How do different types of expenditures relate to households’ overall budgets? To characterize this relationship, economists frequently use demand elasticities, which are the percent change in expenditures on a given item (or group of items) associated with a percent change in their budget. Using household expenditure data, a simple way to generate an elasticity estimate is by regressing the logarithm of a type of expenditure on the logarithm of either total expenditures or food expenditures. Use of the logarithmic transformation is ubiquitous in empirical economics research. However, one drawback is that logarithmic transformations are undefined at zero (as well as negative values). Thus, when zeros are present in data they are dropped from any ensuing analysis once the transformation is applied, biasing the remaining estimates as they relate to the original sample.
To deal with this issue, researchers have recently begun using the inverse hyperbolic sine transformation (IHST) when estimating elasticities, defined as IHST(Z) = log (Z + (Z2+1)0.5). The IHST behaves similarly to a log transformation for positive values, but has the added benefit of remaining defined for zeroes and negative values. In recent years, its use over a broad range of applications has become widespread. In a recent paper, Bellemare and Wichman provide guidance on how to convert IHST regression coefficients into conventional elasticities. They note that the IHST is distorted for “small” values (under 10), and suggest scaling the underlying variable (i.e., multiplying it by a constant) before applying the IHST transformation as a way to address the issue.
In a forthcoming paper in the American Journal of Agricultural Economics, we study expenditure elasticities of food demand using data from the Nigeria General Household Survey. A primary category of interest in our analysis is expenditures on food eaten away from home (FAFH). Average per capita weekly expenditures on FAFH in the sample were nearly $2.50, or 13% of total food expenditures. However, 26% of households report not consuming any FAFH during the reporting period. A log-log regression model measuring the expenditure elasticity of FAFH would therefore drop 26% of the sample, biasing the estimates by eliminating variation on the extensive margin.
The IHST could be used to include those zero valued observations; however, there are two issues with doing so. First, the IHST transformation cannot be derived from well behaved preference relations, making it less likely that observed expenditures reflect rationalizable underlying behaviors. Second, it includes arbitrary distortions to the data in the form of a shifting value (similar to adding a constant before taking a log) and a further potential distortion from rescaling. Both factors result in implicit, arbitrary weights on the difference between zero and positive consumption that may bias resulting elasticity estimates.
To illustrate this problem, we focus here on the scaling factor (in the paper, we show that shifting values introduce a similar issue for logarithms as well). We first drop all zero-valued observations, focusing on households that reported positive FAFH consumption. For this subsample, the elasticity resulting from the log-log regression represents a benchmark for the elasticity of FAFH consumption in total food expenditures. This estimated elasticity is 1.15, suggesting that if overall expenditures were to increase by 10%, then food away from home consumption would increase by 11.5%.
Next, we re-estimate the elasticities using the IHST instead of the log transformation and apply a range of scaling factors prior to estimation. The resulting elasticity estimates are shown in Figure 1, represented by the blue line. Each point along the line represents a separately estimated elasticity. Absent rescaling (X=0), the resulting elasticity estimate is approximately 1, considerably below 1.15. As both FAFH and total food expenditures are scaled upwards by factors of 10, 100, or 1,000 (X = 1, 2, or 3), estimated elasticities converge towards the “true” elasticity of 1.15 (on the right side of the blue line), consistent with Bellemare and Wichman’s suggestion to rescale data when it contains “low” values.
However, in the presence of zeros, we observe new issues. We show this by randomly selecting observations from the original data that had contained zeros and reintroducing them to the analysis sample such that the share of zeros in the data is 1%, 5%, 10%, and, when incorporating all of the original data, 26%. Each line represents estimates on samples containing the same incidence of zeros but, as before, varying the scaling factor that is applied to the data before the estimates are conducted, as indicated on the x-axis.
Troublingly, higher scaling factors as large as 1,000 do not lead to convergence in elasticity estimates when zeros are included. As the incidence of zeros increases, this divergence occurs more quickly. The original, full data set is used in the estimates reflected in the orange line. For this sample and without a clear reason to prefer rescaling by 1,000 instead of 10, one could just as credibly accept an estimated elasticity of 1.4 as one over 2.
In our paper, we show similar patterns on simulated data using Monte Carlo simulations to demonstrate that these findings are not unique to the Nigerian GHS. But more broadly, we urge researchers to tread carefully when using the IHST to estimate any type of elasticities. In the presence of zeros, which is after all, their purpose, elasticity estimates using the IHST may be vulnerable to potentially large and arbitrary distortions like those we illustrate here.
Alan de Brauw is a Senior Research Fellow with IFPRI’s Markets, Trade, and Institutions Division (MTID); Sylvan Herskowitz is an MTID Research Fellow. This work was funded by the IFPRI-led CGIAR Research Program on Agriculture for Nutrition and Health (A4NH).