Attrition Bias in the Estimation of Econometric Models From Panel Data

Russell S. Winer, Columbia University
ABSTRACT - Panel data are often used to estimate the parameters of econometric or other linear models. However, a common problem with panel data is attrition. In this paper, a model is developed that corrects structural econometric models estimated using panel data for possible attrition bias. The model is illustrated using a simultaneous equation structural model and panel data that have an attrition problem. The results indicated that the impact of attrition tended to be on the medal's exogenous variables and not on the endogenous variables that were exogenous in the equations.
[ to cite ]:
Russell S. Winer (1981) ,"Attrition Bias in the Estimation of Econometric Models From Panel Data", in NA - Advances in Consumer Research Volume 08, eds. Kent B. Monroe, Ann Abor, MI : Association for Consumer Research, Pages: 220-226.

Advances in Consumer Research Volume 8, 1981      Pages 220-226

ATTRITION BIAS IN THE ESTIMATION OF ECONOMETRIC MODELS FROM PANEL DATA

Russell S. Winer, Columbia University

ABSTRACT -

Panel data are often used to estimate the parameters of econometric or other linear models. However, a common problem with panel data is attrition. In this paper, a model is developed that corrects structural econometric models estimated using panel data for possible attrition bias. The model is illustrated using a simultaneous equation structural model and panel data that have an attrition problem. The results indicated that the impact of attrition tended to be on the medal's exogenous variables and not on the endogenous variables that were exogenous in the equations.

INTRODUCTION

Researchers in marketing and other social science disciplines often conduct studies where a sample of individuals is interviewed at two or more points in time. Such data have been referred to in various literatures as longitudinal, cross-sectional, time-series and panel. To marketing researchers, the tern panel data usually connotes consumer panel data which are records of household purchasing behavior over time. However, panel studies other than those using consumer panel data have occurred frequently in the marketing literature.

Although panels have been widely used, they are not without problems (Carman 1974). One of these problems is panel attrition or mortality. Except for panels operating under tightly controlled conditions, panel attrition is unavoidable. For example, Charlton and Ehrenberg (1976) report that 88% of their initial sample completed the 25-weak panel. The panel utilized by Farley, Howard, and Lehmann (1976) exhibited a 43% drop-out rate over four waves of interviewing spanning fifteen months.

There has been considerable interest in the possibility of panel bias occurring due to attrition. For example, in a panel study of economic attitude study and change covering the years 1954-1957, Sobol (1959) found that renters, people with low income, and people disinterested in the study tended to drop out. Buckles and Carman (1967) also found the interest factor and its correlates to be highly related to attrition. Ferber (1966) discovered disproportionately high attrition rates related to older age, lower education, self-employment, and high personal asset value. Therefore, assuming an initial probability sample or an alternative sampling approach attempting to represent the population studied, the sample left at the end of the final wave may be biased with respect to the population.

Econometric or other linear models have often been used to analyze panel data. Such applications have been in the areas of evaluating advertising effectiveness (Prasad and Ring 1976, Winer 1980), market segmentation (Frank, Massy, and Boyd 1967, McCann 1974), testing general models of buyer behavior (Farley and Ring 1970, Farley, Howard, and Lehmann 1976), sales management (Futrell and Jenkins 1978), and others.

The effect of attrition on the parameters of linear models has been ignored in studies using panel data. This paper describes the conditions under which parameter bias can occur from attrition and develops a model to correct for the bias. It is demonstrated that the bias is actually specification error in the structural model being estimated. The nodal correcting for attrition bias simultaneously determines the correlates of attrition and the structural nodal parameters. The model is estimated using panel data exhibiting attrition.

ATTRITION EFFECTS ON REGRESSION ESTIMATES

Overview

It is assumed that everyone on the panel, both attriters if they ware observed and non-attriters, follow the same linear model in any given wave,

y = x b + e    (1)

where

y = Nxl vector of observations on the dependent variables;

c = Nxk matrix of independent variable observations;

b = kxl vector of coefficients;

e = Nxl vector of normal disturbances, ~N(0, se2)

That is, if they could be observed, the behavior of the dropouts is not different from the people remaining on the panel. Attriters could differ, of course, in the c matrix but not in terms of b. If b differed between attriters and non-attriters, no correction in a non-attriter model could ever be made and population inferences from a sample experiencing attrition could not be done.

If attrition probability is a function solely of c or is random, no bias of b is induced. There is, however, some loss of efficiency (i.e., the coefficient standard errors will be larger than they should be) since there is a decline in the working sample size. On the other hand, bias will exist in b if the probability of attrition is related to y and, hence, e. Since it is known that interest in the study is a determinant factor in the mortality problem, it is possible that a correlate of interest is a prospective dependent variable of a structural model. For example, if behavioral measures such as awareness and attitude are being tracked over time for a product, families continuing to lack knowledge about the product may be more likely to drop out than those that are familiar with it. That the bias exists is demonstrated by the figure:

FIGURE 1

EFFECTS OF DEPENDENT VARIABLE-RELATED ATTRITION

An implication of this discussion is related to panel operation. For some panels such as commercial consumer panel, households that attrite are replaced by other households. It is clear that the primary criterion for using a family as a replacement should be in terms of a potential dependent variable such as purchasing rather than description variables such as family size, sex, income, etc., if linear models are to be used in analyzing the data. Replacement by household descriptors will cause the sample to resemble the population from which the original panel was drawn but does not cause the bias in b to vanish.

A Model of Attrition

To formalize the above discussion recent papers in the economics literature will be drawn upon. Hausman and Wise (1976, 1977, 1979), Heckman (1976), and Griliches, Ball, and Hausman (1977) have all developed models for dealing with the general problem of incorporating non-random missing data into econometric analysis of panel data of which attrition is a special case. Samples from which attrition has occurred are termed censored; in truncated samples, no data at all are available on a group of households as they were systematically excluded. Since data are available for panel dropouts, the probability that an observation is in the sample can be computed from censored samples. That is not the case for truncated samples. This probability is an important part of correcting the structural equation (1) for possible attrition bias.

Model Development

Again, consider equation (1) to be the structural model for the panel members for a given wave. Let D = 1 if y is observed for a panel member, and let D = 0 if y is unobserved due to attrition. Assume that y is observed (D = 1) if

d = ag + Xg = Wq + m > 0   (2)

where y and X are as before and

W = n x m matrix of variables that do not affect y but affect the probability of its being observed;

a, g, q = scalar, k x 1, and m x 1 vectors of parameters, m ~ N(0, sm2)

The inequality could be relative to any threshold value but is set > 0 for convenience. The vector, q, could be all zeros. If (1) is substituted into (2), the result is

d = a (Xb + e) + Xg + Wq + m    (3)

   = X (ab + g) + Wq + ae + m.

A reduced form of (3) is

d = Xr + Wq + d    (4)

It follows that

EQUATION    (5)

where F(C) is the standard normal distribution factor.

As discussed in the previous section, E(e) may not equal zero since soma y values and, hence, errors are truncated when attrition occurs. Given the structural equation (1) and the equation describing the probability of being in the sample, equation (4), then

E(y|X, D=1) = Xb + E(e|X, D=1)   (6)

of interest is E(e|X, D=1). Since a y-observation, and, hence, z, exists only if d > 0, the error can be re-written

E(e|X, D=1) = E(e|X, d > - Xr - Wq)   (7)

From Johnson and Kotz (1970 p. 81, 1972 p. 112),

EQUATION    (8)

Equation (8) results from the fact that E (e) is affected by d being truncated from below.

Since sd > 0, the critical quantity in (8) is the co-variance between the structural error, m, and the attrition probability error, d. If cov(e,d) 0, then the whole second term on the right side of (8) is non-zero. As noted by Heckman (1979), the general sample truncation problem can be equated with ordinary specification error arising from omitted variables in regression analysis. If cov(e,d) = 0, the term drops out from (6) and b is unbiased.

EQUATION    (9)

Than, by adding an error term, equation (6) becomes

E( y|x, D=1) = Xb + lZ + x   (10)

Therefore, a convenient test for attrition bias is the significance of y; if it is significant, the omitted variable, Z, is an important explanatory variable and needs to be included to estimate b without bias.

Estimation

Several approaches have been developed to estimate the parameters of (10). The most complex approach is to use maximum likelihood methods (Hausman and Wise 1979, Griliches, Hall, and Hausman, 1977). However, Heckman (1976) proposed a relatively simple estimator that is useful for exploratory data evaluation.

At a given point in time (wave) in the panel, the steps of Heckman's procedure are:

1. Estimate equation (4) using probit analysis (Finney 1964) on the whole sample. That is, at that point in time, D=1 for a person still in the sample, D=0 for an attriter.

2. From the parameter estimates of step 1 (,, and ), estimate Z as shown in (9) for each member of the sample still left.

3. The estimate of Z, Z', is then used in equation (10) to estimate l and b by ordinary least squares (OLS).

This procedure, produces consistent estimates of b and l. However, if l = 0, sg2 from (10) is underestimated as x is heteroscedastic (Heckman 1976, p. 480). This will produce inflated estimates of the significance of b and l. Therefore, generalized least squares (GLS) rather than OLS can be used in step 3 of the estimation procedure (Heckman 1979). However, using OLS in step 3 makes the attrition bias correction factor straightforward to implement using commonly available computer routines.

DATA

The data utilized in this study are from Farley, Howard, and Lehmann (1976). A panel was constructed from a national probability sample of respondents expressing some interest in purchasing subcompact cars within the subsequent two years. Four waves of telephone interviews were conducted over a fifteen-month period with people who would be principal drivers of the new subcompact car if purchased.

Because many questions were asked in each wave and there were eight brands in question, the panel was divided into four groups which were each questioned about only four of the eight brands. Information about the brand used in this study, the Chevrolet Vega, was requested by all four groups. Table 1 demonstrates the attrition rate for the four groups over the four waves.

TABLE 1

SAMPLE SIZES FOR THE GROUPS OF RESPONDENTS OVER TIME

As can be seen, the attrition problem with this panel was substantial despite the fact that the panel was of ordinary design compared to other longitudinal data bases.

MODELS

Structural Model

The model employed is based on the Howard-Sheth model of buyer behavior (Howard and Sheth 1969) and is a variant of that utilized by Farley, Howard, and Lehmann (1976). It is a four-equation system explaining the endogenous variables intention, Attitude, Confidence, and Brand Comprehension. The model has the form

EQUATION   (11)

and the ij index refers to equation i and variable j. y is a 4-element vector of the endogenous variables, x is a 12-element vector of ecogenous variables, and m is a vector of disturbances. The variables can be found in Table 2.

TABLE 2

VARIABLE LIST

Attrition Model

As noted earlier, previous research has found that interest in the study tends to be related to the probability of a panel member dropping out. Interest in the study, when unmeasured, is likely to be a function of demographic socioeconomic variables and other exogenous factors. Therefore, the attrition model is assumed to be composed of the reduced-form variables from the system described by equation (11). That is, by some matrix algebra, equation (11) can be represented by

y' = -B-1r'x' + B-1 m   (12)

    = rx' + v.

In addition, there is empirical evidence that the four groups comprising the panel differ with respect to means of endogenous variables some slopes of exogenous variables (Farley, Katz, Lehmann, and Winer 1980), and attrition rates (Table 1). To partially account for this, group dummy variables to reflect possible mean differences in attrition probabilities were added. The attrition model thus becomes

EQUATION   (13)

where D is the 0-1 index of being in or out of the sample for a given wave, the independent variables are from Table 2, and the Gi are 0-1 dummies representing three of the four groups.

EMPIRICAL RESULTS

Attrition Model

The results of the probit estimation of the attrition model (13) for waves 2, 3, and 4 are in Table 3. Since sex, age, income, education, number of drivers, and child are measured only once in the first wave, those variables must be repeated for each wave. However, intention through propensity (Table 3) are measured in each wave.

TABLE 3

PROBIT RESULTS FROM ATTRITION

Since panel dropouts obviously are not measured on these variables in the wave they attrite and on, a decision had to be made on the appropriate values to use, Therefore, for all panel members, the wave one values were employed. The only difference between the data matrices for the wave 2,3, and 4 attrition models was thus in the 0-1 dependent variable.

As can be seen from Table 3, the reduced form variables are only weakly related to the probability of attrition for waves 2 and 3. However, by wave 4, more people had dropped out providing increased information on which to base the relative effects of the exogenous variables. The results indicate that age is negatively related to staying on the panel and income has positive influences. These results are consistent with the findings in the literature reported earlier. A weaker effect was that people with higher confidence in their ability to judge the Vega tended to drop out. Perhaps this reflects a feeling that there was little additional knowledge to be gained from being on the panel which translates to a low interest factor.

Structural Model

As described earlier, the probit results are used to create a new independent variable to be added as an explanatory variable to the equations (11). The implications of panel attrition can be found in Tables 4-6 by comparing the results with and without the attrition bias correction factor. OLS was used to estimate each equation as the model is assumed to be recursive with independent inter-equation errors. Tables 4-6 may be found in the appendix at the conclusion of this paper.

In this instance, attrition appears to have a modest impact on the structural equation estimates. The correction factor is significant in three cues--the Confidence equation in wave 2, the Confidence equation in wave 3, and the wave 4 Attitude equation. In all three cases, the impact of attrition is selective; not all variable coefficients are affected. In fact, the effects seem to be centered on the pure exogenous variables rather than the endogenous variables that are exogenous. In the Confidence-wave 3 equation, the greatest impact is on the age variable as it doubles in size and becomes significant. In the Attitude-wave 4 equation, the income variable assumes a much larger value and becomes significant. If the model was being used for segmentation purposes, serious misallocations of resources or missed opportunities could result.

DISCUSSION

It is difficult to determine when attrition should be modeled as part of a structural system. Hausman and Wise (I979) provide empirical evidence supporting the notion that bias is more likely to be present when the structural model is misspecified. This implies that in early stages of model development or when simplicity is sought at the expense of completeness, attrition bias can seriously distort results. The model utilized in this paper has been thoroughly tested and found robust to minor perturbations in specification. Therefore, it is not surprising that correcting for attrition did not have a dramatic impact on the empirical results. Even with this relatively well-specified model, however, the impact of panel attrition can be seen.

Several other issues can be raised. First, if a wave-by-wave analysis is not desired but the analyst wishes to pool, only panel members completing all waves can be used. Therefore, the implication is that only one attrition model need be estimated--the one corresponding to the wave four analysis of Table 3--since interest is focused on predicting the probability of attrition by the final wave. Second, if attrition and massing data occur, both problems could be modeled and two terms inserted in each structural equation. There may be difficulty, however, in separately identifying massing data and attrition functions.

CONCLUSION

This paper has developed and estimated a model that accounts for attrition bias in panel data. It was found that the main impact of accretion was on exogenous variables and not endogenous variables that ware exogenous in the equations. Marketing studies employing panels should be aware that attrition can affect model parameter estimates, particularly if the model structure is not well-known.

APPENDIX

TABLE 4

STRUCTURAL EQUATION ESTIMATES: WAVE 2

TABLE 5

STRUCTURAL EQUATION ESTIMATES: WAVE 3

TABLE 6

STRUCTURAL EQUATION ESTIMATES: WAVE 4

REFERENCES

Bucklin, Louis P. and Carmen, James M. (1967), The Design of Consumer Research Panels: Conception and Administration of the Berkeley Food Panel, Berkeley, California: Institute of Business and Economic Research, University of California.

Carman, James M. (1974), "Consumer Panels," in Robert Ferber ed., Handbook of Marketing Research, New York: McGraw Hill.

Charlton, P. and Ehrenberg, A. S. C. (1976), "An Experiment in Brand Choice," Journal of Marketing Research, 13, 152-160.

Farley, John U., Howard, John A. and Lehmann, Donald R. (1976), "A 'Working' System- Model of Car Buyer Behavior," Management Science, 23, 235-247.

Farley, John U., Katz, Jerrold P., Lehmann, Donald R., and Russell S. Wiser (1980), "Parameter Stationarity in a Consumer Decision Model," working paper, Columbia University.

Farley, John U. and Ring, L. Winston (1970), "An Empirical Test of the Howard-Sheth Model of Buyer Behavior," Journal of Marketing Research, 4, 427-438.

Ferber, Robert (1966), The Reliability of Consumer Reports of Financial Assets and Debts, Urbana Illinois: Bureau of Economic and Business Research, University of Illinois.

Finney, D. J. (1964), Probit Analysis, 2nd ed., Cambridge, U.K.: Cambridge University Press

Frank, Ronald E., Massy, William F. and Bard, Harper W. (1967), "Correlates of Grocery Product Consumption Rates," Journal of Marketing Research, 4, 184-190.

Futrell, Charles M. and Jenkins, Omar C.(1978), "Pay Secrecy Versus Pay Disclosure for Salesmen: A Longitudinal Study," Journal of Marketing Research, 15, 214-219.

Griliches, Zvi, Hall, Bronwyn H. and Hausman, Jerry A. (1978), '"Missing Data and Self-Selection in Large Panels," Annales de l'insee, 30-31, 137-176.

Hausman, Jerry A. and Wise, David A. (1976), "The Evaluation of Results from Truncated Samples: The New Jersey Income Maintenance Experiment," Annals of Economic and Social Measurement, 5, 421-445.

Hausman, Jerry A. and Wise, David A. (1977), "Social Experimentation, Truncated Distributions, and Efficient Estimation,'' Econometrica, 45, 919-938.

Hausman, Jerry A. and Wise, David A. (1979), "Attrition Bias in Experimental and Panel Date: The Gary Income Maintenance Experiment," Econometrica, 47, 455-473.

Heckman, James J. (1976). "The Common Structure of Statistical Models of Truncation, Sample Selection and Limited Dependent Variables and a Simple Estimator for Such Models." Annals of Social and Economic Measurement, 5, 475-492.

Heckman, James J. (1979), "Sample Selection Bias as a Specification Error," Econometrica, 47, 153-161.

Howard, John A. and Sheth, Jagdish N. (1969), The Theory of Buyer Behavior, New York: John Wiley.

Johnson, Norman and Kotz, Samuel(1970), Continuous Uni-Variate Distributions - 1 New York: John Wiley.

Johnson, Norman and Kotz, Samuel (1972), Continuous Multi-variate Distributions, New York: John Wiley.

McCann, John M. (1974), 'Market Segment Response to the Marketing Decision Variables," Journal of Marketing Research 11, 399-412.

Prasad, V. Kanti and Ring, L. Winston (1976), 'Measuring Sales Effects of Soma Marketing Mix Variables and their Interactions," Journal of Marketing Research, 13, 391-396.

Sobol, M. G. (1959) "Panel Mortality and Panel Bias" Journal of the American Statistical Association, 54, 52-68.

Winer, Russell S. (1980), "Estimation of a Longitudinal Model to Decompose the Effects of an Advertising Stimulus on Family Consumption Behavior," Management Science, 26 (in press).

----------------------------------------