# Statistical Estimation and Consumer Research

^{[ to cite ]:}

Albert R. Wildt (1979) ,"Statistical Estimation and Consumer Research", in NA - Advances in Consumer Research Volume 06, eds. William L. Wilkie, Ann Abor, MI : Association for Consumer Research, Pages: 569-573.

^{[ direct url ]:}

http://acrwebsite.org/volumes/9620/volumes/v06/NA-06

INTRODUCTION

Applications of rigorous research methodologies and statistical estimation techniques to problems of social consequence by consumer researchers are on the increase, as they probably should be. Often research of this type is intended to provide policy-makers with information to aid in the setting of public policy. In these cases it might be argued that researchers have a special obligation to provide accurate and valid information due to: (1) the importance, in terms of the degree of possible impact, of policy decisions, (2) the probable unfamiliarity of policy-makers with the techniques employed, as well as their possible inability to evaluate the resulting information, and (3) the likely impact of poorly conducted research on the receptiveness of policy-makers to this type of input for future decisions.

The discussion that follows considers three papers, all of which involve statistical estimation in the context of empirical research dealing with social and/or public policy issues. Barnes and Bourgeois (1978) consider the social problem of alcohol consumption in a study "...designed to explore primarily the question of whether increased governmental regulation of advertising might be expected to contribute to a reduction in per capita consumption of beverage alcohol products." Their research examines aggregate data (annual data for ten Canadian provinces) in an attempt to identify variables which have significant impact on annual per capita alcohol consumption during the period 1966 to 1973. Reizenstein and Barnaby (1978) examine the energy issue with the intent of providing insight to policy-makers. They employ a mail survey and estimate consumer response, in terms of quantity consumed, to hypothetical increases in the price of gasoline. Further, they group respondents according to similarities in response patterns into three market segments and identify these segments in terms of selected AIO measures, media and personal information source utilization variables and demographic measures. This is done in an effort to determine if the segments are "...sufficiently unique to warrant the formulation of market-oriented conservation strategies and tactics for specific groups." And Dwyer (1978) examines and summarizes research in the area of drug compliance (failure of patients to take medications as directed), mentions two sample-related problems with research in this area (small sample sizes and non-representative samples due to the nature of the sampling procedures employed) and indicates that in nonexperimental studies, such as those in the drug compliance area, where exploratory analysis procedures are employed it is very possible for otherwise spurious results to be interpreted as significant. Further, Dwyer suggests the use of the jackknife procedure to handle this latter problem and provides an illustration within the drug compliance area.

Two approaches come to mind for organizing the discussion of these papers. They are: (1) focus the discussion on the unique role of consumer research and statistical estimation relative to the "real" world of public policy setting, and (2) address selected statistical problems in estimation as they relate to consumer research and these three papers. Beckwith's (1978) paper in this same volume takes the former approach while the latter approach is pursued in this paper.

Each of the three papers considers estimation in one context or another. Barnes and Bourgeois are concerned with estimating the interrelationships among aggregate "social" variables, Reizenstein and Barnaby, among other things, are estimating consumer response through self-report intention measures, and Dwyer addresses the problem of the validation of estimates obtained when exploratory analysis methods are applied to samples of limited size. In this discussion paper, these specific estimation issues are considered under the general topic areas of: selection of appropriate models for estimation, measurement and estimation, and estimation using exploratory analysis procedures. This approach allows for both a brief general discussion of a set of issues in estimation and inference that relate to consumer research in general, and the detailed consideration of selected aspects of the three papers under discussion.

SELECTION OF APPROPRIATE MODELS FOR ESTIMATION

The model employed in any statistical analysis should be selected on the basis of its applicability to the specific problem under investigation. Unfortunately in some instances, the familiarity, or lack thereof, of the researcher with particular analysis models is the determining factor. This is not to imply that every researcher need be an expert on all analysis techniques, but to point out that it is important to be aware of the limitations of techniques so that inappropriate applications are avoided.

In many situations, especially when dealing with observational data, some doubt exists concerning the "correct'' or appropriate statistical model. In instances where any doubt, no matter how small, exists, the researcher should, at a minimum, carefully examine the assumptions of the statistical model before it is employed and empirically test [The testing of a model should also include considerations relating to the consistency of the estimated model with relevant behavioral or economic theory.] those same assumptions after estimation. In some situations there are statistical model comparison procedures available for distinguishing among alternate models and, where applicable, these should be used.

The Problem of Alcohol Consumption

In the Barnes and Bourgeois study dealing with alcohol consumption, the selection of an appropriate statistical model is clearly dependent upon the researcher's perception of the problem. Three modeling issues relating to this study are discussed below: (1) the appropriate degree of aggregation, (2) model specification relative to expected impact of variables, and (3) special considerations when estimating aggregate models.

Aggregate vs. Individual Models. The way in which the decision-maker frames the problem should dictate the analysis approach taken in any type of applied research. The alcohol problem is often described in relation to the heavy or chronic drinker. If this is viewed to be the appropriate focus, then one may argue that the researcher should investigate the heavy drinker and not average consumption. Conversely, the problem may be considered in relation to young drinkers, in which case the focus of the research should be toward this particular segment. In either case, the level of aggregation employed in the present study could be questioned.

The authors do consider the aggregation issue and briefly address it by relating average alcohol consumption to the consumption of heavy drinkers. They point out that available data indicates that the distribution of alcohol consumption is approximately lognormal. Assuming the stability of this distribution over time, they contend that a shift in the parameter of central tendency of this distribution, average consumption, will be accompanied by a corresponding shift in the consumption of the heavy drinker. While this may or may not be a valid assumption, a possibly more plausible explanation for the use of aggregate data is the cost and difficulty of acquiring individual-level data as opposed to the availability of aggregate data.

Expected Impact. It might be effectively argued that the impact of most contemplated governmental regulation concerning this problem, especially that dealing with advertising, would be such that immediate, dramatic effects are not to be expected. That is, the timing of the impact, if any, might be some time in the future. For example, policy actions may result in discouraging present non-problem drinkers from becoming problem drinkers. Issues such as this are important and have serious implications on model specification.

Special Considerations for Aggregate Economic (or Social) Data. Much of consumer research is conducted with individual-level data. But, for certain problems, the use of aggregate data represents a desirable alternative. However, when using aggregate social or economic data it is important that the researcher adjust the model and estimation procedures to the unique characteristics of that data. A considerable amount of investigation into appropriate analysis methods when using aggregate economic data has been done by researchers in the area of econometrics. Econometrics is a special field of economics that deals with the measurement of relationships among economic (and social) variables. Goldberger (1964, p. 1) points out that, "although econometric theory draws heavily on the mainstream of mathematical statistics, it has a distinctive flavor which is attributable to characteristic features of economics. One feature is that as a rule observations on economic phenomena are not obtained by controlled experiments; consequently special methods for the analysis of nonexperimental data have to be devised. Another feature is that there is a rich body of theory of economic behavior; consequently special methods are devised to take advantage of this."

A number of issues frequently encountered in empirical econometrics are applicable to consumer research. Here, a limited number of those issues, which apply to the Barnes and Bourgeois study, are briefly discussed.

Many economic problems involve observations on cross-sections of time-series. Often in these instances there are unique characteristics of the cross-sections which impact directly on the dependent measure or possibly moderate the influence of the independent measures on the dependent variable(s). Also, observations from adjacent time periods often tend to be related to one another. To address these problems, special estimation procedures have been developed for use with time-series data from multiple cross-sections. (The interested reader may refer to Maddala [1977] for a brief discussion of this issue.)

In regard to the Barnes and Bourgeois study, it should be pointed out that the number of useable observations is exceedingly small (approximately 40) and the explicit consideration of cross-sectional parameters in the estimation model would have severely reduced the degrees of freedom, thus causing other problems. However, the questions of cross-sectional differences and temporal relationships could still have been considered by the careful examination of the model which was estimated (e.g., residual analysis).

Also, when multiple dependent measures are employed the question of the interrelationships among these dependent measures arises. These interrelationships can assume any of a number of forms. For example, the multiple dependent measures may be functionally related, say a linear combination of each other, such as total alcohol consumption equaling the weighted sum of the consumption of beer, wine and spirits. (For a discussion of the impact of relationships such as these on modeling and estimation refer to Koehler and Wildt [1978] and McGuire and Weiss [1977].) In other applications the relationship may be indirect resulting in correlations among the dependent measures due to variables and relationships not explicitly included in the model (see Zellner, [1963] for a discussion of estimation procedures in this case). In any event, relationships such as these usually call for simultaneous, rather than separate modeling and estimation procedures.

The implication of using ordinary regression analysis (OLS) in situations which are characterized by the more complex models described above may be severe and are worth mention. At best, OLS will yield unbiased but inefficient estimators. And to complicate matters the estimated standard errors of those estimated coefficients will be biased, rendering inferences based on OLS estimates tenuous. At worst, the model will be mis-specified and biased or even meaningless coefficient estimates will be obtained. Therefore, it is in the best interest of the researcher to carefully consider these issues.

Basis for Statistical Inference

An issue closely related to the selection of an analysis model is the basis use for statistical inference when multiple independent variables are considered. The options available to the researcher include simple, partial and sequential approaches. The simple approach considers each variable separately and ignores possible interrelationships with other variables. Reizenstein and Barnaby adopt this approach when they use a series of univariate F-tests to determine which variables best describe potential market segments. The partial approach considers only the unique aspects of the selected variable after accounting for all other variables. The regression models used by Barnes and Bourgeois, and Dwyer are examples of this approach. The sequential approach requires the researcher to develop a hierarchy within the set of variables and considers the unique aspects of the selected variable after accounting for only the variables above that variable in the hierarchy. Each approach may be useful and has a place in consumer research, although the partial approach is more common with observational studies, such as the three presently under consideration.

Specific analysis procedures often incorporate (either directly or due to available computational procedures) one of these three bases of inference. Therefore, the researcher must make a conscious decision concerning the basis of statistical inference when he selects an analysis model.

MEASUREMENT AND ESTIMATION

Empirical research frequently necessitates the development of operational definitions of variables and the construction of measurement scales based on these definitions. In some cases these measurements are less than precise and might be better thought of as estimates (representations) of an underlying phenomenon or concept. Also, the measurement methods used impact on the nature of the measurements and, therefore, on the accuracy and validity of subsequent analyses.

Gasoline Consumption and Purchase Intentions

A critical measurement and estimation issue in the research reported by Reizenstein and Barnaby is the estimation of consumer response to hypothetical or anticipated changes in the levels of decision variables. This is a difficult problem and, if constrained to a survey research design, self-report intention measures appear to be a reasonable approach. However, a major concern here is the accuracy of consumers' response to questions concerning their behavioral intentions with respect to some future or hypothetical action. This is especially true if the action relates to conditions with which the respondent is unfamiliar or where existing or perceived social pressures favor a particular response. In the latter case, respondents who are more socially aware and concerned may yield more biased responses. The possibility of this problem effecting the accuracy and validity of the Reizenstein and Barnaby research is very real and a closer look at the measures employed in this study is warranted.

Unfortunately, the measurement instrument used in this study is not provided, however, under almost any conceivable measurement scale one would doubt the ability of respondents to give accurate responses. [The authors defend the "potential" validity of their intentions measure by arguing that the scale used has many properties of a purchase probability scale and refer to Juster (1966) for support concerning the validity of the purchase probability scale. However, it should be pointed out that Juster's work involved consumer durables and considered the probability of purchase, not the number of units purchased. Also, according to Green (1977, p. 106) "...the Bureau of the Census discontinued the Consumer Buying Expectations Survey [which employed such intentions measures] in 1973 because it was concluded that the data it provided were only 'marginal' useful."] Support for the accuracy of the intentions measure might have been provided through the measurement of variables relating to commitments by the consumer which might correlate with intentions. No such measures were reported. The large number of consumers indicating a 7-12 gallons per week reduction in consumption might provide some indication of the accuracy of the intentions measure. Assuming that the respondents were responding to reductions in personal (i.e., nonbusiness) consumption of gasoline, how realistic are these results? What would a 7-12 gallons per week reduction mean to you? According to the __Statistical Abstract of the United States__ (Table No. 995, page 597) average annual U.S. gasoline consumption per car (includes taxicabs and motorcycles) in 1974 was 676 gallons, on 13 gallons per week. Given this average consumption, how realistic is it to expect 10% to 30% of the drivers to reduce consumption 7-12 gallons per week? On carefully considering the problem, it is probably reasonable to expect only small changes in consumption, especially in the short-run, in reaction to a price increase. And lastly, the aggregate data presented in Figure A (Reizenstein and Barnaby, 1978) indicates some minor data inconsistencies which should be carefully examined on an intrarespondent level.

While it is easy to point out the existence of the problem, it is a much more difficult matter to find a solution. Some consumer researchers may suggest that experimental methods might yield more accurate indications of purchase intentions. However, given time and cost constraints, it is extremely difficult, if not impossible, to develop an accurate measure of purchase intention, especially within the context of survey research; though careful questionnaire design may go a long way in improving the quality of the intentions measure. It should also be noted that the research reported here is probably not very sensitive to inaccuracies in the prediction of the exact amount of decrease in gasoline consumption. The approach used by the authors requires only that respondents be grouped according to their relative propensity to decrease consumption. Therefore it would appear that only ordinal properties are required of the intentions measure.

Choice of Operational Definition of Variables

A somewhat different problem arises in connection with the Barnes and Bourgeois study and relates to the measurement of monetary quantities. When using time series data, it is usually advisable to measure monetary variables in terms of constant dollars rather than current dollars. When measured in current dollars, monetary variables often exhibit positive correlations to each other because of a common movement due to inflationary conditions. In the case at hand it not clear whether price and per capita income are measured in current or constant dollars. But if measured in current dollars, the positive trend of these variables would be accentuated because of inflation. This could well lead to positive relationships between these variables and consumption in alcohol, possibly due to the coincidental changes in the value of the monetary unit. Also, unnecessary multicollinearity may be introduced which would result in a decrease in the precision of the estimation and a general lessening in the ability to separate the impact of one variable from that of another.

ESTIMATION USING EXPLORATORY ANALYSIS METHODS

General Considerations

A common occurrence in consumer research is the use of exploratory data analysis methods, such as stepwise regression analysis. Often these analysis procedures are used with data sets of limited size and little or nothing is done in the way of validating results. The typical disclaimer is that the number of observations is too small to allow the use of a holdout sample for validation. The three studies considered here are examples. Barnes and Bourgeois conduct four stepwise regression analyses with nine independent variables in their estimation of the impact of selected variables on per capita consumption of alcohol. Reizenstein and Barnaby conduct 46 univariate F-tests in an effort to uncover variables which might differentiate three market segments and, for the two sets of results reported, 6 and 7 of the 46 were statistically significant at a = .05 and 2 and 3 at ~ = .01. This is slightly more than would be expected by chance. In both of these instances no validation of results was undertaken. Dwyer employed stepwise regression analysis with seven independent variables and found one significant at a = .10. Dwyer did attempt to validate his results, a point which will be discussed later.

As mentioned by Dwyer (1978), because of their tendency to capitalize on spurious correlations in the data, data searching techniques often uncover "significant" relationships that are, in fact, random. Therefore, results from these types of analyses should be carefully evaluated before any conclusions are reached.

Based on observation, one might argue that there are too many exploratory studies with results that are unvalidated. Two possible solutions to this problem may be worth considering. First, reduce the number of exploratory studies. Through the development of a scientific tradition whereby research hypotheses are generated based on the careful consideration of available theory, rather than the data, the reliance on exploratory studies could be reduced. This approach appears to be necessary when the researcher is limited in the number of observations available for analysis. In these cases the researcher must rely heavily on existing theory to formulate the model, and the iterative approach to model building and testing has the serious drawback of using the data to formulate the model and then using the same data to test the model. Procedures such as these raise serious doubts concerning the validity of any conclusions which may be reached. The second alternative is for researchers to take the necessary steps to validate results obtained from exploratory analyses, either through the collection of sufficient data to allow for a holdout sample or by some other means.

The Jackknife

Dwyer (1978) suggests the use of the jackknife procedure as a method of validation which requires no additional data collection and illustrates the method within the context of regression analysis. At first glance the use of the jackknife as a substitute for a holdout sample for validation is very appealing. However, before accepting the procedure we should examine available evidence. Unfortunately, Dwyer offers no evidence supporting the use of the jackknife procedure for this purpose, therefore, other references (Quenouille 1956, Gray and Schucany 1972, Miller 1974, Miller 1974, and Mosteller and Tukey 1977) were consulted. According to Gary and Schucany (1972, p. v.) "the jackknife is a general method for reducing the bias in an estimator and for obtaining a measure of the variance of the resulting estimator by sample reuse. Thus the result of the procedure is usually a nearly unbiased estimator and an associated approximate confidence interval." Mosteller and Tukey (1977), in discussing the direct assessment of the variance of estimators, suggest the usefulness of the jackknife in those cases where insufficient observations are available to utilize the standard method of equivalent subsamples. [If the issue of concern to Dwyer was the direct assessment of variability, Mosteller and Tukey (1977) would probably suggest dividing the sample into sub-samples (after all there are 41 observations and only a single independent variable) in such a way as to measure the variability attributable to other relevant variables, such as hospital and length of time since receiving prescription. Alternately, these variables could have been included in the model.] In this same regard, the jackknife is especially useful in complex multivariate problems where little or no theory exists to give exact tests of significance on coefficients, e.g., discriminant analysis.

Now, let's examine the applicability of the jackknife to regression analysis in light the two above-mentioned benefits. First consider the issue of unbiasness. The jackknife does not always yield an unbiased estimator but usually reduces the bias of a given estimator. If the regression model is correctly specified both the OLS and the jackknife give unbiased estimators. Under a misspecified model the OLS estimator will often be biased, but in this case the jackknife is also likely to be biased and since both estimators rely on the same incorrect model, neither may be of much use. In any event, research concerning the properties of jackknife estimators under conditions of misspecified regression models appears to be very sparse.

Second, consider the jackknife as a method to obtain a direct measure of variance of an estimate. The usefulness of this is obvious in those complex cases where little or no theory exists concerning the distribution of sample estimates (refer to Crask and Perreault [1977] for such an application to discriminant analysis). But is it needed in the case of the regression model? If the regression model is correctly specified, it can be shown that the jackknife will have higher variance than the unbiased OLS estimator, and it is generally expected that the estimated variance of the jackknife will be greater than that of the OLS estimate. If the structure of the model is correctly specified except for the error term (e.g., non-normal or non-spherical disturbances), the direct assessment of valiability could be very useful. However, this may not be necessary since there are methods available for detecting misspecifications of this type which should be used by the careful researcher. In the case of a structurally misspecified model, the jackknife estimator is also based on the misspecified model, therefore, depending on the nature of the misspecification, the usefulness of the jackknife is questionable. (Note: this may be a problem for future research.)

The discussion above presented arguments relating to the use of the jackknife in regression analysis to reduce bias or to obtain a measure of variability. Next, in a more general context, let's examine the question of whether the jackknife provides the same information as would an additional validation sample. The use of a validation sample is similar to randomly dividing the population into j samples. Even when there is no association between (among) variables in the population, spurious association may be observed in some of the j samples. The probability of this occurring is derived form sampling theory. When using exploratory data analysis procedures which capitalize on chance, the likelihood of uncovering significant "non-results" may be large, but the chance of confirming these results on the second (validation) sample is controlled by the -level selected. However, given a single sample with observed (though spurious) association, the further subdivision of that sample will not remove the association even though statistical tests on subsamples may fail to reject the null hypothesis due to either sampling variation and/or degree of freedom problems, or the use of less efficient estimation methods. The specific data observations comprising the sample are such that the relationship is present, and even the jackknife cannot change this. Since the jackknife is based on these same observations, one is forced to ask the question: Why should the jackknife do any better? It does appear that the jackknife will usually result in a more conservative estimate (i.e., higher variance), but if that is what one wants, why not select a smaller a? In summary, I would not be comfortable in recommending the jackknife as a substitute for a holdout sample given the evidence considered.

CONCLUDING REMARKS

A number of interesting points are raised by the authors in these papers. But, in the opinion of this reader, there are two very important issues of a general nature addressed by the authors that deserve special mention. First, as pointed out by Barnes and Bourgeois for the problem of alcohol consumption and by Reizenstein and Barnaby in the case of energy, there exists the need for careful scientific investigation in the evaluation of contemplated public policy actions. An second, as Dwyer points out, researchers need to consider the problems inherent in the application of exploratory analysis methods to samples of limited size and the validation of results obtained under these conditions. This latter issue is one which everyone knows about but it appears to be the exception when somebody does something about it.

REFERENCES

James G. Barnes and Jacques C. Bourgeois, "Estimating the Effects of Advertising: Application to a Social Problem," __Proceedings of the Association for Consumer Research__, 1978.

Nell E. Beckwith, "Fiction, Folklore, Findings and Facts: A Discussion of Marketing Research in the Public Interest Forum," __Proceedings of the Association for Consumer Research__, 1978.

Melvin R. Crask and William D. Perreault, Jr., "Validation of Discriminant Analysis in Marketing Research," __Journal of Marketing Research__, 16 (1977), 60-8.

F. Robert Dwyer, "Drug Compliance and the Neglected Concern for Validity," __Proceedings of the Association for Consumer Research__, 1978.

H. L. Gray and W. R. Schucany, __The Generalized Jackknife Statistic__ (New York: Marcel Dekker, Inc., 1972).

Arthur S. Goldberger, __Econometric Theory__ (New York: John Wiley & Sons, 1964).

Paul E. Green and Donald S. Tull, __Research for Marketing Decisions__ (Englewood Cliffs: Prentice-Hall, Inc., forth edition, 1978).

Thomas F. Juster, __Consumer Buying Intentions and Purchasing Probability__ (New York: National Bureau of Economic Research, 1966).

Gary Koehler and Albert R. Wildt, "Characterization and Estimation of Admissible Logically Consistent Parameters for Constrained Linear Models," Research Report No. 78-6, Industrial and Systems Engineering Department, University of Florida, Gainesville, Florida, May 1978.

G. S. Maddala, __Econometrics__ (New York: McGraw-Hill Book Company, 1977).

Timothy W. McGuire and Doyle L. Weiss, "Logically Consistent Market Share Models II," __Journal of Marketing Research__, 13 (1976), 296-302.

Rupert G. Miller, "An Unbalanced Jackknife," __Annals of Statistics__, 2 (1974), 880-91.

"The Jackknife - A Review," __Biometrika__, 61 (1974), 1-15.

Frederick Mosteller and John W. Tukey, __Data Analysis and Regression__ (Reading, Massachusetts: Addison-Wesley Publishing Company, 1977).

M. H. Quenouille, "Notes on Bias in Estimation," __Biometrika__, 43 (1956), 353-60.

Richard C. Reizenstein and David J. Barnaby, "Assessing the Potential Effects of Differential Price Increases on Gasoline Usage," __Proceedings of the Association for Consumer Research__, 1978.

U.S. Department of Commerce, Bureau of the Census, __Statistical Abstract of the United States: 1976__ (Washington, D.C.: U.S. Government Printing Office, 97th edition, 1976).

Arnold Zellner, "An Efficient Method of Estimating Seemingly Unrelated Regressions and Tests for Aggregation Bias," __Journal of the American Statistical Association__, 57 (1962), 348-68.

----------------------------------------

Tweet
window.twttr = (function (d, s, id) { var js, fjs = d.getElementsByTagName(s)[0], t = window.twttr || {}; if (d.getElementById(id)) return; js = d.createElement(s); js.id = id; js.src = "https://platform.twitter.com/widgets.js"; fjs.parentNode.insertBefore(js, fjs); t._e = []; t.ready = function (f) { t._e.push(f); }; return t; } (document, "script", "twitter-wjs"));