Bootstrapping of Data and Decisions

Joel Huber, Purdue University
[ to cite ]:
Joel Huber (1975) ,"Bootstrapping of Data and Decisions", in NA - Advances in Consumer Research Volume 02, eds. Mary Jane Schlinger, Ann Abor, MI : Association for Consumer Research, Pages: 515-524.

Advances in Consumer Research Volume 2, 1975      Pages 515-524


Joel Huber, Purdue University

Bootstrapping involves the substitution of a simple linear model of judgments in place of the judgments themselves. It has been found that in many decision-making contexts the bootstrapped decisions are better than the judgments from which they were derived. It appears that the linear model is quite successful at capturing the policy of the judge and then making decisions without human inconsistency. Most of the work done on bootstrapping has been done in a context--such as forecasts--where the criterion of accuracy is clearly defined. This study shows that bootstrapping can be used to upgrade the quality of subjective judgments (data) which have no ultimate criterion of accuracy but are judged in terms of their usefulness as input data to a predictive model. Implications are explored as to the use of bootstrapping of both data and decisions in consumer behavior.

Models of judgment which associate overall worth with a linear combination of the affective and cognitive components are certainly familiar to those involved in consumer research. For example, these models are often used as diagnostic tools by manufacturers interested in changing the image or physical make-up of a product. Bootstrapping represents a different posture toward the modeling of judgments. Linear models are used to replace the raw judgments, rather than, in any rigorous sense, to understand them. Judgments or decisions are represented as linear combinations of cues where the weights are typically derived from a multiple regression using the judgments as the criterion and the cues as predictors. The ensuing decompositional model provides a linear approximation to what the subject is doing as can be inferred from judgments and inputs. Hoffman (1960), who was the first to formally suggest such a procedure, termed such a linear approximation a "paramorphic representation" of the judgment process. These models are paramorphic in the sense that while they might predict quite well, they are not to be construed as models of what the judge is really doing. That is, decision makers may behave as if they are linear machines but that does not mean that, in fact, they are.

When judgments are made in a context where they can be compared with the "true" value, as in a forecast, the linear model has generally been found to be more accurate than the raw judgments. That is, the correlation of the ultimate criterion with the linear model is generally greater than its correlation with the raw judgments used to derive the model. The replacement of raw judgments with this linear combination of cues has come to be called "bootstrapping," a term coined by R. M. Dawes. In effect, the raw judgments "lift themselves up by their bootstraps." Two examples should be sufficient to illustrate this technique.

One of the first studies to suggest that simple linear models of decisions might produce good decisions was Yntema and Torgerson (1961). Subjects were provided with ellipses of different size, shape and color. Worth was defined so as to increase nonlinearly with increases in size, thinness and brownness. After a ten day training period consisting of giving subjects feedback on their predictions of worth, subjects were required to make a battery of 180 judgments without feedback. The average product moment correlation between these judgments and the true worths was 0.84. A simple additive bootstrapping model derived from predicting these judgments as functions of size, shape and color yielded an average correlation with the true values of 0.89. Thus, the additive bootstrapping models were more accurate than the judges in spite of the fact that these models could not take into account interactions, while the human judges presumably could.

Goldberg (1970) used judgments of clinical psychologists to build a model to discriminate neurotics from psychotics. The predictor variables were scores on the Minnesota Multiphasic Personality Inventory, a test which provides a profile of patients along 11 dimensions. The bootstrapping model provided superior predictions of later diagnosis for 26 out of the 29 judges. Similarly, Dawes (1971) found that admissions evaluations have a higher correlation with actual achievement if derived from a bootstrapping model of judgments rather than the judgments themselves. Bowman (1963) and Kunreuther (1969) were able to demonstrate improved decisions in the field of production management, while in marketing, Heeler et al. (1973) and Montgomery (1972) have applied bootstrapping to the decisions of buyers for supermarket chains with similar results.

In conclusion, then, different researchers, working in different fields, have found that a simple linear model of judgments satisfies the objectives of the judge better than the original judgments.


Bootstrapping works because the linear model is able to make extremely good approximations of most decision processes. The model then makes these judgments without random error. Thus by bootstrapping one replaces the random error of the judge with the nonrandom error of the model. This nonrandom error can be broken into two components: (1) a calibration error due to insufficient sample size to reliably estimate the parameters and (2) a specification error due to the inability of the linear model to capture the complexities of what the judge is doing. These two sources of error will be discussed in an attempt to explain why their sum has generally been less than the error of raw judgments.

Calibration error reflects a fairly minor component of the error in a bootstrapping model. Dudycha and Naylor (1966) provided subjects with cues and worths that were related by linear model with different levels of error. After a learning period, the bootstrapped models of judgments on 50 stimuli had average correlations with the optimal model of better than O.90. Even where the beta coefficients appear to be unstable themselves due to multicollinearity, the predictions from such a model tend to be quite stable.

Specification error reflects the inability of the linear model to account for nonlinearities or interactions in the judgment process. The first researchers of paramorphic representation (concentrated mainly at Oregon Research Institute) saw the linear model as a first approximation of the decision process which would be modified later by nonlinear and interactive components. Then a funny thing happened. Adjustments to the linear model provided very little improvement to predictive accuracy. This result was anticipated by Yntema and Torgerson (1961) who found that a main effects model Y = ai + bj + ck accounts for over 90¦/0 of the variance of data generated by the multiplicative model Y = ij + ik + jk (where i, i, and k are integers between one and seven). Thus, one loses very little by approximating such a decision process with only the main effects.

In a large simulation study Rorer (1971) tested the ability of the linear model to approximate data generated by interactive and configural models. These included interactive and configural terms as well as disjunctive and conjunctive step functions and elaborate lexicographic models. The linear model was generally able to account for over 80% of the variance. Furthermore, given reasonable levels of error, the interaction terms in most cases would not be significant. This result appears to be quite general as long as the criterion variable is conditionally monotone with respect to the cues. That is, if the direction of the effect of a cue is the same regardless of the levels of the other cues. Cues which are conditionally monotone appear quite often in judgmental situations. For example, economy, performance, styling and closeness to mid-sized are all attributes which in a rational man might be conditionally monotone with his judgments of overall worth of automobiles. Research has shown that if this is the case then a linear model will do a good job of approximating the judgments. Furthermore, with fallible data the analysis of variance will generally lack the power to measure the incremental gain of a non-linear analysis.

While the bootstrapping of decisions generally produces better results than the decisions from which it was derived, it generally does not produce the best linear decision scheme available. Simply regressing the cues directly on the criterion produces better results than going through judgments (Meehl, 1954). In fact, Dawes and Corrigan (1974) show that linear models with random coefficients (but the correct sign) do as well as the bootstrapping models. Thus bootstrapping models are not to be seen as magical or in any sense optimal linear models, but merely a method for picking the appropriate variables and weighing them in the right direction. Furthermore, if an unambiguous criterion exists, a better model can be derived by regressing the cues directly on it.

It could be argued that there are many situations where the "optimal" linear model derived from a regression of cues on the criterion is less valid than the bootstrapping model derived from a regression of cues on decisions. Consider the admissions problem. Bootstrapping does not do as well as an optimal linear model in predicting rank in class. However, it is feasible that the admissions committee is taking into consideration other goals, such as racial balance or being well-rounded. These considerations would be reflected in the coefficients of the bootstrapping model but not in a model calibrated to class rank. Thus if the objective is to provide a model that satisfies the judge's implicit goals, bootstrapping provides at least a first step in this direction. It is this quality that makes bootstrapping particularly appropriate to many problems in consumer behavior.


The typical validations of bootstrapping have used judgments, such as forecasts, for which the ultimate criterion for accuracy is easily specified. Further, the cues that go into the judgments have been clearly specified and are generally of quantitative form. This study considers the applicability of bootstrapping to data that serves as input to behavioral models and generally lacks the above qualities. A response of a subject to a stimulus or question cannot have ultimate validity but only be considered better or worse to the extent that it can be related to other responses or behavior on the part of the subject. For example, the superiority of a measure of intention to purchase can be ascertained on the basis of its correlation with actual purchases. In the same way bootstrapping will be evaluated on the basis of its effectiveness in improving input to a behavioral model.

The present study represents an attempt to evaluate bootstrapping on preference judgments of particular samples of iced tea. All analysis is done on the basis of the individual subject. The test between bootstrapped and raw judgments is made by comparing which provides better predictions of preference.


The preference judgments from a convenience sample of 22 people were used for this study. Each was required to make judgments on samples of Lipton iced tea that differed over the amount of sugar and tea according to a balanced design. As is illustrated in Figure 1, they were required to make judgments on 7 validation stimuli nested within 16 calibration stimuli.



For each subject the analysis revolved about the following data.

Pi = Preference scale for stimulus i, i=1,16 for the calibration stimuli and i=17,23 for the validation stimuli. This scale was formed for each set from preference differences using Scheffe's (1951) method of analysis modified for analysis of individual data.

dik = Judgment as to the degree to which stimuli i has too much, or too little, sugar (k=1) or tea (k=2). These were coded on an integer scale from -3 to +3, negative numbers indicating too little, zero indicating optimum, and positive numbers indicating too much of the ingredient.

xik = Objective level of sugar (k=1) and tea (k=2) for stimulus i.

Using data on the calibration stimuli, a preference function

(1)   Pi = f(dik) = bo + b1 |di1| + b2 |di2|

is estimated by multiple regression for each individual. This is a version of the familiar weighted-additive model. The absolute value of the dik'S can be interpreted as the distance from stimulus i from the ideal along dimension k. Thus preference is assumed to be a function of the sum of weighted distances along these psychological dimensions. This parameterized model is then used to make predictions on the preferences of the seven validation stimuli.

The bootstrapping model relates the dik's to the real levels of sugar and tea. This is for dimension k

(2)   dik = g(xik) = bok + b1kxi1 + b2kxi2.

The effectiveness of bootstrapping is gauged by whether estimates (dik's) generated from Equation 2 produce better predictions as input to Equation 1 than the raw data (dik's).

The bootstrapping model given in Equation 2 assumes the amount of change desired in tea and sugar is a linear function of the actual values of these variables. If, for a given individual, preferences are single-peaked or monotone within the physical space, then the physical levels will be monotone with the dik's. Furthermore, since linear functions have been shown (Rorer, 1971; Dawes and Corrigan, 1974) to produce close approximation to most monotone functions, the linear model appears reasonable in this case. This is further supported by the fit of the calibration stimuli to Equation 2. The average product moment correlation across subjects was .84 for sweetness and .62 for tea.


The effectiveness of bootstrapped against raw judgments was compared at two junctures of the prediction process: (1) to parameterize the preference equation and (2) as input to the parameterized models. In both cases the predicted sik's were simply substituted for the raw sik's to test bootstrapping. As is shown in Table 1, using bootstrapping to parameterize has relatively little effect while its use on models that have been parameterized produces large and significant gains in prediction.

If one considers the bootstrapping equations to be the first stage in two-stage model, then using bootstrapping to parameterize the preference model is equivalent to two-stage least squares. This procedure has some theoretical advantages in that errors of the bootstrapped values are not correlated with the error terms of the preference scores. In this case, however, the two-stage model did not produce significant gains probably because the errors in the dik's are relatively random and because of the well-known robustness of linear regression to random error in the independent variables.

By contrast, using bootstrapping to produce variables as input to the parameterized models produced large gains in predicting the preferences on the validation stimuli. This result could only have occurred if the bootstrapped values were, in fact, more accurate estimates of subjective sugar and tea than the original data. This technique could be used to upgrade the quality of data in a wide variety of behavioral science contexts. For example, judgments of sportiness in automobiles could be bootstrapped as a linear function of speed, acceleration, width-to-height ratio, and cornering ability. As input to a second model involving product choice, the bootstrapping equations would provide not only greater reliability but also assistance to product designers interested in translating "sportiness" into more objective components.




The use of a model to impart reliability to raw data is hardly novel in behavioral research. The technique perhaps most closely related to the above use of data bootstrapping is spatial or temporal smoothing. Instead of assuming a global linear relationship, these techniques assume local linearity so that each point can be approximated by a simple function of several contingent points. MacKay (1973) used spatial smoothing to impart greater reliability to store-usage data and found that this resulted in better fits and more interpretable solutions to the quadratic regressions which were used to produce market penetration maps. Even in cases where there is not an objective physical or temporal dimensionality, the assumption of linearity is used to improve data quality. Factor analysis assumes a linear relationship between variables and uses this to impart redundancy in the factors. In the same way the bootstrapping equations used in this study can be seen as the use of a particular linearity assumption that increases the reliability of any particular judgment by forcing it to be consistent with the other judgments in the set.

There is a simple preliminary test to determine whether a bootstrapping model will upgrade the quality of data. It requires one replication of the original judgments. A bootstrapping model calibrated on each half is validated against the other half. The average correlation is called the "double cross-validated correlation." This is compared to the reliability of the judgments which is simply the correlation between the two halves. If the model is exact, the cross-validated correlation should approach the square-root of the reliability. It will not be exactly so unless the sample of judgments is infinite. For practical purposes, however, if the cross-validated correlation is greater than the reliability of the judgments, then the bootstrapping model is better at predicting the raw data itself. There are some risks attached to this procedure. One is that the two replications may not be independent draws from the same judgment process but may reflect a change of viewpoint or merely a rote repetition of the first judgments. The second problem is that the bootstrapped relationship, however optimal it might be with respect to the judgments, may not be at all optimal with respect to the criterion or use to which it is being put. In Einhorn's (1972) study, the optimal combination of cues to predict the criterion was disjunctive while the optimal bootstrapping model was conjunctive.

To avoid just these kinds of problems Kunreuther (1969) advocates the use of bootstrapping only when (1) the decision rules are constant over time, (2) the model and resultant coefficients make theoretical sense and, (3) are statistically significant. While these rules apply more in a production programming context than the present one, certainly the call for a priori consideration of the bootstrapping model is valid.


The decomposition models that have been considered here as bootstrapping models are likely to be as good at "capturing" consumer decisions as they have been of capturing the decisions of admissions committees, psychologists and production planners. It is also likely that the same frustrating search for interactions and mediator variables that has occurred in these areas will be paralleled in research on consumer behavior.

The user of bootstrapping, to the extent that he is more concerned with approximating than understanding decision processes, is content to use the robust linear models and rest secure in the knowledge that rare combinations of cues and nonlinearities have little effect on explained variance. Such predictive, decompositional models can, however, be quite useful in the study of consumer behavior. They could be used as a preliminary normative step to enable the consumer to understand the implications of his own decision processes. Alternatively, they could be used by regulatory agencies as a first approximation of what the consumer decision process is. Values then might be inferred from decisions rather than imposed from above.

Data bootstrapping is likely to have more limited use in consumer behavior. It can be seen as one of a number of methods to provide reliability in data through the structure of a model. Furthermore, just as such models should be consistent with existing theory, they can add to it by explaining, at least in a preliminary way, the sources of the raw judgments. Thus, data bootstrapping can provide more than just reliable data, it can provide a basis for a understanding of its own validity.


Bowman, E. H. Consistency and optimality in management decision making. Management Science, 1963, 9, 310-321.

Dawes, Robyn M. A case study of graduate admissions: Application of three principals of human decision making. American Psychologist, 1971, 26, 180-188.

Dawes, Robyn M. & Bernard Coorigan. Linear models in decision making. Psychological Bulletin, 1974, 18, 95-106.

Dudycha, L. W. & J. C. Naylor. Characteristics of the human inference process in complex behavior situations. Organizational Behavior and Human Performance, 1966, 1, 110-128.

Fisher, R. A. Statistical methods for research workers. Edinburgh: Oliver and Boyd, 1925.

Goldberg, L. R. Man versus model of man: A rationale, plus some evidence, for a method of improving on clinical inferences. Psychological Bulletin, 1970, 73, 422-432.

Heeler, Roger M, Michael J. Kearney & Bruce McHaffey. Modeling supermarket product selection. Journal of Marketing Research, 1973, 10, 34-37.

Hoofman, P. J. The paramorphic representation of clinical judgment. Psychological Bulletin, 1960, 57, 116-131.

Kunreuther, Howard. Extensions of Bowman's theory of managerial decision making. Management Science, 1969, 15, 8.

MacKay, David B. Spatial measurement of retail store demand. Journal of Marketing Research, 1973, 10, 447-453.

Meehl, P. E. Clinical versus statistical prediction: A theoretical analysis and review o the literature. Minneapolis: University of Minnesota Press, 1954.

Montgomery, David B. New product distribution: An analysis of supermarket buyer decisions. Marketing Science Institute Research Programs, Cambridge, Mass., 1973, 63.

Rorer, L. G. A circuitous route to bootstrapping. In H. B. Haley, A. G. D'Costa and A. M. Schafter (Eds.), Conference on Personality Measurement in Medical Education. Wash., D.C.: Associations of American Medical Colleges, 1971.

Scheffe, H. An analysis of variance for paired comparisons. Journal of the American Statistical Association, 1952, 47, 381-400.

Yntema, D. B. & W. S. Torgerson. Man-computer cooperatiOn in decisions requiring common sense. IRE Transactions of the Professional Group on Human Factors in Electronics, 1961, HFE-2(1), 20-26.