Subject Impressionism: a Methodology For Capturing Private Judgments and Measuring Inconsistency

Joseph V. Anderson, McIntire School of Commerce, University of Virginia
ABSTRACT - The use of incentive and accountability mechanisms has been thought to improve the accuracy of public judgments, and their consistency with private judgments. Verification of these effects has been problematic due to the inability to measure private judgments. This paper reports a methodology which captures private judgments and in so doing finds that incentive and accountability mechanisms may be counter productive.
[ to cite ]:
Joseph V. Anderson (1983) ,"Subject Impressionism: a Methodology For Capturing Private Judgments and Measuring Inconsistency", in NA - Advances in Consumer Research Volume 10, eds. Richard P. Bagozzi and Alice M. Tybout, Ann Abor, MI : Association for Consumer Research, Pages: 691-695.

Advances in Consumer Research Volume 10, 1983      Pages 691-695


Joseph V. Anderson, McIntire School of Commerce, University of Virginia

[The author wishes to thank Brian Sternthal, Joshua Eliashburg and the blind reviewers for helpful comments on an earlier draft of this article.]


The use of incentive and accountability mechanisms has been thought to improve the accuracy of public judgments, and their consistency with private judgments. Verification of these effects has been problematic due to the inability to measure private judgments. This paper reports a methodology which captures private judgments and in so doing finds that incentive and accountability mechanisms may be counter productive.


Much of consumer research is based on the use os self-report data and/or respondents' Judgments (let us call them collectively "judgments"). The use of judgments is common in theoretical works on Bayesian Inference, Utility Theory and Decision Theory, as well as in studies of attitude and intention. In addition, judgments play a major role in applied marketing research, where they are incorporated into models addressing the marketing mix and consumers' responses to its manipulation. Clearly, judgments pervade much of what we do, either at the conceptual core or at critical junctures along the route to practical application. Yet despite their wide spread use, the veracity of consumers' judgments is problematic. On the one hand they may be inaccurate. On the other hand, judgments might be inconsistent.

The inaccuracy problem is fairly straightforward. The assessor (as we shall call the person actually making the judgments) does not possess perfect knowledge. Further, the world is a complex and dynamic place. The combination of bounded rationality and uncertainty yield a situation in which inaccuracy is inevitable, and which evidences itself in observable differences between public judgments and objective reality and/or subsequent behavior. Inconsistency, in contrast, looks not only at public judgments, but at the underlying private judgments as well, asking whether the public judgment is a clear reflection of what the assessor truly believes within the privacy of his own mind. There is no interest in the relation of public judgments to reality or subsequent behavior. The fact of the matter is that private judgments often become altered on their way to becoming public judgments. We can see it in our own lives, whether we alter judgments to further our own cause, use "white lies" to protect the feelings and welfare of others, second guess our gut feelings, or constantly alter our public judgments in an attempt to appear, and be, able to cope with the uncertainty around us. In summary, Figure One shows that accuracy is concealed with matching public judgments and reality or subsequent behavior, while ignoring private Judgments. Consistency, in contrast, concerns itself with matching private and public judgments, while ignoring what reality or subsequent behavior might show to be "true" answers.



The inaccuracy problem has been dealt with by "ball-parking", a method, or actually a number of different methods, which assist the assessor in narrowing the range of possibilities to a feasible set with a very small variance. These efforts may entail the use of an ever narrowing range of betting tradeoffs, interrogation and feedback, and/or "group think" methodologies such as the Delphi Method (Seaver et al 1978; Winkler 7967). Adjustment methodologies have also been constructed to correct these judgments once they have been formulated. This approach is facilitated by the fact that assessors tend to bc inaccurate in a systematic rather than random manner, indicating bias and not confusion (Kahneman & Tversky, 1980). The presence of consistent bias, however indicates that private judgments are a key operator and therefore our focus moves from inaccuracy back to inconsistency which appears to be the real key. The inconsistency problem, however. has not yet been adequately resolved though several approaches have been attempted. [It might be argued that the author is actually rehashing questions of reliability. Such is not the case. First, reliability only addresses constancy within the same level (public). The present study addresses constancy between levels (private to public). Second, reliability is r.of concerned with the causes of inconsistency - nor in remedying anything more than symptoms. The present study addresses causes.] Winkler (1967A) reported on the use of several mechanisms that sought to bring private and public judgments into alignment. These included direct interrogation, hypothetical lotteries, and probability distribution graphs. Gonik (1978) used a cash incentive plan in which salesmen were encouraged to match forecasts to their gut level private judgments. Roshwalb (1975) utilized a dual question approach, first eliciting public judgments and second, probability estimates of the veracity of that public judgment. Good (1965) used actual monetary bets as a way to encourage assessors to report their actual private judgments, and finally, de Finnetti (1962) developed a set of scoring rules aimed at making assessors self regulating. Winkler (1967) claims that it is possible to show that scoring rules oblige the assessor to express his private judgment. But to this point the proof lies in mathematical logic, arc r.of in empirical evidence. In fact, all of the above methodologies falter at this very point, for none of them have been able to actually acquire a measure of private Judgments. Therefore, neither have they been able to obtain a meaningful measure of inconsistency, which is, after all, the variable of interest. Even Winkler (1967), in his attempt to combine the above methods, has run into the same roadblock. He is therefore in the rather uncomfortable position of saying that methods and strategies must show themselves capable of forcing consistency (Winkler 1967), and responding to himself (Winkler 1971) that we can do no more than assume that consistency is realized. And he bases his rather shakey assumption of success on the tenuous evidence of post-experimental comments by the assessors that they felt compelled to be consistent.

We are therefore left with the proposition that consistency is extremely important, the fact that various strategies have been formulated to encourage its practice, and the and observation that consistency is impossible to measure directly - because there is no way for us to collect private judgments. Obviously, a methodology which enabled us to capture private judgments (rather than merely infer them as in conjoint analysis, multidimensional scaling, etc.) would be helpful.To this end, an experiment was conducted in which respondents were asked to answer a series of judgment questions utilizing a measurement device thought to be useful in capturing the long sought private judgment. Clearly the major purpose of this study was a methodological one - the verification of this device as an effective measure of private judgments. However, it also seemed feasible (on the assumption that private judgments could be captured) to incorporate a theoretical test of the causes of inconsistency.

Traditionally, theorists have used two approaches in dealing with the inaccuracy/inconsistency problem. The first has been to raise the salience of accuracy by offering incentives for correct judgments (Gonik, 1978; Good, 1965). The second has involved the invocation of accountability using direct interrogation, iterative processes or scoring rules (Winkler 1967A, de Finnetti, 1962; Roshwalb, 1975). The assumption seemingly underlying both approaches is that increased attention is stimulated in the assessor, thereby improving results. In essence a defacto premise exists that private and public judgments differ only in their chronological order. The present study, in contrast, is based on the premise that there is a fundamental difference between a private judgment and a public one. By virtue of its privacy alone, the private judgment is subject to considerably lower levels of risk, at both an egoistic and functional level. A public judgment however, is fraught with risk. As a result, we may actually be confronting a fairly classic case of threat. If such is the case. then the very mechanisms (incentives, accountability) thought to lessen the problems of inaccuracy and inconsistency, may actually vitiate matters. We should expect the situation to possess a greater degree of threat when assessors are made personally accountable and when there is the possibility of gaining or losing some incentive (Johnson, Feigenbaum and Weiby, 1964; Feather and Simon, 1971). This increase in threat should cause a greater degree of inconsistency and inaccuracy because of over focusing of attention, somewhat similar to Kahneman and Tversky's (1980) notion of moving from distributional to case data.



Twenty-three respondents were recruited from an undergraduate course in Marketing Management. The study was announced at the previous class session and was introduced as an attempt to investigate how people make subjective judgments. In addition. it was stated that the results of this particular group would be compared to those obtained from three other test groups.


Each respondent was given a booklet consisting of two sheets of paper stapled together. Each sheet included the same series of questions and response sets. However, the sheets differed in several key respects. The top page was labeled "Work Sheet", and had no space for the respondent to record his name. Respondents were assured that this sheet was for their own private use and that no one would ever see that sheet except themselves. In fact, they were advised that at the end of the exercise they could take it home if they so desired. The second sheet was labeled "Report Sheet" and had a space for the respondent to provide his/her name. The respondents were informed that this sheet would be handed in. It was felt that the differing formats of the two pages would closely approximate the private pondering and subsequent public pronouncement of judgments that take place in actual decision processes. It was reasoned that whatever the respondent recorded as his apparently private judgments on the work sheet would be saved by leaving an impression on the sheet below and therefore passed on (unwittingly) for the perusal of the experimenter. A forced choice answering system utilizing computer scanner type coloring blocks on an interval scale was used in to receive a decipherable impression at a predictable location. The layouts of the worksheet and report sheet were reversed so that answer sections did not overlap, and respondents were issued ball point pens (using a cover story of an experimental scanner that read only ink) to provide a writing instrument of uniform hardness. To eliminate the possibility of second guessing and erasures, which yield indecipherable or multiple impressions, respondents were instructed not to change answers once they were recorded. That, after all was the purpose of having a worksheet. And finally, an off-handed remark that "neatness counts" was used since the researcher had previously noted that such a comment sets off an obedient response of squaring up the booklet, thus providing proper alignment.

After all instructions had been given, respondents filled out the worksheet of the test booklet followed by the top sheets of two other booklets, which served as a check on serial order effect checks. The independent variables, consisting of accountability and incentive inductions were then administered via slips of paper randomly distributed. Respondents were giver. a moment to process these, then they were instructed to fill out the report sheet of the test booklet followed by the bottom sheets of the other two booklets. Then as a final check to see if the report sheet was indeed a good measure of public judgment, respondents were called on by name to orally supply answers for two of the questions from the report sheet. The report sheets were then collected and the respondents debriefed.

Two independent variables were manipulated in this study: accountability art incentive. The accountability induction was manipulated between anonymity (not accountable) and identification (accountable) conditions. In the not accountable condition respondents were instructed to make up a number known only to themselves and put it on the report sheet and the bottom sheets of the other two booklets so that results could be tabulated. In the accountable condition respondents were instructed to follow the same procedure, except that they were told to use their name. The second independent variable, incentive, involved manipulating the respondent's awareness of an evaluation criteria based on accuracy and the existence of a prize. Approximately half the group was informed "You will be evaluated according to how accurate you are.

The person with the best accuracy score wins a free dinner." (incentive present condition). The other half received no information (incentive absent condition). These two variables were crossed to yield a 2x2 design.

In addition, to check on the possibility that the type of question used might effect consistency and accuracy, (Granbois & Summers, 1975; Byrnes, 1964) half the questions were probability questions and half the questions required a forced range quantitative answer. Probability questions included: the probability that Mount Saint Helens would erupt again in 1980, and the probability that at least 900 Cuban refugees would be permanently placed in Kansas City. Quantitative questions included the distance to the moon, and the annual rainfall in Pakistan. The analyses in which accuracy was the dependent measure were based only on the quantitative questions as there is no readily verifiable "true" answer for the probability questions. It will be noted that the questions utilized on the instrument were somewhat less personal, and more objective than those usually incorporated in consumer research. It was felt that objective, impersonal questions would provide a clearer test of the present methodology.

Two dependent variables were of major interest. The first was inconsistency. Responses were coded using a 0/1 switching rule, receiving a zero if there was no change between the worksheet and the report sheet response for a given question, and a one if there had been a change. A qualitative evaluation of pre-test data indicated that the actual magnitude of changes would not vary significantly from one treatment condition to the next so the switching rule option presented itself as a parsimonious method that sacrificed very little information. These scores were later transformed additively (Nunnally 1978) to avoid any problems that might come from analyzing zeros. The second major variable was inaccuracy. Winkler (1967) seems to work on the assumption that accuracy improves when methods of consistency-forcing are used. Since the assumption is that we can now measure true private beliefs, it seemed possible to also investigate this accuracy assumption. The forced choice answers were designed as equal interval measures, and coding was performed b) recording the number of intervals that a given response lay from the correct answer. This was done in absolute value terms because, except for one question for one respondent, both Judgments stayed on the same side of the correct answer. Therefore the concern was with convergence or dispersion rather that direction in a positive or negative sense.


Treatment means, standard deviations and cell sizes appear in Table A.



The effects of accountability and incentive on inconsistency were assessed by analysis of variance. It was found that accountability exercised a marginally significant main effect on inconsistency, such that subjects were more inconsistent when they were held accountable for their judgments than when they were not accountable IF(1,19)=3.18,P .10]. In addition, inconsistency exhibited a significant incentive main effect IF(1,19)=4.61,P<.05] such that subjects were more inconsistent when an incentive was present. Both of these main effects were qualified by an accountability x incentive interaction that was strongly significant [F(1,19)=9.14,P<.0005]. A Newman-Keuls test on the cell means showed that the incentive present/accountable subjects were more inconsistent than any of the other treatment groups (all comparisons exceeded a difference of 1.40 with Q=2.96, P<.05). Further, there were no significant differences among the other three treatment groups.

In addition to data on inconsistency, the experiment also provided data on inaccuracy. For the purpose of analysis, the inaccuracy scores were coded as interval measures of distance from the correct answer. An overall test was initially run on the data to determine if there was a significant difference- between the accuracy scores on the worksheet (Xw) and the report sheet (Xr). Based on the mean difference of the two sheets and a null hypothesis of (Xr - Xw - ()) the t test showed the differences to be insignificant (Xr Xw - .26, SD - 1.79," = 23, t = .48, p>.10). Despite the lack of significance between private and public accuracy, the lata seemed to suggest that differences within the public sphere might be accounted for by the various treatments. To pursue this matter further an analysis of variance was performed on the report sheet scores, categorized by independent variables. Neither the main effects nor the interaction showed even marginal significance (all p's>.10). This point was further emphasized by a Newman-Keuls test (all comparisons p >.05).

As in Winkler's (1967) experiment, the respondents were asked during debriefing if they felt their scores had improved. And, 85 in Winkler's study, the verbal comments were affirmative. However, as the empirical results show, opinion and fact may differ.

Further evaluation of the data indicated a clear lack of serial order effects (t<1). Likewise, it was found that there was absolutely no difference between the report sheets and the oral reports (t = 0). Thus we can be fairly confident that the report sheet did, in fact, serve as a good approximation of public judgments, even for those in the anonymous conditions.

In the final check there was no significant difference between probability and quantitative questions (: 1 for both inaccuracy and inconsistency scores) which seems to contradict the assumptions by other authors that the type of questions used would effect results. Given the outcomes of these checks, we can assume that the observed results were the product of the manipulated variables and not the result of other noise in the system.


The results of this study indicate that it is possible to obtain some measure of private judgments - or, at least, a very close surrogate. Further, the present study indicates that private and public judgments can vary, and do so in response to manipulations. It is felt that these findings are of interest in that their veracity had existed before only as a result of logic and not empirical evidence. In addition, the data suggest that two major efforts in dealing with problems of inconsistency and inaccuracy are counter productive. The evidence shows that incentives and accountability have no beneficial effect on accuracy, while they exercise a detrimental effect on consistency.

These findings provide support for the threat oriented theory presented earlier in this paper. Consistent with the predictions of that theory, the accountability and incentive conditions were both found to yield significant main effects. But these were qualified by an overwhelmingly significant interaction effect. This would tend to confirm the reasoning put forth to explain the predicted significance of the interaction. The use of incentive on an anonymous respondent lacks full effect because the assessor feels relatively immune to evaluation. Likewise, accountability in the absence of incentive has limited effect (accountable for what one might ask). It is primarily when the two are combined that a successful threat induction is made, thereby causing inconsistency to rise significantly.

As a further point of interest, the data indicate that there is indeed a fundamental difference between making private and public judgments even in the absence of experimental inductions. A t test was performed on the not accountable/incentive absent inconsistency cell as subjects moved from the work sheet (private) to the report sheet (public). This cell represented a control condition where the only force operating was the move from private to public. The results were significant (X = .60, SD = 1.58," = 5, t = 2.50, p<.05). This tends to confirm the assertion that public judgments represent some more complex activity than merely reporting private judgments, and that perhaps the inherent threat of "going public" may be the driving force behind inconsistency. The treatment effects merely heighten the threat.

From a pragmatic perspective the present findings suggest that managers eliciting forecasts from sales personnel and researchers eliciting judgments from consumers would be best served by not using accountability or incentive mechanisms. Not only is the cause of accuracy not served, but the assessors tend to inject so much inconsistency into the process that it becomes difficult to know if one is dealing with fact or fantasy. However, the findings also indicate a more useful approach to judgments. This would seem to be the situation in which several anonymous assessors are asked to formulate judgments under circumstances where no specific accuracy criteria or incentives are used. While it is admitted that accuracy itself is not improved under this condition, at least consistency is very high. And in some situations honest judgments may be more important that accuracy per se. An interesting example of this was the procedure adopted by Secretary of Defense Robert McNamara when he commissioned a high level, top secret evaluative study of the prelude to, and conduct of, the Viet Nam War. The fact that The Pentagon Papers proved to be a rather insightful and damning indictment of the very administration which commissioned the study may be attributable to the consistency engendered (Sheehan et al, 1971).

From a theoretical perspective, the present research provides evidence for the complementarity of two threat related theories. Ego defensiveness may be used to explain the effects on consistency when personal accountability is induced. Self perception may be used to explain the effects on consistency when accuracy is made salient via incentives. And together, the two may explain inconsistency as a self protective reaction against the potential of negative evaluation coming either from oneself or from others.

The strongest contribution of this study is thought to be in its methodology. It has been shown that private judgments are fundamentally different than public judgments. It has also been demonstrated that it is possible to measure private judgments and therefore investigate questions regarding the causes of inconsistency and inaccuracy. And while evidence has been found to indicate the efficacy of the threat related theory forwarded by this paper, the author's opinion is that its major contribution is that it opens a door which has been an obstacle for a considerable time. Yes, we can indeed capture private judgments.

Further research may build on this work in several directions. First, it would be helpful to investigate exactly what it is that causes the natures of private and public judgments to be fundamentally different. A cursory explanation has been offered in this paper, but it seems far from conclusive. Second, further research is needed to confirm the theoretical explanation for inconsistency forwarded by this paper. In addition, our new found ability to measure inconsistency has brought with it some very intriguing questions. What causes assessors to chronically adjust their public judgments in one direction or another? A number of possible explanations exist - ranging from built-in bias correction, to Pollyanna effects, to risk aversion. What causes the means of consistency to remain static while the variance fluctuates dramatically, or vice versa? Is there a predictable pattern to all this inconsistency, and are there means to suppress it so that we may attain "honest" judgments? Fourth, now that it is possible to separate consistency and accuracy, it may finally be possible to investigate means by which we can improve accuracy, by dealing with the pivotal formulation stage that goes on at the private level. Fifth, research is needed focusing on consistency when dealing with matters of opinion, attitude and intention. These subjective judgments are the real focus of consumer research. The objective judgments used in the present study were used for the sake of clarity. And finally, it is hoped that the methodology presented in this paper will enable researchers to investigate and successfully come to grips with issues that are at present far beyond the preview of the author. For as the proof of a pudding is in the eating, so the proof of a tool is in its use


Byrnes, J.C. (1964), "Consumer Intentions to Buy," Journal of Advertising Research, 4, 49-51.

de Finnetti, B. (1962), Does it Make Sense to Speak of 'Good Probability Appraisers'?" in I.J. Good (ed) The Scientist Speculates - An Anthology of Partly-Baked Ideas, New York: Basic Books, 357-63.

Feather, N.T. and Simon, J.G. (1971), "Attribution of Responsibility and Balance of Outcome in Relation to Initial Confidence and Success and Failure of Self and Other," Journal of Personality and Social Psychology, 18, 173-188.

Gonik, J. (1978), "Tie Salesmen's Bonuses to their Forecasts," Harvard Business Review, May-June, 116-123.

Good, I.J. (1965), The Estimation of Probabilities - An Essay on Modern Bayesian Methods, Cambridge: MIT Press.

Granbois, D.H. and Summers, J.O., (1975), "Primary and Secondary Validity of Consumer Purchase Probabilities," Journal of Consumer Research, vol. 1, 31-38.

Johnson, T.J., Feigenbaum, R., and Weiby, M. (1964), "Some Determinants and Consequences of the Teacher's Perception of Causation," Journal of Educational Psychology, 55, 237-246.

Kahneman, D. and Tversky, A. (1980), "Intuitive Prediction: Biases and Corrective Procedures," Management Science.

Nunnally, J.C. (1978), Psychometric Theory, New York: McGraw-Hill. 120-121.

Roshwalb, I. (1975), "A Consideration of Probability Estimates Provided by Respondents," Journal of Marketing Research, vol. xii, 100-103.

Seaver, D.A., von Winterfeldt, D. and Edwards, W. (1978), "Eliciting Subjective Probability Distributions on Continuous Variables," Organizational Behavior and Human Performance, 21, 379-91.

Sheehan, N., Smith, H., Kenworthy, E.W., and Butterfield, F. (1971), The Pentagon Papers, New York: Bantam Books.

Winkler, R.L. (1967), "The Quantification of Judgement: Some Methodological Suggestions," Journal of the American Statistical Association, vol. 62, no. 320, 1105-1120.

Winkler, R.L. (1967A), "The Assessment of Prior Distributions in Bayesian Analysis," Journal of the American Statistical Association, vol. 62, 776-800.

Winkler, R.L. (1971), "Probabilistic Prediction: Some Experimental Results," Journal of the American Statistical Association, vol. 66, no. 336, 675-685.