Effects of the Cooperative Group Decision-Making Context on the Test-Retest Reliability of Preference Ratings

Kim P. Corfman, New York University
ABSTRACT - An approach to modeling test-retest measurement error as a function of contextual factors is presented. Two situational factors are proposed that may contribute to low reliability when a subject provides preference ratings independently and then with the knowledge that he or she will be making a decision on the same stimuli as a member of a cooperative group. These factors are the group's decision history and empathy.
[ to cite ]:
Kim P. Corfman (1986) ,"Effects of the Cooperative Group Decision-Making Context on the Test-Retest Reliability of Preference Ratings", in NA - Advances in Consumer Research Volume 13, eds. Richard J. Lutz, Provo, UT : Association for Consumer Research, Pages: 554-557.

Advances in Consumer Research Volume 13, 1986      Pages 554-557


Kim P. Corfman, New York University

[The author wishes to thank Donald R. Lehmann and Punam Anand for their helpful comments on earlier drafts of this manuscript.]


An approach to modeling test-retest measurement error as a function of contextual factors is presented. Two situational factors are proposed that may contribute to low reliability when a subject provides preference ratings independently and then with the knowledge that he or she will be making a decision on the same stimuli as a member of a cooperative group. These factors are the group's decision history and empathy.


Unless the reliability of measures used in studies of consumer behavior is established, it is not possible to demonstrate the validity of the instruments used or determine the reason for low correlations among constructs (Peter 1979). The test-retest method of reliability assessment is designed to establish the proportion of systematic variance in a measure by administering it twice to the same subjects under conditions that are as similar as possible. As it is rarely possible to exactly duplicate measurement conditions, it is useful to attempt to identify the differences in context from one administration of an instrument to the next. This paper proposes that the inconsistency found in test-retest measures can be partly explained by identifiable changes in the measurement context

Specifically, it suggests that preference ratings that members of long-term cooperative groups (e.g., families and buying committees) provide when they are questioned individually (with no reference to the group) will differ from those they provide when they know they will shortly be making a decision about the same stimuli with their group. The latter situation will be referred to as the cooperative group decision-making context. It is proposed that these differences are due to the group's decision history - the effect of the outcomes of the group's past decisions on the way a group member values or rates the alternative he or she prefers in future decisions (Corfman, Lehman and Steckel 1985) - and to the empathy members may have for each other and each other's preferences (Burns and Granbois 1977. Davis 1976).


Classical reliability theory suggests that a subject's observed response to a measurement scale is composed of two parts, a true and an error component (Guilford 1954, Kerlinger 1973). The true component is presumed to be stationary over time and neither component can be observed alone. What is observed is the true score plus or minus an error score.

This error has been termed either random or nonrandom (systematic) error (Carmines and Zeller 1979). Nonrandom error has a systematic biasing effect on measuring instruments, causing them to register consistently high or low scores. Test-retest measures of reliability cannot assess the level of this kind of error because it is present in both the test and retest settings. They can, however, measure the amount of random error. This is because it is assumed that random error is equally likely to move the observed score up or down from the true score, so that over a very large number of administrations of an instrument the mean score will be very close to the true score.

In this paper another kind of error, context error, is discussed which tends to show up as random error in test-retest methods. This error is not a consistent biasing of responses either higher or lower than the true score, nor is it random and equally likely to move the observed score higher or lower than the true score. It is caused by factors which vary from one administration of the instrument to the next (and, perhaps, from subject to subject). Context error is systematic in the sense that it is a function of changes in the situation or context.


Figure 1 represents the basic model. It differs from the usual representation of the test-retest method (Carmines and Zeller 1979) in two ways. First, due to changes in the context, what we are attempting to measure in the retest may be the true score of another phenomenon - a different set of preferences for a different situation. Preferences may have changed due to the measurement context. (An alternative way to view this is that the true score has changed.) Therefore, there is a potentially different true score for each period. The second difference is that in both the test and retest conditions the context may affect how the subject reports his or her true score. If the contexts are different, even if the same true score is being measured, the observed score may change. (Random error will also contribute to changes in observed scores).

For example, a husband may have a different true preference score when questioned alone than when he is questioned with the knowledge that he will shortly be evaluating the same stimulus with his wife. Concern for her preferences may cause him to incorporate what he thinks she would prefer into his own rating. This may reflect either genuinely different preferences or simply a desire to accommodate his spouse.



If XT1 and XT2 are the subject's true test and retest scores, and X01 and X02 are the subject's observed scores, the traditional model is,

XO2 = b1 XT1 + e    (1)

If changes in context affect response, the model in (1) is underspecified. If these changes are measured, their contribution to lower test-retest reliability can be estimated. Z1 is the effect of the context in the first administration of the instrument. Since the difference between the contexts of the two tests is of interest, Z1 is set to zero and the value of Z2 indicates the change. (Clearly, as many context factors as are appropriate may be specified.) The following equations result from the model depicted in Figure 1:

XO1 = b1 XT2 + e1     (2)

XT2 = b2 XT1 + b3 Z2     (3)

XO2 = b4 XT2 + b5 Z2 + e2    (4)

Substituting (2) and (3) into (4) produces (5).

XO2 = b1 XO1 + b2 Z2     (5)

It will not be possible to determine whether the new context is eliciting a new true score or whether it has simply changed the way the subject reflects the true score in his or her rating. It will, however, show the relative importance of different environmental or contextual factors to test-retest reliability.


Many models of group choice weight individual preferences with the group members' relative influence to predict what the group will choose (Choffray and Lilien 1978, Davis 1973, Keeney and Kirkwood 1975, Krishnamurthi 1981, March 1966). Whether individual preferences are assessed alone or in the presence of the group will lead to important differences in interpretation of results if the context affects the observed preference scores.

The cooperative group context often causes a subject to give different responses than he or she would in an independent (non-group) context. In a recent examination of how individual preferences relate to group decisions, Corfman (1985) had 124 spouses (62 married couples) individually rate a set of 54 stimuli by assigning from O to 100 points to each. These ratings were used to create a unique set of 12 to 18 stimulus pairs for each couple so that the items in each pair were ranked differently by the spouses (i.e. one spouse preferred the first alternative and the spouse preferred the second). An average of 19 days later the experimenter met with the couple and presented the stimulus pairs one by one. As each pair was presented, the spouses first rated the alternatives individually on 100-point constant sum scales (group context ratings) and then made a joint decision on which of the two to acquire.

All together the spouses rated 1668 stimulus pairs (124 spouses each rating 12 to 18 pairs). Since the first set of ratings was on a different scale from the second (a O to 100 point scale for each stimulus vs. a 100 point constant sun paired comparison of the stimuli in the pair) only the cases in which the preferred alternative changed are examined. Of the 1668 rankings, 541 (32%) changed from the first to the second test. This undoubtedly under-represents the total number of rating changes, some of which did not result in rank changes.

Some of these changes may have been due to the differences in the way subjects used the two rating scales or to changes in preference with time, although the number is large enough to suggest there are other explanations. To investigate this possibility, 24 spouses provided extra sets of individual group context ratings on the 100-point scales used in the first test. Immediately before the sequence of decision pairs was presented to the spouses in each of the 12 couples, they provided extra ratings of those stimuli which were about to appear in decision pairs. These couples provided a total of 312 pairs of individual ratings (24 spouses each rating an average of 13 pairs). The average point change from the first to the extra test was 21.47 points. The following rank changes occurred for this group of 24 subJects in their ratings of the decision pairs used in the group task:


This illustrates the potential importance of context to the correct interpretation of the scale responses. If subjects are questioned alone and the group context affects their true scores, a model's parameter estimates may be biased and it may not perform as well as it would were the relevant preferences measured. If subjects are questioned in the group context and the model estimated, it should not be assumed that preference ratings obtained from members in isolation will predict as well. This paper suggest two factors, decision history and empathy, that may affect members' preferences and preference ratings in the cooperative group decision-making context.

Decision History

The first situational factor that is considered is the group's decision history. Assuming our interest is in groups that are not newly formed (such as families, committees, and buying centers) a particular decision made by a group is likely to be one of a sequence the group has made in the course of its existence. If there has been conflict, the outcomes of past decisions will represent wins and losses and may affect members' feelings about future decisions. Specifically, decision history is the effect of the outcomes of past decisions on the way the group member values the alternatives he or she prefers in future decisions. As a result of having lost in the past an individual may exaggerate the value of his preferred alternative in a subsequent decision and rate it more highly than he would have out of the context of the group decision (Corfman 1985, Corfman, Lehmann and Steckel 1985). Conversely, having had his choice adopted by the group, an individual may express future preferences less extremely. These effects may occur because individuals genuinely value the alternatives differently - their preferences have changed - or because the desire to win or restore equity (Adams 1965) is being reflected in the preference ratings. In either case, the preference ratings will differ from those provided outside the context of decisions made by the group.

Decision history can affect preferences and preference ratings in another way. A group member who has had his selection adopted by the group in the past may place higher values on the alternatives he prefers the more he gets his way. Having his preferences confirmed may give him confidence and lead him to express his preferences more firmly, or winning may be addictive and the desire to win reflected in the ratings. Similarly, a member who has lost in the past may respond with decreased confidence or a sense of futility and rate his preferred alternatives more moderately. Again, changes in ratings may be due either to real changes in preferences or to the reflection of desire to win.


The other context effect that will be considered here is the empathy group members may have for each other and each other's preferences (Burns and Granbois 1977, Davis 1976, Greenhalgh, Neslin and Gilkey 1984, Olson 1969). This effect will be particularly important in primary groups (Faris 1937), such as families, and other groups such as social clubs and even some work groups that are characterized by strong commitment or attachment bonds (McCall 1970). In these groups, the satisfaction of other members and preserving relationships with them may be as or more important to a member than having his way in a decision. When asked alone which of several alternatives a husband (or wife) prefers, he may give a different response than he would in the context of making a decision with his wife where he may be less able or willing to distinguish his preferences from hers. Empathy may actually cause changes in individual preferences or it may just reflect the desire to accommodate a spouse. In either case it will contribute to the difference between ratings given independently and those given in the group context.

If a decision history or empathy effect is present and important, then use of independent preference ratings in models of group choice is not appropriate. Even if true independent preference scores are the same, these independent ratings will not reflect subjects' preferences at the time the joint decision is made, due to the group decision-making context.


The general model in (5), modified to include the effects of decision history and empathy in a two-person group, appears in (6). It may be generalized to larger groups by using the subject's perceptions of other members' average preference if the analysis is performed across groups, or by adding a term for each additional member if the analysis is performed across decisions made by a single group. Presented this way, decision history is a variable and empathy is bs, the parameter associated with the member's perception of the other member's preferences.


X01 and X02 are the subJect's ratings of the stimulus alone and in the group context, respectively. If an experiment is used and the group context ratings are provided all together before the group decisions begin, no effect of decision history will be apparent, although it may well play a part in the group's decisions. For the most accurate measures of the individual preferences that will contribute to the group decision, the ratings should be provided immediately preceding the corresponding group decision.

Given that the outcomes of a sequence of decisions made by each group have been recorded, decision history may be operationalized in at least two ways. It may be simply whether the member had the alternative he or she preferred chosen by the group in the preceding decision. Alternatively, it may be the proportion of preceding decisions in which the alternative he or she preferred was chosen.

The subject's perception of the other member's preferences may be solicited directly, although reactivity may present a problem. If the subject is asked first what he believes the other member prefers it may encourage the use of this information in the subject's reporting of his own preferences. If asked afterward, the subject may bias his report of the other member's preferences. An alternative approach would be to use the other member's self-reported preferences. In a pre-existing cooperative group whose members are well-known to each other and have established preferences for the decision alternatives, this may be a better solution.


H1: The subject's initial observed score, XO1 should clearly account for the largest portion of the variation in the rating he gives in the group context, XO2 . The parameter b1 is an indication of the measure's reliability and should lie between zero and one. It should be less than one because when looking across a large number of ratings, a regression to the mean effect is more likely than consistently more extreme ratings.

H2: The DECISION HISTORY parameter, b2, should be negative: the less the subject's preferred alternative has been chosen in the past, the more highly he will rate his choice in future decisions (and vice versa). The alternative hypothesis suggested earlier is also plausible: the more the subject has had his way in the past the more highly he will rate his choice in the future. Exploratory research indicates that the former is the more likely phenomenon in cooperative groups (Corfman 1985, Lehmann and Steckel 1985).

H3: Due to empathy, b3 should be positive - the subject's rating of his preferred alternative in the group context will be positively related to the way he perceives the other member's preference for the alternative. If member A believes that member B likes the alternative A prefers even more than A does, A will rate the alternative more highly in the group setting than he does independently. If he thinks B likes the alternatives less, he will rate it lower in the group setting.

It is possible that "empathy" will be stronger for some subjects and groups than others due to personality and relationship characteristics. Subjects who are generally more caring, accommodating, or dependent, or who dislike conflict may tend to alter their preferences or preference ratings more. The more cooperative group (due to a longer relationship or greater interdependence) may be generally more empathetic. These possibilities could be investigated through interaction terms with the perceived preference of the other group members.


Although the data requirements for estimating group choice models are great, collecting the additional preference data needed for this analysis seems reasonable when the added insights that can result are considered. If the context factors examined here contribute to the change in preference ratings given by group members when they are moved from the independent to the group setting, it is clearly important to measure preferences in the appropriate context and to understand how measurements taken in other contexts differ

There may be other context factors affecting preferences and preference ratings in these and other kinds of groups which have yet to be identified - a useful direction for further research. In some organizational groups, for example, the threat of veto or possession of much greater influence by another member may result in preference rating changes similar to those associated here with empathy. The group contest may also affect the riskiness or conservatism of individual ratings provided in anticipation of a group decision on the same issues. In a non-cooperative bargaining situation it may be to a participant's strategic advantage to misrepresents his or her preferences. This might be reflected in individual ratings provided before the exchange begins.

This kind of investigation into the effect of context on test-retest reliability may be useful in areas other than group decision-making. Depending on the instrument, stimuli, and subjects, such factors as differences in the time of day (relating to fatigue), recency of consumption of or contact with the stimuli, and discussion or consideration of the stimuli or instrument may affect true retest scores and the way they are reported. If some of the measurement error produced in situations in which the environment cannot be completely controlled is found to be systematic in the sense that it can be modeled, data may be adjusted to accordingly and, hence, more reliable measures used.


Adams, J.S. (1965), "Inequality in Social Exchange," in Advances in Experimental Social Psychology, Volume 2, ed. L. Berkowitz, New York: Academic Press.

Burns, Alvin C. Donald H. Granbois (1977), "Factors Moderating the Resolution of Preference Conflict in Family Automobile Purchasing, in Journal of Marketing Research, 14 (February), 77-86.

Carmines, Edward G. and Richard A. Zeller (1979), "Reliability and Validity Assessment," Sage University Paper Series on Quantitative Applications in the Social Sciences, series no. 07-017, Beverly Hills: Sage Publications.

Choffray, Jean-Marie and Gary Lilien (1980), Market Planning for New Industrial Products, New York: John Wiley.

Corfman, tim P. (1985), "Models of Group Decision-Making and Relative Influence when Preferences Differ," unpublished Ph.D. dissertation. Columbia University.

Corfman, Kim P., Donald R. Lehmann and Joel H. Steckel (1985), "An Experimental Investigation of Group Conflict Resolution Over Time," unpublished working paper, Columbia University.

Davis, Harry L. (1976), "Decision Making Within the Household," Journal of Consumer Research, 2 (March), 241-260.

Davis, James H. (1973), "Group Decision and Social Interactions: A Theory of Social Decision Schemes," Psychological Review, 2 (March), 97-125.

Faris, Ellsworth (1937), The Nature of Human Nature, New York: McGraw Hill.

Greenhalgh, Leonard, Scott A. Neslin and Roderick W. Gilkey (1984), "The Effects of Negotiator Preferences, Situational Power, and Negotiator Personality on Outcomes of Business Negotiations," Working Paper No. 137, Amos Tuck School, Dartmouth College.

Guilford, J.P. (1954), Psychometric Methods, New York: McGraw-Hill

Keeney, Ralph t. and Craig W. Kirkwood (1975), "Group Decision Making Using Cardinal Social Welfare Functions," Management Science, 22 (December), 430-437.

Kerlinger, Fred N. (1973), Foundations of Behavioral Research. New York: Holt, Rinehart and Winston.

Krishnamurthi, Lakshmanan (1981), "Modeling Joint Decision Making Through Relative Influence," unpublished dissertation. Stanford University.

McCall, G.J., ed. (1970), Social Relationships, Chicago: Aldine.

March, James G. (1966), "The Power of Power," in Varieties of Political Theory, E. Easton, ed., Englewood Cliffs, New Jersey: Prentice-Hall, 39-70.

Olson, David H. (1969), "The Measurement of Family Power by Self-Report and Behavioral Methods," Journal of Marriage and the Family, 31 (August), 545-550.

Peter, J. Paul (1979), "Reliability: A Review of Psychometric Basics and Recent Marketing Practices," Journal of Marketing Research, 16 (February), 6-17.