The Influence of Survey Format on Judgment Processes: the Case of Ideals and Perceived Similarity

Jack M. Feldman, Georgia Institute of Technology
Scott Wesley, Georgia Institute of Technology
Micael Hein, Georgia Institute of Technology
Ann Gilmore, Georgia Institute of Technology
Jack M. Feldman, Scott Wesley, Micael Hein, and Ann Gilmore (1990) ,"The Influence of Survey Format on Judgment Processes: the Case of Ideals and Perceived Similarity", in NA - Advances in Consumer Research Volume 17, eds. Marvin E. Goldberg, Gerald Gorn, and Richard W. Pollay, Provo, UT : Association for Consumer Research, Pages: 217-222.

[Thanks are due to John Lynch for his many helpful comments.]

Survey question format and order effects were studied in a similarity judgment context. It was expected that a series of questions requiring comparison of an object to a category ideal would cause the construction of the ideal if none existed and if overall evaluation was insufficiently diagnostic for the judgment. When questions about the attributes of an ideal were asked before an overall similarity judgment, a familiar asymmetry occurred: object-ideal comparisons yielded higher similarity ratings than ideal-object judgments (Tversky, 1977). With the overall rating first, the asymmetry was reversed. Object-specific and question format effects also occurred, qualifying our conclusions.

This study explores the effect of survey formats on judgment in the context of two theoretical domains: the perception of similarity (Tversky, 1977) and the concepts of value lability and response construction (Feldman and Lynch, 1988; Fischhoff, Slovic and Lichtenstein, 1980).It tests the hypothesis that variations in the format and ordering of survey questions will have predictable effects on judgments of similarity between an object or person and some hypothetical ideal, a judgment often used to evaluate the object or person in question.

The concept of the "ideal" occurs frequently in theories of attitude and satisfaction. Johnson (1971), for instance, proposed that consumers have an "ideal point" referent in some product space, and evaluate specific products in terms of their distance from that point. The product space itself is formed by product attribute dimensions (e.g., taste, price, etc.). Likewise, Locke's (1976) theory of job satisfaction proposes that one's liking for a job is determined by the perceived differences between the job itself and the job's "ideal" position on each of a set of dimensions (e.g, pay, responsibility). One widely used job satisfaction instrument (Porter, 1962) is cast in an explicitly comparative format, requiring ratings of a job's current outcome levels and the levels of those outcomes that "should" exist. In a more abstract realm Barsalou (1987) has shown that categories may be represented by, and constructed in terms of, "ideals" formed by goal-relevant properties of objects. While Barsalou's "ad hoc" categories are heterogeneous (e.g., "foods to eat on a diet"), it is not difficult to think of categories that are more homogeneous and which may be represented by a concrete ideal (e.g., "sports cars").

The measurement models above (Johnson, 1971; Porter, 1962) implicitly assume that some ideal representation like those demonstrated by Barsalou exists within the domain of interest. This assumption may, however, be questionable. Fischhoff, et al. (1980) have discussed the concept of "value lability," making the point that people do not have well articulated values in many domains. Without a highly elaborated value structure the nature and form of questioning may in fact influence the representation of issues and concepts (Feldman and Lynch, 1988). In keeping with the principle of cognitive economy (Srull and Wyer, 1989), we assume that the judgments and associations composing a value structure are not formed in the absence of some processing goal or motive. When a motive exists, a judgment is formed based on accessible resources, according to a satisficing criterion. That is, no more elaboration is carried out than is necessary to meet a current processing demand.

Thus, in the absence of environmental contingencies (such as those provided by a particular cultural or interpersonal environment; see Triandis, 1989), it is unlikely that a highly elaborated system of beliefs (including concepts of dimensions and categories of objects) will be formed. This reasoning is consistent with the apparent absence of common ideologies in the survey research literature (e.g., Bishop, Oldendick, Tuchfarber and Bennett 1980; Schuman and Presser, 1980) and with context dependence in the form of category representation and structure (Barsalou 1987). Only in the case of expertise (Alba and Hutchinson, 1 98 7) and/or long-term, continuing involvement in a domain (Feldman and Lynch, 1988) would we expect to find elaborated, accessible ideal representations within a given domain. Without such representations, the act of responding to survey questionnaires of a particular form (e.g., "How far do you have to reach to adjust your car's air conditioning?" "How far should you have to reach?") may stimulate the creation of an ideal and its use in evaluating the object in question. This is a precondition for the phenomenon Feldman and Lynch (1988) have termed self-generated validity, in which the measurement operations themselves create judgment processes that seem to confirm the theory being tested.

The hypothesis that measurement operations may cause the creation of ideal representations where none had previously existed may be tested in the domain of similarity judgment, by taking advantage of a phenomenon noted by Tversky (1977). Tversky's feature comparison model predicts asymmetrical judgments of similarity as a function of the phrasing of a comparison. In his model, the more prototypical or salient (prominent and highly elaborated) object has a larger number of accessible features than a less prototypical or salient object. In a similarity judgment more attention is directed to the features of the subject of the comparison than to the referent object. The judgment of similarity itself depends on the number of common and distinctive features of the two objects, with greater weight given to the features of the subject.

If the subject of the judgment has most of its known features in common with a more elaborated referent, it will be judged to be similar even if the referent has many features not shared by the subject. This is normally the case with category member-prototype judgments (e.g., "How similar is Poland to the USSR?"). If the order of comparison is reversed, the more elaborated object becoming the subject of the judgment, judged similarity is lower because the form of the question directs attention to the relatively large number of distinctive features of the (prototypical) subject. Thus, "How similar is the USSR to Poland?" elicits a lower judgment than does the former order.

By extension, judgments of similarity between elaborated ideal category representations and specific category members should be made in the same fashion. That is, the rated similarity between a category member and the category's ideal representation ought to be higher than the rating of similarity between the ideal and the category member.

It is important to note, however, that this prediction depends entirely on the assumptions that the ideal exists and that its cognitive representation is more elaborated than that of the object in question. For judgments in some domains, the reverse may be true.

In a variety of domains (e.g., automobiles), people may lack the knowledge or motivation necessary to form an ideal. They may, however, have constructed a relatively elaborated representation of their own possessions, either because these are an important component of the self-concept (Belk, 1988) or because daily experience and outcome dependence motivates one's construction, as occurs in the domain of person perception (Brewer, 1988; Feldman, 1988).

Thus, it is entirely plausible that under many circumstances peoples' representations of individual objects and persons are more elaborate and detailed than the corresponding category representations. Tversky's (1977) logic should dictate that, under these circumstances, a representation - exemplar comparison should yield higher similarity ratings than an exemplar - representation judgment, the reverse of the usual finding. If, however, the process of measurement caused the construction of a representation, by asking for ratings of the attributes an ideal category member should have, the usual finding would be obtained. We also expect that asking questions that specifically direct attention to the component attributes of an ideal category representation would produce the effect more strongly than questions calling for a simple similarity judgment between the object and the ideal on the same set of attributes. The latter questions may be answered by accessing an overall evaluation of the object, without the construction of an ideal level on each attribute (Feldman and Lynch, 1988).



Subjects were 181 male and female students enrolled in one of four sections of introductory social or industrial psychology at a major Southeastern university. Student participation was voluntary, in exchange for course credit.

Stimulus Materials

Objects. It was decided that classroom instructors would be useful comparison objects for an exploratory study. First, classroom teacher evaluations are a-near-ubiquitous form of survey, and so are of interest in their own right. Second, because of frequent interaction and outcome dependence, students are likely to form individualized or personalized representations of classroom instructors. Third, by including two instructors, each of whom taught two sections of different courses, some degree of generality could be established. Fourth, our objective in this study is to demonstrate the occurrence of a particular, hypothetical judgment process. It was felt that if the hypothesized effects did not occur in the person domain, where individualized or personalized representations seem most generally likely, they would probably not occur in nonsocial object domains (where values and involvement differ more widely across persons).

Questionnaire design. Questionnaire contents were developed using a set of behaviors from the university's standard instructor evaluation form. This form includes behaviors in several domains (e.g., "uses real-world cases to illustrate principles") and requires a Likert-type rating. For the present study, each behavior was cast into one of two forms:

a. A similarity judgment form, requiring a rating of similarity between the present (classroom) instructor and one's ideal instructor on that specific behavior.

b. An "Is-Should Be" form similar to Porter's (1962), requiring the student to rate the frequency of the behavior as performed by their instructor and as ideally performed. Twenty-one behaviors were included.

Each questionnaire also required an overall similarity rating, which was placed either before or after the 21 behavior ratings. This question required either a rating of the similarity between the present instructor and one's ideal instructor, or between one's ideal instructor and the present instructor. All ratings were made on 20 point scales anchored by 1 (= "Not at all similar") and 20 (= "As similar as possible. ")

For half the subjects, attribute (behavioral) similarity judgments were made on the same scale. For the remaining subjects, behavioral frequency (Is - Should Be) judgments were anchored by "Never" (= 1) and "Always" (= 20). Behavior ratings were always made in an order matching that of the overall similarity judgment, that is in terms of present instructor-ideal or ideal-present instructor comparisons. Additionally, when the comparison was present instructor-ideal, subjects were instructed to think of the present instructor and form a clear image of that person; when the comparison was ideal-present instructor, subjects were instructed to think of their ideal instructor in the same way.

Design and Procedures

The design was a completely crossed four-factor between subjects factorial. Manipulated variables were:

1. Comparison order (Instructor-Ideal Vs. Ideal-Instructor), intended lo create asymmetrical similarity judgments.

2. Order of Global Rating (overall similarity first vs. last), intended to influence the timing of elaboration of an ideal and thus influence the direction of asymmetry in the overall similarity judgments.

3. Behavior rating Format (Similarity vs. Is - Should Be), intended to influence the relative degree of elaboration of the ideal and, thus, the relative size of the predicted asymmetry - reversal effect.

4. Instructor (Instructor 1 vs. Instructor 2), intended to increase the generality of results.

Overall similarity ratings were the dependent variable of interest.

Subjects participated in their regular classrooms, during ordinary class hours. The second, third, and fourth authors served as experimenters, presenting the study as one of judgment and evaluation processes. After a brief introduction and signing of informed consent forms, one of eight questionnaire forms was distributed at random to each participant. Participants were not told that different forms were being used. The purpose and results of the study were explained during a later class session.


A 2x2x2x2 ANOVA on overall similarity ratings produced the effects shown in Table 1. A main effect of Instructor accounted for 9% of the explained variance; Instructor 1 was seen as more similar to the ideal (across both comparison orders) than was Instructor 2 (X1 = 13.73; X2 = 11.09). There was also a main effect for Order of Rating; rating overall similarity before any behavioral ratings resulted in lower perceived similarity than rating overall similarity after the behavioral ratings (X before = 12.09; X after = 13.54).

These effects must be considered, however, in light of the significant four-way interaction. As shown in Table 2, the effects of Order of Rating and Comparison Order differed by both Instructor and Format. For Instructor 1, ratings on the Similarity format produced no effects of rating or comparison order. The "Is-Should Be" format produced the expected pattern of ratings, however. Global similarity comparisons between the instructor and the "ideal instructor" reversed Tversky's (1977) findings when global ratings were made prior to behavior ratings (ideal/object > object/ideal), while the usual ordering (ideal/object < object/ideal) was found when global ratings were made subsequent to behavior ratings. The differences, however, did not reach conventional significance levels.

Stronger-effects were observed for Instructor 2. The predicted pattern was obtained most strongly when behavior ratings were made using thesimilarity format. Slope differences between Order of Rating conditions are significant (p < .05). The 'Is-Should Be" format produced a similar pattern, complicated by a higher global similarity rating in the "global rating-last" condition.


Support for the hypotheses stated earlier can be described as "moderate but encouraging." Both significant reversals and replications of Tversky's (1977) similarity judgment results were found as a result of question order and format manipulations designed to change the relative degree of elaboration of category representations vs. category members. That is, answering questions likely to cause the formation of an elaborated ideal representation led to higher overall similarity judgments when the judgments were phrased in the "object to representation" order than in the "representation to object" order. When, however, it was likely that the object was the more elaborated of the two (prior to answering the same questions), the reverse was obtained.

The results are complicated, however, by the failure to obtain differences in one instructor/format combination and the weak effects in two others. These results can, we believe, be explained by the relative accessibility and diagnosticity of overall evaluations. As discussed by Feldman and Lynch (1988), summary judgments tend to be highly accessible and, while not as diagnostic as specific judgments, may often be sufficient for the question at hand. Therefore, they are likely to be used when specific judgments are less accessible. In the present case, Instructor 1 was the more highly evaluated of the two (Instructor 2's overall similarity to the ideal being close to the scale midpoint). The more polarized affect toward Instructor 1 may have been sufficiently diagnostic for all judgments of ideal-object similarity; thus, the effect of the similarity judgment format on the construction of an ideal would be nullified. The "Is-Should Be" format may constitute a stronger manipulation, but still not strong enough to completely overcome the greater accessibility of Instructor l's overall evaluation. Instructor 2's overall evaluation, being more moderate, would be less diagnostic; thus, a judgment construction process would take place as hypothesized. The less supportive results obtained using the "Is-Should Be" format may be explained by affective polarization (Tesser, 1978) produced by rehearsal in the "global rating last" condition. Interestingly, the ratings of Instructors 1 and 2 in the "global rating last" condition of the "Is-Should Be" format are virtually identical.





This explanation is, admittedly, highly speculative. It does, however, permit unambiguous tests. For example, it predicts that over a set of objects varying in initial polarization, the size of any order of questioning effects should be negatively correlated with the absolute value of the initial polarization. Furthermore, because an ideal is not expected to be constructed when a highly polarized evaluative object judgment already exists, we would expect that response times to questions about the attributes of a category ideal would be longer following responses to a series of "similarity to ideal" questions about a highly polarized object than following the same questions about a more moderately evaluated object. We would also expect to find greater increases in attitude accessibility for moderately evaluated objects following questions in an "Is-Should Be" format than following questions in a straightforward similarity judgment format.

Given the qualifications above, these results are consistent with Tversky's (1977) feature-matching model of similarity judgment, and with Barsalou's (1987) concepts of category flexibility and context dependence. They further demonstrate that "ideal points" cannot be assumed to exist in every domain, contrary to earlier assumptions. Even people with substantial experience in a domain, such as college students have with instructors, may not have devoted the time and effort required to create an ideal representation. Just as the typical voter may lack a coherent political ideology and corresponding "ideal" representations of congressmen, presidents, governors, etc., so might the typical consumer lack the "implicit theories" of products such as automobiles, cameras, beers, or breakfast cereals necessary to the existence of context-independent, coherent categories and category representations (Alba and Hutchinson, 1987; Barsalou, 1987; Murphy and Medin, 1985). Without such representations, survey research guided by content models of motivation or value (as Porter's 1962 instrument was guided by Maslow's need hierarchy) may, in fact, cause the construction of ideals different from those that would otherwise have been generated. This phenomenon may have three alternative (and equally undesirable) effects:

1. Under low-involvement, "peripheral" processing conditions (Petty & Cacioppo, 1986), responses to surveys may be constructed which are unrelated to attitudes and behaviors generated later in a different context.

2. Summary judgments stored in memory may guide immediate behavior, such as voting or product choice, in a direction influenced by scale contents but contrary to other values of the respondent momentarily less accessible. The individual may later come to regret his or her decision

3. The individual's representation of the issue or the domain may be more or less permanently influenced by the questioning process and subsequent elaboration, producing change in values at least partially due to questioning procedures (Feldman and Lynch, 1988; Fischhoff et al., 1980; see also Lynch, Chakravarti and Mitra, 1989, for a discussion of similar effects in a different theoretical domain.)

The first of these is a methodological problem; the second and third are both methodological and ethical in nature. Both kinds of issues are best addressed by the development of methods for, first, diagnosing the presence or absence of an elaborated ideal representation and, second, developing questioning procedures that elicit multiple perspectives about a given object or issue. As Fischhoff, et al. (1980) have argued, these procedures both provide more ecologically valid responses and benefit the respondent.


