Whether to Agree-Disagree Or Disagree-Agree: the Effects of Anchor Order on Item Response

David Sheluga, (student), Purdue University
Jacob Jacoby, Purdue University (student), Purdue University
Brenda Major,
ABSTRACT - Research to date has generally focused on item content rather than item format as a source of questionnaire response bias. A laboratory experiment was conducted to assess the effects of different orderings of verbal and numerical anchors on responses. Results indicate that different rating scale formats differentially influence the degree of item endorsement. An "Agree (=1)---Disagree (=7)" format generates greater degrees of item endorsement than any other verbal and numerical anchor format.
[ to cite ]:
David Sheluga, Jacob Jacoby, and Brenda Major (1978) ,"Whether to Agree-Disagree Or Disagree-Agree: the Effects of Anchor Order on Item Response", in NA - Advances in Consumer Research Volume 05, eds. Kent Hunt, Ann Abor, MI : Association for Consumer Research, Pages: 109-113.

Advances in Consumer Research Volume 5, 1978      Pages 109-113

WHETHER TO AGREE-DISAGREE OR DISAGREE-AGREE: THE EFFECTS OF ANCHOR ORDER ON ITEM RESPONSE

David Sheluga (student), Purdue University

Jacob Jacoby, Purdue University

Brenda Major (student), Purdue University

[The authors acknowledge with appreciation the contributions of Amy L. Lubitz and Peggy Whan in planning and executing this study, and Angelo DeNisi, Bernie Dugoni, and John Hollenbach for their statistical assistance.]

ABSTRACT -

Research to date has generally focused on item content rather than item format as a source of questionnaire response bias. A laboratory experiment was conducted to assess the effects of different orderings of verbal and numerical anchors on responses. Results indicate that different rating scale formats differentially influence the degree of item endorsement. An "Agree (=1)---Disagree (=7)" format generates greater degrees of item endorsement than any other verbal and numerical anchor format.

INTRODUCTION

A commonly used approach in attitude assessment is to have respondents indicate the extent to which they agree or disagree with one or more attitude statements. Various types of biases may arise when employing such a measurement approach, particularly when a series of items is used and a cumulative score derived. A topic which has received considerable attention is the question of whether such scales are susceptible to response sets or styles, particularly acquiescence (e.g., Cloud and Vaughn, 1970; Elliott, 1961; Jackson, 1967) or "yea-saying" (Couch and Keniston, 1960) response bias.

While debate goes on regarding whether response styles are meaningful or even exist (e.g., Rorer, 1965; Jackson, 1967), those who have attempted to control or adjust for possible agreement response bias have focussed primarily on manipulating item stems through item content (e.g., Jackson, 1967; Elliott, 1961; Wrightsman, 1965; Cloud and Vaughn, 1970). One approach, the method of balanced keying (e.g., O'Neill, 1967; Terborg and Peters, 1974), presents even numbers of positively and negatively worded statements. In contrast to item content, the question of whether response format is sufficient to create, affect, or be used to control agreement response bias has received relatively little attention. As Cook and Campbell (1976, p. 242-3) remarked: "attitude scales are often presented to respondents without apparent thought... to varying whether the positive end of the response scale appears on the right or on the left of the page."

Considerable evidence exists to show that order affects exist for checklist alternatives (e.g., Belson, 1966; Rugg and Cantrill, 1942). Thus, a reasonable concern is whether such order effects can cause agree or disagree response bias when present in rating scales. Given a seven-point rating scale, would responses differ if the scale ranged from seven (at the top or on the left) to one (at the bottom or on the right), rather than the more common one-to-seven? Similarly, would there be any differential effects if verbal end-point anchors appeared in positive-to-negative (e.g. "agree" to "disagree") rather than negative-to-positive order?

Given that both verbal and numerical anchors are used, another issue meriting consideration is the appropriate combination of these anchors. To illustrate: is "strongly disagree" more reasonably associated with a "1" or a "7"? In some respects, the issue is analogous to the question of "fittingness" described by Kanungo (1968, 1969).

Accordingly, the present investigation examines the question of whether there are order effects for both verbal and numerical rating scale anchors, specifically, two studies are described which address the following questions:

1. Do differential, perhaps biasing effects result from having horizontal seven-point graphic rating scales anchored in all four combinations of agree-disagree and 1-to-7?

2. What is the most natural or "fitting" combination of verbal and numerical end-point anchors?

Based upon research sited below, a seven-point scale anchored only at the endpoints was chosen for examination. Considerable literature (e.g., Bendig, 1954; Finn, 1972; Green and Rao, 1970; Guilford, 1954; Lehmann and Hulbert, 1972; Komorita and Graham, 1965; Matell and Jacoby, 1971, 1972; and Symonds, 1924) exists to suggest that 6-7 point scales are optimal for individual level analysis while 3-point scales are adequate for aggregate level analysis. Studies on verbal anchoring suggest that anchors at the end-points are preferred over anchors at every scale point or no anchors at all (Altemeyer, 1970; Bendig, 1953, 1955; Finn, 1972).

METHOD

Subjects

Subjects were 240 (167 male and 73 female) undergraduates enrolled in Introductory Psychology at Purdue University during the Spring 1976 semester who participated in order to fulfill a class requirement.

Procedure

Measures for the present studies were embedded in a larger set of questionnaires designed to investigate several fundamental issues in questionnaire construction. The combined battery, consisting of eight questionnaires, was administered during one 1-hour session. Each subject responded to all eight questionnaires. With the exception of one questionnaire which always appeared first (described below), the remaining questionnaires were presented in counterbalanced order and were randomly assigned to subjects.

Study 1

An eight-item questionnaire was developed to measure attitudes toward abortion. This topic was chosen based on pre-tests which had indicated that, relative to 12 other topics, abortion was a subject of moderate interest to comparable Purdue undergraduates. Although the eight item stems were identical for all subjects, four different response formats were constructed corresponding to a full crossing of verbal and numerical anchors. The response format for any one subject was the same for all eight items. Each response scale had seven-points. The end-points were labeled both numerically and verbally as indicated below, while the five intermediate points carried only the intervening numerical descriptors. Subjects were to respond by circling the one number along the scale which best represented their degree of agreement toward that particular attitude statement.

TABLE

Form 2 of the questionnaire is provided as Appendix A.

Study 2

This study was directed toward determining whether natural associative tendencies existed between the verbal anchors of Agree -- Disagree and numerical scale end-points of 1 and 7. The design corresponded to a 2 X 2 situation involving the following four scales:

SCALE

This questionnaire was always the first one that each subject received.

Half the subjects received Form la while the other half received Form 1b. In both cases, subjects were instructed to write the words "Agree" ("Disagree") and "Disagree" ("Agree") on the lines at the end of the scale (the wording of the instructions was counterbalanced within and across conditions). Form 1b is provided here as Appendix B. Half the subjects in each of these groups then received Form 2a while the other half received Form 2b. Subjects were instructed to write-in scale points from 1 to 7 (or 7 to 1) along the line provided. Again, the wording of the instructions was counterbalanced. Form 2a is provided as Appendix C. Finally, half the subjects engaged in the "assign verbal anchors" task first, while the other half received the "assign numerical anchors" task first.

RESULTS

Study 1

Responses to the eight-item abortion scale were re-coded so that all were aligned in a uniform direction, in terms of numerical and verbal anchors. Responses to the eight items were summed for each subject and a two-way analysis-of-variance for unequal n's was applied. Table 1 presents a summary of these data.

TABLE 1

ANALYSIS OF VARIANCE SUMMARY FOR EFFECT OF VERBAL AND NUMERICAL ANCHOR REVERSALS ON TOTAL SCALE SCORES

Reversals of both the verbal and numerical anchors produced significant differences in the mean scores (p < .05 and p < .01, respectively), indicating that anchor order does indeed affect the nature of the response. There was no significant interaction effect. Figure 1 depicts these findings.

FIGURE 1

SUMMARY OF MAIN EFFECTS BY TWO-WAY ANOVA FOR REVERSALS OF VERBAL AND NUMERICAL ANCHORS

The mean of Form 1 (30.6) differed significantly from the means of Forms 2, 3, and 4 (X2 = 34.5;  X3 = 33.7; X4 = 35.7) using a Newman-Keuls range test (Winer, 1970, p. 193).

Data collected using the four formats were next examined to determine whether these formats differed in terms of degree of reliability. Despite the small number of items, a test of split-half reliability produced moderately respectable correlations ranging from .46 to .55 for the four formats. Using Fisher's r-to-z transformation, these differences were found to be insignificant, suggesting that the different forms were equivalently reliable.

Next, responses to each of the eight items were analyzed using separate one-way ANOVAs. Table 2 summarizes these results.

TABLE 2

ANALYSIS OF VARIANCE SUMMARY FOR EFFECTS OF VERBAL AND NUMERICAL ANCHOR REVERSALS ON ITEM MEANS

Four of the eight items revealed significant (p < .05) differences in means across the four formats, and a fifth item revealed a marginally significant difference (p = .10). Examination of individual items revealed that these five statements were relatively "tight," short, and to the point. The three non-significant items, by comparison, had stems which were much longer in word length, and were somewhat more ambiguous.

Study 2

The results of this study are summarized in Table 3.

TABLE 3

CHI-SQUARE SUMMARY OF RESPONSES: STUDY 2

Nearly 62% of the subjects felt the most natural response to Form 1 was completing the verbal endpoints in the order of Agree-Disagree (p < .001). One-sample Chi-Square analysis showed subjects did not significantly differ in their responses to Form 1a. Associating Disagree-Agree (n = 54) to a 1-to-7 scale was almost as frequent as associating Agree-Disagree (n = 64) to such a scale. However, responses to Form 1b were significantly different (p < .001). That is, responding Agree-Disagree (n TM 84) to 7-to-1 scale was much more prevalent than responding Disagree-Agree (n = 38).

Subjects completed Form 2 by filling-in scale point anchors in the order of 1-7 63.3% of the time (p < .001). While responses to Form 2a were not significantly different (n = 68 for 1-to-7, and n = 52 for 7-to-1), responses to Form 2b (Disagree-Agree presentation format) were significant (n = 84 for 1-to-7 vs. n = 36 for 7-to-1; p < .001).

DISCUSSION

The investigation focused on two related issues. Attention was first paid to the effects on subject responses caused by reversals of verbal and numerical endpoint anchors. Second, given that respondent motivation or willingness of report is a prime condition for successful data collection" (Cannell and Kahn, 1968, p. 537), subject preferences and response tendencies for various scale formats were also assessed. Parenthetically, using subject preference as a criterion may be viewed as reflecting a "consumer (qua respondent) orientation." Within the limitations imposed by the sample, questionnaire, etc., the findings may be summarized as follows:

1. Reversals of either numerical or verbal anchors are sufficient to cause significant differences in response to an Agree-Disagree attitude statement.

2. The Agree-Disagree order resulted in more agree endorsement than did the Disagree-Agree order. Likewise, the 1-to-7 order produced more agree endorsement than did the 7-to-1 order.

3. A rating scale employing an Agree (1) -Disagree (7) order produces significantly more agree endorsement than any other combination of numerical and verbal anchors.

4. Ambiguous or double-barreled items appeared resistant to being affected by scale reversals.

5. Scale reversals exerted no significant effect on the split-half reliability of the total scale score.

6. Subjects seem to perceive a "natural" association between i and Agree, and between 7 and Disagree.

7. Directional tendencies for a scale combining verbal and numerical anchors results in the Agree (1)-to-Disagree (7) format being most frequently selected and considered most "natural."

Closer examination of the data reveals the strength of this preference for the Agree (1)-Disagree (7) format. When the numerical anchors were presented in 1-to-7 order, the subjects were evenly divided as to which verbal anchor order was most appropriate. In other words, the 1-7 order appeared so natural, that either verbal anchor order seemed appropriate when paired with it. This is in contrast to the responses obtained when the presentation format was in the 7-to-1 order. In this condition, subjects overwhelmingly preferred the Agree-Disagree verbal anchor order.

Similarly, when verbal anchors were presented as Agree- Disagree, subjects were evenly divided on the appropriate order for the numerical anchors. The two orders (i.e., either 1-7 or 7-1) seemed equally appealing. In contrast, when the presentation order was Disagree-Agree, the overwhelming choice for appropriate numerical anchor counterpart was the 1-7 order. It is noteworthy that these data are based on a dependent measure requiring more behavioral input from subjects (i.e., "fill-in the end-point as appropriate") than would be required by simple preference ratings of the four formats.

Pre-testing had indicated that, compared to twelve other questionnaire topics, the topic of abortion was of intermediate interest for this student population. This is not unlike topics used in actual survey research. Actual surveys often feature topics which may be of high interest to the sponsor, but are perceived to be bland by the respondents. Thus, the use of a moderate interest topic for this study was considered desirable.

Earlier findings (Elliott, 1961) suggest that there will be a high amount of agree endorsement for low interest topics. Given the predominantly male sample used in this particular study, the topic could be judged as having low personal relevance for these subjects. Again, actual surveys are not often greeted with equal degrees of interest by all respondents. Rather, survey topics are usually differentially relevant to the respondents and therefore vary in interest to the sampling audience.

Our findings indicate that Form 1 facilitates agree endorsement when interacting with a moderately interesting topic having low personal relevance for the subject sample. In part, this may be due to the fact that the ordering of Form 1 was the most "natural" response format, as judged by subjects in Study 2. This familiarity, coupled with moderate or low interest, seems to facilitate the generation of an agree endorsement response style. The findings also suggest that this endorsement can be counterbalanced by reversing either the numerical or verbal anchor order. Usage of either Form 2 or 3 is therefore recommended on this basis. However, both verbal and numerical anchors should not be reversed at the same time, since this seems to cause an inordinate amount of disagree endorsement, as in the case of Form 4.

Use of any of the four combinations does not seem to affect item reliability. This reflects Nunnally's (1970, p. 429) view that format changes will not affect the important psychometric (e.g., reliability) properties of test instruments.

The experimental findings pose a small dilemma. Because respondent frustration may cause response error or "uncooperative" behavior, care should be taken to provide a preferred response format; one with which the respondent feels comfortable. Therefore, Form 1 --Agree (1)-to-Disagree (7) -- appears most advisable to use. However, caution should be used when employing this format, because it appears that a preferred or "natural feeling" format enhances agree endorsement. It seems advisable that format 1 be avoided when a relatively low interest topic is being investigated. in such instances, Forms 2 or 3 appear more appropriate. Additional research is necessary to more directly investigate the interaction between high and low content interest and the effects of scale reversals.

APPENDIX A

ATTITUDES TOWARD ABORTION SURVEY

APPENDIX B

APPENDIX C

REFERENCES

Altemeyer, Robert A. "Adverbs and Intervals: A Study of Likert Scales," American Psychological Association Proceedings, 1970, 5 (1), 397-98.

Belson, William A. "On Methods: The Effects of Reversing the Presentation Order of Verbal Rating Scales," Journal of Advertising Research, 1966, 6 (4), 30-7.

Bendig, A. W. "The Reliability of Self-Ratings as a Function of the Amount of Verbal Anchoring and the Number of Categories on the Scale," Journal of Applied Psychology, 1953, 37, 38-40.

Bendig, A. W. "Reliability and the Number of Rating Scale Categories," Journal of Applied Psychology, 1954, 38, 38-41.

Bendig, A. W. "Rated Reliability and the Heterogeneity of the Scale Anchors,? Journal of Applied Psychology, 1955, 39, 37-39.

Cannell, Charles F. and Kahn, Robert L. "Interviewing," in Gardner Lindzey and Elliot Aronson (Eds.) The Handbook of Social Psychology (2nd edition), Vol. 2, Reading, Mass.: Addison-Wesley Publishing Co., 1968.

Cloud, Johnathan and Vaughan, Graham M. "Using Balanced Scales to Control Acquiescence," Sociometry, 1970, (June) 33 (2), 193-202.

Cook, Thomas D. and Campbell, Donald T. "The Design and Conduct of Quasi-Experiments and True Experiments in Field Settings," in Marvin D. Dunnette (Ed.) The Handbook of Industrial and Organizational Psychology, Chicago: Rand McNally, 1976, 223-326.

Couch, Arthur and Keniston. Kenneth. "Yeasayers and Naysayers: Agreeing Response Set As a Personality Variable," Journal of Abnormal and Social Psychology, 1960, 60, 151-174.

Elliott, Lois L. "Effects of Item Construction and Respondent Aptitude on Response Acquiescence," Educational and Psychological Measurement, 1961, 21(2), 405-415.

Finn, Robert H. "Effects of Some Variations in Rating Scale Characteristics on the Names and Reliabilities of Ratings," Educational and Psychological Measurement, 1972, (Summer) 32, 255-265.

Green, Paul E. and Rao, Vithala R. "Rating Scales and Information Recovery -- How Many Scales and Response Alternatives to Use," Journal of Marketing, 1970, 34, 33-39.

Guilford, Joy P. Psychometric Methods. New York: McGraw-Hill, 1954; Chapter 11, Rating Scales.

Jackson, Douglas N. "Acquiescence Response Styles: Problems of Identification and Control," in Irwin August Berg (Ed.) Response Set in Personality Assessment, Chicago: Alding, 1967.

Kanungo, Rabindra N. "Brand Awareness: Effect of Fittingness, Meaningfulness and Product Utility," Journal of Applied Psychology, 1968, 52(4), 290-295.

Kanungo, Rabindra N. "Brand Awareness: Differential Roles of Fittingness and Meaningfulness of Brand Names," Journal of Applied Psychology, 1969, 53(2), 140-146.

Komorita, Samuel S. and Graham, William K. "Number of Scale Points and the Reliability of Scales," Educational and Psychological Measurement, 1965, 25, 987-95.

Lehmann, Donald R. and Hulbert, James. "Are Three-Point Scales Always Good Enough?" Journal of Marketing Research, 1972, 9, 444-46.

Matell, Michael and Jacoby, Jacob. "Is There an Optimal Number of Alternatives for Likert Scale Items? Study 1: Reliability and Validity," Educational and Psychological Measurement, 1971, 31, 657-74.

Matell, Michael and Jacoby, Jacob. "Is There an Optimal Number of Alternatives for Likert Scale Items? Effects of Testing Time and Scale Properties," Journal of Applied Psychology, 1972, 56(6), 506-509.

Nunnally, Jum C. Introduction to Psychological Measurement, New York: McGraw-Hill, 1970, Chapter 14.

O'Neill, Harry W. "Response Style Influence," Public Opinion Quarterly, 1967, 31, 95-102.

Rorer, Leonard. "The Great Response Style Myth," Psychological Bulletin, 1965, 63. 129-156.

Rugg, Donald and Cantril, Hadley. "The Wording of Questions in Public Opinion Polls," Journal of Abnormal and Social Psychology, 1942, 37 (October), 469-495.

Symonds, Percival M. "On the Loss of Reliability in Ratings Due to Coarseness of the Scale," Journal of Experimental Psychology, 1924, 7, 456-61.

Terborg, James R. and Peters, Larry H. "Some Observations on Wording of Item Stems for Attitude Questionnaires," Psychological Reports, 1974, 35, 463-366.

Winer, Benjamin J. Statistical Principles in Experimental Design, 2nd edition, New York: McGraw-Hill, 1971.

Wrightsman, Lawrence S. Characteristics of Positive Scored and Negative Scored Items from Attitude Scales. Psychological Reports, 1965, 17, 898.

----------------------------------------