Latent Trait Theory and Attitude Scaling: the Use of Information Functions For Item Selection

Wagner Kamakura, (student), University of Texas at Austin
Rajendra R. Srivastava, University of Texas at Austin
ABSTRACT - This expository paper demonstrates the usefulness of Latent Trait Theory based procedures for purposes of attitude scaling. In particular, it is shown that different items provide different amounts of "information" (or discerning ability) for varying attitude levels. Consequently, items may be chosen according to their ability to provide information at specific attitude levels. Also, redundancy may be reduced by eliminating items presenting similar information.
[ to cite ]:
Wagner Kamakura and Rajendra R. Srivastava (1982) ,"Latent Trait Theory and Attitude Scaling: the Use of Information Functions For Item Selection", in NA - Advances in Consumer Research Volume 09, eds. Andrew Mitchell, Ann Abor, MI : Association for Consumer Research, Pages: 251-256.

Advances in Consumer Research Volume 9, 1982      Pages 251-256

LATENT TRAIT THEORY AND ATTITUDE SCALING: THE USE OF INFORMATION FUNCTIONS FOR ITEM SELECTION

Wagner Kamakura (student), University of Texas at Austin

Rajendra R. Srivastava, University of Texas at Austin

ABSTRACT -

This expository paper demonstrates the usefulness of Latent Trait Theory based procedures for purposes of attitude scaling. In particular, it is shown that different items provide different amounts of "information" (or discerning ability) for varying attitude levels. Consequently, items may be chosen according to their ability to provide information at specific attitude levels. Also, redundancy may be reduced by eliminating items presenting similar information.

INTRODUCTION

Traditional scaling procedures based on reliability measures have received the most attention for attitude scaling in the marketing research literature as evidenced by the articles in the special issue of the Journal of Marketing Research on measurement (February 1979). These procedures assume a constant standard error of measurement along the attitude continuum, i.e., reliability only indicates the overall efficiency of the scale across all attitude levels. Though the correlation of an item with the scale (for example, of a variable with the factor score) may help in choosing items that contribute the most to the scale, it is hard to decipher whether these items contribute to discerning ability at the high or low ends of the attitude scale. In addition, traditional procedures do not provide a measure for the specific contribution of each response category for measurement accuracy. For example, for Likert-type items, does "strongly agree" provide more information than "agree," given the item, along the attitude scale? Further, the alpha reliability coefficient leads to paradoxical rules for deciding which items, or how many items, to include in the scale: (a) items highly intercorrelated among themselves should be chosen to increase reliability, and (b) items with low intercorrelation among themselves, yet with high correlation with the major trait being measured should be included to enhance validity.

Finally, the "don't know" (DK) response categories have traditionally been handled by substituting the mean value (across respondents) or the middle category on a bipolar scale to reflect a difficult decision (Coombs and Coombs, 1977). Some research has also been conducted to determine if DK responses arise due to response style related to respondent's characteristics (Converse 1977, Francis and Busch 1975, Innes 1977). In general, however, research on the treatment of DK's is very limited: we don't know very much about how to handle "don't know" responses.

In this paper we present approaches based on Latent Trait Theory which allow the researcher to select items on the basis of their discriminant ability along the attitude continuum (thus if a researcher is particularly interested in increasing the accuracy at particular levels of attitude, for example, in the middle range to identify "switchable" prospects, s/he may do so by increasing the number of items which provide discriminating ability at those levels). An additional advantage of these approaches is the treatment of DK as nominal responses, thereby allowing the category to be represented by varying positions along likert-type measures, depending on the item. As is discussed in the next section, this provides additional information normally lost by treating DK's as missing values and potentially misrepresented by treating them as mean or middle categories.

LATENT TRAIT THEORY

In this section we examine the basic Latent Trait Models for dichotomous items and the later extensions for the polychotomous case. These models are based primarily on the stochastic approach for mental measurement introduced by Lord (1952) and Rasch (1960) in the early 60's.

Models for Dichotomous Responses

The theory developed by Lord (1952) is a variant on Lazerfeld's (1954) "Latent Structure Theory" restricted to a single dimension, where individuals are placed along a trait/attitude continuum, and the probability of them responding positively to a dichotomous item depends on the position of the item relative to the individual's position on this same dimension.

Rasch (1960) used a similar approach, but modeled the probability of positive responses as a logistic function of the individual's trait/attitude and the item characteristics, rather than a Normal Ogive function as used by Lord. Due to its simplicity and computational advantages, the Logistic Model has received greater attention in the literature than the Normal Ogive Model. An important extension of Rasch's Logistic Model is the 2-Parameter Logistic Model derived by Birnbaum (1968) which includes the possibility of items to differ not only on their position on the trait/ attitude continuum, but also in their power to discriminate at different levels of that continuum

In his 2-Parameter Logistic Binary Response (BR) Model Birnbaum defines the probability of a given individual j answering an item i positively as a logistic function of the individual's trait or attitude, and the item characteristics, such that,

Pij           1                 (1)

            1+e-ai(bi-0j)

where,

Pij = probability of individual j answering item i with a positive response

0j = attitude level for individual j

bi = position parameter for item i

ai = discrimination parameter for item i

Following Rasch's original conceptualization, this model also placed items i and individuals j in the same attitude continuum. The Position parameter bi for item i is defined as the position of the item in the attitude continuum which would result into a 50% chance of a positive response. For an individual's attitude 0j equal to the parameter position bi, the exponent in Eq. 1 becomes null, and the probability of individual j answering item i positively results into Pij = 1/2.

The discrimination parameter as for a given item i is defined as the maximum slope of the logistic function (also called Item Characteristic Curve (ICC)) defined in Eq. 1. The maximum slope of a logistic curve occurs at an attitude level 0j equal to the item position bi, i.e., at the midpoint as shown in Figure 1. And, the steeper the slope (the higher the value of the discrimination parameter ai) of an item, the better it discriminates among respondents with attitude levels 0j in the vicinity of the position parameter bi as a small variation in attitude will be detected by a large variation in the probability of positive response. The reader will note that when the slope ai = ¦, the item characteristic curve is represented by a straight line representing Pi: = 0.50, i.e., a 50-50 chance of a positive response, irrespective of the attitude 0j of the respondent.

The definition of item parameters can be better understood in Figure 1, where item 1 and 3 are positioned on the lower and higher ranges of attitude respectively, while item 2 is positioned in the middle range. So, only individuals with high attitude levels ij will agree with item 3, since it is necessary to have an attitude higher than b3 for a probability of positive response larger than 50%. On the other hand, only low attitude persons will disagree with item 1. Also, the reader can easily see that item 3 has the highest discriminatory power, since a slight variation of attitude 0j around the item position b3 results in a large change of the probability from almost zero to nearly one.

FIGURE 1

ITEM CHARACTERISTIC CURVES (I.C.C.): 2-PARAMETER LOGISTIC MODEL (BINARY)

An important contribution by Birnbaum, besides his extension of the Rasch model, is the concept of Information Function (IF), which provides an indication of measurement accuracy for each level of attitude Oj for each item in a scale. The information transmitted by an item at a given attitude level Oo is defined by Birnbaum as being inversely proportional to the square of the length of the asymptotic confidence internal for the estimate of Oo, or directly proportional to the square of the slope of the ICC at attitude level Oo. For the BR Model, Birnbaum demonstrated that the information transmitted by an item i of a given ability level Oj can be calculated by

Ii(0j) = [P'ij]2/Pij(1-Pij)   (2)

where P'ij is the slope of the logistic curve at the ability level 0j.

Hence, the concept of information is directly related to the discrimination parameter ai, and for the 2-parameter model, a more discriminating item will provide more information, with its maximum at the item position bi. In Figure 2, the information functions of the same items in Figure 1 are plotted, and one can easily see that for the more discriminating item (Item 3) the information function reaches higher levels, and the peak occurs at the position parameter b3.

FIGURE 2

INFORMATION FUNCTIONS: 2-PARAMETER LOGISTIC MODEL

An important characteristic of the IF, as defined by Birnbaum is its additive property; the information transmitted by different items at a given attitude level can be added to obtain the total information transmitted by the scale at that attitude level. Hence it is possible to evaluate the contribution of each item and the total information transmitted by the scale. Moreover, the IF indicates the maximum accuracy attainable by each item at each level of the attitude, in contrast to traditional reliability measures calculated over the whole range. Thus, adding items with the same characteristics (item position bi, item discrimination ai) will improve the accuracy only in the attitude range already covered by the items, with no improvement on other levels of attitude.

Therefore, the concept of information function solves the reliability dilemma mentioned earlier in this paper. It provides objective means of selecting items for a scale, according to the researcher's objective. If a politician were interested in identifying voters with uncertain political attitudes, which might be susceptible to attitudinal changes, the scale should concentrate on items positioned in the middle range.

Models for Polychotomous Responses

Samejima's (1969) Ordered (Graded) Response (OR) Model provides a polychotomous extension for the case where there are two or more ordered categories. For example, in the case of an item scored on a scale of 1 to 3, two item response curves can be used to describe the conditional (on attitude level j) probability of responding to any particular category in a stage-wise manner. In the first stage, functions are obtained to represent the response in the first category versus a higher category (1 vs. 2 or 3), and for a response in the first and second category versus the third (1 and 2 vs. 3). These response functions are represented by the curves in Figure 3. The second stage is merely to subtract the successive response functions from each other to obtain the desired response probabilities for each category. Note that the probabilities for the extreme categories are obtained by subtracting the response functions from 1.0 and 0.0. Then in Figure 3, for an individual with attitude level 0j, the probability of responding with categories 1, 2 and three are 0.17, 0.63, and 0.20, respectively. The response functions in Figure 3 are easily modeled as:

Fijp =               1                

                   1+e-ai(bip-0j)

where: Fijp - probability of a person j with attitude 0j responding to item i with category p or better. As in the binary model ai may be interpreted as the discriminating power of item i and bip as the position of the pth category of item i on the attitude continuum.

The information function was defined by Samejima in a manner equivalent to Birnbaum's formulation. The OR model provides one information function for each response category of the item. Hence the contribution of each response category of an item to the measurement accuracy, at each level of attitude, can be assessed. The IF's can be summed across categories for an item to provide a measure of the items information value. Moreover, Samejima's OR Model makes no metric assumptions about the response categories; the interval between response categories is not fixed "a priori" and may even vary for different items. Even the rank order assumption about response categories is relaxed in the Nominal Response (NR) Model developed by Bock (1972). Bock developed his model as a polychotomous logit model, explaining the choice of one response category for a given item as a function of item-category parameters and the individual's attitude level. The model formulation, though somewhat more complex, is very similar to the binary model. This is because a 3-category rating scale may be represented by 2 binary rating scales or, in general, a n-category rating scale may be represented by (n-l) binary scales. Bock's model provides ICC's and IF's for each nominal response category that can be interpreted in a manner similar to the binary model. For example, the ICC for each category (for a given item) represents the probability that a respondent with a given attitude will respond with that category.

FIGURE 3

EXAMPLE OF GRADED RESPONSE MODEL WITH THREE CATEGORIES

The main advantage of the NR Model over the OR Model is that since no assumption is made about the order of the response categories, their relative ordering is determined by the data itself and may vary for different items. Consequently, it is useful when there is no "logical" or intuitive order of the response categories, as it happens with "Don't Know," "No Opinion" and "No Answer" responses.

METHODOLOGY AND ANALYSIS

The data used to illustrate the use of Latent Trait Theory for attitude scaling is the "anomia" scale (Srole, 1956) drawn from the National Opinion Research Council (NORC) survey for 1973. Anomia is viewed as an individuals generalized, pervasive sense of social malintegration or "self-to-others alienation." The scale is unidimensional and consists of 9 items listed in Table 1, and three response categories (agree, disagree, don't know). 400 cases were selected at random from the total sample of approximately 1200. A larger sample was not necessary for computation accuracy and would have merely inflated the computing time/costs. The computational algorithm used was the LOGOG program (Kolakowski and Bock, 1973). The analyses and results are presented to illustrate: (1) item selection based on information functions, and (2) treatment of don't know responses.

First, the data are analyzed by means of the binary model which treats DK's as missing values. The derived parameters are used to develop ICC's and IF's. These curves are used to illustrate that (1) item positions vary along the attitude continuum, i.e., items provide information at different attitude levels, (2) items may be duplicative or redundant, i.e., provide the same information, and (3) items with lower slopes provide lesser information.

TABLE 1

ANOMIA SCALE

1. Next to health, money is the more important thing in life

2. Sometimes you can't help wondering whether anything is worthwhile anymore

3. To make money there are not right wrong ways anymore, only easy and hard ways

4. Nowadays a person has to live pretty much for today and let tomorrow take care of itself

5. In spits of what some people say, the lot (situation and condition) of the average man is getting worse, not better

6. It's hardly fair to bring a child into the world with the way things look for the future

7. Most public officials are not really interested in the problems of the average man

8. These days a person doesn't really know whom he can count on

9. Most people don't really care about what happens to the next felloe

Source: Srole, L. (1956), "Social Integration and Certain Corollaires," American Sociological Review, 21, 709-16.

The results of the binary model are also compared with Alpha Factor Analysis (based on Cronbach's alpha) to illustrate the similarities and differences between Latent Trait Approaches and Traditional scaling techniques.

Second, polychotomous models are estimated treating DR's as middle values and then as nominal responses. The latter case allows DK's to "float", i.e., have a high, middle or low categorical position. The effect of the treatment of DR's on information is shown in two illustrative cases where it would be appropriate and inappropriate, respectively, to treat DR's as middle or mean values.

RESULTS

Selection of Items Using Information Functions

The first two columns on Table 2 presents goodness-of-it statistics (Chi-Square and significance level z) for Birnbaum's 2-Parameter Logistic Model, applied to the 9-item Anomia scale, with Don't Knows keyed as missing values. The estimated item position parameters bi's and discrimination parameters ai's are shown in Table 2 and the corresponding ICC's and IF's are presented in Figures 4 and 5. From the ICC's in Figures 4 and the position parameters bi in columns 3 and 4 in Table 2 one can see that the 9 items concentrate on the range of attitudes between b8 = -1.1 and b3 = 1.6. Therefore, this scale will provide its best measurement accuracy on this range of attitudes (since each item provides its maximum information near it's position bi). By comparing ICC's one can see that items 5 and 7 are somewhat redundant, (b5 = -.588, b7 = -.562; a5 = 1.132, a7 = 1.032), providing most of their information at the same levels of attitude. The IF's plotted on Figure 5 confirms this redundancy, showing items 5 and 7 with the same shape, peaking at the same attitude level. Also, items 2 and 4 have similar ICC's and IF's.

TABLE 2

ITEM PARAMETER: 2-PARAMETER LOGISTIC MODEL  -  (DON'T KNOWS AS MISSING VALUES)

FIGURE 4

I.C.C. FOR ANOMIA ITEMS: BINARY MODEL

FIGURE 5

INFORMATION FUNCTIONS FOR ANOMIA ITEMS: BINARY MODEL

Figure 5 also shows a clear distinction between items 1 through 4, which provide low information, and items 5 to 9, which have IF's peaking at higher values (ant higher ai values). This difference between the two sets of items confirms the results of Alpha Factor Analysis performed on the same data (treating DK as missing values). Alpha Factor Analysis resulted in the derivation of only one factor (eigenvalue = 3.14) based on the elbow rule. The factor loadings for items 1 through 9 were, respectively, -0.004, 0.028, 0.074, 0.073, 0.330, 0.560, 0.418, 0.422, 0.495, It is easily observed that items 5 through 9 which have higher Factor Loadings also have higher peaks for their corresponding information functions (Figure 5). It would appear that either procedure (Alpha Factor or Latent Trait) would select the same items for the scale. However, the IF's provided by the Latent Trait procedure indicate the attitude levels where the items are most informative. As shown in Figure 5 item 8 provides its highest information at low attitude levels while item 6 is more informative at high levels. Besides that, the IF allows the researcher to identify redundant items (5 and 7; 2 and 4), which would not be detected by reliability measures. It would seem useful to retain item 5 (item 7 has a lower slope/IF peak) if the objective were to reduce redundancy of items. However, this would also lower measurement accuracy around attitude level 0j = -0.57.

Treatment of Don't Know Response Categories

The OR Model with DR between "Disagree" and "Agree" was applied to the anomia scale. To avoid this restrictive assumption, the NR model was applied to the same data. Or the 9 items in the scale, only items 3 and 9 resulted in the categorization of DR's at the lower extreme (i.e., in the ordering DK, Disagree, Agree). To demonstrate the effect of considering DR as a middle category we present the ICC's derived from the NR model for items 7 and 9 in Figure 6. As the anomia increases the probability of agreeing with items 7 and 9 increases. However, with an increase in anomia the probability of disagreeing decreases monotonically for item 7 while it first increases and then decreases for item 9. Finally, the probability of answering don't know increases as anomia decreases for item 9 while it has a maxima at an intermediate level for item 7 indicating that DR is "ordered" as a middle category for item 7 and a lower extreme category for item 9.

Given the "orderings" based on the NR model we would expect the OR model to fare as well for item 7 and not quite as well for item 9. In the latter case (item 9) forcing DK as a middle category would lead to a loss of information. This is clearly illustrated by the IF's for items 7 and 9 corresponding to BR, OR and NR models in Figure 7. For item 7, the inclusion of DR as a middle category increases the information transmitted by the item compared to the binary model which treats DR as missing data. The relaxation of the rank order assumption in the NR model does not improve the information compared to OR model, which might be taken as an indication that DR is indeed a middle category. For item 9, the inclusion of DR as a middle category also results in a gain in the information transmitted by the item. However, when the NR Model is used (resulting into a low extreme position for DR's as mentioned before), even more information is gained, not only at the peak but also at 'ow attitude levels where the probability of DK responses increases. This indicates that useful information (or discerning ability) is lost at the low end of the attitude scale by treating an extreme (low) value DK response as missing data or as the middle category. Of course it would be hard to define DR at the low extreme a priori. It should be noted that the NR model simply chooses the position of the DK category in order to obtain the "truth." However, if the ICC for DK fits a small range it is likely that its mean is consistent across respondents sampled. If the curve is spread out DK means different things to different people

FIGURE 6

I.C.C. FOR ITEMS 7 AND 9: ORDERED RESPONSE MODEL  -  (D.K. AT MIDDLE)

CONCLUSIONS

This expository paper serves to illustrate the use of Latent Trait Theory based procedures for attitude scaling. In particular, the item characteristic curves and information functions can be useful for item selection in scale construction. Latent Trait Theory models may be more useful than traditional scaling techniques because they not only provide measures of item information value, but also measures of the attitude levels at which items are likely to have the greatest discriminating ability. These measures may be used to delete redundant or duplicative items and/or to consciously increased the accuracy of the scale at desired attitude levels.

FIGURE 7

COMPARISON OF INFORMATION FUNCTIONS - ITEM 7 AND 9

Additionally, responses to multiple category rating scales may be analyzed by the Nominal Response Model which provides measures of contributions made by each response category of each item at each attitude level, rather than a general measure of the relationship between individual items and the scale. The Nominal Response Model does not require any metric assumptions about the data and "don't know," "no opinion" and "no answer" categories can be scaled and used as sources of information for attitude measurement. As shown (for items 7 and 9), "don't know" responses may contribute to measurement accuracy and their contribution may occur at different ranges of attitude, depending on the item.

Finally, it should be mentioned that the Latent Trait Theory based procedures have other advantages not discussed in this paper. The calibration procedures are independent of the specific items used (item-free attitude scaling) as well as the sample (sample-free scale calibration) as discussed by Wright (1968). Also, once the parameters are determined for each item-category composition, it is feasible to develop "tailored" procedures for data collection to be used on other samples. For example, if a respondent disagrees with an item having a low position parameter (bi) along an attitude continuum, it is not going to be very useful administering items that have higher position parameters. This feature should become increasingly important with the advent of interactive, computerized data collection procedures. It is hoped that this paper will provide the impetus toward increased usage of Latent Trait Theory based attitude scaling procedures which furnish more objective scale construction criteria.

REFERENCES

Birnbaum, A. (1968), "Some Latent Trait Models and Their Use in Inferring An Examinee's Ability," in F. M. Lord and L R. Novick (eds.), Statistical Theories of Mental Test Scores, (Reading, Mass.: Addison-Wesley).

Bock, R. V. (1972), "Estimating Item Parameters and Latent Ability When Categories Are Scored in Two or More Nominal Categories," Psychometrika, 37, 29-51.

Converse, J. (Winter, 1977), "Predicting No Opinion on the Polls," Public Opinion Quarterly, 40, 515-30.

Coombs, C. S. and Coombs, L. (Winter 1977), "Don't Know Item Ambiguity or Respondent Uncertainty," Public Opinion QuarterlY, 40, 457-514.

Francis, J. and Busch, L. (Summer, 1975), "What We Know About 'I Don't Knows'," Public Opinion Quarterly, 39, 207-18.

Innes, J. M. (1977), "Extremity and 'Don't Know' Setts in Questionnaire Responses," British Journal of Social and Clinical Psychology, 16, 9-12.

Kolakowski, D. and Bock, R. D. (1973), Maximum Likelihood Item Analysis and Test Scoring: Logistic Model for Multiple Item Responses (Ann Arbor, Michigan: National Educational Resources Inc. 5.

Lazarfeld, P. F. (1954) "A Conceptual Introduction to Latent Structure Analysis," in Mathematical Linking in the Social Sciences, (Glencoe, Illinois: Free Press

Lord, F. M. (1952), "A Theory of Test Scores," Psychometric Monograph No. 7, Psychometrika Society.

Rasch, F. (1960), Probabilistic Models for Some Intelligence and Attainment Tests, (Copenhagen: Danish Institute for Educational Research).

Samejima, F. (1969), "Estimation of Latent Ability Using a Response Pattern of Graded Scores," Psychometrika Monograph Supplement, No. 17.

Srole, L. (1956), "Social Integration and Certain Corollaries," American Sociological Review, 21, 709-16.

Wright, B. D. (1968), "Sample-Free Test Calibration and Person Measurement," Proceedings of the 1967 Invitational Conference on Testing Problems (Princeton, N. J. : Educational Testing Services), 85-101.

----------------------------------------