Modeling Trait Response Error in the Context of a Multitrait-Multimethod Matrix: Scaling Models For Discrete Items

William R. Dillon, University of Massachusetts/Amherst
Thomas J. Madden, University of Massachusetts/Amherst
ABSTRACT - The authors propose and demonstrate a sequential testing strategy applicable for the investigation of the scalability of a set of traits having discrete components. The modeling of response errors is undertaken with the use of latent structure analysis. The extended approach proposed here provides the researcher with the ability to: i) use a formal test statistic to select a response error model; ii) assess the effects due to traits versus methods; and iii) examine and test a wide array of plausible measurement error hypotheses.
[ to cite ]:
William R. Dillon and Thomas J. Madden (1983) ,"Modeling Trait Response Error in the Context of a Multitrait-Multimethod Matrix: Scaling Models For Discrete Items", in NA - Advances in Consumer Research Volume 10, eds. Richard P. Bagozzi and Alice M. Tybout, Ann Abor, MI : Association for Consumer Research, Pages: 115-120.

Advances in Consumer Research Volume 10, 1983      Pages 115-120

MODELING TRAIT RESPONSE ERROR IN THE CONTEXT OF A MULTITRAIT-MULTIMETHOD MATRIX: SCALING MODELS FOR DISCRETE ITEMS

William R. Dillon, University of Massachusetts/Amherst

Thomas J. Madden, University of Massachusetts/Amherst

[The paper presented here is an abridged version of a much longer manuscript originally submitted for publication in this proceedings. Copies of the unabridged version can be obtained by contacting the principal author.]

ABSTRACT -

The authors propose and demonstrate a sequential testing strategy applicable for the investigation of the scalability of a set of traits having discrete components. The modeling of response errors is undertaken with the use of latent structure analysis. The extended approach proposed here provides the researcher with the ability to: i) use a formal test statistic to select a response error model; ii) assess the effects due to traits versus methods; and iii) examine and test a wide array of plausible measurement error hypotheses.

INTRODUCTION

Discrete indicators having nominal or at best ordinal properties are ubiquitous in behavioral and social research and are frequently encountered with panels and survey research in general. For example, in the 1978 General Social Survey (Davis 1978), which is designed to measure an impressive array of inherently unobservable attitudes and values, 107 of the items are dichotomous 146 of the items are polytomous, which means that near 82% of the 310 recorded variables are discrete in nature. However, to the knowledge of these authors, there is no up-to-date treatment of scaling models appropriate for dichotomous items appearing in the consumer behavior or marketing research literature. Thus. one of the purposes of this paper is to present a discussion of recent advances in modeling errors in measurement and to review the use of scaling, models on traits having discrete components.

Our second purpose is to demonstrate how the extension or latent class models to the across group problem recently proposed by Clogg (1989) can be used to investigate various measurement error models in the context of a traditional multitrait-multimethod matrix. The extended approach proposed here affords the researcher, much in the spirit of Joreskog's general analysis of covariance structures (1971, 1974), the ability to: i) use a formal test statistic to select a response error model that sufficiently represents the scale structure of the data; ii) assess the effects due to traits versus methods: and iii) examine and test a wide array of plausible measurement error hypotheses.

ALTERNATIVE MODELS FOR ASSESSING SCALABILITY

Deterministic Methods: Guttman Scale and Scalogram

Analysis [Much or this section is based on the work of Clogg and Sawyer (1981) who present a comprehensive and critical review of the Guttman scale method.]

The Guttman scale met hod assumes that the attitude on issue is unidimensional and that the traits used as reflective indicators of the latent attitudes in question are cumulative.

In general for k items there are k! different orderings; thus, with k=4 items there are k!=24 possible orderings. The Guttman scalogram method posits that only one of these orderings represent the natural ordering of the items. Let "1" refer to the yes response and "2" refer to the no response. Four items would be called perfectly Guttman scalable if the five response patterns

(1,1,1,1)(1,1,1,2)(1,1,2,2)(1,2,2,2)(2,2,2,2)    (1)

contain all of the respondents surveyed, which means that the remaining 16-5=11 response patterns are empty. The response patterns shown in (1) are called scale types and can be numbered 1 through 5 for the purpose of identification. In general with k items there are k+l scale types which can be numbered consecutively from 1,2,.... (k+l).

Probabilistic Extensions

All of the scaling models discussed here can be expressed in terms of the general latent structure model (Lazarsfeld and Henry 1968) and can be fitted to data with the use of the flexible computer program MLLSA (Clogg 1977). Before proceeding to discuss the various scaling models we present some necessary background which introduces latent structure analysis in terms of the Guttman pure scale model.

Assume we have data collected on three dichotomous traits A, B, and C which produces a three-way cross-classification. Assume further that a latent scale-type variable S with T classes explains the association among the traits. This means chat

EQUATION     (2)

where pijk is the expected proportion in the (i.i k) cell of the (A,B,C) cross-classification, pSt is the expected-proportion in the tth latent class, which is associated with a particular "scale type," pASit is the expected conditional probability that trait A takes on response i when S is at scale type t, and where pBSjt and pCSkt have similar definitions (cf., Goodman 1975).

With k=3 items the Guttman pure scale model posits four scale types (i.e., T=4):

(1,1,1)(1,1,2)(1,2,2)(2,2,2)    (3)

In other words the four other cells in the 23 cable would have zero counts--that is, pijk = 0 for all response patterns that differ from the ones shown in (3).

For the response patterns shown in (3) we would have pS1 = pABC111, pABC112, = pABC112, pS3 = pABC122, and pS4 pABC222. The latent class model corresponding to the Guttman pure scale model is a restricted four class model. The restrictions are

EQUATION     (4)

The Goodman Model. Goodman (1975) incorporates an "intrinsically unscalable class" to account for the existence of response errors in the Guttman scale model. The existence of an intrinsically unscalable class means had there are T=k+2 scale types. Let pS0 denote the proportionate frequency of the unscalable class, so that now k+1 E t=0 pSt=1. To accommodate Goodman's model, we modify he model shown in (2):

EQUATION    (5)

where the parameters are as defined before. The parameter pS0 is an extremely important indicator since it gives an overall index of scalability which is typically computed by taking (1-pS0). When pS0=0 Goodman's model reduces to Guttman's pure scale model, and hence to a deterministic, error-free model. The remaining models to be discussed in this section incorporate response errors in the scale-type respondents.

Uniform-Error Model. The simplest model that incorporates measurement error is Proctor's (1970, 1971) uniform-error model. The existence of response patterns other than those consistent with the true (k+1) scale-types is accounted for by the constant error rate parameter a which governs the expected frequency of response errors or all k items and all k+1 scale-types. The parameter is an overall index of item scalability. When a is low he items are said to be scalable, whereas when it is high the scalability of the items can be called into question.

In Proctor's original development, homogeneity was assumed; that is, no provision for an intrinsically unscalable class was made. However, as Dayton and Macready (1979) and Clogg and Sawyer (1981) have demonstrated, an intrinsically unscalable class can be easily incorporated into the uniform-error model. With the existence of an intrinsically unscalable class the uniform-error model for the three-way (A,B,C) table is a restricted five-class latent structure model. This scale model is still consistent with the general latent class model shown in (5) but the appropriate restrictions are

EQUATION   (6)

All of the conditional probabilities shown in (6) are set equal to one another, and thus their common value can be taken to be 1-a. With all of the scale-type conditional probabilities set equal to one another in this manner the item error rate is thus set at a for all items.

Equal Item-Specific Error Model. Instead of assuming a uniform rate across items, the equal item-specific model. originally suggested by Proctor (1970), and discussed in greater detail by Dayton and Macready (1976), assumes that error rates aj dictate the response errors for the jth item (l<j<k) and that these aj's do not vary across true scale-types.

Assuming an intrinsically unscalable class, as we will assume throughout our discussion, the equal item-specific model is a five-class latent structure model with parameters as shown in (5). The appropriate restrictions on the latent class probabilities for this model are

EQUATION   (7)

Equal Scale-Type-Specific Error Model. The equal scale-type error model assumes that the true scale-types have different response error rates that do not depend on the particular item for which a response error is made. Thus with k items there are a1, a2, ..., ak+1 of these error rates which govern the expected occurrence of response errors.

The equal scale-type-specific error model is a five-class latent-structure model, assuming an intrinsically unscalable class (see(5)), with the following restrictions:

EQUATION   (8)

Extended Models Which Consider Method Effects

The latent class models for scaling trait responses proposed in this section are derivatives of the four probabilistic response error models just discussed. The development of these extended models owes much to the work of Clogg (1982) who extended the latent class framework to handle the simultaneous population problem.

Let us now assume that the responses to the three dichotomous traits A, B, and C have been collected under two maximally different methods (i.e., measuring instruments). [All that will be discussed can be easily extended to more than two methods.] Thus, we now have an (A, B, C, M) multiway table where M denotes the method factor with M=1,2 classes. In order to define a T-class latent scale structure for each method simultaneously, consider M as an explicit indicator of each scale-type. and change the number of latent scale-types from T to 2T. The basic latent class model shown in (2) now becomes

EQUATION   (9)

with the fixed restrictions

EQUATION   (10)

Note that for ease of exposition we are now numbering the T=k+2 scale-types consecutively from 1 to 2T, with k+2 scale-types defined for Method 1 and k+2 scale-types defined for Method 2. With the imposed constraints the relationships hold that

EQUATION   (11)

The conditional probabilities shown in (9) take on slightly different meaning; for example, pAS11 is the conditional probability that trait A assumes the level 1 for a member of the first latent scale-type class under method 1, while pAS1,T+1 is the conditional probability that trait A takes on level 1 for a member of the T+1, or first, latent scale-type class under Method 2. SimiLar interpretations apply to the other conditional probabilities.

Method and Test Strategy

We begin by testing the hypothesis of no method variation and no measurement error. To ensure that method variation is nonexistent we need to impose equality restrictions on the latent scale-type probabilities. As noted, pSt/p+++1, for t=1,2....,T, is the proportion in the tth latent-scale-type under Method 1, and pSt/p+++2, for t=T+1,T+2,...,2T. is the proportion in the tth latent scale-type under Method 2. To ensure no method effect, we impose the restriction

EQUATION   (12)

which means that the proportion in the tth latent scale-type is constant across methods. The second component of the test concerns the hypothesis of no measurement error. Thus, we impose the restrictions shown in (4) and fit a (k+1) latent scale-type model with no intrinsically unscalable class (i.e., the Guttman pure scale model) or allow for the existence of an intrinsically unscalable class and fit a (k+2) latent scale-type model (i.e., the (Goodman model). with the restrictions shown in (4) along with the restrictions shown in (10) and in (11).

Let L2(Mi) be the likelihood ratio chi-square statistic obtained from fitting model Mi and denote by d.f. (Mi) the associated degrees of freedom. The first test considers the overall independence of responses before fitting any scale model. If independence cannot be rejected, there is no evidence for any latent structure beyond the trivial one-class model and the process terminates. Assuming that the hypothesis of independence is rejected, the fit of the Guttman pure scale model [The Guttman pure scale model is a highly restricted latent class model. Because this model yields zero expected counts for all but (k+1) scale-types it is not appropriate, in a strict sense, to report either the likelihood-ratio or Pearson chi-square statistics.], denoted by M1, and the Goodman model, denoted by M2, are examined next. If the fit of either of these two models is adequate, the evidence would tend to support the contention of error-free response items. There is, however. a caveat in interpreting the fit of the Goodman model. in particular. The Goodman modeL may yield an acceptable fit to the data but with a correspondingly large estimated proportion of the respondents in the intrinsically unscalable class. This is problematic in that the large number or respondents deemed intrinsically unscalable may be due to the wording assumption of error-free data; that is, response errors in the scale-type respondents may be present.

It is unlikely that the no measurement error hypothesis will be compatible with the observed trait scores, and in cases where it does adequately fit the data the intrinsically unscalable class is likely to be large. Hence, the next series of tests investigate the scaling structure of the response items under the restriction or no method variation (i.e., restrictions (10) and (11)). The following sequence of tests are suggested.

STEP 1: Testing for intrinsically unscalable class under method homogeneity

Univform scale with intrinsically unscalable class (Model M4).

     versus

Uniform scale only (Model M3).

L2(M3)-L2(M4) with d.f. (M3-M4) = d.f. (M3) - d.f. (M4)

Equal item-specific errors with intrinsically unscalable class (Model M6).

     versus

Equal item-specific errors only (Model M5).

L2(M5)-L2(M6) with d.f. (M5-M6) = d.f. (M5) - d.f. (M6)

Equal scale-type-specific errors with intrinsically unscalable class (Model M8).

     versus

Equal scale-type-specific errors only (Model M7).

L2(M7)-L2(M8) with d.f. (M7-M8) = d.f. (M7) - d.f. (M8)

STEP 2: TesEing for a response error model under method homogeneity

A. Depending on the results of STEP 1 compare

Equal item-specific versus errors only (Model M5).

     versus

Uniform errors only (Model M3).

L2(M3)-L2(M5) with d.f. (M3-M5) = d.f. (M5) - d.f. (M3)

Equal scale-type-specific errors only (Model M7).

     versus

Uniform errors only (Model M3).

L2(M3)-L2(M7) with d.f. (M3-M7) = d.f. (M7) - d.f. (M3)

B. Depending on the results of STEP 1 compare

Equal item-specific errors with intrinsically unscalable class (Model M6).

     versus

Uniform errors only with intrinsically unscalable class (Model M4).

L2(M4)-L2(M6) with d.f. (M4-M6) = d.f. (M6) - d.f. (M4)

Equal scale-type-specific with intrinsically unscalable class (Model M8).

     versus

Uniform errors only with intrinsically unscalable class (Model M4).

L2(M4)-L2(M8) with d.f. (M4-M8) = d.f. (M8) - d.f. (M4)

STEP 3: Testing for method variation

Assume that the previous series of tests, all of which imposed the restriction of equal latent scale-type proportions has identified a particular scale-type error model (or models) as adequately characterizing the trait responses. Denote by L2(MR) the likelihood-ratio chi-square associated with this model, where the subscript R is used to indicate that the latent scale-type proportions are restricted. Next fit the model that restricts the same conditional probabilities as in Model (MR) except for the restrictions on the latent scale-type proportions which are now set free. Denote the likelihood ratio chi-square value associated with this model by L2(MU), where the subscript U is used to indicate that the latent scale-type proportions are unrestricted. Thus

L2(MR|MU) = L2(MR) - L2(MU),

with degrees of freedom found by subtraction is a chi-square statistic that can be used to test method homogeneity.

Note that STEP 2 involves a comparison or the adequacy of fit between the equal-scale-type-specific error model (models M7 and M8) and the equal item-specific error model (models M5 and M6); however because neither of these models is a restricted form of the other, the choice of which one to focus on must, in a strict sense, be made on the basis of criteria other than statistical fit, or on a comparison which pits each model against a competing theoretical model which is hierarchically related to each. In the latter case, the models are compared indirectly by demonstrating that model Mi is not significantly worse than a hierarchically related test model in which it is nested and that model Mj, i=j, is significantly worse. The second step described above is in the spirit of this type of indirect testing procedure in that the uniform-error model is used as a benchmark to evaluate the other two scaling models.

ILLUSTRATIVE EXAMPLE

The data set used to illustrate the sequential hypothesis testing strategy described in the previous section is presented in Table 1. The data is an adaptation of data originally presented by McHugh (1956) on four creativity-ability items. [Thirteen respondents have been added to the cell count for pattern (2,2,2,2) originally reported by McHugh and, in addition, we have incorporated a second method--the contrived frequency counts appear under method 2 in the table.] The analysis of the data will uncover substantial method variation and provides an illustration of how various error models can be mixed to better understand the structure of the data. All of the reported results are based on the use of the MLLSA program (Clogg 1977).

TABLE 1

AN ADAPTATION OF THE MCHUGH (1956) DATA

Table 2 presents the likelihood ratio goodness-of-fit chi-square statistics for several models that will be examined in the course of conducting the sequential hypothesis testing strategy. Note first that the four items, A, E, C, and D. and the method factor, M, are clearly not independent since the likelihood ratio chi-square statistic is 271.20 with 26 degrees of freedom (p .0001). Because of the relatively large frequencies in response patterns (1,1,1,1) and (2,2,2,2) the hierarchical structure used in this example is comprised of only these two scale-types. In effect, traits A, B, C, and D are presumed to operate as a unit so that the first unit is a conditional prerequisite for the second, third and fourth units.

The data appear to be subject to response errors since both the Guttman pure scale model, and the Goodman model, under method heterogeneity and homogeneity, fit the data quite poorly. [The Guttman model was assessed on the basis of a comparison of the observed and the fitted values.] An evaluation of models M3-M8 suggest that the scale-type models which incorporate an intrinsically unscalable class uniformly do better than those models that preclude the existence of such a class; that is, the differences in likelihood ratio chi-square statistics for model M3 versus model M4, model M7 versus model M8, and model M5 versus model M6 are all statistically significant However, neither of the scaling models that incorporate both errors in responses and an intrinsically unscalable class (i.e., models M4, M6, and M8) satisfactorily represent the structure of the data and, therefore, comparisons between them are inappropriate.

In such cases, we turn immediately to the second stage test results which relax the restrictions on the equality of the latent scale-type proportions. From the table we see that models M9, M10 and M11, while providing, for the most part, better fits than their counterpart models M4, M6, and M8. also do not adequately fit the data. Here again any differences in likelihood ratio chi-square tests between these models are inappropriate.

Having found that error models that incorporate an intrinsically unscalable class and method variation are not sufficient to represent the data; we now consider whether the fit can be improved significantly by going to alternative model specifications which incorporate mixtures of scale-types across methods Inspection of the frequencies shown in Table 2 can provide valuable insights on how to proceed In particular, note the relatively large frequency for the (1,1,2,2) response pattern under Method 1 and the relatively large frequency for the (1,1,2,1) response pattern under Method 2. The presence of the relatively large frequencies for these response patterns suggests a more complicated structure.

Table 3 presents two models, denoted by M12 and M13 which show different scale-types under different methods. Model M12 has three scale types:

(1,1,2,2) (1,1,1,1) (2,2,2,2) under Method 1

(1,1,2,1) (1,1,1,1) (2,2,2,2) under Method 2

In effect, under Method 1 items A and B and items C and D are presumed to function in the (1,1,2,2) scale-type as units such that the first unit is a conditional prerequisite for the second unit, whereas under Method 2 the first scale-type, namely, (1,1,2,1), can be viewed as a "biform" scale wherein the ordering is assumed to be ABDC (c f., Goodman 1975). This model assumes equal item-specific error rates for scale-types (1,1,1,1) and (2,2,2,2) across methods. The estimated true item-specific error rates are 0.236, 0.210, 0.134, and 0.180, respectively. Thus, for these scale-types, at least, item C appears most reliable while item A appears least reliable. Note also that under Method 1 scale-type (1,1,2,2) is assumed error free Under Method 2 scale-type (1,1,2,1) is also assumed free of error, except for item C, which has an estimated error rate of 0.467 Finally, it should be noted that the fit of model 11, is better than any of the models previously considered (p = .05)

TABLE 2

CHI-SQUARE VALUES AND GOODNESS-OF-FIT TESTS FOR VARIOUS LATENT CLASS MODELS

TABLE 3

PARAMETER ESTIMATES FOR MODEL 12 AND MODEL 13

Model M13 differs from model M12 only in that an intrinsically unscalable class is incorporated under each of the two methods. Notice first that the fit of the model is quite good (p > .50). The estimated proportion of intrinsically unscalable respondents under Method 1 is 0.322 and under Method 2 the estimated proportion is 0.175. The estimated true item-specific error rates for scale-types (1,1,1,1) and (2,2,2,2) are

0.300, 0.271, 0.146, and 0.200

which shows item C to be the most reliable trait and item A to be the least reliable trait. For scale-type (1,1,2,1) under Method B, however, item C is clearly unreliable with an estimated error rate of roughly 0.60. Finally, note that the error rates for the four items in the intrinsically unscalable class vary markedly across the two methods which can indicate the existence of method differences.

To summarize, we found that the response traits

(i) exhibit a rather complex scale structure,

(ii) exhibit different latent scale-type proportions across methods, and

(iii) can be reasonably characterized by what can be called a "mixed-scale" model.

SUMMARY

In this study we have attempted to review recent advances in modeling errors in measurement. In particular, we discussed the use of scaling models on traits having discrete components. The scaling models discussed are all probabilistic generalizations of Guttman's pure scale model and can be fitted by use of the MLLSA program for latent structure analysis. Finally, we demonstrated now latent class models can be modified to allow ,or the investigation of response errors in the context of a traditional multitrait-multimethod matrix. A sequential testing strategy was also proposed. An illustrative example was provided which demonstrated how the extended approach proposed here provides the researchers with the ability to (i) use a formal test statistic to select a response error model that sufficiently represents the scale structure of the data, (ii) assess the effects due to traits versus methods, and (iii) examine and test a wide arras of plausible measurement error hypotheses.

REFERENCES

Bagozzi, R.P. (1980), Causal Models in Marketing. New York: John Wiley & Sons, Inc.

Barr, A.J., et al. (1976), A User's Guide to SAS 76. Raleigh, N.C.: SAS Institute.

Clogg, C.C. (1977), "Unrestricted and Restricted Maximum Likelihood Latent Structure Analysis: A Manual for Users," Working Paper 1977-09. University Park, Penn.: Population Issues Research Office.

Clogg, C.C. (1981), "Latent Class Analysis Across Groups," Proceedings of the Social Statistics Section, 1981 Annual Meeting or the American Statistical Association.

Clogg, C.C. and Sawyer, D.O. (1981), "A Comparison of Alternative Models for Analyzing the Scalability of Response Patterns," in Sociological Methodology 1981, S. Leinhardt. ed., San Francisco: Jossey-Bass.

Davis, J.A. (1978), Codebook for 1975 General Social Survey. Chicago: National Opinion Research Center.

Dayton, C.M. and Macready, G.B. (1979), "A Scaling Model with Response Errors and Intrinsically Unscalable Respondents," Psychometrika in press.

Goodman, L.A. (1974), "The Analysis of Systems of Qualitative Variables When Some of the Variables are Unobservable. Part l-A: Modified Latent Structure Approach," American Journal of Sociology, 79, 1179-1259.

Goodman, L.A. (1975), "A New Model for Scaling Response Patterns: An Application of the Quasi-Independence Concept," Journal of the American-Statistical Association, 70, 755-68.

Guttman, L. (1950), "The Basis for Scalogram Analysis," in S.A Stouffer, et al., eds., Measurement and Prediction: Studies in Social Psychology in World War II, Vol 4 Princeton: Princeton University Press.

Hays, D.G. and Borgatta, E.F (1954), "An Empirical Comparison of Restricted and General Latent Distance Analysis," Psychometrika, 19, 271-9.

Joreskog, K.G. (1971), "Statistical Analysis of Sets of Congeneric Tests," Psychometrika, 30, 109-33.

Joreskog, K.G. (1974), "Analyzing Psychological Data by Structural Analysis of Covariance Matrices," in D.H. Krantz, R.D. Luce, R.C. Atkinson, and P. Suppes, eds., Contemporary Developments in Mathematical Psychology, Vol. 2, San Francisco: Freeman.

Lazarsfeld, P.F. and Henry, N.W. (1968), Latent Structure Analysis. Boston: Houghton Mifflin.

Macready, G.B and Dayton, C.'l (1977), "The Use of Probabilistic Models in the Assessment of Mastery," Journal of Educational Statistics, 2, 99-120.

McHugh, R.B. (1956), "Efficient Estimation and Local Identification in Latent Class Analysis," Psychometrika, 21. 331-47.

Proctor, C.H. (1971), "Reliability of a Guttman Scale Score," Proceedings of the Social Statistics Section, Annual Meeting of the American Statistical Association Washington, D.C.: American Statistical Association

Stouffer, S.A. and Toby, J. (1951), "Role Conflict and Personality," American Journal of Sociology, 56, 395-406.

----------------------------------------