Specifying Measurement Error in Structural Equation Models: Are Congeneric Measurement Models Appropriate?

Joseph A. Cote, Washington State University
Robert Greenberg, Washington State University
ABSTRACT - It is well known that misspecification of structural equation models leads to biased estimates. It is also widely know that systematic measurement errors are ubiquitous. In spite of these two well known facts, researchers using structural equations (LISREL) usually specify congeneric measurement models. This paper outlines why congeneric measurement models may lead to biased estimates of the structural relationships which consequently can affect theoretical conclusions in empirical research. Several alternative techniques for modelling measurement error are presented.
[ to cite ]:
Joseph A. Cote and Robert Greenberg (1990) ,"Specifying Measurement Error in Structural Equation Models: Are Congeneric Measurement Models Appropriate?", in NA - Advances in Consumer Research Volume 17, eds. Marvin E. Goldberg, Gerald Gorn, and Richard W. Pollay, Provo, UT : Association for Consumer Research, Pages: 426-433.

Advances in Consumer Research Volume 17, 1990      Pages 426-433


Joseph A. Cote, Washington State University

Robert Greenberg, Washington State University


It is well known that misspecification of structural equation models leads to biased estimates. It is also widely know that systematic measurement errors are ubiquitous. In spite of these two well known facts, researchers using structural equations (LISREL) usually specify congeneric measurement models. This paper outlines why congeneric measurement models may lead to biased estimates of the structural relationships which consequently can affect theoretical conclusions in empirical research. Several alternative techniques for modelling measurement error are presented.

Congeneric measures are the most commonly used approach for specifying structural equation models (Anderson and Gerbing 1982; Darden, Carlson, and Hampton 1984). Congeneric measurement models assume measures are composed of a single underlying construct (true score represented by x's and h's) and random measurement error (d's and e's) as illustrated in Figure 1.

However, it is quite likely that measures tap more than one underlying construct. In this case, the true score contains both valid and invalid components (Gerbing and Anderson 1984, Smith 1974). The valid component is the construct we are trying to measure. -The invalid component is other trait and method components which are ". . . part of the true portion of variance because they contribute to the covariance of the variable with other variables . . ." (Smith 1974, p 492). For example, measures of satisfaction might contain variance from product satisfaction, method effects, general life satisfaction, and random error.

When present, invalid components (hereafter referred to as systematic measurement errors) constitute additional constructs whose variance cannot be properly subsumed under random error. As shown in Figure 2, when systematic measurement error exists, a measure contains variance from random error (e's) and at least two underlying constructs, the construct of interest (h1, h2, or h3) and a second, invalid or confounding component (h4, h5, or h6).

The use of congeneric measurement models, as in Figure 1, is only appropriate if measures do not contain systematic error. Unfortunately, systematic measurement errors are commonly found in behavioral research (Peter and Churchill 1986, Cote and Buckley 1987). Measures often contain invalid components, most notably method effects. It has been suggested that method effects are omnipresent (Fiske 1982, Peter 1981). Cote and Buckley (1987) provided empirical support for this claim and found that, on average, 26.3% of the variance in measures is due to method effects. In addition to method effects, it is quite possible that measures tap multiple constructs (Cattell 1978), a problem that is rarely addressed in consumer behavior (Churchill 1979, Anderson and Gerbing 1982).

In summary, it appears quite likely that measures contain systematic measurement error. Systematic measurement error can be either method effects or some other type of invalid measurement component. This raises questions about how consumer researchers specify measures in structural equation models. Simply put, Figure 2 probably represents measurement reality, yet we commonly specify Figure 1 when fitting structural equation models.


Proper interpretation of structural estimates is possible only when the measurement model is correctly specified (Anderson and Gerbing 1982, Gerbing and Anderson 1984). Failing to model systematic measurement error can lead to biased and inconsistent parameter estimates which may confound theoretical conclusions (Anderson and Gerbing 1982, Burt 1976, Gerbing and Anderson 1984, Kumar and Dillon 1987, Phillips 1981). While it is impossible to determine the effect of misspecification on any single parameter estimate, failure to model existing systematic measurement error will, on average, inflate estimates of structural relationships. This can be seen by considering the model in Figure 1. Suppose that the first indicator for each construct (xl, yl, and y4) has been measured using method 1, the second indicator with method 2 and the third indicator with method 3. Since they are not modeled, the shared method variance must be accounted for in some other way. The model allows for shared method variance only through the structural relationships (g's and b's). For example, the shared method variance between x1 and y4 can only be accounted for through g21. Since the structural estimates contain both trait variance and method variance, they may overestimate the effect of one trait on another. Moreover, this bias may make a relationship appear statistically significant when it is actually zero.


Three methods for modelling systematic measurement error have been suggested; 1) create separate factors to explicitly model invalid components, 2) use second order factor analysis to implicitly model invalid components, or 3) use a correlated errors model. The separate factors approach is exemplified by multitrait-multimethod analysis. As shown in Figure 2, valid components are modeled as separate factors, with the structural relationships specified among the constructs of interest. Invalid components are also modeled as separate factors. These invalid components can either be correlated or uncorrelated with one another, but must be uncorrelated with the valid components (due to identification problems). Systematic error will not confound the estimated relationships among the valid factors since they are separated out.







Second order factor models are an alternative way to account for systematic error (Gerbing and Anderson 1984, Marsh and Hocevar 1988). A traditional measurement model is used to specify first order factors which contain both valid and invalid components (h1 to h6 in Figure 3). For example, h6 contains variance from trait 3 (h8) and invalid component 6 (x6). Second order factors are then specified as a common component (valid component) in the first order factors (h7 to h9 in Figure 3). The invalid components are separated out in x1 to x6.

Marsh and Hocevar (1988) recommend combining the traditional MTMM model and the second order factor model presented above. This combination results in a model with both explicitly and implicitly modeled invalid components (see Figure 4). As with the second order factor model discussed above, first order factors include both valid and invalid components (h1 to h6). Second order factors for both the valid components (h7 to h6) and known invalid components (h10 to h11) then specified. Finally, the unique variance for the first order factors (x1 to x9) model any additional invalid components the researcher can not identify.

The third alternative is to model the invalid component using correlated measurement errors (John and Reve 1982, Marsh 1988). Model identification is ensured by constraining correlations among errors to be equal. For the model in Figure 1, the constraints could be specified such that only six correlated error parameters are estimated (John and Reve 1982).



[The specification for method effects in Figure 2 can be replaced with the following correlated error specifications.

sA= s1,4= s1,7= s4,7

sB= s2,5= s2,8= s5,8

sC= s3,6= s3,9= s6,9

sD= s1,2= s1,5= s1,8 = s4,2 = s4,5 = s4,8 = s7,2= s7,5 = s7,8

sE= s1,3= s1,6= s1,9 = s4,3 = s4,56= s4,9 = s7,3= s7,56= s7,9

sF= s2,3= s2,6= s2,9 = s5,3 = s5,6 = s3,9 = s8,3= s8.6 = s8,9


sA = correlation among measures containing method 1.

sB = correlation among measures containing method2.

sC = correlation among measures containing method 3.

sD = correlation among measures containing method 1 and those containing method 2.

sE = correlation among measures containing method 1 and those containing method 3.

sF = correlation among measures containing method 2 and those containing method 3.

Assuming simple method effects, all the error terms for measures using method 1 (e1, e4, and d1 in Figure 1) would be correlated with the correlations constrained as equal (With LISREL this can only be done by making Ksi 1 an endogenous factor and converting all the d's to e'S as was done in Figure 2). This is repeated for all other methods and intercorrelations among different methods. Unlike the models discussed above, the correlated errors model does not include additional factors to control for invalid components, rather, correlated error terms account for the correlation between invalid components.

It has been argued that using correlated errors is theoretically inelegant and makes actual theoretical relationships unclear (Gerbing and Anderson 1984). This is true only when correlated errors are specified in an atheoretical, post hoc basis. Theoretically supportable correlated errors have been used to model method effects in an multitrait-multimethod model (Arora 1982) and are extensively used with longitudinal data (Alwin and Jackson 1979).

When using correlated errors models, there is no requirement that all constructs be measured using the same invalid components (as with multitrait multimethod data). For example, some constructs can be measured with a single method component while others have some combination of methods. Finally, correlated error models may even be superior to traditional MTMM factor models (Widaman 1985), since they are not plagued by Heywood cases and provide more intuitively reasonable estimates (Marsh 1987).


Modelling invalid components is common when assessing validity with confirmatory factor analysis, but is much less frequently used when structural relationships are specified (exceptions include: Allen and Taylor 1985; Anderson 1987; Arora, 1982; Bielby, Hauser and Featherman 1977; Campbell 1983; Wolfle and Robertshaw 1982). Wolfle and Robertshaw present a good example of how systematic measurement error affects structural estimates. The effect of locus of control in period 1 on locus of control in period 2 dropped from 0.445 to 0.335 when method effects were accounted for using a correlated errors model. Reanalysis of Churchill and Surprenant (1982) data will further highlight the drastic difference in theoretical conclusions that may result when a model incorporating an invalid component rather than a congeneric measurement model is examined.

Churchill and Surprenant examined the effect of expectation and performance on disconfirmation and satisfaction using a video disk player and a plant. Churchill and Surprenant express concern about the inconsistent results between the two models, and suggest this may be due to differences in the measures used for each data set and the possibility of shared method variance. The pair-wise correlations reported in their study were reexamined using EQS to analyze both of these possibilities.

To test the effect of using different measures for the two data sets, a model using all the measures except 13 was fit to both data sets. This adjustment did effect the magnitude of the estimates but not the theoretical conclusion. Most notably, the plant data indicated disconfirmation affected satisfaction while the video disk player data still indicated it did not (see b4,3 in Table 1). The video disk player data also indicated a much larger effect for perceived performance on disconfirmation and satisfaction (see b3,2 and b4,2 in Table 1).

To test the possible effect of shared method variance, correlated errors were added to Churchill and Suprenant's (see Figure 2, Churchill and Suprenant 1982). Although not identical, the measurement methods used by Churchill and Surprenant for each question were very similar, except for the faces scale (measure 12). A model was specified with the error terms for measures 1 through 11 correlated with one another and constrained equal (only one method effect for measures 1 through 11). Accounting for method effects resulted in models with similar estimates for both data sets (see Table 1). While some minor differences in the size of the estimates still exist, the theoretical conclusions for the two models are identical. Both models now indicate; disconfirmation affects satisfaction (b4,3), perceived performance has a minimal effect on satisfaction (b3,2) and perceived performance has a large effect on disconfirmation (b4,2).

In summary, as Churchill and Surprenant suggest, not accounting for systematic measurement error appears to have effected theoretical conclusions. In particular, the conclusion that, "the effects of expectation, disconfirmation, and performance on satisfaction may differ for durable and nondurable products," was not supported when shared method variance is modeled. In addition, their conclusion that, ". . . both researchers and managers must direct much more attention to the impact of performance levels (on satisfaction)," may have been overstated.


The researcher faces a dilemma when confronted with systematic measurement error. One alternative is to ignore systematic measurement error and use a model that is known to be misspecified. The other alternative is equally unattractive because data requirement may be excessive or restrictive assumptions about the nature of the systematic measurement error may be needed.

The ability of the researcher to explicitly model systematic measurement error will be limited for several reasons. First, the multiple measures necessary for this approach are often not available. Either multiple methods have not been developed (e g. behavior intentions) or the researcher is unable to include multiple measures (data already collected or cost/length constraints). Even when multiple measures are available, the model may not be identified. MTMM models with more than three traits and three methods are considered identified, even with all traits and methods specified as correlated. However, replacing the correlation specifications with causal relationships may cause identification problems for subcomponents of the model. For example, a standard MTMM model is identified for the Arora data (1979), but when trait intercorrelations are replaced with structural relationships suggested by Arora, LISREL indicates the model is not identified. Determining model identification must be handled on a case by case basis.

A final limitation is that the researcher must be able to identify and correctly model the invalid component(s). This may be difficult if the invalid component is not method based, or more than one invalid component exists. If more than one invalid component exists, then it is unlikely that the researcher can explicitly model all the relevant factors and still have an identified model. Identifying relevant systematic error may also prove difficult. Even when multitrait-multimethod data is used, simply modelling method effects does not insure proper specification. There may be other types of systematic error which are improperly modeled. To help minimize the existence of systematic measurement error (other than method effects), scales should be constructed using the procedures outlined by Churchill (1979) and Gerbing and Anderson (1988).

The second order factor models bypass the need to explicitly identify and model the invalid component. The advantage to this approach is its ability to account for multiple invalid components which do not have to be explicitly identified and modeled. The major problem associated with this approach is model identification (Gerbing and Anderson 1984). As with the explicit modelling approach, the identification problem must be handled on a case by case basis.

A problem with the second order factor model in Figure 3 is that systematic errors are uncorrelated. This may be inappropriate in some cases such as when there are shared method effects. When these correlations are not modeled, the second order factor will contain part of the invalid component. The problem becomes more serious the higher the correlation among the invalid components. This can be overcome by including correlations among the invalid components, however, empirical identification is difficult to achieve if a large percentage of the invalid components are correlated. Finally, the second order factor model not only requires multiple methods, but also requires the use of multi-item rather than single item scales. This is more restrictive than the traditional MTMM approach which only requires a single item for each method/trait combination.

The second order factor model in Figure 4 allows the researcher to explicitly model invalid components that may be correlated while simultaneously accounting for any unknown systematic error. This approach is the most powerful approach and makes the least restrictive assumptions about measurement conditions. As with the two approaches discussed above, identification can be a major problem. For example, Marsh and Hocevar (1988) present a three trait, three method model with three measures for each trait-method combination. This model was identified when correlations among the traits and methods was specified. However, changing the correlations to causal links will cause the model to be unidentified since certain subcomponents of the model are not identified. In addition, the data requirements for complex models can be quite formidable, not only are multi-item, multiple method scales required, but there must be a large number of them since the number of specified relationships is also large.

Correlated error models have the most restrictive assumptions about the systematic errors. Restricting correlated errors to be equal implies that the percentage of measurement error, due to a specific systematic error, is identical for all measures. This assumption is violated if interactions between traits and methods exist. By definition, trait-method interactions indicate that the method variance differs across traits. The prevalence of trait-method interactions is unknown because they are difficult to assess. However, Widaman (1985) argues that these interactions should not exist and that they may be safely ignored in empirical work (commonly practiced in multitrait-multimethod analysis). The appropriateness of the equality constraints can be examined using first-order derivatives and modification indices. If these indicate that particular equality constraints are unreasonable (i.e., significantly reduce fit), the offending constraints can be relaxed. If only a few constraints require removal, the model may still be identified and- the problems associated with the equality constraints will be reduced.

The correlated errors model will require that a correlation, rather than a covariance matrix, be analyzed. A covariance matrix cannot be analyzed since it is unlikely that the constraint of equal covariance between errors is reasonable if the measures have significantly different variances. Analyzing a correlation matrix is a compromise, however, because of the scale invariance problem. Analyzing a correlation matrix can lead to questionable estimates of the x2 value and the standard errors. For example, Bentler and Lee (1983) found the x2 value was optimistic by about 15%, and the correct standard errors were up to three times less efficient when a correlation matrix is analyzed. Nevertheless, dealing with scale invariance may be preferable to using a model that is known to be misspecified. Bentler and Lee (1982) found the parameter estimates were very similar no matter which matrix was analyzed. In addition, Lee (1985) has suggested an estimation procedure which is scale invariant, although programs using this procedure are not yet readily available.


The pervasiveness of systematic measurement error, such as method effects, is widely recognized, yet researchers frequently do not model these invalid components (Darden, Carlson, and Hampton 1984). Commonly used congeneric measurement models may be inappropriate since they ignore the existence of systematic error and may often lead to inflated estimates of the structural relationships. However, the problem can be remedied by properly modelling systematic error in one of three ways, explicitly model the invalid components, second order factor analysis, or theoretical use of correlated errors. The results of this study underscore the importance of assessing and properly modelling measurement error before testing theory. Models ignoring systematic measurement error and using similar measures for all variables make it impossible to determine whether results are theoretically important or simply statistical artifacts.



Measurement quality must be carefully considered when using structural equation models. Without careful consideration of the possible presence of systematic measurement error, interpretation of the structural relationships is confounded. When selecting an approach to model systematic measurement errors, there is a tradeoff between the data requirements and assumptions about the nature of systematic error. If multi-item measures using multiple methods is available, complex modelling of the invalid components can be accomplished. If such data is unavailable, then a correlated errors model with the associated restrictive assumptions must be used. Table 2 summarizes the data requirements and disadvantages of the approaches for including systematic measurement error. Preferably, multiple methods should be used to measure each construct but the possibility of systematic error must be assessed and properly modeled even when a single method is used.


Allen, Richard L. and Benjamin F. Taylor (1985), "Media Public Affairs Exposure: Issues and Alternative Strategies," Communication Monographs, 52 (June), 186-201.

Alwin, D. F. and Donald J. Jackson (1979), "Measurement Models for Response Errors in Surveys: Issues and Applications," in K. F. Schuessler (ed.), Sociological Methodology 1980, San Francisco: Jossey-Bass, 68-119.

Anderson, James C . (1987), "An Approach for Confirmatory Measurement and Structural Equations Modelling of Organizational Properties," Management Science, 33 (April), 525-41.

Anderson, James C. and David W. Gerbing (1982), "Some Methods for Respecifying Measurement Models to Obtain Unidimensional Construct Measurement," Journal of Marketing Research, 19 (November) 453-460.

Arora, Raj (1982), "Validation of an S-O-R Model for Situation, Enduring, and Response Components of Involvement," Journal of Marketing Research, 19 (November), 505-16.

Bentler, Peter M., and Sik-Yum Lee (1983), "Covariance Structures Under Polynomial Constraints: Applications to Correlation and Alpha-Type Structural Equation Models," Journal of Educational Statistics, 8 (Fall), 207-22.

Bielby, William T., Robert M Hauser, and David L. Featherman (1977), "Response Error of Nonblack Males in Models of the Stratification Process," Journal of the American Statistical Association, 72 (December), 723-35.

Burt, Ronald (1976), "Interpretational Confounding of Unobserved Variables in Structural Equation Models," Sociological Methods and Research, 5 (August), 3-52.

Campbell, Richard T. (1983), "Status Attainment Research: End of the Beginning or Beginning of the End," Sociology of Education, 56 (January), 47-62.

Cattell, R. B. (1978), The Scientific Use of Factor Analysis in Behavioral and Life Science, New York: Plenum Press.

Churchill, Gilbert A., Jr. (1979), "A Paradigm for Developing Better Measures of Marketing Constructs," Journal of Marketing Research, 16 (February), 64-73.

Churchill, Gilbert A., Jr. and Carol Surprenant (1982), "An Investigation Into the Determinants of Consumer Satisfaction," Journal of Marketing Research, 19 (November), 491-504.

Cote, Joseph A and M. Ronald Buckley (1987), "Estimating Trait, Method and Error Variance: Generalizing Across Seventy Construct Validation Studies," Journal of Marketing Research, 24 (August), 315-8.

Darden, William, S. Michael Carlson and Ronald D. Hampton (1984), "Issues in Fitting Theoretical and Measurement Models in Marketing," Journal of Business Research, 12 (September), 273-296.

Fiske, Donald (1982), "Convergent-Discriminant Validation in Measurements and Research Strategies," in David Brinberg and Louise H. Kidder (eds), Forms of Validity in Research, San Francisco: Jossey-Bass, Inc., 77-92.

Gerbing, David H. and James C. Anderson (1984), "On the Meaning of Within-Factor Correlated Measurement Errors," Journal of Consumer Research, 11 (June), 572-580.

Gerbing, David H. and James C. Anderson (1988), "An Updated Paradigm for Scale Development Incorporating Unidimensionality and Its Assessment," Journal of Marketing Research, 25 (May), 186-92.

John, George and Torger Reve (1982), "The Reliability and Validity of Key Informant Data from Dyadic Relationships in Marketing Channels," Journal of Marketing Research, 19 (November), 517-24.

Kumar, Ajith and William R. Dillon (1987), "The Interaction of Measurement and Structure in Simultaneous Equation Models with Unobservable Variables," Journal of Marketing Research, 24 (February), 98-105.

Lee, Sik-Yum (1985), "Analysis of Covariance and Correlation Structures," Computational Statistics and Data Analysis, 2 (February), 279-95.

Marsh, Herbert W. (1987), "Confirmatory Factor Analysis of Multitrait-Multimethod Data: Many Problems and a Few Solutions," Unpublished Working Paper, University of Sydney.

Marsh, Herbert W. and Dennis Hocevar (1988), "A New, More Powerful Approach to Multitrait-Multimethod Analyses: Application of Second-Order Confirmatory Factor Analysis," Journal of Applied Psychology, 73, 107-17.

Peter, J. Paul (1981), "Construct Validity: A Review of Basic Marketing Practices," Journal of Marketing Research, 18 (May), 133-45.

Peter, J. Paul and Gilbert A. Churchill, Jr. (1986), "Relationships Among Research Design Choices and Psychometric Properties of Rating Scales: A Meta-Analysis," Journal of Marketing Research, 23 (February), 1-10.

Phillips, Lynn W. (1981),- "Assessing Measurement Error in Key Informant Reports: A Methodological Note on Organizational Analysis in Marketing," Journal of Marketing Research, 18 (November), 395415.

Smith, Kent W. (1974), "On Estimating the Reliability of Composite Indexes through Factor Analysis, Sociological Methods and Research, 2 (May), 485-510.

Widaman, Keith F. (19855, "Hierarchically Nested Covariance Structure Models for Multitrait-Multimethod Data," Applied Psychological Measurement, 9 (March), 1-26.

Wolfle, Lee M. and Dianne Robertshaw (1982), "Effects of College Attendance on Locus of Control," Journal of Personality and Social Psychology, 43, 802-10.