Decomposing the Correlation Matrix in Panel Data

Donald R. Lehmann, Columbia University
John U. Farley, Columbia University
ABSTRACT - ANOVA is suggested as an approach to decomposition of a correlation matrix as an aid to model building. The decomposition proceeds sequentially from a simple "direct effects" to a "significant effects" model relating four constructs measured in two waves of a panel study tracking the introduction of a new small automobile.
[ to cite ]:
Donald R. Lehmann and John U. Farley (1981) ,"Decomposing the Correlation Matrix in Panel Data", in NA - Advances in Consumer Research Volume 08, eds. Kent B. Monroe, Ann Abor, MI : Association for Consumer Research, Pages: 233-237.

Advances in Consumer Research Volume 8, 1981      Pages 233-237


Donald R. Lehmann, Columbia University

John U. Farley, Columbia University


ANOVA is suggested as an approach to decomposition of a correlation matrix as an aid to model building. The decomposition proceeds sequentially from a simple "direct effects" to a "significant effects" model relating four constructs measured in two waves of a panel study tracking the introduction of a new small automobile.

The correlation matrix serves asa the basis for many types of analysis, including two-variable association, multiple-relationship structural models and reliability and validity studies. Methods of analysis include coefficient alpha, multi-trait and multi-method approaches to the analysis of within-construct and among-construct correlations, factor analysis for assessment of variable structure and dimensionality, and regression and path analysis for estimation of structural parameters linking different constructs. When multi-wave and/or multi-brand data are available, the correlation matrix has regularities that may offer special information useful for systematic model building. This paper describes a direct approach to analysis of patterns in a correlation matrix. The approach is designed to help gain insight into structure in a set of variables in situations when repeat measurement is available.


The basic approach involves decomposing the set of zero-order correlations with analysis of variance, attempting to explain the correlations on the basis of combinations of variable, construct and remeasurement patterns. The approach, which has similar goals as approaches for deducing structural relationships among a set of variables (Baggozi 1979, Joreskog and Sorbom 1977), can be summarized as follows:

1. Underlying constructs are assumed to be linked causally.

2. Constructs are assumed to be measured with error.

3. Errors are assumed to have a correlated error structure.

4. Multiple measures are available on constructs over time, over multiple items or both.

5. An ANOVA incorporating direct effects (e.g., wave, brand, construct or variable) is used to assess "base-level" correlation in the data which may represent response style (Gruber 1979) or some other "cosmos factor" characteristic of the measurements.

6. Interactions (combinations of waves, constructs and variables) are specified to model particularly strong or weak relations among variables and hence to suggest intertemporal or intervariable links for a causal model.

This ANOVA approach is similar in spirit to the log-linear method for analysis of individual traits, in that it sequentially tests various effects to assess the impact of that particular variable on patterns of intercorrelations.

Different types of data configuration make different types of ANOVA models appropriate. Some possibilites are shown for illustration in a two-wave correlation mattrix in Figure 1.

a) for any single-wave study, a triangular correlation matrix such as I is available. In this case, it is possible to develop a model including both multiple measurements within construct and relationships between constructs.

b) for a two-wave study using different respondents, two such triangles (I and II) are available. Insofar as measurement is on parallel constructs (e.g., K = M and each set of measurements is identical) a parallel model can be estimated viewing the two triangles as replications. A similar approach can be used for two stimuli measured identically on either the same or different samples. Both will yield two triangular correlation matrices which can be viewed as replications.

c) remeasurement on the sane respondents either over stimuli in a cross section or over time as in a panel (or both) produces a series of rectangular correlation matrices like III, in addition to triangles I and II. When identical measurements are used on each wave or stimulus, III is square and test-retest reliabilities of the measurements appear on the main diagonal.

Direct decomposition of the correlation matrix uses various features of construct, repeated within-construct measurement, time and stimuli to develop design patterns to be used in analysis of variance models for assessing patterns of systematic differences in arrays of correlations of the type shown in Figure I. The design matrix for the ANOVA will be idiosyncratic to a particular application, depending on the number of waves, the extent to which parallel measurements are used, and the extent to which measurements on different stimuli (usually brands) can reasonably be viewed as replications. Further, the number of available square cross-wave and cross-stimuli matrices determines the number of replications or semi-replications available, which in turn determines the extent to which various second- and higher-order interactions can be incorporated into the ANOVA design.


The approach is illustrated in an analysis of four single-measure constructs which constitute the core endogenous variables in a consumer choice model (Farley, Howard and Lehmann 1976):

Intention to buy brand i

Attitude toward brand i

Confidence in ability to judge brand i

Perceived knowledge of brand i

There is theoretical reason to expect systematic contemporaneous inter-construct relationships, and there may be important inter-temporal relationships as well. The data consist of measurements on two waves of a national penal on small cars, with one wave taken immediately following thc introduction of Vega and the other three months later. All constructs were measured by self-report on a 10 point bi-polar adjective scale. The analyses uses the correlation matrices for Volkswagen (an existing brand) and Vega (the new brand) shown in Table 1.





Regression procedures were used for estimation. The design variables are binary and were coded +1, -1 so coefficients of a given effect sum to zero. Pair-wise interaction terms are formed by direct multiplication of the three direct effects (brand, wave and variable). The two brands are viewed as replications of the same data-generating process. Two different specifications of ipq are used--one symmetric (e.g., ipq = iqp) and one in which the coefficients are not constrained to be equal.


The results of the decomposition are discussed in terms of:

1. an assessment of residuals from a baseline "direct effects" model which leads to further inferences about model structure in the context of this potentially incomplete model. This model excludes x and i from (1).

2. tests on direct effects and pair-wise interactions in the full version of (1).

3. interpreting the parameters of an ANOVA medal containing effects identified as significant in steps (1) and (2).

The Use of Residuals.  A "direct efforts" model provides the baseline for assessment of possible interactions. Residuals from this ANOVA model, often disregarded in analysis of variance, provides guidance for improved specification. Patterns in the signs of the residuals from this parsimonious direct effects only model (Table 2) show clear underprediction of reliabilities and overprediction of cross-wave combinations of non-paired variables. The direct effects model does a reasonably good job predicting correlations of contemporaneous pairings. Further, it appears that a relatively small number of large underpredictions (particularly involving the reliabilities) offsets a larger number of smaller overpredictions (Table 3) in the cross-wave combinations.

These patterns motivate the use of a model incorporating interactions combining various pairings of variables measured across waves. The residuals thus provide guidance in structuring tests for specific interactions--a step necessary because the data base lacks the degrees of freedom for a full set of factorial interactions of all orders.

ANOVA Results.  Conventional analysis of variance for the full model containing both direct effects and pair-wise interactions are shown in Table 4. Since the design is not orthogonal because of the configuration of the correlation matrices, only the partial sum of squares are attributed to the individual effects. Common variance components are omitted from the numerators of all F statistics. The results:

1. no significant difference was found over time in the correlations, even though the Vega was newly-introduced. It appears that intervariable structure becomes stable quite quickly, probably in part because the product class is well-known and heavily advertised.

2. no significant difference vas found between Vega and Volkswagen correlations despite significant shifts in means of three of these variables for Vega over the introductory period and absence of such shifts for Volkswagen (Farley, Katz, Lehmann and Wiser, 1979). Stable inter-variable relationships can thus exist among key constructs describing consumer behavior even when the means of chose constructs are shifting over time.

3. significant differences occur in correlations involving different individual variables. This indicates that the variables are not simply one "common factor" but measure different constructs.

4. a significant test-retest effect was found, indicating reliability and/or the existence of significant carry-over effect for individual variables.

5. an interaction effect also exists for individual variable pairs, indicating the presence of some type of model linking these variables. A test for symmetry of interactions (e.g., that the correlations of attitude and knowledge are different for different wave pairings) shows that the interactions are not asymmetric. This implies that the interactions are stable over time.





The Regression Coefficients.  Additional insights into the nature of the data structure is provided by the regression coefficients from an ANOVA because the analysis of the suns of squares indicates existence but not direction of an effect. A reduced model consisting only of those variables identified by the ANOVA just discussed is used--the equivalent of pooling the sums of squares of the insignificant effects in Table 4 with those of the residual error term. The resulting "significant effects" model contains direct effects of variables, plus reliability and symmetric interaction terms. The coefficients (Table 5) are expressed in units of increment or decrements of correlation around the constant term which represents the grand mean. Among the individual variables, knowledge and attitude have incrementally positive effects while intention is incrementally negative and significant. The reliabilities add significant increments of correlation, while the interactions have modest negative effects.





Reports of ANOVA results often omit measures of goodness of fit which can be useful both for comparing alternative models for a given body of data and for comparing results from different but related studies (Farley, Lehmann and Ryan 1979) in terms of the ability of a model to grasp essential characteristics of a data generating process. The explanatory power of the "significant effects" model compares favorably with those of many semi-aggregate cross sectional models (Farley and Howard 1977), and with those containing various combinations of direct effects and interactions (Table 6) in this specific case.




This paper describes an ANOVA approach to direct decomposition of a correlation matrix on the basis of various direct effects and specified interactions. The analysis can be developed sequentially from a parsimonious "direct effects" specification and can lead to a "significant effects" model involving direct effects and interactions. The approach can be used to:

1. identify factors causing systematic variation in the correlations;

2. identify patterns in appropriately classified residuals from an ANOVA for extension of the decomposition process;

3. describe the process generating the correlations using the coefficients of individual variables and the coefficient of determination for an ANOVA.

The approach can be especially useful for constructing a design for analysis of panel measurements when a theory that is essentially contemporaneous is being used. The ANOVA models are likely to be parsimonious and can provide integrated information on reliability of individual measurements over time, on within-construct validity and on general patterns in among-construct relationships. While such a correlation decomposition is not an end in its own right, it may be useful for integrating intertemporal features of panel data into a general contemporaneous specifications of a structural model (Farley, Katz, Lehmann and Winer 1979).


Bagozzi, R. P. (1980), Causal Models in Marketing, New York: John Wiley and Sons.

Farley, J. U. and Howard, J. A. (1975), Control of "Error" in Market Research Data. Lexington, Mass.: D.C. Heath.

Farley, J. U., Kate, J. R. and Lehmann, D. R. (1977), "Patterns in Repeated Attitude Measurements on New and Established  Brands of Subcompact Car," working paper, Columbia University.

Farley, J. U., Kate, J. R. and Lehmann, D. R. (1978), "Impact of Different Sets of Comparison Brands on Evaluation of a New Subcompact Car Brand," Journal of Consumer Research, September, 5, 2, 138-143.

Farley, J. U., Kate, J. R., Lehmann, D. R. and Winer, R. S. (March 1979 forthcoming), "Two Approaches to Enriching Specifications of Consumer Decision Process Models," Proceeding, First Annual TIMS/ORSA Conference on Market Measurement, Stanford University.

Farley, J. U., Lehmann, D. R. and Ryan, N. J. (1979), "Generalizing From Imperfect Replication," working paper, Columbia University.

Gruber, Robert E. (1978), "The Impact of Response Style and Response Set on Large Scale Survey Research Involving Activity, Interest, and Opinion (AIO) Variables," unpublished Ph.D. dissertation, Columbia University.

Joreskog, K. G. and Sorborm, D. (1977), "Statistical Models for Analysis of Longitudinal Data," in D. J. Aigner and A. S. Goldberger, Latent Variables in Socioeconomic Models, Amsterdam: North Holland Publishing.