Causal Modeling: a General Method For Developing and Testing Theories in Consumer Research

Richard P. Bagozzi, Massachusetts Institute of Technology
[ to cite ]:
Richard P. Bagozzi (1981) ,"Causal Modeling: a General Method For Developing and Testing Theories in Consumer Research", in NA - Advances in Consumer Research Volume 08, eds. Kent B. Monroe, Ann Abor, MI : Association for Consumer Research, Pages: 195-202.

Advances in Consumer Research Volume 8, 1981      Pages 195-202


Richard P. Bagozzi, Massachusetts Institute of Technology


The behavioral sciences are now experiencing a revolution of sorts that is dramatically influencing the way researchers are approaching problems. The revolution entails a paradigm shift away from the many fragmented methods for inquiry currently employed toward a more holistic framework termed herein causal modeling. The purpose of this article is to outline and illustrate the use of causal modeling in consumer research. To this end, the expectancy-value model of attitude will be examined.

Before presentation of the approach, however, it will prove useful to provide some background. Causal modeling is often regarded by the unfamiliar as merely another data analytic technique to be used primarily in survey research. In reality, it is a method of enormous scope and power, where 'method' is taken here in its broadest philosophy of science sense.

In particular, we may look at causal modeling from at least four perspectives. First, causal modeling is a general philosophical orientation. It provides a way to operationalize the commonly accepted model of scientific inquiry posed by philosophers of science which stipulates that any true theory must contain, at a minimum, theoretical concepts, empirical observations, correspondence rules that link theoretical concepts to observations, and a rationale or set of laws connecting theoretical concepts (Bagozzi 1979). Further, when used as a tool for causal inference, causal modeling proceeds from a set of philosophical principles which guide the specification, test, and interpretation of any theory. Typically, the researcher will use either a neo-positivist or realist conceptualization of causation when employing causal modeling (Bagozzi 1980a).

Despite its common sense meaning, causal modeling is not limited to studies of cause-and-effect. Indeed, from a second perspective, causal modeling can be employed strictly in a measurement sense to compute internal consistency and test-retest reliabilities. Significantly, it allows the researcher the opportunity to assess a whole family of traditional measures of reliability such as Cronbach Alpha, as well as more general alternatives. It does this with less information than is required by traditional procedures in some instances, and it provides a means to take into account errors in variables and systematic error such as methods variance or other external confounds.

Third, causal modeling can be a valuable methodology for the examination of construct validity. Not only can it be used to perform a traditional multitrait-multimethod matrix analysis, but it does this in a more rigorous and less ambiguous way. Moreover, causal modeling furnishes one with a versatile means to investigate other forms of validity such as criterion related, predictive, and nomological validities. An important point to note is that causal modeling offers the advantage over traditional methods that measurement error is taken into account explicitly.

Finally, causal modeling is a general method for testing hypotheses. It can be used expediently in true experiments, quasi-experiments, cross-sectional surveys, panel studies, time series analyses, and cohort investigations, to name a few contexts. As in measurement and construct validation usages, it offers the opportunity to take into account measurement error as well as systematic biases. For example, the causal modeling analogue to the analysis of covariance in experimental settings gives a straight forward way to accurately correct for random error, adjust for differences between experimental and control groups, and generally model the influence of non-random factors. In this sense, it avoids the major limitation of the traditional analysis of covariance which, by failing to take into account errors in variables, can sometimes fail to detect a true causal effect or else erroneously identify an effect which does not in fact exist.

As important as causal modeling portends to be, it is not without faults and shortcomings. A statement of these and some of its benefits are presented at the end of this article. For further explication of causal modeling, the reader is referred to Bagozzi (1980a), Bentler (1980), and Joreskog and Sorbom (1979). We turn now to a prototypic development of causal modeling as a method of research.


To illustrate causal modeling, consider the expectancy-value model of attitude and image that one desires to examine its nature, measurement, antecedents, and implications. One way to do this is to scrutinize the following issues: (1) convergent validity, (2) reliability, (3) concurrent validity, (4) discriminant validity, (5) predictive validity, and (6) nomological validity. [Two additional standards that should be examined are the theoretical meaningfulness and the observational meaningfulness of concepts in the theory. The former refers to certain logic criteria as to well-formedness, internal consistency, and the like; while the latter concerns the adequacy of correspondence rules relating theoretical concepts to observed measures. In the interest of brevity, these criteria will not be analyzed here, and the reader is referred to other treatments of these issues by the author (Bagozzi 1979, 1980a, 1980b).] A verbal definition and mathematical expression for each of these follows.

Convergent Validity

At an intuitive level, convergent validity can be conceived as the extent to which two or more attempts to measure the same concept through maximally different methods are in agreement. The traditional standard for convergence requires that the correlations between measures of the same concept be greater than zero, statistically significant, and relatively large. However, because these criteria do not specifically take into account measurement error and can be misleading, the more rigorous causal modeling method will be used herein.

The null hypothesis for the causal model of convergence for an unidimensional attitudinal construct can be expressed as

y = L x + z   (1)

where y is a vector of p observations of attitude (y1, y2, ..., yp); x is a hypothesized attitudinal construct; x is a vector of factor loadings relating y to x; and z is a vector of unique scores (i.e., errors in variables). The attitudinal construct, x, is taken as an expectancy-value model where each Yi consists of the product of an expectancy (belief)-times-value observation. To fully specify the null hypothesis for the unidimensional attitudinal construct, the variance-covariance matrix of observations, E, must be written as

E = LL' + y  (2)

where y is a diagonal matrix of error variances for attitudinal measures. In words, equations (1) and (2) hypothesize that all of the variation in responses to the attitudinal measures can be accounted for by one underlying expectancy-value construct, except for random error.

The null hypothesis for the causal model of convergence for a multidimensional attitudinal construct can be expressed as

y = L x + z    (3)

E = LfL' + Y   (4)

where x is now a k < p vector of hypothesized attitudinal dimensions, f is the intercorrelation matrix of attitudinal dimensions, and the remaining symbols are as defined earlier. As a point of interpretation, equations (3) and (4) hypothesize that all of the variation in attitudinal responses can be accounted for by k oblique dimensions, except for random error. This will occur when the responses of subjects (a) achieve a high degree of convergence among measures within attitudinal dimensions (i.e., when the within dimension measures are both highly intercorrelated and uniform in their pattern of values) and (b) exhibit uniform and significantly lower correlations among measures across dimensions. Figure 1 shows a causal diagram which is, in fact, the model achieving construct validity in the present study.




Two internal consistency measures of reliability can be effectively applied to observations. The reliability of individual items, Pi can be computed as


where li is the factor loading relating the ith measure of attitude (i.e., the ith belief times evaluation product) to its respective attitudinal dimension, xj; and the remaining symbols are as defined earlier. Similarly, the reliability of a composite, pc of r observations of xj can be calculated as


As with Cronbach Alpha, pi, and pc should only be applied to unidimensional attitude scales (or to homogeneous subdimensions) because to do otherwise would capitalize on the shared variance across dimensions and yield generally inflated values. Hence, it is recommended that one examine convergent validity first before computing reliabilities. Finally, it should be noted that equation (6) is similar to the standard Cronbach Alpha formula except that the latter assumes, a priori, that each observation of xj contributes equally (i.e., the li's are set equal to unity). Thus, equation (6) is more general and less restrictive than Cronbach Alpha.

Concurrent Validity

The degree to which a measure of a concept correlations with a measure of a similar concept when both should naturally covary is known as concurrent validity (which itself is a special case of criterion-related validity). In the present research, concurrent validity will be examined as the degree of association between the attitude toward the act (measured with five semantic differential items) and the expectancy-value model of attitude. Figure 2 illustrates a causal diagram for concurrent validity when attitude toward the act (Aact) exists as an unidimensional construct and the expectancy-value model is multidimensional.

The null hypothesis for the concurrent validity model of Figure 2 is

y = L x + z     (7)

E = LfL' + Y     (8)

where y = (y1, y2, ..., y12)',  x = (Aact, EV1, EV2, EV2)',  x = (z1, Z2, ..., z12)', EQUATION.



This is a stringent test of concurrent validity because the hypothesis will be sustained only when (a) the within dimension correlations of measures are high, statistically significant, and uniform and (b) the across dimension correlations of measures are uniform and significantly lower than the within dimension values. Given this state of affairs, concurrent validity can be assessed by examination of f21, f31, and f41 (i.e.. the correlations between the attitude toward the act and the expectancy-value dimensions). These cross dimension correlations should be relatively high and statistically significant, yet should be lower in magnitude than f32, f42 and f43 which, in turn, represent the intercorrelations among the three expectancy-value dimensions.

Discriminant Validity

Discriminant Validity refers to the degree to which a concept differs from other concepts. Given the establishment of convergent validity, discriminant validity can be examined through an inspection of f21, f31, and f41 (see Figure 2). The comparison of the goodness-of-fit tests for the model of Figure 2 where f21, f31, and f41 are left unconstrained to the same model wherein f21, f31, and f41 are constrained to equal unity will provide an explicit test of discriminant validity. The successful achievement of convergence and discrimination as outlined heretofore would indicate that one's construct (attitude) is valid in the sense of achieving homogeneity and uniqueness.

Predictive Validity

If a measure of a concept is related empirically as an antecedent to, or consequent of, a measure of another concept, then it is said to achieve predictive validity. The relation should not be fortuitous but rather should have its basis in the mechanism or theory connecting the two concepts.

The predictive validity of the expectancy-value model will be examined in two senses. As a predicted variable, the expectancy-value model will be observed as a function of the extent of the performance of past behavior, where the behavior relates directly to the act encompassed by the content of attitude. As a predictor, the expectancy-value model will be investigated as it forecasts three behavioral intentions that are logically entailed by a favorable attitude.

Specifically, Figure 3 and 4 show the expectancy-value model as a predicted and predictor variable, respectively. In Figure 3, past behavior (PB) is proposed to be antecedent to three expectancy-value dimensions. The structural equations for the null hypothesis of this model are


Figure 4 posits that the expectancy-value model (EV) predicts three behavioral intentions (B11, B12, B13). The structural equations for the null hypothesis of this model are

EQUATION    (10)





A single construct is used to represent the expectancy-value model in order to avoid problems of multicollinearity which would exist had one employed the three dimensions -- EV1, EV2, EV3 -- as simultaneous predictors. The three measures of EV are, respectively:


where ei and vi are respectively, particular belief and evaluation items from a questionnaire.

Nomological Validity

Nomological validity indicates the degree to which predictions from a formal theoretical network containing a concept of interest are confirmed. From one viewpoint, the difference between predictive and nomological validities might be regarded as one of degree and not kind. Predictive validity entails the relationship of a concept to a single antecedent or consequent. Nomological validity in contrast, involves many antecedents and/or consequents in a complex theoretical system. The particular test of the nomological validity of the expectancy-value model conducted herein is based on attitude and learning theory (see Bagozzi ]980b). That is, it is hypothesized that a person's intentions to act in a particular way will be a function of (a) one's beliefs about thc consequence of performing the behavior and the evaluation of those consequences (i.e., the expectancy-value attitude), (b) the extent of having performed the behavior in the past, (c) one's personal normative belief (PNB) that he or she should perform the behavior, and (d) one's social normative belief (SNB) that others whose opinions are valued feel that he or she should perform the behavior.

Figure 5 illustrates the null hypothesis for the nomological validity model. The structural equations for the key relations are

EQUATION    (11)


The foregoing hypotheses were examined in the context of attitudes toward the act of donating blood. The results presented here constitute analyses performed on a sub-sample of 117 faculty, students, and staff which was drawn from part of a larger study. The full study was a quasi-experiment performed by the author (Bagozzi 1980b) and, for purposes of description, can be termed a post-test only design with nonequivalent multiple groups, multiple covariates, and measurement error modeled explicitly. Although a total of eight complex hypotheses were tested across and within groups on main effects and slope effects, only part of the within group analyses for one of three groups will be investigated herein. This is necessary for purposes of brevity, given space constraints. Hence, the present study is a survey analysis. The reader is urged to examine the quasi-experiment presented in Bagozzi (1980b), in order to gain a more complete picture of the scope and power of causal modeling. Also, the author presents a detailed description of the pretests, questionnaire, sample, methodology, and other related issues. Joreskog and Sorbom's (1978) program, LISREL, was used for all analyses.




On the hypothesis that people would form complex multidimensional expectancy-value attitudes rather than unidimensional ones (see Bagozzi 1980b for the rationale). convergent validity was examined. As predicted, the responses to the seven expectancy times value products did not converge to yield a single underlying attitudinal construct (c(14) = 72.95, p = .00). However, convergence was achieved for the multidimensional expectancy-value model, as hypothesized (c2 (11) = 5.86, p = .88). The second and third columns of Table 1 list the factor loadings and error variances, respectively, for the 7 expectancy times value products described in column one. Notice that each loading is relatively large in value and twice its respective standard error, and the error variances are low to moderate in magnitude.

Column four in Table 1 shows the individual item reliabilities where it can be seen that all values reach acceptable levels except the seventh which should be regarded as borderline. As presented in the final column of Table 1, the composite reliabilities are quite large in magnitude and thus indicate that the measures of the three dimensions of the expectancy-value model exhibit a high degree of internal consistency.

The goodness-of-fit test for the model testing concurrent validity shows that the hypothesis cannot be rejected (i.e., c2(48) - 56.96, p = .18). The intercorrelation matrix of Aact, EV1, EV2, and EV3 is


Each of the correlations between Aact and the EVi is statistically significant at the .001 level or better and is lower than the correlations among the EVi.

Because the likelihood functions for the model represented by equations (7) and (8) could not be evaluated when f21, f31, and f41 were constrained to unity, it was not possible to examine the difference in c2-tests necessary for an explicit test of discriminant validity. However, inspection of o indicates that the relevant entries are each far below 1.000, with the differences being greater than the values necessary to achieve significance at the .001 level or better. Thus, the evidence suggests that the expectancy-value model achieves uniqueness when compared to the semantic differential operationalization of attitude toward the act. [It should be noted that Aact, itself, attained convergent validity (c2(5) = 4.16, p = .53) and demonstrated high individual (pi = .500 to .788) and composite (pc = .916) reliabilities. The expectancy-value model is largely a cognitive measure of attitude, while Aact is predominantly affective in content.]

Table 2 summarizes the results for predictive validity. Looking first at column one, we can see that past behavior predicts all three behavioral intentions [Two of the behavioral intention items asked the probability one would (a) give blood "sometimes in the future" and (b) become a regular donor, respectively. These were measured on 11-point scales. The third behavioral intention item asked how frequently one might give blood if they had become a regular donor and was measured on a 7-point scale ranging from "5 times per year" to "would not become a regular donor."], as hypothesized. All parameter values are in the proper direction and are at least twice their standard errors; and the overall goodness-of-fit test indicates a very good correspondence indeed (c2 (16) = 7.67, p = .96). The findings for the expectancy-value model as a predictor are shown in column two of Table 2. In general, the results are mixed. Although the goodness-of-fit test indicates a poor fit overall (c2 (6) = 22.93, p = .00), the parameter values are in the proper direction and are greater than twice their respective errors. Moreover, inspection of the residual matrix shows that the model captures most of the variation in responses.

Table 3 presents the results for the test of nomological validity. The overall goodness-of-fit test indicates a borderline fit (c2(18) = 28.22, p = .06). Notice that the expectancy-value model predicts all three behavioral intentions significantly and in the predicted direction and that two of the three intentions are predicted by past behavior. Neither personal normative beliefs nor social normative beliefs function as significant predictors of behavioral intentions, however. Apparently, future volitions are only under the control of attitudes and past behavior.

But are they in reality? The results of the quasi-experiment suggest that attitudes are not validly related to intentions at all (Bagozzi 1980b). The findings in Table 3 are based on a sample of individuals who had given blood 20 minutes prior to filling-out the questionnaire or less. These individuals thus could have inferred their attitudes from their own prior behavior; and attribution and self-perception arguments could serve as rival explanations for the observed relations. However, when this sample was compared to a second sample of previous donors who had last given blood two months prior to filling-out the questionnaire or longer, only past behavior functioned as a valid determinant of intentions. It appears that attitudes do not supply any predictive power over and above learning theory arguments. Similarly, attitudes failed to predict intentions for those who had never given blood in the past. In sum, while considerable evidence exists for (a) the convergent, concurrent, discriminant, and predictive validities of the expectancy-value model of attitude, (b) reliability in measurement of attitudes, and (c) a valid relation from behavior to attitude, one must question the hypothesis that attitudes influence intentions.

As a point of comparison, it should be noted that it would not have been legitimate to use the commonly computed Fishbein model which is formed by summing the expectancy times value products to arrive at a single number for the expectancy-value attitude. The Fishbein-model is justifiable only when the items form an unidimensional scale. Although few authors have ever demonstrated the reliability and validity of their measures (see Bagozzi 1980b for a review), the practice has been in consumer research to merely rely on the face validity of the expectancy-value model and to assume unidimensionality. For discussions and illustrations of the problems entailed by this practice, see Bagozzi (1980b , c).


A number of shortcomings of the causal modeling method deserve mention. First, because the goodness-of-fit chi-square test is directly proportional to sample size, it is likely that virtually all models will be rejected in very large samples. There are at least three ways to mitigate this limitation. One alternative is to confine one's analysis to comparisons among nested models and to examine the differences in chi-square tests. This provides a means to assess the tenability of valid causal paths. A second procedure that might be meaningful in certain contexts is to randomly select subsamples from a larger population and then perform the causal analysis on these subsamples. One could then test the invariance of key parameters across subsamples, as well as examine the goodness-of-fit tests for each subsample. This approach might be useful when one desires to use separate fitting and validation samples. A third alternative is to scrutinize the residual matrix and ascertain whether the amount of information remaining is trivial for all practical purposes. Combinations of all three procedures might prove effective in some cases.

A second limitation of causal modeling is that it is strictly appropriate to use only large samples. Although what constitutes a large sample is open to question, this author has found that in many instances it is justifiable to employ causal modeling when the sample size minus the number of parameters to be estimated is greater than about 50. Hence, for most applications, the sample should be at least 60 to 70, or so. Unfortunately, very little is known about the small sample properties of the parameter estimates generated by LISREL, although some work is now being performed in this regard by statisticians. Hence, use of the procedures with small samples must be regarded with caution.

A third shortcoming of causal modeling is that it is designed to be used with measures that are at least interval scaled. Very little is known about the consequences of using measures that are ordinal. However, if one forms measures as the sum of independently distributed ordinal measures, say, then it is probably safe to assume that the summed variates satisfy the normality assumptions. Further, if the distribution of responses to variables is not too excessively skewed and if many scale steps are used (say seven or more), then the distributional assumptions of causal modeling may not be violated detrimentally for many applications. When in doubt, it is best to check the distributional properties of one's observations. Joreskog and Sorbom (1978, p. 13) provide references for procedures to "robustify" one's data.

Still another limitation is the requirement that hypothesized relations be linear. This will not pose problems generally when transformations can be made to the data to reflect nonlinearities or when the underlying processes being modeled are linear to approach being linear. However, when such is not the case, causal modeling becomes less useful.

A final problem to note concerns the framework for hypotheses. Causal modeling sets-up the null hypothesis such that a nonsignificant chi-square value indicates a satisfactory fit. Ordinarily, one prefers that the hypothesis be constructed in an opposite way, but this is impossible to do so. In practice, the causal modeling approach usually gives valid results. Moreover, Bentler (1980) suggests the use of an index of the amount of information gained as a partial solution to the null hypothesis dilemma.








It is important to stress that causal modeling represents only one link in the research process. Issues such as concept formation, questionnaire development, sampling, overall research design (e.g., random assignment and construction of experimental procedures), and the interpretation of findings are equally important areas for concern. Causal modeling should not be regarded as a replacement for any of these; indeed, each constitutes a necessary step in the process. Rather, causal modeling supplies the researcher with a powerful method for improving the conduct of research, given that care has been taken at all points in the process.


Bagozzi, Richard P. (1979), "The Role of Measurement in Theory Construction and Hypothesis Testing: Toward a Holistic Model," in O. C. Ferrell, S. W. Brown, and C. W. Lamb, Jr., Eds., Conceptual and Theoretical Developments in Marketing, Chicago: American Marketing Association, 15-33.

Bagozzi, Richard P. (1980a), Causal Models in Marketing, New York: John Wiley & Sons.

Bagozzi, Richard P. (1980b), "On the Construct Validity of the Expectancy-Value Model of Attitude," unpublished working paper, Massachusetts Institute of Technology.

Bagozzi, Richard P. (1980c), "A Holistic Methodology for Modeling Consumer Response to Innovation," unpublished working paper, Massachusetts Institute of Technology.

Bentler, Peter M. (1980), "Multivariate Analysis with Latent Variables: Causal Modeling," in Annual Review of Psychology, 31, 419-456.

Joreskog, Karl C., and Sorbom, Dag (1978), LISREL: Analysis of Linear Structural Relationships by the Method of Maximum Likelihood, Chicago: National Education Resources.

Joreskog, Karl C., and Sorbom, Dag (1979). Advances in Factor Analysis and Structural Equation Models, Cambridge, MA: Abt Books.