Exploring Attitude Construct Validity: Or Are We?

Joel B. Cohen, University of Florida
[ to cite ]:
Joel B. Cohen (1979) ,"Exploring Attitude Construct Validity: Or Are We?", in NA - Advances in Consumer Research Volume 06, eds. William L. Wilkie, Ann Abor, MI : Association for Consumer Research, Pages: 303-306.

Advances in Consumer Research Volume 6, 1979      Pages 303-306

EXPLORING ATTITUDE CONSTRUCT VALIDITY: OR ARE WE?

Joel B. Cohen, University of Florida

[Construct validity and related philosophy of science issues defy easy resolution. When positions are taken they tend to reflect one's orientation to and faith in science. I wish to acknowledge stimulating discussions on these topics with a number of my colleagues at Florida, particularly Gordon Bechtel and Barry Schlenker.]

INTRODUCTION

Two of the three papers presented in this session deal specifically with construct validity, and both of these choose the construct "attitude" as the basis for their discussion and empirical examination. The Cattin paper also addresses an important topic: cross validation of regression models to provide a more reliable estimate of fit to appropriate criterion variables. Because of the importance and particularly the underemphasis of the construct validity issue and the central role of attitudes in the study of consumer behavior, I have chosen to confine my comments to the former two papers and the rather similar positions they take.

WHAT IS CONSTRUCT VALIDITY?

The 1974 revision of Standards for Educational and Psychological Tests defines a construct as, "an idea developed or 'constructed' as a work of informed, scientific imagination; that is...a theoretical idea developed to explain and to organize some aspects of existing knowledge." Central to the concept of construct validity is the dictum that a construct, "is a dimension understood or inferred from its network of interrelationships" (1974, p. 29).

In order to better understand the significance and implications of the above for empirical efforts at construct validation it is necessary to go back a few years. Between 1950 and 1954 a blue ribbon committee was convened by the American Psychological Association to establish standards for test validity. The preceding decade had seen much discussion and dissatisfaction with conventional notions of validity, and out of this came the committee's recommendation to distinguish four types of validity: predictive validity, concurrent validity, content validity, and, for the first time, construct validity. The first two of these are today often grouped together as criterion-related validity and apply when one wishes to infer an individual's standing or performance on some criterion variable from a test or trait score. What is of utmost concern here is that such a test "work" for the particular criterion variable of interest. Content validity is established by showing that the behaviors demonstrated in completing the test constitute a representative sample of behaviors to be found in the criterion domain of interest. Test items, for instance, must encompass all relevant facets (e.g., skills, knowledge) of a performance domain in order for the test to be a valid representation of that domain.

Construct validity rests on a philosophy of science position that, "to 'make clear what something is' means to set forth the laws in which it occurs...a nomological network" (Cronbach and Meehl, 1955, p. 290). The network generates testable propositions which relate test or trait scores (as representations of a construct) to other constructs some of which are observable through measurement or experimental observation. In a sense, the meaning of test or measure "validity" comes down to an issue of the specific types of inferences one wishes to make. A measure or test is merely a principle for making inferences (see Cronbach and Meehl, 1955, p. 297). Construct validity involves inferences to a network of related constructs and observables, the purpose being to better understand the nature of the construct being measured and to further develop the theoretical structure in which it is imbedded.

CAMPBELL AND FISKE'S APPROACH

The construct validity position taken by Campbell and Fiske (1959), in setting forth the logic of the multi-trait-multimethod matrix, represents a concern with "the adequacy of tests as measures of a construct rather than with the adequacy of a construct as determined by the confirmation of theoretically predicted associations with measures of other constructs" (1959, p. 100). It is Campbell and Fiske's premise that, "before one can test the relationships between a specific trait and other traits, one must have some confidence in one's measures of that trait" (1959, p. 100).

It is this author's contention that the multitrait-multimethod approach is designed specifically to supply such confidence rather than as a paradigm for examining construct validity. In a subsequent, clarifying paper (Campbell, 1960) a distinction is drawn between "two types of construct validity": "trait validity" and "nomological validity." "Trait validity", to use Campbell's example, is illustrated by validating the Taylor Manifest Anxiety Scale against psychiatrists' ratings. "Nomological validity", on the other hand, is illustrated by interpreting such test scores within the theoretical structure of Hull-Spence learning theory (i.e., as a measure of D) and generating predictions of performance in learning situations.

Whenever possible, it makes a great deal of sense to compare a test with other independent means of measuring the same trait. It is also clear that such "trait validity'' goes beyond criterion-based validity (i.e., no one test is the criterion), still the existence of higher correlations among alternative measures of the same trait relative to other traits may not go far in defining the nature of the construct or examining its role in a theoretical structure. It appears to be stretching things a bit, therefore, to think of "trait validity" as construct validity.

Much depends on the theoretical rationale for and nature of the traits and methods selected: subsequent inferences regarding validity must rest upon exactly what is being examined in the matrix. An example may clarify this. Campbell and Fiske term the monotrait-heteromethod values in their multitrait-multimethod matrix the "validity diagonal" and argue that entries in the validity diagonal provide evidence regarding convergent validity. Take the simplest case, in which each method of measurement for a particular trait is represented by a single item. According to Campbell and Fiske, convergent validity is supported by high correlations among these methods/items. The reader will recognize, however, that this is little more than high internal consistency reliability, which is itself, of course, an important aspect of test construction. In traditional multitrait-multi-method analyses, single items are replaced by entire scales. Still, the concept is the same--an assessment of the degree to which these measures "hang together." In discussing this issue, Campbell and Fiske argue that "reliability and validity can be seen as regions on a continuum," with test-retest and internal consistency reliability progressively closer to validity (1959, p. 83). The key issue to Campbell and Fiske lies in the independence of measurement approaches: reliability representing convergence among maximally similar methods and validity representing convergence among maximally different methods.

Unless methods of measurement actually represent aspects of the nomological network, however, convergence among methods may do little to illuminate the construct. While methods qua nomological observables conceivably could fit this requirement (e.g., observation of performance theoretically predicted to covary with a trait or state measure), alternative test formats, objective vs. subjective means of assessment and the like cannot be said to be of this character.

Discriminant validity (as operationalized by Campbell and Fiske) would appear to be more directly relevant to construct validity provided that an absence of association among the traits selected for study is theoretically meaningful (i.e., enables the investigator to rule out certain interpretations regarding the meaning of the construct). In Campbell and Fiske's words, "One cannot define without implying distinctions, and the verification of these distinctions is an important part of the validation process" (1959, p. 84). For example, a lack of association between a test of aptitude and a test of specific and relevant knowledge enables the investigator to draw more precise inferences about the construct measured by the test. Or, an absence of systematic variance due to method-trait interaction may help the investigator draw the inference that the construct does not incorporate certain response set factors.

Selection of comparison traits and methods to fill out the matrix is, therefore, the key to any construct validity inferences which could emerge from the analysis. The role of each trait and method of measurement in the nomological network must be identified beforehand. Any interpretation of a multitrait-multimethod matrix with respect to construct validity (as opposed to confidence in the precision of one's measures) should, in this author's opinion, be carefully and critically examined. Convergent validity, in particular, may shed little light on the nature of the construct being studied.

Nothing said above, however, is intended to convey the impression that the multitrait-multimethod approach is anything other than an extremely important contribution to the logic and procedures of test and scale evaluation. Campbell and Fiske have made an extremely strong argument in support of multiple operationalism under the rubric of convergent validation: "Any single observation, as representative of concepts, is equivocal... the addition of a second viewpoint...greatly reduces this equivocality, greatly limits the constructs that could jointly account for both sets of data" (1959, p. 101). In addition, the procedures advocated by Campbell and Fiske provide a means of evaluating unwanted method variance, which may be an extremely important step in consumer research due to the customary use of pencil and paper instruments in standard response formats.

Establishing scale validity should not be thought of as a one-step task but an on-going process, and this is particularly true of construct validity. It is encouraging to see the greater concern the field is evidencing with respect to the definition and measurement of key constructs. Such concern appears on the threshold of being reflected in higher standards applied to measurement.

THE PAPERS

[Before going any further, I would like very much to applaud the impressive work by these four authors in carrying out an in-depth assessment of attitude measurement procedures. We need much more validity-oriented work in consumer behavior, and I believe it is high time the field raised its standards for both carrying out and reporting analyses of the validity and reliability of the measures relied on in research studies.]

Both sets of authors follow the Campbell and Fiske approach and look upon the multitrait-multimethod approach as a test of construct validity. While nomological validity is mentioned in both papers, only the Bagozzi and Burnkrant paper actually attempts to specify part of the theoretical structure in which the construct "attitude" is embedded and to examine part of that structure.

What can we learn about construct validity from these papers? The John and Reve paper offers an assessment of alternative ways of analyzing data in a multitrait-multi- method matrix. This should be a helpful addition to methodological discussions seeking to refine analytical procedures in this area. The paper is limited by its substantive reliance on Ostrom's often analyzed data base, in the sense that any inferences to be made are constrained by the trait and method input to this matrix. There is no effort here to look at alternative conceptions of the attitude construct within a theoretical structure. The data, simply put, enable investigators to ask the question, "In the context of the methods used, to what degree are these components interrelated?" Making the remarkable assumption that each scale is a valid and reliable measure of the dimension it is intended to represent (i.e., either the cognitive, affective, or conative dimension), we are left without a means of evaluating the adequacy of any one or combination of these as a measure of attitude. Even if we were to conclude (contrary to the apparently more powerful analytical methods) that these were separately identifiable dimensions, on what basis do we conclude that each is a necessary component of attitude? Knowing that the dimensions are in some ways different from one another and can be identified as such using different methods of measurement is useful information, but what role, if any, each serves in identifying this construct and not something else is yet to be determined. One does not know that a valid measure of a construct exists until the scores obtained are shown to be consistent with predictions made about the covariation in observables tied by theory to the construct.

To clarify this, let's take a simple example. Say a theory is proposed linking two observables to a hypothetical construct. We'll term the observables S and R and the hypothetical construct O. Investigator 1 states that O is a unidimensional construct, while investigator 2 states that, in reality, O is made up of O1, O2 and O3. To prove this the second investigator develops multiple means of measuring each component, administers the series of scales to a group of subjects and shows that each component is completely separate from the other two for each and every method used. Which investigator is right? Well, we've learned that the second investigator has three pretty good and distinct scales, but we really don't know whether he's done any better job representing the construct. That evidence awaits a study in which a prediction involving the construct and one or both observables is tested using such scores. The more complex the theoretical network, of course, the greater the amount and diversity of evidence that is required.

The Bagozzi and Burnkrant paper suffers from similar deficiencies. They propose a 2 component model of attitudes and test this using data developed by Fishbein and Ajzen involving 5 different scaling methods, 2 of which appear to measure the affective and 3 of which the cognitive aspect of attitudes, together with a scaled behavioral intention measure (for part of the sample) and a scaled self-reported behavior measure (for the remaining subjects). Despite fairly reasonable intercorrelation among the five scales, the more rigorous structural equation technique used by the authors allows them to reject a single factor model in favor of their hypothesized 2 factor model, though the presence of an additional parameter in the two factor model needs somehow to be accounted for in evaluating the overall fit. To this point the authors claim only superior convergent validity for their model, stemming, it seems, from the (slightly) better fit with the measures partitioned into two groups.

Bagozzi and Burnkrant then propose to investigate the "nomological validity" of the two factor attitude measures, which (it has been argued earlier) is the essence of construct validity. Behavioral intention and self-reported behavior scores (approximately half the subjects for each) were used as the observables in their "nomological" network. Without question the concept behind this approach to construct validation is not only sound but represents a high water mark in consumer research. Procedurally, however, one must question the appropriateness of conceptualizing (to use Fishbein and Ajzen's words) "behavioral attitude scores" as a separate part of the nomological network. The three-component view of attitudes (as represented in the Ostrom data analyzed by John and Reve) suggests pushing behavioral intentions into the attitude construct itself, and the similarity among measurement methods (Guttman, Likert, Thurstone) used by Fishbein and Ajzen in developing scales for both attitudinal and behavioral elements may, in fact, tend to produce biases in the direction of consistency. I don't believe this is what Campbell and Fiske had in mind by "maximally different methods." In addition, even if maximally different, it is customary to treat both behavioral intentions and behavior as criterion variables in tests of concurrent or predictive validity. Therefore, it is not clear that the theory linking these to the construct is thought to be sufficiently well developed to term this a test of nomological validity. At best, this must be regarded as very weak evidence for nomological validity--far too weak to support the strong conclusions reached by the authors. With respect to the evidence, no comparison with the more parsimonious one factor model is offered. The question as to how much is being added by going to a two component model is worth addressing.

ATTITUDE CONSTRUCT VALIDITY: SOME ISSUES

The preceding discussion of construct validity and the authors' focus on the attitude construct raised several substantive and methodological issues which it might be helpful to explore.

Developing a Nomological Network

Objections were raised earlier to Bagozzi and Burnkrant's selection of scaled behavioral intention and behavior data to test theoretical predictions involving the attitude construct. Since this is a necessary step in establishing construct validity, what sorts of variables might be used? While attitude theory is not particularly well developed, it need not remain so. We might begin with a reexamination of Allport's classic definition of an attitude. [See McGuire (1969) for a related discussion and many helpful references.] An attitude was conceived to be a mental and neural state of readiness to respond, organized through experience and exerting a directive and/or dynamic influence on behavior.

If attitudes are a neural state, physiological variables (e.g., GSR, heart rate) might be used as observables to distinguish people who obtain high vs. low scores on proposed attitude measures, assuming of course that the attitude object used in the study is sufficiently arousing. Those holding rather extreme attitudes (e.g., extreme dislike of a minority group) should respond more intensely to a salience-increasing stimulus. Readiness to respond might, thus, be translated into response latency or intensity (Lott and Lott, 1968; Weiss, 1968). To the extent an attitude represents an organized cognitive structure, certain inference processes (i.e., based on evaluative consistency) may accompany favorable or unfavorable attitudes toward a person or object. Relationships among particular attitudes or attitudes and values might also be predictable on theoretical grounds. One might expect an attitude to exert a directive influence on certain perceptual processes (redefinition of a stimulus, distortion, encoding) as well as on the response side, which, at a minimum, could be measured by something other than the standard pencil and paper set of cognitive response formats (unobtrusive measures, choice behavior, multidimensional scaling).

This is, of course, a much abbreviated list. It is intended to be suggestive of a set of theory-driven relationships which, at present, await more comprehensive conceptualization of the nomological network in which the construct "attitude" is embedded. It is acknowledged that the theoretical links to any one variable are now fairly weak. What is probably needed, therefore, is far greater attention to theory development followed by an investigation of a set of relationships examining various aspects of the nomological network. At the very least, such an approach would guarantee much greater independence among measurement methods as suggested by the Campbell and Fiske paradigm.

Levels of Analysis

As discussed earlier, Bagozzi and Burnkrant proposed a two component attitude construct made up of cognitive and affective dimensions. They outlined a theoretical system in which these two dimensions impact directly on behavioral predispositions which, in turn, lead to overt behavior. From their discussion, it seems clear that the term "components" refers to what the authors believed to be a necessary partitioning of the attitude construct and not to a deterministic relationship spanning different psychological levels of analysis.

In discussing Fishbein and Ajzen's version of an expectancy-value model, John and Reve state that attitude "consists of" two components (these are mislabeled in the paper, but that is not crucial to this point), and then it is said that these components "determine one's attitude." Consisting of and determining are not the same thing. This highlights the importance of specifying the level of analysis at which the construct is being defined or explained and not attempting to choose among levels in making construct validity inferences. Cook and Campbell (1976) make the point this way in discussing construct validity in reference to "threats" to the proper labeling of cause and effect in experiments. Such "threats" produce "confounding" in the sense that, "cause and effect can be construed in terms of more than one construct, all of which are stated at the same level of reduction...The reference to the level of reduction is important because it is always possible to 'translate' sociological terms into psychological terms, or psychological terms into biological terms" (1976, p. 238).

Similarly, it is possible to "translate" the construct "attitude" into lower-level constructs which are theorized to be its building blocks. Fishbein and Ajzen define attitude as "a learned predisposition to respond in a consistently favorable or unfavorable manner with respect to a given object" (1975, p. 6). Consistent with this definition Fishbein and Ajzen suggest that attitude "should be measured by a procedure which locates the subject on a bipolar affective or evaluative dimension...'' (1975, p. 11). They add, "Beliefs are the fundamental building blocks in our conceptual structure...a person's attitude toward an object is based on his salient beliefs about that object" (1975, p. 14). Accordingly, they describe the relationship between the set of lower-level beliefs and attitude in terms of an expectancy-value model which specifies both the types of beliefs (i.e., beliefs about the object's association with attributes and consequences and beliefs about the evaluation of the attributes and consequences) and the functional relationship among such beliefs.

It is, of course, meaningful to evaluate attitude measures which vary by level of reduction in terms of criterion-related validity. Different levels of measurement maybe appropriate depending upon the criterion variable and the purpose of the investigation (e.g., prediction, diagnosticity).

Parsimony vs. Completeness

Parsimony has long been regarded as a virtue in theory building; unnecessary constructs and too cumbersome theoretical structures to be replaced by simpler formulations whenever possible. One must wonder, therefore, whether a multicomponent model of attitudes that fits ever so slightly better is to be preferred over a simpler model. Following his extensive analysis of the theoretical and empirical literature dealing with attitudes, McGuire concluded: "Given the less than perfect state of our measuring procedures, the three components have proven to be so highly intercorrelated that theorists who insist on distinguishing them should bear the burden of proving that the distinction is worthwhile" (1969, p. 157). Despite the development of more finely tuned statistical methods for partitioning matrices and accounting for variance, I wonder if that conclusion is any less valid today. The key word may be "worthwhile".

I differentiate in this section between constructs and theory, on the one hand, and applied research on the other. The goals need not be the same. In the latter case, completeness (in the sense of measuring all the variables one believes will prove useful) is much to be valued. It may be no coincidence that many of the social psychologists who are identified with the multi-component view of attitude were keenly interested in topical issues such as prejudice, attitudes toward the war, political ideology, strategies of persuasion and the like. In seeking to adequately describe the phenomena of interest, it wasn't enough to develop a unidimensional measure of affect, and more elaborate building block models (e.g., information processing, expectancy-value approaches) were still some years away. As a result, not only did researchers (often using survey-type questionnaires) want to know what people knew about the issue, how they felt about it and what types of action they were prepared to take, but also people's intensity of feeling, interconnectedness among beliefs, differentiation of beliefs, relationship to central values, etc. In following this approach a key question is, "What is worth measuring?" The answer is normally decided on criterion-related rather than theoretical grounds.

It may be instructive to contrast the above orientation--which leads to a particularly rich and complete treatment of a construct--with an exceedingly parsimonious approach. Robert Wyer sees an attitude as simply another belief, with no fundamental difference between one belief or another: "A subject's reported attitude toward an object is interpretable in terms of his judgment of the object's membership in a cognitive category" (1974, p. 24). In other words, having an attitude that bank robbers are "bad" is nothing more than a relationship between membership in the category "bank robbers" and membership in the category "bad". Wyer is expressing a preference for laws of behavior that avoid an Aristotelian emphasis on surface characteristics in favor of a Galileian focus on underlying cognitive process. This emphasis is worth thinking about in approaching the subject of construct validity.

REFERENCES

Standards For Educational and Psychological Tests, (1974), American Psychological Association, p. 29, Washington, D.C.

D. T. Campbell and D. W. Fiske, "Convergent and Discriminant Validation by the Multitrait-Multimethod Matrix," Psychological Bulletin, 56 (March 1959), 81-105.

T. D. Cook and D. T. Campbell, "The Design and Conduct of Quasi-Experiments and True Experiments in Field Settings," in M.D. Dunnette (ed.), Handbook of Industrial and Organizational Psychology (Chicago: Rand McNally, 1976), 223-326.

L. J. Cronbach and P. E. Meehl, "Construct Validity in Psychological Tests," Psychological Bulletin, 52 (July 1955), 281-302.

M. Fishbein and I. Ajzen, Belief, Attitude, Intention and Behavior: An Introduction to Theory and Research, Reading, Mass. Addison, Wesley (1975).

A. J. Lott and B. E. Lott, "A Learning Theory Approach to Interpersonal Attitudes," in A. G. Greenwald, T. C. Brock and T. M. Ostrom (eds.) Psychological Foundations of Attitudes, (New York: Academic Press, 1968).

W. J. McGuire, "The Nature of Attitudes and Attitude Change," in G. Lindzey and E. Aronson (eds.) The Handbook of Social Psychology, (Reading, Mass., Addison Wesley, 1969).

R. F. Weiss, "An Extension of Hullian Learning Theory to Persuasive Communication," in A. G. Greenwald, T. C. Brock and T. M. Ostrom (eds.) Psychological Foundations of Attitudes, (New York: Academic Press, 1968).

R. S. Wyer, Jr., Cognitive Organization and Change: An Information Processing Approach, (New York: John Wiley, 1974).

----------------------------------------