Reliability and Validity in Consumer Research: Where Do Ww Go From Here?

Stephen A. LaTour, Northwestern University
ABSTRACT - The papers in this session indicate that substantial progress has been made in the application of psychometric techniques to assure the reliability of measures used in consumer research. There is also evidence of greater attention to validity, although additional progress is necessary, particularly with respect to discriminant validity. In addition, it is argued that excessive attention to reliability issues may be resulting in excessively long and unwieldy measurement scales.
[ to cite ]:
Stephen A. LaTour (1983) ,"Reliability and Validity in Consumer Research: Where Do Ww Go From Here?", in NA - Advances in Consumer Research Volume 10, eds. Richard P. Bagozzi and Alice M. Tybout, Ann Abor, MI : Association for Consumer Research, Pages: 696-698.

Advances in Consumer Research Volume 10, 1983      Pages 696-698

RELIABILITY AND VALIDITY IN CONSUMER RESEARCH: WHERE DO WW GO FROM HERE?

Stephen A. LaTour, Northwestern University

ABSTRACT -

The papers in this session indicate that substantial progress has been made in the application of psychometric techniques to assure the reliability of measures used in consumer research. There is also evidence of greater attention to validity, although additional progress is necessary, particularly with respect to discriminant validity. In addition, it is argued that excessive attention to reliability issues may be resulting in excessively long and unwieldy measurement scales.

INTRODUCTION

In 1979 a special issue of the Journal of Marketing Research appeared in which various authors bemoaned the lack of attention to reliability and validity issues in marketing and consumer research. The papers presented at this session are a testament to the fact that substantial progress has been made since the appearance of that issue. For the most part, all of the papers do an excellent job of determining the reliability of the measures studied. They vary, however, in the quality of their assessment of the validity of measures. Presented below are comments about each paper. This is followed by a discussion of reliability and validity issues requiring further attention.

ANDERSON'S PAPER

Anderson's paper is the most original of the four papers but it is also the most problematic. Although he doesn't approach the issue of consistency between private and public judgments in this fashion, it's my view that when he discusses consistency he is fundamentally concerned about construct validity. At its heart, construct validity is concerned with the problem of what is being measured and a concomitant concern that the researcher may be measuring something other than what he or she thinks is being measured. This is basically a concern about confounding. Anderson is concerned that while public judgments include a dimension related to private judgments, they may be confounded with other dimensions such as social desirability. Anderson's concerns about accuracy the convergence between public judgments and reality or behavior - really implies a concern about predictive validity. For example, can a consumer's attitude about a product be used to accurately predict purchase behavior?

The major contribution of this paper lies in its discussion of the consistency issue. This is not to detract from the accuracy issue, but the kinds of accuracy issues studied in the paper are not of interest to most consumer researchers. In fact, there is a fundamental difference between testing the accuracy of a person's statement about the distance to the moon versus the examination of the ability of consumers' judgments about products to predict behavior. It is the latter kind of accuracy that is of interest to most consumer researchers.

The beauty of Anderson's paper is that he attempts to demonstrate an understanding of the potential confounds involved in public judgments by using various methods designed to eliminate them. Such an experimental approach is extremely powerful.

There are, however, several methodological issues that need attention and which detract from the ability to make inferences from this research. First, the decision to score inconsistency using a dichotomous scoring rule on the basis of pre-test results may have been premature given that the pretest sample must have been quite small (the study itself has only 23 participants). Second, the kinds of judgments involved in this study are not the kinds that should logically induce evaluation apprehension (and in turn result in inconsistency). Thus it is not surprising that there is no effect of anonymity in the incentive absent condition. The psychological literature concerning the effects of anonymity on judgments generally does show that anonymity produces different results than when the respondent is known and there is some social desirability present. And, these studies have not involved incentives. This provides further support for the argument that the judgments in this study simply do not involve social desirability. But, if consumer judgments involving social desirability were made, there probably would be an effect in the incentive absent conditions.

Anderson's logic about the reason for inconsistency in the accountable/incentive present condition may be correct. It is here that public accountability and concerns about winning the incentive may cause one to shift in going from private to public judgments. But why should this occur? This is not the typical kind of situation where there is anything to gain from hiding one's true beliefs there is inherently socially desirable response. Perhaps it is simply a matter of heightened nervousness resulting in a last minute change in judgment because of a concern about the accuracy of one's first guesses about the correct answer. There is one other possible explanation, however. The number of respondents is so small that there could have been a failure of randomization. The fact that there is high variance in the cell that is different from all of the other cells lends support to the possibility that a few respondents who were quite different from the other respondents happened to be assigned to this condition, producing the observed effect.

While these methodological flaws are serious, Anderson's approach has merit and a further study with measurements involving social desirability confounding would be in order.

PRICE AND RIDGWAY'S PAPER

The paper by Price and Ridgway is more of a traditional scale validation study. It does a nice job of assessing the reliability of an individual difference measure relevant for consumer research use innovativeness. The authors used an excellent iterative procedure using measures of internal consistency and factor analysis to develop several subscales tapping different dimensions of the overall use innovativeness construct

Evidence for the validity of the scale is more problematic. There is good evidence for content validity since the authors used independent judges in constructing scale items. Construct validity is supported by the fact that the factor structure is generally as predicted by the authors. Criterion validity, as the authors note, is not strong, but it isn't as poor as the authors suggest. The problem is that the authors have handicapped themselves by using variance explained as an indication of criterion validity. Using the mean difference between groups is actually a better measure of effect size (LaTour 1981) and given that the mean difference between those low and high in use innovativeness averages around 20% of scale range, there is evidence for criterion validity. Criterion validity might have been further enhanced if the scale had been designed to measure use innovativeness for the specific instance of hand calculators. People who are use innovators for one type of product may not be use innovators for other kinds of products. Thus one would never expect a general use innovativeness scale to strongly differentiate the behaviors of those scoring high on the scale from those scoring low on the scale for a given product. Many psychologists (e.g. Mischel 1968) have in fact argued that personality is situation specific.

There is a need for more attention to convergent validity in this paper. That is, does a different measure of the construct correlate highly with the measure developed by the authors. For example, one could develop behavioral measures of use innovativeness and determine whether they correlate with the paper and pencil test developed by the authors. The authors' validation study almost qualifies as a study of convergent validity but some of the behaviors that are measured are too removed from use innovativeness per se. For example, frequency of use and length of use at a sitting are not on the face of it behavioral use innovativeness measures. The calculator use patterns measure perhaps comes closest to providing a desirable behavioral measure that provides evidence for convergent validity.

Discriminant validity is also needs attention. This is particularly crucial for establishing construct validity since use innovativeness may not be differentiable from other constructs such as intelligence. Evidence for discriminant validity is particularly important, for otherwise researchers may be studying the same constructs but think they are addressing different ones. This problem plagued personality psychologists who studied authoritarianism for many years only to discover that it was highly correlated with educational level.

In the long run it will be important to provide evidence for the nomological validity of this measure. That is, the authors should ultimately derive theoretically expected relationships between this measure and other variables and conduct empirical research to test the hypotheses. For example, one might predict that provision of information in an advertisement about other uses for a product would be responded to differentially by those high and low in use innovativeness. If this in fact occurred, there would be evidence for nomological validity.

LEIGH'S PAPER

Leigh's paper involves a statistically sophisticated examination of the reliability and validity of a measure of information source usage. The use of a confirmatory factor analytic method was particularly impressive, but there are still a few issues that deserve attention. For example, the measure still has a potential reliability problem inherent in the difference between reported information seeking and actual information seeking caused by memory error. This-type of reliability problem is not well addressed by internal consistency measures and the related factor analytic study.

The factor analysis, as the author notes was unfortunately inconclusive. The study might have been salvaged, however, by first conducting a developmental factor analytic study with half of the sample and using the other half of the sample for validation with the confirmatory method. The sample size was sufficiently large to allow for this procedure .

Evidence for the construct validity or the measure provided by the factor analytic procedure demonstrating convergence among subscales, hut there is still a need to establish convergent validity for the scale as a whole by correlating it with another measure of the construct. In addition, discriminant validity should be established. In particular, does informational source usage differ from educational level or intelligence? It may very well be that those who use more sources are simply more educated, knowledgeable individuals who more likely to be aware of the sources of help available to them.

BAMOSSY, SCAMMON, AND JOHNSTON'S PAPER

This is an excellent paper and little needs to be said about it. The reliability assessment was excellent. There is evidence for criterion validity because the measure does divide people with different a priori expected levels of development into appropriate groups. In addition, there is reasonably convincing evidence for construct validity. The measure correlates with a theoretically similar measure - the Cognitive Integration Index Test - but it is discriminably different from that test in that social class is correlated with this test and not with the Cognitive Integration Test. The only problem with this approach is that it depends upon observed relationships with one other measure to provide evidence for both convergent and discriminant validity. It would have been better to develop a separate measure of aesthetic judgment in order to provide the evidence for convergent validity.

CONCLUSIONS

Most of these papers pay a good deal of attention to the reliability problem. While this is an important issue (and one indeed needs reliable measures in order to have valid measures), we may have become so obsessed with reliability issues that we are engaging in overkill to assure reliability. That is, we develop scales with perhaps too many items to assure ourselves that we are accurately sampling the universe of possible wordings in order to obtain reliable scales. This means that we often end up with unwieldy and time consuming measures. The key to a reliable and efficient measure is to neither over nor under sample the universe of construct operationalizations. Often we oversample and it may be desirable to attempt further reduction of scale items in order to reduce our scales to more manageable proportions. In many instances a single measure may be quite sufficient. For example, in the attitude area, any one measure from the semantic differential will be almost as reliable as the whole scale. The same is true for likelihood measures of belief and behavioral intentions such as those used in Fishbein's model (c.f. Fishbein and Ajzen 1975).

It is also apparent that greater attention needs to be paid in consumer research to the issue of discriminant validity,. All too often researchers theorize about a supposedly new construct which is actually a new name for a construct that other researchers, sometimes in other fields, have already been addressing. Elsewhere (LaTour and Peat 1979) I have argued, for example, that consumer satisfaction researchers have not established the discriminant validity of satisfaction vis a vis attitude. Each seems to be conceptualized as an affective response to some object or experience. The difference is that attitude researchers in the consumer arena usually focus on affective responses prior to purchase whereas consumer satisfaction researchers focus on affective responses after the purchase. The two therefore differ in terms of the timing of the measurement or the specificity of the object or experience, but it is essentially the same construct that is being tapped. Unfortunately, because the different sets of researchers are using a different vocabulary to address the same basic construct, they rarely talk to one another and there is therefore less theoretical progress than might otherwise be made. Perhaps there is a basic human desire to be different that leads to the lack of attention to discriminant validity. Willingness to assure oneself that a measure taps a truly unique construct is essential, however, if rapid progress is to be made in understanding consumer behavior.

REFERENCES

Anderson, J.V. (1983), "Subject Impressionism: A Methodology for Capturing Private Judgments and Measuring Inconsistency," in R. Bagozzi and A. Tybout (Eds.) Advances in Consumer Research, Vol.10.

Bamossy, G., Scammon, D., and Johnston, M. (1983), -A Preliminary Investigation of the Reliability and Validity of an Aesthetic Judgment Test," in R. Bagozzi and A. Tybout (Eds.) Advances in Consumer Research. Vol .10.

Fishbein, M. and Ajzen, I. (1975). Belief, Attitude, Intention, and Behavior, Reading, Mass. Addison-Wesley.

LaTour, S.A. (1981), 'Variance Explained: It Measures Neither Effect Size Nor Importance," Decision Sciences, 12, 150-160.

LaTour, S.A. and Peat, M.C. ( 1979), "Conceptual and Methodological Issues in Consumer Satisfaction Research," in W. Wilkie (Ed.), Advances in Consumer Research, Vol.6.

Leigh, James (1983), "Reliability and Validity Assessment of Patterns of Information Source Usage," in R. Bagozzi and A. Tybout (Eds.) Advances in Consumer Research, Vol. 10.

Mischel, Walter (1968), Personality and Assessment, New York: John Wiley and Sons. Inc.

Price, L.L. and Ridgway, N. (1983), Development of a Scale to Measure Use Innovativeness, in R. Bagozzi and A. Tybout (Eds.) Advances in Consumer Research, Vol .10.

----------------------------------------