Reliability of Measured Beliefs in Consumer Research

Roger J. Best, University of Arizona
Del I. Hawkins, University of Oregon
Gerald Albaum, University of Oregon
ABSTRACT - Consumer beliefs are often measured with semantic scales and used as predictive or explanatory measures of behavior, preference, or intended behavior. The accuracy of research based on measured beliefs however, is in part dependent upon the reliability of these measurements.
[ to cite ]:
Roger J. Best, Del I. Hawkins, and Gerald Albaum (1977) ,"Reliability of Measured Beliefs in Consumer Research", in NA - Advances in Consumer Research Volume 04, eds. William D. Perreault, Jr., Atlanta, GA : Association for Consumer Research, Pages: 19-23.

Advances in Consumer Research Volume 4, 1977    Pages 19-23

RELIABILITY OF MEASURED BELIEFS IN CONSUMER RESEARCH

Roger J. Best, University of Arizona

Del I. Hawkins, University of Oregon

Gerald Albaum, University of Oregon

ABSTRACT -

Consumer beliefs are often measured with semantic scales and used as predictive or explanatory measures of behavior, preference, or intended behavior. The accuracy of research based on measured beliefs however, is in part dependent upon the reliability of these measurements.

The reliability of consumer beliefs measured in this study were assessed with three alternative methods of inferring reliability. For five stimulus objects, measures of stability ranged from .42 to .61 while measures of coefficient alpha ranged from .56 to .65 and reliability inferred from analysis of variance ranged from .62 to .71. At these levels of reliability a consumer belief would have to be viewed within an interval of "2.3 scale values to be 95% confident that the consumer's true belief lies within that interval. Recognizing the importance of belief measurement in consumer research, the reliability of belief measurements should be in the range of .80 to .90 to provide a 95% confidence interval of "1 scale value around any given measure of belief.

INTRODUCTION

Consumer researchers have utilized measures of consumer beliefs in a variety of consumer studies. Typically, beliefs toward a particular stimulus object are measured with a six or seven interval bipolar semantic scale. Just looking at last year's conference, beliefs were used in studies of fashion innovation (Painter and Granzin, 1975), comparative advertising (Wilson, 1975; Golden, 1975), multi-attribute attitude models (Lutz, 1975; Aholta, 1975), and attribution (Mizerski, 1975).

In most cases beliefs are used as explanatory or predictor variables to discern the strength of a statistical association between beliefs and some criterion. However, the measurement error associated with belief measurements is rarely reported in most studies of consumer research (Jacoby, 1975). Recognizing the importance of consumer beliefs and reliability in the study of consumer behavior, the purpose of this paper is to assess the reliability of commonly derived measures of consumer beliefs and assess the adequacy of observed measures of reliability in terms of substantive consumer research.

MEASURES OF RELIABILITY

Reliability connotes dependability, stability, consistency, predictability and accuracy (Kerlinger, 1973). Though there are a variety of methodologies used to measure reliability, this study investigates three measures of reliability commonly used in consumer research and related psycho-social sciences.

Stability

Reliability evaluated by correlating beliefs measured across time is called a measure of stability or test-retest reliability (Bohrnstedt, 1970). The test-retest approach measures the same stimulus objects again and again with the same instrument and respondents. The stability of measured beliefs is inferred by the extent to which the measurements produce the same or similar results over a specified time interval. In this way reliability is measured by the correlation (r) between measurements observed during the test (t1) and retest (t2). Therefore:

Reliability = rt1 . t2  (1)

There are some obvious problems with test-retest reliability. One, different results may occur depending on the length of time between measurement and remeasurement. The longer the time interval, the lower the reliability (Bohrnstedt, 1970). A second problem with test-retest reliability estimates is that a consumer's true beliefs have a greater probability of actually changing the longer the time interval between test and retest. Heise (1969) has shown that with three observations across time one can distinguish change from unreliability if the intervals between measurements are the same and if it can be assumed that the errors in measurement are not correlated across time. Another problem a researcher must face when using any test-retest procedure has to do with the problem of reactivity (Campbell and Stanley, 1963). Reactivity refers to the fact that a respondent's sensitivity or responsiveness to the belief under study may be enhanced by the measurement of that belief.

Measures of Equivalence

Because of the problems inherent in the test-retest approach to estimating reliability, many researchers have relied on measures of equivalence to infer reliability.

It is assumed that when several consumer beliefs are summed into a simple measure of a particular scale or construct, the items are measuring the same underlying phenomenon. For example, in terms of attitude, each belief can be thought of as a measure of the attitude (Bohrnstedt, 1970). Reliability estimates which measure the equivalence of each belief as an indicator of the underlying phenomenon are called measures of equivalence.

The earliest variety of equivalence measures were the split-half methods. In the split-half approach, the total number of scale items are randomly divided into two halves and summated scores derived from each half of the test are correlated to get an estimate of reliability. However, this approach to estimating reliability has fallen into disuse (Bohrnstedt, 1970) since split-halves are far from equivalent halves and many replications of the split-half measurement would have to be performed before one could accurately infer an expected value for this measure of reliability.

A second approach which has become more popular (Bohrnstedt, 1970) is that of coefficient alpha (Cronbach, 1951). This approach to internal consistency examines the variance-covariance properties of all scaled beliefs simultaneously rather than in any particular or arbitrary split as in the method of split-half. This method assumes that each belief or scale item has an equivalent belief or scale item which exactly parallels it. In this case, a measure of equivalence is inferred from the underlying variance-covariance properties that can be obtained from a group of consumers expressing their beliefs toward a stimulus on K-semantic scales. This measure of reliability is computed in the following way:

EQUATION   (2)

where:

a = reliability estimate

si2= variance of ith belief

sij = covariance between the ith and jth beliefs

K = number of beliefs (i.e., semantic scales)

For example the upper half variance-covariance matrix shown in Table 1 helps illustrate the nature of this computation. The diagonal of Table 1 is summed to compute si2while the upper half of matrix is summed to compute sij. Since a variance-covariance matrix is symmetric, multiplying the computed covariance (sij) by two provides a sum of all covariance terms. In this way the reliability of data expressed in Table 1 can be shown as:

EQUATION   (3)

TABLE 1

VARIANCE-COVARIANCE MATRIX OF 9 BELIEF MEASUREMENTS

Reliability can also be defined through error: "the more error, the greater the unreliability; the less error, the greater the reliability" (Kerlinger, 1973). This approach to measuring reliability was originated by Hoyt (1941) but has been less popular among consumer researchers. A belief observed in this framework is the result of several effects:

Xij = pi + aj + eij   (4)

where:

Xij = the ith consumer's response to the jth belief

pi = true magnitude of the jth belief held by the ith respondent

aj = anchor effect of the jth semantic scale

eij = random error

i = 1, 2, ... N respondents

j = 1, 2, ... K beliefs.

Reliability within this framework can be partitioned into alternative sources of variance that can be attributed to scale effects, individual differences, the true magnitude of measured beliefs, and random error. (For a detailed mathematical description see Winer, 1971). Utilizing this method of decomposition reliability can be estimated from the results of analysis of variance in the following way (Kerlinger, 1973):

EQUATION   (5)

To better illustrate this approach to inferring reliability a set of hypothetical data is presented in Table 2. In each case five hypothetical respondents have expressed their beliefs toward a given stimulus object using four, 6-interval semantic scales. For each set of semantic scale measurements a two-way analysis of variance was performed to isolate alternative sources of variation present in the observed responses. These results along with measures of reliability are also shown in Table 2. Though the variance attributed to scale effects was the same in both measurements, the error variance observed in data set II was sufficiently large in comparison to individual differences so as to reduce the reliability of the measurement from .92 to .45.

TABLE 2

VARIANCE APPROACH TO MEASURING THE RELIABILITY OF TWO SETS OF CONSUMER BELIEFS

Reliability = 1 - 2.60/4.70 = .45

Each of these measures of reliability focuses on a different aspect of the belief measurement and reliability. The remainder of this paper is devoted to an empirical analysis of typical belief measurements using these three methodological measures of reliability.

DATA

Eighty-four adult female shoppers from four local church groups completed a questionnaire concerning shoppers' attitudes. Each shopper received a self-administered questionnaire which instructed the respondent to rate her beliefs toward five department stores using 10, 6-interval semantic scales. The 10 scales were derived from a much larger set following a pretest. The five department stores selected as stimulus objects were well-known to the respondents and represented a wide range of store images. The stores included two high quality regional chains, two medium quality national chains, and one lower quality national chain.

The same questionnaire was mailed to the 84 respondents 10 days after completion of the first questionnaire. The test and retest produced 70 complete sets of questionnaires.

ANALYSIS AND RESULTS

Only those 70 respondents that completed both the test and retest questionnaires were included in the analysis. The store attributes used in the study and the mean responses observed in the test and retest measurements are shown in Table 3 for each stimulus object. The correlation between mean responses is shown to vary from .97 to .98 across the five stimulus objects with the median correlation equal to .97. Therefore, at the aggregate level these results suggest relatively stable, reproducible measurements.

At the individual level test-retest stability was evaluated for each respondent by correlating belief ratings expressed toward each store in the test and retest measurements. The average individual test-retest correlations are shown in Table 4 for each stimulus object. The average correlations ranged from .42 to .61.

The measured beliefs observed for each stimulus object were next decomposed into alternative sources of variance in order to measure reliability using the analysis of variance approach. This method of estimating reliability produced estimates that ranged from .62 to .71 for the five stimulus objects shown in Table 4.

TABLE 3

MEAN RESPONSES FOR TEST-RETEST MEASUREMENTS

TABLE 4

.MEASURES OF STABILITY AND INTERNAL CONSISTENCY

Utilizing the la by 10 variance-covariance matrix produced by these 70 shoppers for each stimulus object, estimates of coefficient alpha were assessed and reported in Table 4. These measures of reliability varied from .56 to .65 across the five stimulus objects.

To obtain a more precise idea of what these levels of reliability mean in terms of stochastic response Figure 1 was constructed. Scaled along the ordinate axis is individual test-retest reliability; scaled along the abscissa is the root-mean-square obtained from individual measures of beliefs observed in the test and retest. At one extreme when the reliability is perfect and equal to 1.0, the error and root-mean-square associated with stochastic response is zero. At the other extreme when the test-retest correlation is equal to -1.0, stochastic response on a 6-interval semantic scale is maximum and the root-mean-square associated with this level of stochastic response is equal to 5. Between these two extremes lies a plot of the relationship between reliability and root-mean-square observed in this study. Therefore, when the reliability of measured beliefs is equal to .60 the variability of this measurement on a semantic scale is inferred by a root-mean-square or standard deviation of approximately 1.15.

DISCUSSION

Reliability of measured beliefs was quite high when measured at the aggregate level. The median correlation used to measure aggregate reliability was equal to .97. However, belief reliability at the individual level was much lower. At the respondent level average test-retest correlations ranged from .41 to .61 across five stimulus objects. Thus, the average individual test-retest reliability was almost one-half the reliability observed at the aggregate level.

FIGURE 1

THE RELATIONSHIP BETWEEN TEST-RETEST RELIABILITY AND THE ROOT-MEAN-SQUARE (RMS) OF THE TEST-RETEST MEASURES OF CONSUMER BELIEFS

A difference such as this is not uncommon in comparing individual and aggregate phenomena. For example, very static brand share behavior observed at the aggregate level of the marketplace can be contrasted with considerable brand switching at the individual level (Bass, 1974). Thus, in this study the stability or reproducibility of belief measurements at the aggregate level was not due to slight variations in response, but the result of a great deal of stochastic response and averaging which produced approximately the same measure of central tendency in both measurements.

Measures of coefficient alpha were also computed from beliefs associated with each of the five stimulus objects. In this case reliability as a measure of internal consistency varied from .56 to .65. Measures of reliability inferred from isolating alternative sources of variance in measured beliefs attributed to scale effects, individual differences, and error ranged from .62 to .71. In this study median measures of internal consistency (coefficient alpha = .64; ANOVA = .66) were significantly greater (p < .05) than a median measure of test-retest reliability (stability = .49). However, without a third measurement to isolate actual changes in beliefs over time (Heise, 1969), we can only suggest that measures of stability were lower because of some change in beliefs and perhaps the problem of reactivity (Campbell and Stanley, 1963).

Since the reliability of measured beliefs reported in this study appear low, one might argue that beliefs toward department stores are more stochastic in nature than beliefs toward brands of a familiar product class. This is not likely. Holmes (1974) utilized the same test-retest methodology to measure the stability of beliefs toward six familiar brands of beer and reported an overall stability equal to .56. In this study median stability was equal to .49. Furthermore, other measures of reliability converged on the same magnitudes of reliability. Coefficient alpha and the analysis of variance approach produced median measures of reliability equal to .64 and .66, respectively. Recognizing Holmes' measure of belief reliability and the three alternative measures of reliability presented in this study, there is some validity to the inference that the reliability of measured beliefs observed in this study is approximately .60.

If .60 is a reasonable estimate of reliability for measured beliefs, the question then becomes: is this level of reliability acceptable? If not, what is an acceptable level of reliability for measured beliefs? To answer the first question we need to refer to Figure 1. As shown, a reliability of .60 corresponds with a level of stochastic response which has a root-mean-square or standard deviation of approximately 1.15. That means on a 6-interval scale one would have to consider a range of approximately "2.3 scale values to be 95% confident that a true belief lies in that interval. Not many consumer researchers or policy makers could accept that level of individual variability. Therefore, in response to the second question, a more plausible range would be "1 scale values which translates into a root-mean-square or standard deviation of approximately .5 and a belief reliability between .80 and .90. That is, at reliability equal to approximately .85 the true measurement can be viewed with 95% confidence to lie in an interval "1 scale value from the observed belief.

The importance of reliability is easily overlooked. Without reliable measurement, however, the validity of many relationships and consumer-based decisions would have to be questioned. For example, in a familiar bi-variate relationship, how can the predictive accuracy of the multiattribute attitude model be determined if the reliability of belief measurements is .60? Furthermore, in a multivariate analysis what meaning can be given to a cluster of factor scores that were derived from a factor analysis of measured beliefs whose reliability was equal to .60? Bivariate or multivariate analysis, large measurement errors in the criterion or predictor variable will result in unstable estimates of parameters and a meaningless relationship.

Reliability is a necessary but not sufficient condition for good consumer research. There is no guarantee that high reliability will produce good research findings, but there cannot be good scientific results without reliability. Therefore, it is apparent that consumer researchers report levels of reliability as a matter of scientific procedure. Though retest measurements are often not available to measure stability, measures of internal consistency can be assessed using coefficient alpha or analysis of variance to infer the reliability of measured beliefs.

REFERENCES

O.T. Aholta, "Toward a Vector Model of Intentions," in B.B. Anderson, ed., Advances in Consumer Research, 3 (1975), 481-84.

F.M. Bass, "The Theory of Stochastic Preference and Brand Switching," Journal of Marketing Research, 11 (February, 1974), 1-20.

G.W. Bohrnstedt, "Reliability and Validity Assessment in Attitude Measurement," in Gene F. Supers, ed., Attitude Measurement (Chicago: Rand McNally and Co., 1970), 80-99.

D.T. Campbell and J.C. Stanley, Experimental and Quasi-Experimental Designs for Research (Chicago: Rand McNally and Co., 1963, 9).

L.J. Cronbach, "Coefficient Alpha and the Internal Structure of Tests," Psychometrica, 16 (1951), 297-334.

L.L. Golden, "Consumer Reactions to Comparative Advertising,'' in B.B. Anderson, ed., Advances in Consumer Research, 3 (1975), 63-67.

D.R. Heise, "Separating Reliability and Stability in Test-Retest Correlations," American Sociological Review, 34 (1969), 93-101.

C. Hoyt, "Test Reliability Estimated by Analysis of Variance,'' Psychometrika, 6 (1941), 153-60.

C. Holmes, "A Statistical Evaluation of Rating Scales," Journal of the Market Research Society, 16 (April, 1974), 87-107.

J. Jacoby, "Consumer Research: Telling It Like It Is," in B.B. Anderson, ed., Advances in Consumer Research, 3 (1975), 1-11.

F.N. Kerlinger, Foundations of Behavioral Research (New York: Holt, Rinehart and Winston, Inc., 1973), 445-52.

R.J. Lutz, "Conceptual and Operational Issues in the Extended Fishbein Model," in B.B. Anderson, ed., Advances in Consumer Research, 3 (1975), 469-76.

R. Mizerski, "An Investigation into the Differential Effects of Causally Simple and Complex Attributions," in B.B. Anderson, ed., Advances in Consumer Research, 3 (1975), 176-83.

J.J. Painter and K.L. Granzin, "Profiling the Male Fashion Innovator--Another Step," in B.B. Anderson, ed., Advances in Consumer Research, 3 (1975), 40-45.

M.J. Ryan and M.J. Etzel, "The Nature of Salient Outcomes and Referents in the Extended Model," in B.B. Anderson, ed., Advances in Consumer Research, 3 (1975), 1-11.

D.S. Tull and G.S. Albaum, Survey Research: A Decisional Approach (New York: Contest Educational Publishers, 1973), 96.

R.D. Wilson, "An Empirical Evaluation of Comparative Advertising Messages: Subjects' Responses on Perceptual Dimensions," in B.B. Anderson, ed., Advances in Consumer Research, 3 (1975), 1-11.

B.J. Wirier, Statistical Principles in Experimental Design (New York: McGraw-Hill Book Co., 1971), 283-97.

----------------------------------------