Citation:
Dan E. Schendel, William L. Wilkie, and John M. McCann (1971) ,"", in SV - Proceedings of the Second Annual Conference of the Association for Consumer Research, eds. David M. Gardner, College Park, MD : Association for Consumer Research, Pages: 404-416.
[Associate Professor of Industrial Administration, Assistant Professor of Industrial Administration. and Doctoral Candidate, respectively.] THE PROBLEM This research is concerned with potential problems involved in measuring the "importance" of various attributes of an attitude object. ["Importance" should be understood here to refer simply to some desire or need for the presence of a particular attribute in a product. Questions as to which of a set of attributes are most crucial for a purchase decision are more complex issues ("saliency" or "determinism" [See 6,7] which might use "importance" measurement as one input.] The measurement of attribute importance is a critical step in at least two major thrusts of recent research in consumer behavior- studies of attitude structure (e.g., applications of a Fishbein-type model [See 1,2,3,5,9]) and studies of the product stream of market segmentation [See 4}0]. In studies of attitude structure this measure is used to individualize the weights of various attributes and thus theoretically improve the raw evaluative measures obtained from the same respondent. In segmentation research these importance measures are used (after certain refinements) to aggregate respondents into relatively homogeneous segments with respect to product benefits most sought; these segments are then tested for differences in purchase behavior, brand predispositions, enduring characteristics, and susceptibility to different advertising appeals. The importance of this measure is clear, yet little empirical study of possible biases inherent in commonly used instruments has been reported. Because importance weights are typically used as a basis for further analysis (not as end measures), results from a single paper and pencil instrument are particularly difficult to evaluate. There are, however, two major questions which should be considered in utilizing these weights: 1. Are the "attributes" themselves sufficiently meaningful and inclusive? This can be termed the "attribute generation problem. 2. Are the importance scores obtained for these attributes "true scores" in the sense of adequately reflecting the strength of desire for any given attribute by the respondent? This can be termed the "attribute measurement problem." This paper present, findings related to the second of these problems (i.e., measurement). It will be assumed that a relatively complete and unambiguous set of product characteristics has been presented to the respondent. Typical methods for obtaining measures of attribute importance include a dichotomous scale (Important - Not Important) for each attribute, rank ordering of the attributes, gradient scales (e.g., 1 -6) for each attribute, and point assignments from a common sum for each attribute. Levels of measurement desired may thus range from nominal through interval scales. The particular method to be chosen for a given study should be a function of both the ability of the respondent to make the level of judgement required and of the levels of analysis to follow these measurements. Market surveys attempting to simply delimit those product characteristics desired by most consumers will not require the detail demanded by models testing attitudinal structure, for example, and may be done faster and more easily by the respondent. Both the survey researcher and the theoretican are implicitly assuming, however, that results would not differ if a different instrument were used. This study investigates the extent to which this assumption of "true" measurement might hold for two frequently purchased consumer products. Five instruments combining four levels of measurement over a number of attributes are examined for the kind of stability which would give support to the practice of arbitrarily choosing one instrument upon which to base an entire study. A short discussion of research methodology will be followed by presentation of results and several key conclusions arising from the study. METHODOLOGY The study began by considering issues on choice of product classes to study. Literature search was followed by a series of openended discussions with small groups selected to be representative of the population to be studied. These discussions were taped and the records content-analyzed to provide both a basis for decision as to product classes and a start toward generation of attributes for these classes. Shampoos and deodorants were selected as the products of interest, and a number of attributes for each class were isolated (17 for deodorants and 20 for shampoos). [See Tables 3a and 3b for lists of these attributes.] These attributes were pretested for clarity, then run with 250 college juniors and seniors. Five forms of a questionnaire for each product class were randomly assigned to the 250 respondents -- 50 per form. Each instrument included one or more of the measurement methods discussed above. Table 1 summarizes the measures contained by instrument. SUMMARY OF MEASURES BY INSTRUMENT Version 1 used a six point scale with which the respondent rated each attribute independently as to its importance in making a purchase decision. Version 2 required that each attribute be rank ordered as to importance. In this version the determination of importance is not made independently for each attribute. Version 3 had the respondent allocate 100 points to the set of attributes. Versions 4 and 5 also required the respondent to allocate 100 points but only after the respondent first had performed another task. Version 4 used a Yes/No importance rating for each attribute while Version 5 used essentially Version l (1-6 scale). There are several natural comparisons of results. Versions l and 5a using the 1-6 scale are comparable except for the anticipation of further work for Version 5a. Similarly, versions 3, 4b, and 5b are similar in point allocations, although 4b and 5b follow performance on another task first. As expected, time to complete the questionnaire for both shampoo and deodorant was least for Version l (about 6 minutes) and increased monotonically to Version 5 (about 11 minutes). Little difficulty was evidenced for Versions l and 2, whereas the point assignments required by 3, 4, and 5 did lead to errors in addition or failure to complete the task by about 10% of those sampled. Additional respondents were used to increase the sample sizes for these versions to a level comparable to 1 and 2. A broad statement of the question underlying this work is: Do measures of "attribute importance" obtained from the several questionnaire versions differ? There are several dimensions to this overall question that are worth consideration. Among these dimensions are: levels of measurement, individual versus aggregate results, task differences, and stresses on specific attributes versus patterns across attributes. As noted earlier, measures obtained in this study range from nominal (Version 5a - Yes/No) to interval (Versions 3, 4b, and 5b). To compare results across these levels of measurements requires that higher, interval levels of measurement be diluted in order to test against weaker, ordinal levels. Tests usable with ordinal level data are required for this purpose. There are various uses to which information of the type gathered here would be put. Some uses require aggregation of results, as when individual attributes are of major concern, while other uses require that individual respondent vectors be weighted differentially. For this reason it is important that both individual and aggregated results be tested for differences. Each questionnaire version requires the respondent to perform a somewhat different task. These task differences are viewed here as different contexts under which similar measures are to be generated. Versions 3, 4, and 5,for example, all yield point allocations, but differ in the tasks completed prior to the point assignment. These task differences may result in differences in outcomes. Finally, if differences emerge it would be useful to trace such differences, if possible, to specific attributes or to patterns of attribute evaluations made by respondents. This study will attempt to treat the issue of whether observed differences are due to comprehensive tendencies or to relatively few key attributes. The next section will give the results obtained from the study and will move across these four major dimensions of the problem. The results are generally organized around levels of measurement proceeding from lower levels to higher levels of measurement. Within each level results on aggregation, task differences and attribute patterns are presented as appropriate. Finally, some overall results based on factor analysis across all questionnaire versions are given. RESULTS Number of Important Attributes Every one of the attributes for both product classes was termed "important" to some degree by at least a few respondents in the total sample. By defining "important" as 2 or greater on the gradient Scale of Versions l and 5a, as Yes on Version 4a, and as one or more points on Versions 3, 4b and 5b, the average number of "important" attributes per respondent was 9.5 for deodorants and 10.7 for shampoo. Thus, in both product classes, somewhat greater than half the possible number of attributes were termed important. One hypothesis of the study was that the results obtained from different measures and different tasks or contexts would yield differences in the number of attributes called ''important.l Tests of this question are presented in Table 2. VERSION DIFFERENCES: NUMBER OF ATTRIBUTES GIVEN POSITIVE IMPORTANCE SCORES The 1-6 gradient scale (Version l) compared to the Yes-No measure of importance (Version 4a) tends to shift answers from the "No" they would be given in Version 4a to some low level of Yes. The 1-6 gradient scale may therefore be able to achieve a finer distinction as to importance and apparently leads to a greater number of "important" attributes. Within the point allocation versions where the tasks or contexts varied across Versions 3, 4 and 5, it appears that the presence of a prior task affects the number of attributes receiving points. The nature of this effect seems to differ between the two product classes. Although it is not clear that Version 4a (Yes-No task) differs from Version 3 for the two products studied, it does appear that Version 5 with a 1-6 gradient scale task prior to point allocations does lead to a greater number of attributes given importance points. Ranking of Importance The rank order of importance can be obtained by aggregating responses to each version in a manner consistent with the level of measurement achieved. These aggregated rank orders of attributes are presented in Table 3 (a and b). It can be seen that the seven measures yield very similar results, with Kendall W = o.88 for shampoo and o.96 for deodorant. Extreme ranks, those attributes of greatest or least importance, are especially stable. RANK ORDER OF ATTRIBUTE IMPORTANCE AGGREGATED BY VERSION - SHAMPOO RANK ORDER OF ATTRIBUTE IMPORTANCE AGGREGATED BY VERSION - DEODORANT This high agreement of ranks across versions is interesting in that it appears that the researcher interested only in aggregate rankings of importance can obtain this data from any of these methods, and can thus use the simplest and fastest of them (Yes-No or 1-6). Interval or Ordinal Data? It is usually the case, however, that the researcher will desire at least an interval level of measurement as input to further analysis. Typical uses of the Fishbein model, vector representations, factor analysis, and other multivariate models commonly assume at least interval data, and this data is often obtained from some variant of the 1-6 rating scale. Analysis of Version 5 data can provide evidence regarding the scaling power of the 1-6 scale. Each attribute was first rated on the 1-6 scale (Version 5a) then given some number of points (Version 5b) by each respondent. Assuming that Version 5a's point assignments lead to an interval scale, these points can be related to their 1-6 rating to reconstruct a point scale underlying the 1-6 assignments. The results of this analysis are presented in Table 4. It can be seen that the mean intervals associated with the 1-6 ratings are not at all similar, but instead increase dramatically for high importance terms. An analysis of individual responses, moreover, showed that not one of the fifty respondents exhibited an interval scale for either product. Although about ten percent came close, these were more than balanced by some 20% of respondents who were not even monotonic in their point assignments. POINTS ASSOCIATED WITH 1-6 SCALE (ACROSS ATTRIBUTES - FROM VERSION 5) It seems, then, that the desirable characteristics of the 1-6 scale used in this study do not include interval measurement. In order to obtain interval measurement, it would be necessary either to perform a prior scaling study of the sort described by Myers and Warner[8], transform weights through a concurrent scaling study, or move to some version of point assignments. Point Allocations As it is often desirable to utilize a device which leads directly to an interval scale, point allocations have appeal. However, there is no single criterion that exists for the evaluation of this level of measurement. For the purposes here, the most straightforward evaluation method seemed to be to ask for point allocation with no prior task (Version 3). This version forms a baseline against which two variants with prior tasks could be evaluated: Version 4 which uses a Yes-No Judgment prior to point awards and Version 5 which asked for a 1-6 rating. As would be expected from the results presented in Table 3, there is no major effect on the rank order of means obtained from Versions 3, 4, and 5. The Kendall W between the three measures is 0.93 for Shampoo and 0.97 for deodorant. Table 5 summarizes both F- and t- tests by attribute for all three versions of the point assignments. F- tests for differences of means among all three versions are significant for only 1 attribute for deodorants and 3 attributes for shampoo. Paired versions and corresponding t- tests similarly showed few significant differences. When the shampoo attributes were tested pair wise across all three versions for mean differences, only 4 out of a possible 60 pairs were significantly different at the .10 level. Similarly, only 7 out of a possible 51 pairs of deodorant attributes were significantly different. It appears that the presence, as well as the nature of a prior task does not have a great effect on the mean importance scores generated for each attribute. As pointed out earlier and as indicated in Table 1, the three versions do differ in the number of attributes given one or more points. The differences in mean scores are slight as just discussed. There is, however, for some attributes considerable difference in the variance surrounding mean scores of the attributes. Table 5 shows that a statistically significant difference in variance exists for both shampoo and deodorant attributes. On a pairwise comparison basis, over 50 percent of the variances were significantly different. Further analyses were made to determine patterns and possible causes of these differences in variances. .However, they appear to occur without discernible pattern and are not correlated with the characteristics of the attribute or its corresponding importance weight. NUMBER OF ATTRIBUTES WITH SIGNIFICANT DIFFERENCES IN MEANS AND VARIANCES There is sufficient evidence to suggest that the several questionnaire versions do yield different results. Piecemeal comparisons do not point to an overwhelming and easily recognized pattern. Hence, some means of examining pattern differences as opposed to specific attribute differences among the several versions seemed desirable. Two approaches were used. The first w to use a multiple discriminant analysis of point vectors for the three versions. [Discriminant Analysis is here used to test whether patterns of points across all attributes can be used to correctly identify the questionnaire version leading to that pattern.] The second, which included versions 1 and 2 as well, was to perform a factor analysis in an attempt to reduce the number of independent dimensions to be considered. In both cases, patterns are of interest which leads to a somewhat non-standard use of both of these techniques. The multiple discriminant analysis correctly classified about 50% of the respondents as to the three versions (49% for shampoo and 55% for deodorants). This can be compared to a hit or miss chance level of 34$, and is significant beyond 0.001. Although the classificatory power would be expected to shrink somewhat using a holdout sample, the strength of this result does suggest that pattern_wise there are differences among the versions. Given the contradictory evidence over the several versions studied, it seemed further useful to see whether whatever differences which exist would be detected by a technique that might commonly be used to reduce the larger set of attributes to a smaller number of dimensions which are presumed independent of one another. The data obtained from the five versions was submitted to factor analysis. [The method used was principal component factor analysis with varimax orthogonal notation. Communalities were estimated by the squared multiple correlation coefficient. Only those factors associated with an eigenvalue greater than or equal to 1.0 were subjected to rotation.] The results of this analysis were examined for agreement between versions in terms of the number of factors obtained, the factor loadings that resulted, and similar measures appropriate to factor analytic methodology. In the most general sense it might be expected that each of the 6 versions (Version 5 was analyzed as 5a and 5b, also, recall that Versions 1 and 5a, and Versions 3, 4b, and 5b were of the same type of measurement) would yield similar number of factors and similar amounts of explained variance. Table 6a indicates that, for shampoo, the number of factors ranged from 5 (Versions 1 and 5a) to 8 (Versions 3, 4b, and 5b) with Percent Explained Variance ranging from 45% (Versions 1 and 5a) to 71% (Versions 3, 4b, and 5b). While the range was fairly great across versions, there was general overall agreement for similar measures. Results for the versions across deodorants were less stable. The number of factors ranged from 3 to 7 and the Percent Explained Variance from 36% to 69%. Versions 1 and 5a displayed 4 and 3 factors respectively with only 51 and 36 percent explained variance. Versions 3, 4b, and 5b were much less stable than for shampoo with 6, 7, and 4 factors respectively and 67%, 69%, and 41% explained variance, Examination of the individual factor loadings indicates much more strongly the differences between the six versions. It is rarely an easy task to interpret, summarize, and communicate factor analysis results. The lack of agreement among factor loadings can be seen by examining the numbers of times attributes were similarly paired by two or more questionnaire versions under identical factor analytic conditions. Little agreement occurred as Table 6b indicates. The maximum agreement, i.e. all six versions, did not occur for a single pairing for either shampoo or deodorant, nor did any five agree. Only twice were attributes paired by four different versions for deodorants and not at all for shampoo. Agreement by three versions occurred only four times for both shampoo and deodorant. Agreement by two versions occurred 11 and 18 times for shampoo and deodorant respectively. Sixty-one and 43 different pairs for shampoo and deodorant were brought out singly by the six different versions. Attempts to name the factors proved difficult and showed little communality across versions. While a very rough way to summarize the results, this general lack of agreement would indicate that the factors developed by each version of the questionnaire were strikingly different. As before it is difficult to pinpoint the cause, but it does appear that how importance of attributes is measured c~ms to make some difference in the results to be achieved. SUMMARY OF FACTOR ANALYSIS RESULTS FOR QUESTIONNAIRE VERSIONS PAIR GROUPS OF FACTOR LOADINGS BY QUESTIONNAIRE VERSION CONCLUSIONS No single judgment can be made as to the results obtained in this study, but some findings of potential value for future research have emerged: 1. Aggregated rank orders of attribute importance appear suite stable across versions. Simple "yes-no" or "1-6" Judgments work as well as any of the more difficult tasks. 2. Use of an arbitrary dichotomy for importance yields differences across versions. The 1-6 gradient scale seems to generate finer distinctions when used alone and also when used as a warm-up for point allocations. It also offers a combination of both independent and relative ratings of importance and was easily completed by the respondents. 3. The 1-6 scale falls short of interval measurement, however, and its measures should only be viewed as ordinal. If interval data is required, either a separate scaling study should be run in conJunction with the 1-6 scale, or a shift to some version of point allocation should be undertaken. 4. Evidence from three versions of point allocations is mixed. The three versions yield similar mean points for almost all attributes. The few attributes that differ by version are difficult to characterize. Variances about the attribute means show many more significant differences across versions, but again no systematic rationale can be seen to account for these differences. It is at this point that the stability found on broader levels begins to break down. It appears that respondents may have approached their tasks in comprehensively different fashions which may lead to instability of patterned responses. 5. The possibility of pattern instability is further confirmed by results of multiple discriminant analysis and factor analysis. These findings clearly showed differences in responses over the three versions studied, and raised the question of how adequately a single measuring device really represents the construct it is purporting to study. In conclusion the purpose of this study was to provide evidence concerning the problem of measuring attribute importance. Expectations were that results would be straightforward; importance scores across versions would be either consistent and interpretable or inconsistent and interpretable. Actual results, of course, were somewhat consistent, somewhat inconsistent, and generally noninterpretable. The results are, however, suggestive of follow-up research efforts. Stress should be given to measurement intended for higher-level analysis, since considerable stability was achieved on simple importance rankings measured by any version. With respect to the higher level analysis this study indicates that results are sensitive to differences in response method, but they do not provide an answer as to whether the differences are due to unreliability or to a systematic bias. Reliability can be tested through a longitudinal design. Validity is particularly troublesome due to the absence of even a clear predictive criterion, since the measures are now used as intermediate inputs for testing models (of unknown validity) which are aimed either at inference of attitude structure or prediction of behavior. It would seem that two approaches to the validation issue are possible. A tangential method would involve sensitivity analysis of the models to determine how "true" the measurement needs to be. If results show little sensitivity,the problem can be dispensed with. If not, creative search for a specific criterion related to the Particular model must be undertaken. REFERENCES Cohen, Joel B. and Michael J. Houston, "Some Alternatives to a Five-Point Likert Scale (Especially if You Have a Purpose in Mind)," paper presented to the Workshop on Attitude Research and Consumer Behavior, University of Illinois, December 4, 1970. Day, George S., "Attitude Structures: Prospects for Theory-Oriented Measurement," paper presented to the Workshop on Attitude Research and Consumer Behavior, University of Illinois, December, 4, 1970. Fishbein, Martin, "The Relationships Between Attitudes and Behaviors,'t paper presented to the Workshop on Attitude Research and Consumer Behavior, University of Illinois, December 4, 1970. Haley, Russell I.g "Benefit Segmentation: A Decision-Oriented Research Tool," Journal of Marketing, 32, (July 1968), pp. 30-35. Hughes, G. David, "Distinguishing Salience and Valence," paper presented to the Workshop on Attitude Research and Consumer Behavior, University of Illinois, December 4, 1970. Miller, G. A. "The Magical Number Seven, Plus or Minus Two: Some Limits in Our Capacity for Processing Information," Psychological Review, 62 (1956). PP. 81-97. Myers, James H., and Mark P. Alpert, "Determinant Buying Attitudes: Meaning and Measurement," Journal of Marketing, 32 (October, 1968) pp. 13-20 Myers, James H., and Gregory W. Warner, "Semantic Properties of Selected Evaluation Adjectives," Journal of Marketing Research, 5 (November 1968) pp. 409-412. Talarzyk, W. Wayne, and Reza Moinpour, "Comparison of an Attitude Model and Coombsian Unfolding Analysis for the Prediction of Individual Brand Preference," paper presented to the Workshop on Attitude Research and Consumer Behavior. UniversitY of Illinois, December 4, 1970. Wilkie, William L., An Empirical Analysis of Alternative Bases of Market Segmentation, Unpublished doctoral dissertation, Stanford University, 1970. ----------------------------------------
Authors
Dan E. Schendel, Krannert Graduate School of Industrial Administration, Purdue University, Lafayette, Indiana
William L. Wilkie, Krannert Graduate School of Industrial Administration, Purdue University, Lafayette, Indiana
John M. McCann, Krannert Graduate School of Industrial Administration, Purdue University, Lafayette, Indiana
Volume
SV - Proceedings of the Second Annual Conference of the Association for Consumer Research | 1971
Share Proceeding
Featured papers
See MoreFeatured
Guilt Undermines Consumer Willingness to Buy More Meaningful Time
Ashley V. Whillans, Harvard Business School, USA
Elizabeth W. Dunn, University of British Columbia, Canada
Featured
Priming & Privacy: How Subtle Trust Cues Online Affect Consumer Disclosure and Purchase Intentions
James A Mourey, DePaul University, USA
Ari Waldman, New York Law School
Featured
Promoting Pi Day: Consumer Inferences about Special Day-Themed Promotions
Daniel M. Zane, University of Miami, USA
Kelly Haws, Vanderbilt University, USA
Rebecca Walker Reczek, Ohio State University, USA