Nonmetric Unidimensional Scaling of Consumer Preferences For Proposed Product Designs

Murphy A. Sewall, State University of New York at Albany
ABSTRACT - Multidimensional scaling is an increasingly popular device for analyzing consumer preferences. However, in some instances a one dimensional scale may provide an adequate basis for decision-making. This paper describes a procedure for unidimensional scaling of rating data and presents empirical confirmation of validity based upon an independent data set.
[ to cite ]:
Murphy A. Sewall (1978) ,"Nonmetric Unidimensional Scaling of Consumer Preferences For Proposed Product Designs", in NA - Advances in Consumer Research Volume 05, eds. Kent Hunt, Ann Abor, MI : Association for Consumer Research, Pages: 22-25.

Advances in Consumer Research Volume 5, 1978      Pages 22-25

NONMETRIC UNIDIMENSIONAL SCALING OF CONSUMER PREFERENCES FOR PROPOSED PRODUCT DESIGNS

Murphy A. Sewall, State University of New York at Albany

[This project was supported by a Fortune 500 firm as part of a SUNY Albany MBA field project. The sponsor wishes to remain anonymous for proprietary reasons.]

ABSTRACT -

Multidimensional scaling is an increasingly popular device for analyzing consumer preferences. However, in some instances a one dimensional scale may provide an adequate basis for decision-making. This paper describes a procedure for unidimensional scaling of rating data and presents empirical confirmation of validity based upon an independent data set.

INTRODUCTION

Recent years have seen the development and application of increasingly sophisticated multidimensional scaling techniques for examining consumer perceptions and preferences (Green and Rao, 1972). The need to examine two or more components, or dimensions, may be necessary to understanding the mental process underlying relative perceptions and preferences in some cases. However, two arguments can be made favoring consideration of the alternative of unidimensional scaling.

From the point of view of the decision-maker, the relevant question is usually: "Which alternative(s) among a set of proposals is most preferred by potential customers?'' Such a question is clearly a request for a ranking at minimum, and a ranking that indicates the relative differences between closely ranked proposals if possible. If a scale of only one dimension can fairly represent consumer preferences, such a scale is clearly consistent with the needs of the user (and sponsor) of the research.

Also, a simple explanation is preferable to a complicated one that offers little additional information. Multidimensional solutions are often difficult to interpret, difficult to explain, and hard to understand. Their implications for decision-making may be so difficult to determine that their practical value may be much less than their elegance. While much of the "real world" may be inherently too complex for anything less than a multidimensional solution, it could be a serious mistake to assume, without examination, that more than one dimension is needed to describe or explain a phenomenon.

This paper represents results of a survey of consumer reactions to 36 proposed bed linen designs. The analysis indicates that a nonmetric unidimensional scaling procedure does provide an adequate representation of relative group preference for this particular problem. The scale predicts the relative preferences of an independent ("holdout") sample of consumers with a high degree of accuracy.

THE PROBLEM

The 36 proposed designs evaluated in this study were supplied by a major seller of bed coverings (and a variety of other consumer products). The competition for the 650-700 million dollar market for bed linens has grown increasingly intense. Over thirty name fashion designers are currently producing patterns for competing marketers of these products. In spite of the proliferation of designs, generic demand for some items in the line is actually declining, for example, approximately three quarters of sheets are sold at sale prices (Reif, 1977).

Selecting which patterns to introduce from among those proposed by designers has been an important problem for product managers ever since the advent of "designer" bed linen a decade ago. The increasing intensity in today's market places an even greater premium on preference information obtained from pre-tests of proposed designs on consumers.

The number of new patterns proposed for each selling season is typically twenty or more. Management would like to evaluate as many choices as possible since the number of potentially strong entries identified is believed to increase with the number of ideas examined. A researcher would prefer to keep the number of alternatives small in order to keep the survey task manageable. Requesting too many responses from individual consumers is likely to increase the frequencies of refusals to participate, refusals to provide complete responses, and rote responses which do not reflect the true preferences of the subjects.

SURVEY DESIGN

Most preference scaling techniques are based on expressions of subjects' preferences for one of each possible pairing of all alternatives. Unfortunately, the number of pairs to be evaluated expands geometrically as the number of alternatives increase.

It is unreasonable to expect consumers to respond to the large number of paired comparisons that would be required to evaluate the number of bed linen designs that management wishes to consider. Hence, it is necessary to utilize a data collection technique which requires far fewer responses from each subject. A rating scale approach was selected because only one response per pattern is required, and rating also presents respondents with a readily comprehensible evaluation task.

The thirty-six patterns were photographed on a bed, without head board or foot board, against a plain white background and arranged on five display boards, each of which represented a designer "collection." Subjects were first asked to examine each collection as a whole. After they studied all five collections and were given an opportunity to consider their preferences over the entire set of patterns, respondents were asked to rate each pattern on a five point scale:

If this pattern were offered for sale -

1. I would definitely buy.

2. I probably would buy.

3. I might or might not buy.

4. I probably would not buy.

5. I definitely would not buy.

The survey also included a list of demographic questions (age, income of household, and so forth), questions about the number of bedrooms in the subject's home and the style(s) of furniture in each, and other questions about whether various attributes of the linen or the marketing firm were important to their purchase decisions.

Data were obtained by shopper intercept interviews at shopping centers in eight metropolitan areas (two in the Northeast, one in the Southeast, three in the Midwest, and two on the West Coast). A total of 399 interviews were completed (approximately 50 in each city).

DATA ANALYSIS PROCEDURE

The use of rating scales in marketing research is quite common, and has been for some time. However, quantitative analysis of this type of data often involves statistical calculations which are used to draw inferences which are conceptually inappropriate (Adams, Fargot, and Robinson, 1965). The usual errors are the implicit assumptions that:

1. Adjacent response categories are equidistant (that is the data are interval scaled).

2. These intervals are constant across subjects. Rating Scale Analysis

The most common approach to analyzing five-point intention to purchase ratings consists of ranking the stimuli in order of the fraction assigning one of the first two (would buy) ratings (Taylor, Houlahan and Gabriel, 1975). This procedure is subject to several criticisms. The calculations discard information by reducing the five-point ratings to a two-point scale. There is an implicit assumption that all subjects mean the same thing by "probably would buy." And, the analysis does not provide a statistical basis for determining how much more desirable a highly rated pattern is over one with a lower rank.

One paper that explicitly recognized the problem of assuming too much about rating data (Harris, 1964) is subject to similar criticism for using an arbitrary rule for determining which values should be counted (the top three categories in a 21 point scale which actually produced a forced ranking of the alternatives). The analysis in the study also assumes homogeneity across subjects in their interpretation of the range covered by the scale. Attempts to validate the scaling against a "sales index" for two types of patterns produced widely divergent results. While the explanation that is given for the disparity appears plausible, it is qualitative. The possibility that the one high correlation between the scale and the sales index was due to chance cannot be eliminated.

A procedure has existed for over 25 years for scaling rating data without assuming equal distances between categories (Edwards, 1952). The method of successive intervals does assume that subjects make the same evaluation of each response category (subject to some error in making individual judgments about the assignments of stimuli to categories). That is, scaling by this method assumes, for example, that all responses of "I probably would buy" represent the same probability of purchase for all subjects.

The method proposed in this paper is less restrictive in its assumptions about the original data than the method of successive intervals. It is also easy to use the scale reported below to verify goodness-of-fit to the original data or to predict responses of subjects from a different sample, although it is possible to perform these calculations with the method of successive intervals.

Plans were made to scale the data for this study by the method of successive intervals for comparative purposes. Unfortunately, a limitation of the successive intervals approach aborted these plans. The frequency distributions for four of the thirty-six designs were found to be badly skewed. More than half of the subjects stated that they definitely would not buy these four designs. The method of successive intervals fails if the median occurs in an end point of the rating scale for any stimulus.

The skew in some of the frequency distributions for the pattern ratings is hardly surprising. The patterns are, after all, new product ideas, and the high consumer rejections rate for new product ideas is well known. The fact that skewed rating data is to be expected in this type of consumer research is an obvious indication that metric data analyses techniques are inappropriate.

Derived Paired Comparisons

The approach taken in this study is, in concept, one of restructuring the data into a form that has previously been solved. Specifically, each subject's ratings were converted into a set of pairwise preferences. The notion of a "derived paired comparison matrix" has been considered before. Neidell (1972) recommends derived measures for large stimulus sets. Green and Rao (1972, p 25) propose derived paired comparisons from multi-attribute rating profiles. Five-point intention to purchase ratings may be viewed as sorting n stimuli into k classes (Rao, 1971).

As proposed here, derived paired comparisons for each subject are obtained by comparing the two ratings for each possible pair and assuming the subject prefers the design toward which he has expressed the more favorable purchase intention (Reynolds, 1966). Tied ratings are treated as missing data. That is, subsequent calculations of the fraction of respondents expressing a preference for one pattern over another are based only on that portion of the sample that gave a clearly more favorable rating to one pattern. The alternative approach would be to treat ties as a value of one-half for each pattern in a pair (Benson, 1962). Computations were made for both assumptions about ties. Results were equivalent to two decimal places. The missing data approach is favored by the sponsoring management since subjects who expressed a greater degree of distinction between the patterns produce fewer tied comparisons and are, therefore, given greater weight in the resulting group scale. It is assumed that subjects who gave many patterns the same rating are relatively indifferent to which designs are marketed. There is a more sophisticated, but computationally complex, method for treating tied (no preference) data (Draper, Hunter and Tierney, 1969). Application of this procedure was not deemed appropriate, from a cost-benefit point of view, for this stage of the analysis.

Only two assumptions need to be made about the data in order to justify this formulation of derived paired comparison matrices theoretically:

1. Each subject applied a consistent underlying scale to his evaluations of all patterns presented.

2. This underlying scale is at least ordinal.

The first assumption implies that the subject is consistent with himself (but not necessarily with other subjects). The second assumes that his ratings are consistent with the stated rating categories. Since the subjects studied all the patterns and responded to opinion questions about groupings of them (the five "collections") before rating the patterns individually, the likelihood that they formulated personal scales which remained nearly fixed over the 36 ratings requested is plausible.

Unidimensional Scaling

The derived paired comparison preference matrices are summarized (fraction of subjects preferring the column stimulus to the row stimulus for all columns and rows) and subjected to Thurstone scaling (Case V). The theory and method of this procedure was originally published fifty years ago (Thurstone, 1927) and has subsequently been treated in a wide variety of papers and texts (Mosteller, 1951a and 1951b, among many others). Thus, Thurstone V scaling is taken as "previously solved" insofar as mechanics and conceptual justification are concerned.

The scaling of stated intention to purchase data by this procedure may be viewed as an unusual application by some. A consumer's evaluation of his purchase intention is likely to involve a considerable number of considerations. Hence, one might think that purchase intention is far too complex a concept for Thurstone's procedure. However, at the outset, Thurstone noted that:

"It is not necessary to limit psychological analysis to stimuli which have intensity or magnitude as their principal attributes. For example, a series of handwriting specimens may be arranged in a continuum on the basis of general excellence (Thurstone, 1927, p. 369)."

A judgment about which of two handwriting specimens is "more excellent" does not seem to be altogether different from an evaluation of which of two bed linen designs is more desirable.

Fortunately, a statistical text exists (Mosteller, 1951b) for the adequacy of applying Thurstone V assumptions to a particular set of data. Numerically, the statistic involves a chi-square calculation based on arc sine transforms of the square roots of the predicted and observed percentages of subjects preferring one of each possible pair of stimuli. Thus, an empirical basis for arguing that an application of Thurstone V scaling does not violate the procedure's assumptions is available. This statistical test, and some others, may also be applied to predictions of the fraction of respondents preferring one pattern over another from a sample of subjects other than those used to develop the scale.

RESULTS

Ratings for the 399 respondents to the survey were randomly sorted into two groups. These groups are labeled 'A' (195 subjects) and 'B' (204 subjects) in the discussion and tables below.

Goodness-of-Fit

Derived paired comparison matrices, and subsequently Thurstone scales, were formulated for each group. The scales were then used to predict the fraction of respondents preferring one of each of the 630 possible pairings of patterns (in matrix form, the fraction preferring the column designs over the row designs for the upper half matrix with missing diagonal).

Tables 1 to 3 each contain four statistical results. Since the data are randomly divided into two subsamples, it is possible to use each as a "holdout sample" to test the predictive power of a scale derived from the other. The same statistics also may be calculated in the usual manner - using the subsample which produced a scale as the observed data (a less rigorous goodness-of-fit test than one based on an independent sample).

Table 1 contains statistics for Mosteller's (1951b) goodness-of-fit. The very low chi-square values indicate a close correspondence between scale predictions and observed proportions.

TABLE 1

MOSTELLER'S GOODNESS-OF-FIT TEST

Table 2 represents an alternative (and perhaps more familiar) perspective of the goodness-of-fit. This table contains correlation statistics between predicted and observed fractions of subjects preferring one of each pair of designs (630 predictions). The values are surprisingly high. The null hypothesis that the true correlations are zero is rejected at any reasonable significance level. It does not appear that multidimensional scaling would be likely to provide additional insight into the samples' aggregate preferences for bed linen designs.

TABLE 2

PRODUCT MOMENT CORRELATIONS

Desirable Patterns

Table 3 contains statistics for means, standard deviations, and mean absolute deviations of the 630 prediction errors. The ninety-five percent confidence interval (based upon across sample error deviations) is approximately twelve percent. That is, if pattern 'A' is predicted to be preferred over 'B' by 62 percent or greater, then the likelihood that 'B' is actually preferred to 'A' is five percent or less.

TABLE 3

PREDICTION ERRORS

Since management seeks above average designs, a decision-rule to classify a pattern as desirable if it has a predicted preference of 62 percent or greater compared to an "average pattern" (the mean preference scale value for all 36 patterns) was recommended. Seven of the 36 patterns in this study were identified as satisfying this decision-rule.

CONCLUSION

After the fact, it does not seem surprising that a scale of one dimension provides an excellent fit to the preferences derived from consumer intention to purchase ratings. A recent article discussing applications of conjoint analysis (Hupfer, 1977) points out that attempts to break down preferences for product choices into component attributes and benefits are probably inappropriate when relatively little conscious thought is given to the purchase. It seems plausible that consumers' analyses of cost-benefit trade-offs occur in choosing whether or not to buy new bed linen, and if so, in what fabric (muslin or percale) and style (flat or fitted) and for what purpose (beds, walls, windows, or furniture) (Reif, 1977). The choice of pattern from among the many competing within chosen fabric and price groups may indeed be made with little conscious analysis.

It is likely that nonmetric unidimensional scaling will prove appropriate for similar applications in other product areas. Certainly, in instances where brand competition is characterized by proliferation of "fashion designs," results similar to those obtained in this study may be expected.

Unidimensional scaling also appears to have merit as a preliminary to multidimensional scaling. When direct paired comparison data is obtained, Thurstone V scaling is straight-forward. It seems surprising that applications of Thurstone scaling rarely appear in recent marketing literature.

It also seems puzzling that scaling by the method of successive intervals (Edwards, 1952) has received virtually no attention from marketers. The method is computationally simple (particularly when programmed for a computer). One would expect that this method would be appropriate for the data collected in many studies (Reynolds, 19-6, for example). The nonmetric unidimensional scaling procedure described here may prove to be more general, and equivalent to successive intervals when both methods are appropriate, but this assertion remains to be verified.

REFERENCES

Adams, E. W., R. F. Fargot and R. E. Robinson. "A Theory of Appropriate Statistics," Psychometrika, 30 (June 1965), 99-127.

Benson, Purnell H. "A Short Method for a Distribution of Consumer Preferences," Journal of Applied Psychology, 46 (October 1962), 307-13.

Draper, N. R., W. G. Hunter and D. E. Tierney. "Analyzing Paired Comparison Tests," Journal of Marketing Research, 6 (November 1969), 477-80.

Edwards, A. L. "The Scaling of Stimuli by the Method of Successive Intervals," Journal of Applied Psychology, 36 (April 1952), 118-22.

Green, Paul E. and Vithala R. Rao. Applied Multidimensional Scaling. New York: Holt, Rinehart and Winston, Inc., 1972.

Harris, Douglas, "Predicting Consumer Reactions to Product Designs," Journal of Advertising Research, 4 (June 1964), 34-37.

Hupfer, Herbert. "Techniques Useful in Planning New Products, Especially Costly Ones," Marketing News, 10 January 28, 1977) 10 ff.

Mosteller, Frederick. "Remarks on the Method of Paired Comparisons: I. Least Squares Solution Assuming Equal Standard Deviations and Equal Correlations," Psychometrika, 16 (March i951), 3-11.

Mosteller, Frederick. "Remarks on the Method of Paired Comparisons: III. A Test of Significance Assuming Equal Standard Deviations and Equal Correlations," Psychometrika, 16 (June 1951), 207-18.

Neidell, Lester A. "Procedures for Obtaining Similarities Data," Journal of Marketing Research, 9 (August 1972), 335-37.

Rao, Vithala R. and Ralph Katz. "Multidimensional Scaling Methods for Large Stimulus Sets," Journal of Marketing Research, 8 (November 1971), 488-94.

Reif, Rita. "Sheets by Design," New York Times (Sunday, February 13, 1977), F-1 ff.

Reynolds, William H. "Some Empirical Observations on a Ten-Point Poor-to-Excellent Scale," Journal of Marketing Research, 3 (November 1966), 388-90.

Taylor, James W., John J. Houlahan and Alan C. Gabriel, "The Purchase Intention Question in New Product Development: A Field Test," Journal of Marketing, 39 (January 1975), 90-92.

Thurstone, L. L. "Psychophysical Analysis," Journal of American Psychology, 38 (July 1927), 368-89.

----------------------------------------