Multiattribute Choice Models: a Critical Review
ABSTRACT - This paper reviews and discusses the work by Green and Desarbo, "Two Models For Representing Unrestricted Choice Data," Leigh, Mackay, and Summers, "On Alternative Methods For Conjoint Analysis," and Acito and Olshavsky, "Limits To Accuracy in Conjoint Analysis."
Citation:
Jeffrey E. Danes and Philippe Cattin (1981) ,"Multiattribute Choice Models: a Critical Review", in NA - Advances in Consumer Research Volume 08, eds. Kent B. Monroe, Ann Abor, MI : Association for Consumer Research, Pages: 323-328.
This paper reviews and discusses the work by Green and Desarbo, "Two Models For Representing Unrestricted Choice Data," Leigh, Mackay, and Summers, "On Alternative Methods For Conjoint Analysis," and Acito and Olshavsky, "Limits To Accuracy in Conjoint Analysis." TWO MODELS FOR REPRESENTING UNRESTRICTED CHOICE DATA Green and Desarbo's paper, "Two Models For Representing Choice Data," is one more advancement to be included in the rapid growth of multidimensional scaling models. Today market researchers may choose between a metric or non-metric algorithm, scale a 2-way matrix, a 3-way matrix, may employ longitudinal scaling, and may choose from a variety of metric axioms: Euclidean, Minkowski, as well as Riemannian (cf. Lindman & Caelli 1978, Piezko 1975). Further, market researchers have numerous data collection methods at their disposal. The Green and Desarbo paper makes an advance in this latter category. They first demonstrate the use of Levine's model (1979) for scaling unconstrained choice data--data obtained from respondents who are neither instructed to choose a fixed number of alternatives nor is the set of possible alternatives explicit. The primary contribution of the Green and Desarbo paper is the extension of Levine's (1979) model to unconstrained choice data in which the stimulus set is prespecified. The outcome of their effort is a multidimensional space that yields: ideal points, stimulus points, and attribute points. Although it is traditional to map attributes as vectors, the model provided by Green and Desarbo maps attributes as points in a multidimensional space. Let us focus our attention upon this feature. Utility of Attributes As Points One criticism of the Green and Desarbo paper is that they do not explain the role that dimensions play in their model. If attributes are represented as points, what do the dimensions represent? It is now well known that the dimensions obtained in multidimensional scaling do not necessarily correspond to attributes. This has been found by numerous researchers who have regressed attribute vectors into multidimensional spaces. It is not uncommon to find six or more attribute vectors in a two or three dimensional space (cf., Schmidt 1976). The dimensions of a multidimensional space do not necessarily correspond to attributes; thus, it may be more meaningful to view the obtained dimensions as an arbitrary reference system in which points of all kinds are plotted. A common example of an arbitrary reference system in which dimensions have no inherent meaning is the letter-number grid on a city map. These dimensions may be translated and rotated with no harm done to the distance between any two homes. The coordinates simply tell us where "Oak Street" is relative to "Elm Street." Green and Desarbo's treatment of the attribute as a point in space, however, gives new meaning and new potential applications of multidimensional scaling to market and consumer research. Let us now comment on one possible application of representing attributes as points. Example Application: Advertising Strategy The Green and Desarbo paper used association data for their example application. Hence, distance in the space is a function of association in the data--the highly associated points are close together. They also mapped ideal points. Without loss of generality, let us assume a homogeneous population with one ideal point, I; let it be mapped as follows (see Figure 1): A SIMPLE MDS MAP WITH ATTRIBUTES AS POINTS Let us denote the product as P, the ideal point as I, and three attributes as A1, A2, and A3.. Without changing the associations (or inter-distance relations), we may translate the dimensions of the space so that the product, P, has zero coordinates (see Figure 2). TRANSLATED MDS MAP WITH ATTRIBUTES AS POINTS One marketing goal may be to "move" the product closer to the ideal point. How can this be accomplished? Communication researchers have long known that messages that repetitively associate two objects (e.g., Watergate and political crime) result in increased perceived similarity between the two objects. Likewise, the same appears to be true for objects and attributes (Barnett, Serota, and Taylor, 1976). Marketers capitalize on this principle when they associate their product with intuitively desirable attributes. Let us assume that a marketer, through communication, associates their product, P, with A3. Let us also assume that the communication campaign is relatively lengthy. What is the likely outcome? If the campaign successfully teaches the association the product, P, should "move" closer to A3. Let us denote the "new" location as P' in Figure 3. However, this NEW LOCATION OF PRODUCT AFTER ASSOCIATION OF PRODUCT, P, WITH ATTRIBUTE, A3 is not consistent with the original goal--moving the product closer to the ideal point. A campaign that associates the product, P, with say, A2 should be a much better strategy. But, the resulting direction of motion is slightly off target. However, if we assume that vector averaging is a viable cognitive principle, then the two-attribute message in which the product, P, is associated with, A2 and A3, would be even better. This principle is diagrammed in Figure 4. NEW LOCATION OF PRODUCT AFTER ASSOCIATION OF PRODUCT, P, WITH ATTRIBUTES, A1 AND A2 Figure 4 pictures the "final state" for the product given an advertising campaign that is successful in teaching the new associations, i.e., the product, P, is associated with the two attributes, A1 and A2. The example we present here, of course, is over simplified. However, generalizations to spaces of larger dimensionality with more attributes is possible. Extended models and algorithms for using attributes as points (in multidimensional spaces) for the measurement of communication and marketing processes may be found in Woelfel and Danes (1980). Summary Green and Desarbo's work is one more contribution to the rapidly growing arsenal of multidimensional scaling techniques. They provide a multidimensional scaling model derived from Levine's (1979) earlier model development. The primary contribution of their work is the extension of Levine's (1979) model to unconstrained choice data in which the stimulus set is prespecified. The outcome of their effort is a multidimensional space that yields: ideal points, stimulus points, and attribute points. The role of the dimension, which is usually treated as an attribute, is not made clear by Green and Desarbo. Our discussion, however, suggests that the dimensions in a multidimensional space may be usefully viewed as an arbitrary reference system, which by themselves, have no inherent meaning. Our discussion then moved to potential applications of the new model presented by Green and Desarbo; we discussed an advertising strategy application. ON ALTERNATIVE METHODS FOR CONJOINT ANALYSIS Leigh, Mackay, and Summers have presented an interesting study on alternative methods for conjoint analysis. The objective of their study was to evaluate full and fractional experimental and various methods of data collection: rank order, pair-comparisons, graded pair-comparisons, subjective estimates, and rating scales. The evaluative criteria were reliability and validity. The study concluded that the less fractionated designs were more reliable than the more fractionated designs; and that data collection methods involving comparative judgments (i.e., pair-comparisons, graded pair-comparisons, and ranking) were more reliable than profile stimulus sets. Based upon small sample sizes, however, the authors assert that the results must be tentatively held. Hence, they agree that the findings are suggestive. The implication, of course, is that if the cell sample sizes had been larger, the results would be more meaningful estimates of reliability and validity. We show below, however, that this conclusion may be false. Leigh, et al. use the test-retest correlation as a measure of temporal stability. They state that "Test-retest methods are a common way of measuring the temporal stability of Consumer's Responses (p. 318)." Their distinctions between temporal stability and structural reliability follow the work of McCullough and Best (1979). McCullough and Best (1979) have stated that "The stability of a measurement is related to the reproduction of measurement results at different points in time. This aspect of reliability generally has been inferred by the correlation between test-retest measurements over some specified time interval (p. 26-27)." Additionally, McCullough and Best (1976) have stated that structural reliability relates to systematic error. Let us first state that these definitions are not standard psychometric terms. Ordinarily when systematic changes are accounted for, the degree to which an instrument replicates its measurements is called the reliability of measurement. On the other hand, given measurement that contains no random error, the degree to which an instrument replicates its measurements is called the stability of measurement (Heise 1979, Wiley and Wiley 1971). Nonetheless, Leigh, et al. define temporal stability as reliability and structural reliability as stability; hence, one is not sure what they are really after. More importantly, Leigh, et al. used the test-retest correlation coefficient; it is shown below that one test-retest correlation does allow one to separate reliability from stability. Separating Reliability From Stability To demonstrate why a single test-retest correlation does not provide a good estimate of reliability or stability, we lean heavily upon the work of Heise (1971). His work stems from Coleman (1968), classical psychometrics (Lord and Novick 1968) and from path analysis. Let us begin with path diagram of standardized test-retest measurements given in Figure 5. In this diagram, the coefficient Bt1t2 is the standardized path coefficient representing the degree to which the true score at time i, t1 "causes" the true score, t2, at time 2. In this context, Bt1t2 is identical to the correlation between true scores, i.e., rt1t2. The coefficient Bt1t2 (or rt1t2) is an estimate of stability, i.e., the degree to which test-retest measurements covary when measurement error is removed. The coefficients lt1x1 and lt2x2 are path coefficients reflecting the degree to which the true scores "cause" the observed measurements, x1 and x2. These coefficients are identical to the correlation of the observed measurements with the true scores, i.e., rx1t1 and rx2t2. The square of lt1x1 (or r2x1t1)is defined as the reliability of measurement for x1 (Lord and Novick 1968). The coefficients le1x1 and le2x2 are coefficients reflecting the degree to which random measurement errors "cause" the observed measurements. Since, we assume standardized variables, the variance of x1 and x2 is equal to one; hence, The coefficient bu2t2 represents systematic changes produced in t2; it is given as: For the above path model we assume that measurement errors are mutually uncorrelated, and that measurement errors correlate only with the respective observed measurements. Following the rules of path analysis the test-retest correlation , rx1x2, is defined as: rx1x2 = lt1x1 Bt1t2 lt2x2. (3) Hence, with three unknowns in one equation, we can easily see that test-retest correlation does not enable one to separate reliability (l2t1x1 or l2t2x2) from stability, bt1t2. Even if we assume that the reliability of the instrument is constant over the two time periods, we still have an unsatisfactory situation: Following Coleman's (1968) lead, Heise (1971) demonstrated that it takes at least three time periods to separate reliability from stability. Below we provide a path diagram for the test-retest paradigm--assuming that the reliability of the instrument is constant (see Figure 6). If the assumptions stated above are preserved, and if it is assumed that the disturbances? u2 and u3. are uncorrelated, we may use the rules of path analysis to write: The first two equations may be written as: Making the appropriate substitutions into the bottom of equation 6 and solving for yields: Once the reliability of measurement is found, the observed correlations may be corrected for attenuation. These corrected correlations, bt1t2, bt2t3, and bt1t3, are the desired stability coefficients. If the assumption that the reliability of three measurements is constant over time seems unrealistic, Wiley and Wiley (1971) present a path analytic model in which this assumption is relaxed. Leigh, et al, used the test-retest correlation as a measure of "temporal stability." Our discussion, however, demonstrated that the test-retest correlation confounds stability with reliability. Furthermore, one test-retest correlation does not permit one to separate stability from reliability. We show that it takes at least three measurements (i.e., test-test-retest) for the estimation of stability. Additional Comments One interesting result obtained by Leigh, Mackay and Summers is that the average test-retest correlation of the direct dollar metric estimates of the part worths is greater than the average test-retest correlation of the part worths derived by conjoint analysis: .500 (Table 2) vs. .385 (Table 1). The difference may not be significant because of the relatively small sample size. [In this case, the t-value would have to be taken with a grain of salt. As indicated by the authors themselves in their reply to a referee, a t-test is "strictly speaking not appropriate." This is because different data collection methods were used to obtain the part worth estimates derived by conjoint analysis. Hence, the variances of the test-retest correlations may vary across methods.] In any event, one may wonder whether the direct part worth estimates have more predictive validity than those derived by conjoint analysis. Leigh et al. could have used the part worth estimates obtained (directly and by conjoint analysis) the first time to predict the rankings or ratings of the stimuli evaluated the second time. To be able to predict with the direct part worth estimates it is necessary to also know which level is preferred for each attribute and respondent. This can be obtained with one direct question for each attribute. Leigh et al. (apparently) did not have this information and thus could not estimate the predictive validity of the direct part worth estimates. In a study that involved 41 respondents Cattin and Weinberger (1980, Tables 3 and 7, p. 782-783) found the predictive validity of direct part worth estimates to be very close to the predictive validity of part worths derived by conjoint analysis. Of course, direct part worth estimates can have biases. To reduce the biases it is necessary to tell the respondents which levels they should refer to and to think that the products are always exactly the same (on all attributes) except on the attribute they are evaluating. Even so, it has been found in several studies that respondents tend to underestimate the importance of important attributes and to overestimate the importance of lesser important attributes: see, for instance, Scott and Wright (1976, p. 214) and Cattin and Weinberger (1980, Table 6). However, Green (1979) has shown by simulation that the market share predictions obtained with such distorted part worth values are quite close to the market share predictions obtained with "true" non-distorted part worth values. Hence, the effect of this bias on market simulation results is relatively small. Actually, direct part worth estimates are used commercially: e.g., Aaker and Day (1980, p. 206). If direct part worth estimates have relatively good predictive validity, combining them (somehow) with conjoint data might improve predictive validity. Bayesian regression procedures derived from Stein (1960) can be used to combine the two sets of data. Cattin and Danes (1981) have shown analytically and empirically that some improvement in predictive validity can thus be achieved. LIMITS TO ACCURACY IN CONJOINT ANALYSIS Acito and Olshavsky's hypothesis is that the predictive validity obtained with conjoint data (using the full profile approach) in which each attribute is defined with two levels is superior to the predictive validity obtained if each attribute is defined with three levels. Acito and Olshavsky tested this hypothesis in the case where the number of stimuli evaluated by respondents is the same in both the two and three levels cases and where part worth attribute utilities (rather than utility functions) are estimated (i.e., one more parameter is estimated for each attribute in the three levels case compared to the two levels case). The arguments in favor of the hypothesis, as stated by Acito and Olshavsky, are the following. First, the parameter estimates obtained in the three levels case are less reliable because there are less degrees of freedom (due to the fact that there is one more parameter estimated for each attribute). Second, the respondents are given more information in the three levels case (i.e., stimuli are defined on three levels instead of two) which produces more confusion, carelessness, and thus more noise in the data. It should be pointed out that the hypothesis applies primarily to the full profile approach (whereby the stimuli evaluated by the respondents are defined on all attributes). If one compares, for instance, the predictive validity obtained with ten (3 x 3) tradeoff matrices to the predictive validity obtained with ten (2 x 2) tradeoff matrices, the hypothesis is likely not to hold. Acito and Olshavsky tested their hypothesis using twenty MBA students, ten of which were assigned to each of three levels and two levels designs. In both cases, the attribute utilities were estimated using MONANOVA on the rankings of 23 stimuli. [The 23 stimuli were taken from orthogonal design plans of 25 assemblies for the three level case and 24 for the two level case. However, the authors deleted two assembles from the three level design and one assembly from the two-level design. Hence, the designs were not orthogonal.] The predictions obtained on validation data show that Acito and Olshavsky's hypothesis seems to be correct. Some of the results are significant even though the sample size was small. One may wonder whether the superior predictive validity obtained with the two-levels design is due (a) to more degrees of freedom, or (b) to less noise in the data, or (c) to both. If (a) is the reason, one could improve the predictive validity in the three-levels case by increasing the number of stimuli (thus decreasing the number of degrees of freedom). If the amount of noise is the same, the predictive validities obtained with both designs would be the same if the "true" attribute utilities could be obtained. However, the estimated part worths have errors because they are based on a limited number of observations, and the error is expected to be greater with the three-levels design because there are less degrees of freedom. In other words, the resulting shrinkage in the correlation reflected in the predictive validity is expected to be greater in the three-levels case. Some insight in the difference in expected shrinkage (obtained with the two vs. three levels designs) can be gained using equation 8 or 9 in (Cattin 1980, p. 410). There is one way to find out whether there is more noise in the data in the three-levels case compared to the two-levels case. [In the three-levels case, Acito and Olshavsky used linear interpolation between part worths to estimate the utility of intermediate levels. Pekelman and Sen (1979) have shown that fitting a (quadratic) curve through the part worths improves predictive validity and is thus more appropriate than interpolation. However, this would have changed the results obtained by Acito and Olshavsky only in the case of the prediction of actual brands (and not in the case of the prediction of "holdout" ranks). It thus appears that the results obtained with the two-levels design would still have been superior to the results obtained with the three-levels designs.] It involves assuming linear utility functions for each attribute in the three-levels case, thus estimating one parameter for each attribute (i.e., the same as in the two-levels case). In these conditions, the differences in predictive validity is due to the amount of noise and not to differences in the number of degrees of freedom. However, if linear utility functions are fitted in the three-levels case, the violations of the a priori attribute utilities would still be the same: i.e., higher in the three-levels case than in the two-levels case (Acito and Olshavsky 1980, Table 2). Hence, it appears that there might indeed be more noise in the three-levels case than in the two-levels case, and that the two-levels design would still produce higher predictive validities even if the number of stimuli (to be evaluated by respondents) is increased in the three-levels design. Whether and by how much two-levels designs are found to have more predictive validity than three-levels designs depend upon several factors including the type of respondents and the type of attributes. If the respondents are more involved, there should be less noise and this could improve the results of a three-levels design compared to two-levels design. However, whenever two-levels designs do produce data with less noise and increase (substantially enough) the predictive validity (when the full profile approach is used), it then becomes appropriate not to have more than two levels whenever possible. One way to proceed is to ask the respondents to position the levels of each attribute on (say) an 11-point scale where the least preferred and most preferred levels occupy the two ends (which can be asked of the respondents), and to then have the same respondents evaluate stimuli that are defined using only two levels for each attribute. See, for instance, Wind, Grashof and Goldhar (1978, p. 29-30). This procedure can be used with both discrete and continuous attributes. However, there might be some problems with continuous attributes especially if the attribute utility is expected to go through a minimum or a maximum (e.g., sugar content). In this case, it might still be best to have more than two levels in the conjoint design. A final point regarding Acito and Olshavsky's study concerns the use of MONANOVA as the procedure for estimating part worths. MONANOVA is an iterative procedure that tries to minimize stress. However, it can end up in a local rather than global optimum. In a Monte Carlo simulation study, Cattin and Wittink (1976, Table 4, p. 20) found that an estimation procedure based on the maximum likelihood of the LOGIT model led to part worth estimates with lower stress values than MONANOVA more often than not. Hence, the simulation results obtained by Acito and Olshavsky to determine the stress values produced with random data are not very meaningful because there is no guarantee that MONANOVA reached the global optimum. With nonmetric data it is more appropriate to use LINMAP or a LOGIT approach, while regression can be used with metric data. For the tradeoffs between these methods, see, for instance, Green and Srinivasan (1978, p. 112-114). REFERENCES Aaker, D. A. and Day, G. S. (1980), Marketing Research, New York: John Wiley and Sons. Acito, F. and Olshavsky, W. (1981), "Limits to Accuracy in Conjoint Analysis" in Advances in Consumer Research ed., Kent B. Monroe, Washington, D. C., Vol. 8, pp. 313-316. Barnett, G., Serota, L., and Taylor, J. (1976), "Campaign Communication and Attitude Change: A Multidimensional Analysis." Human Communication Research, 3, 277-44. Cattin, P. (1980), "Estimation of the Predictive Power of a Regression Model," Journal of Applied Psychology, 65 (August), 407-14. Cattin, F. and Wittink, D. R. (1976), "A Monte-Carlo Study of Metric and Nonmetric Estimation Methods for Multi-attribute Methods," Research Paper No. 341, Graduate School of Business, Stanford University. Cattin, P. and Weinberger, M. G. (1980), "Some Validity and Reliability Issues in the Measurement of Attribute Utilities,'' Advances in Consumer Research, ed., Jerry Olson, San Francisco, CA, Vol. 7, 780-3. Cattin, P. and Danes, J. E. (1981), "A Simple Bayesian Regression Procedure: Applications to the Estimation of Multiattribute Preference Models," Working paper, School of Business Administration, University of Connecticut. Coleman, J. S. (1968), "The Mathematical Study of Change," in Methodology in Social Research, eds., H. M. Blalock, Jr. and A. B. Blalock, New York: McGraw Hill, pp. 428-78. Green, P. E. (1979), "On the Insensitivity of Brand Choice Simulations to Attribute Importance Weights," Working paper, The Wharton School, University of Pennsylvania. Green, P. and Desarbo, W. (1981), "Two Models For Representing Unrestricted Choice Data," in Advances in Consumer Research, ed., Kent B. Monroe, Washington, D. C., Vol. 8, pp. 309-312. Green, P. E. and Srinivasan V. (1978), "Conjoint Analysis in Consumer Research: Issues and Outlook," Journal of Consumer Research, 5 (September), 103-23. Heise, D. R. (1971), "Separating Reliability and Stability in Test-Retest Correlation," in Causal Models in the Social Sciences, ed., H. M. Blalock, Jr., Chicago: Aldione-Atherton, pp. 348-63. Leigh, T., Mackay, D., and Summers, J. (1981), "On Alternative Methods For Conjoint Analysis," in Advances in Consumer Research, ed., Kent B. Monroe, Washington, D. C., Vol. B, pp. 317-322. Levine, J. H. (1979), "Joint-Space Analysis of 'Pick-Any' Data: Analysis of Choices From An Unconstrained Set of Alternatives," Psychometrika, 44, 85-92. Lindman, H. and Caelli (1978), "Constant Curvature Riemannian Scaling," Journal of Mathematical Psychology, 17, 89-109. Lord, F. and Novick, M. (1968), Statistical Theories of Mental Test Scores, Reading, Mass.: Addison-Wesley. McCullough, J. and Best, R. (1979), "Conjoint Measurement: Temporal Stability and Structural Reliability," Journal of Marketing Research, 16, 26-31. Pekelman, D. and Sen, S. K. (1979), "Measurement and Estimation of Conjoint Utility Functions," Journal of Consumer Research, 5 (March), 263-71. Piezko, H. (1975), "Multidimensional Scaling in Riemannian Space," Journal of Mathematical Psychology, 12, 449-77. Schmidt, C. (1972), "Multidimensional Scaling Analysis of the Printed Media's Explanations of the Riots of the Summer of 1967," Journal of Personality and Social Psychology, 24, 59-67. Scott, J. E. and Wright, P. (1976), "Modeling an Organizational Buyer's Product Evaluation Strategy: Validity and Procedural Considerations," Journal of Marketing Research, 13 (August), 211-24. Stein, C. (1960), "Multiple Regression," in Contribution to Probability and Statistics, I. Olkin et al., eds., Stanford, California: Stanford University Press. Wiley, D. and Wiley, J. (1971), "The Estimation of Measurement Error in Panel Data," in Causal Models in the Social Sciences, ed., H. M. Blalock, Jr., Chicago: Aldine-Atherton, pp. 364-73. Wind, Y., Grashof, J. F. and Goldhar, J. D. (1978), Market-Based Guidelines for Design of Industrial Products," Journal of Marketing, 42 (July), 27-37. Woelfel, J. and Danes, J. (1980), "Multidimensional Scaling Models For Communication Research," in Multivariate Techniques For Communication Research, eds., Peter Monge and Joseph Cappella, Academic Press, in press. ----------------------------------------
Authors
Jeffrey E. Danes, Virginia Polytechnic Institute and State University
Philippe Cattin, University of Connecticut
Volume
NA - Advances in Consumer Research Volume 08 | 1981
Share Proceeding
Featured papers
See MoreFeatured
Non-normative influence of self-decided prices on product-related inferences
Sudipta Mukherjee, Virginia Tech, USA
Mario Pandelaere, Virginia Tech, USA
Featured
Consumers’ Trust in Algorithms
Noah Castelo, Columbia University, USA
Maarten Bos, Disney Research
Donald Lehmann, Columbia University, USA
Featured
Understanding the Framing of Recommendations
Jia Gai, Erasmus University Rotterdam, The Netherlands
Anne-Kathrin Klesse, Erasmus University Rotterdam, The Netherlands