Further Beyond Conjoint Measurement: Toward a Comparison of Methods

Philippe Cattin, University of Massachusetts
Dick R. Wittink, Stanford University
ABSTRACT - An extensive body of research in marketing and related areas concerns the collection and analysis of data to assess preference formation by individuals. Alternative methods are available to analyze the information. Based on synthetic data, metric and nonmetric procedures are compared. It appears that metric procedures perform very well on "weak" (ordinal) data.
[ to cite ]:
Philippe Cattin and Dick R. Wittink (1977) ,"Further Beyond Conjoint Measurement: Toward a Comparison of Methods", in NA - Advances in Consumer Research Volume 04, eds. William D. Perreault, Jr., Atlanta, GA : Association for Consumer Research, Pages: 41-45.

Advances in Consumer Research Volume 4, 1977   Pages 41-45


Philippe Cattin, University of Massachusetts

Dick R. Wittink, Stanford University


An extensive body of research in marketing and related areas concerns the collection and analysis of data to assess preference formation by individuals. Alternative methods are available to analyze the information. Based on synthetic data, metric and nonmetric procedures are compared. It appears that metric procedures perform very well on "weak" (ordinal) data.


An extensive body of research in marketing and related areas concerns the collection and analysis of data to assess preference formation by individuals. The data can be collected by a variety of procedures. Similarly, alternative methods are available to analyze the information provided by individual respondents. However, little is known about how these methods compare under a variety of conditions. In addition to the recently developed body of techniques known as conjoint measurement, a researcher may consider traditional methods such as regression analysis. The attractiveness of conjoint measurement appears to be "...its capability of producing relatively 'sophisticated' results, typically scaled at the interval level, from rather 'primitive' data, normally consisting merely of rank order or paired comparison preference data" (Johnson, 1975b). Strictly speaking, metric procedures may not be appropriate for analyzing such weak data. Yet even though the information contained in the preference data is not as good as interval scaled data, the measurement may be better than ordinal. In this paper we will discuss alternative methods for data collection and analysis, and present some results based on synthetic data to illustrate the comparative performance of some available techniques.


There are basically two frameworks for collecting data to determine the effect of attributes by which objects can be described on the overall judgment of these objects. The concept evaluation approach consists of asking respondents to evaluate objects defined along all attributes. These hypothetical objects can be chosen such that the attributes are orthogonal (Green, 1974) or they can be so structured that the combinations of attribute levels result in perhaps more realistic objects by allowing some correlation between attributes (Parker and Srinivasan, 1975). Johnson has presented a tradeoff matrix approach which makes the respondents indicate a preference for hypothetical objects defined on two attributes at a time (Johnson, 1974). It is clear that the concept evaluation approach would become too cumbersome when the number of attributes becomes large. To deal with this, Green has suggested to use balanced incomplete block designs (Green, 1974). Even so, especially when objects have to be measured on a large number of attributes, the tradeoff matrix task may be easier to handle for many respondents. On the other hand, when only two attributes are considered at a time, it may seem vacuous to evaluate objects not defined on other relevant attributes. Yet, there is little evidence to suggest the existence of interaction effects in most empirical settings considered to date. Clearly, a lack of interaction suggests that the utility of a given attribute level does not depend on the level of other attributes. Indeed, Johnson has found no reason to believe that the tradeoff matrix approach is not appropriate for obtaining preference judgments (Johnson, 1974). Thus, although some people may feel comfortable arguing for either one of these frameworks on a priori grounds, it appears that an empirical study can offer additional guidelines for determining which framework to choose for a given study.

The preference data are typically collected by asking respondents either for a ranking of the hypothetical objects or for an evaluation by comparing all possible pairs of objects. These two procedures are algebraically equivalent. That is, n ranked objects can be transformed into n(n-1)/2 paired comparisons. Thus, whenever ranked data are obtained, we can also consider methods of analysis that assume the availability of paired comparison data. [In an actual setting, an individual may not provide paired comparison judgments that follow directly from the rank order data due to intransitivities.] The transformation involves something equivalent to assigning the value "one" to each preferred object and the value "zero" to each non-preferred object.


It has been generally agreed upon that respondents are not capable of providing more information than required by evaluating the objects on a rating scale, ranking the objects with no ties allowed, or paired comparisons. Clearly, such kinds of data do not possess the qualities necessary to attain measurement at an interval scale. This is precisely why conjoint measurement has been advocated extensively. Even so, the data may correspond quite closely to the underlying metric values. And to the extent that they do, metric procedures can be considered for determining the effects of attribute levels on preference. For example, for one particular set of synthetic data (Green and Rao, 1971, p. 356), the Pearson correlation coefficient between the metric and rank order data is 0.99. Unfortunately, in an empirical setting this is not known and we cannot determine how appropriate a metric procedure is for a given study. We can, however, compare metric and nonmetric procedures by considering alternative transformations of synthetically produced metric data, and compare the procedures on these data.

Johnson has suggested the following taxonomy of procedures:

(1) Conjoint Measurement, including MONANOVA (Kruskal, 1965) and a monotone regression procedure developed by Johnson (1975a);

(2) Linear Programming procedures, for example Srinivasan and Shocker (1973);

(3) Econometric Methods, such as Ordinary Least Squares (OLS);

(4) Stochastic Modeling, based on methods such as the logit and probit models (Berkson, 1955; Finney, 1964).

The nonmetric algorithms, known as conjoint measurement, minimize iteratively a measure of badness of fit such that the parameter estimates reproduce as closely as possible the ranks, scale values or paired comparisons used as input. That is, if Y represents a vector of ranks provided by an individual, then we wish to obtain Y=XB such that we minimize the violations in terms of ordered data. The parameter estimates, B, are adjusted in successive iterations to produce metric values whose rank order is as consistent as possible with judgments provided by the respondent. The nature of the measure of badness of fit is such that these procedures ". . .tend to produce solutions with many ties or near-ties in differences among estimated utilities" (Johnson, 1975b). Moreover, the solution provided by nonmetric algorithms may be a local rather than the global optimum.

Linear programming methods can also be categorized as nonmetric procedures. The objective considered is to minimize the amount or number of violations in terms of recovering the rank order or paired comparison judgments provided by a respondent. These methods differ from the conjoint measurement procedures in at least two respects:

a. the global optimum will be found, although there may be more than one solution (even if the data are not recovered exactly);

b. constraints can be imposed on the parameters.

Econometric methods can be considered by using dummy variables to represent the levels of attributes considered in the evaluation process. The justification for developing and using nonmetric procedures is that metric procedures are not appropriate for analyzing rank order and other "weak" data. However, such an argument suggests that measurement is the only criterion for determining the appropriateness of alternative methods. In fact, if the data do not have interval-scale properties but are better than ordinal, it is not at all clear that we should discard metric procedures. The point is that the criterion considered by procedures such as OLS (i.e., minimizing the squared deviations between actual and predicted values) may be more appropriate if the data are more "precise" than assumed by nonmetric algorithms. Even so, the standard errors should be interpreted with caution. Furthermore, statistical tests of the coefficients assume normality of the error terms. Of course, nonmetric algorithms do not allow for statistical inference so this is not relevant for comparing the methods for estimation purposes. We said earlier that for purposes of estimating the parameters, rank order data are algebraically equivalent to paired comparison data. Thus, we may perform regression analysis on rank order data, or convert them into paired comparisons and estimate by discriminant analysis (two-group discriminant analysis is equivalent to regression analysis on a binary variable). Appendix 1 shows the exact relationship between the parameter estimates obtained for the rank order and paired comparison data.

Finally, the logit model considers the probability, Pij, that stimulus i is preferred to stimulus j as a function of their utilities, yi and yj:


The probability Pij can actually be defined as a function of yi and yj in many different ways (McFadden, 1976). The logic and probit models are frequently used because of their simplicity. The logit model assumes Weibull distributed errors while the probit model assumes that the errors are normally distributed. The parameters of the model can be estimated by maximizing a likelihood function such as:


Basically, the maximum likelihood of the logit model looks beyond the ordinal properties of the data. As Johnson puts it: "The likelihood function 'continues to care about' each order relation, even after it is satisfied" (Johnson, 1975b). This procedure has other desirable properties, for example, the global optimum will be found.


It appears that the nonmetric procedures assume that the data are exactly at the ordinal scale level of measurement, while the econometric procedures assume the data to be exactly at the interval scale level. Actually, the data provided by respondents may be better than ordinal but less than interval. For that reason, it is of interest to compare the performance of the procedures on synthetic data. In a recent study, Cattin and Bliemel (1976) have compared one of the non-metric procedures, MONANOVA, with OLS. In the experimental design, they considered situations involving four and nine attributes, all dichotomous, a model with and without a disturbance term and four transformations of the metric data. Sixteen observations were used to estimate the parameters of the model and fifty replications were made for each cell in the design. To produce the data, the parameters were drawn from normal distribution, while the disturbances were obtained from a rectangular distribution. For the data containing error, the ratio of the variance due to the disturbance to the total variance was 25% in the four attribute model and 11% in the nine attribute model.

The first transformation involved a forced ranking of the objects based on the magnitude of the metric data. The other three transformations involved ratings and are illustrated in Figure 1. Graph A represents what might be termed a "fair respondent." That is, the transformation from metric data to a rating on a 1-7 scale is linear, except for rounding up to the closest integer. An "exaggerating respondent" could be characterized by graph B. Such a respondent tends to use the extreme values of the scale. Graph C shows an "indecisive respondent", who relies heavily on the values in the middle of the scale.

Graph A characterizes the best we can hope for when asking respondents to rate objects on a scale. One would expect metric procedures to do very well for such data since metricity is most nearly preserved. The two other transformations, graphs B and C, deviate from this in opposite directions. Here the assumption of interval scaled data is more clearly violated. To compare the ability of MONANOVA and OLS to recover the parameter values, the deviations of the coefficients from the parameter values were calculated. The sum of the absolute values of the deviations (errors) was then calculated across attributes and replications, resulting in the values presented in Table 1. For a more precise description of the calculations involved in comparing the two procedures, see Appendix 2.





The results tend to suggest that for error-free data MONANOVA recovers the parameters as well as or better than OLS. Often the results were identical for the two methods. This happens whenever the initial MONA-NOVA solution has zero stress, [See Kruskal (1965) for a definition of stress.] because the initial solution is equal to the OLS result. Whenever the model includes a disturbance term, OLS clearly outperforms MONANOVA. While this result was expected for the transformation indicated by a fair respondent, it appears that at least for the nonlinear transformations used by Cattin and Bliemel, metric procedures hold up very well. It remains to be seen whether OLS produces better results under more extremely nonlinear transformations. Of course, it is not known presently what transformation, if any, best characterizes actual behavior of respondents. Obviously, the results are also limited to the particular synthetic model used (parameters obtained from a normal distribution). Thus, before we can make recommendations about the use of rank order tasks versus ratings on a scale, it is necessary to consider other types of models, such as a model that heavily weighs one attribute (perhaps a lexicographic model).

Mansky (1975) has shown that a nonmetric procedure called "Maximum Score", which is designed to maximize the number of paired comparisons correctly predicted, may be outperformed by the logit model. And finally, Johnson compared his monotone regression procedure with MONANOVA (Johnson, 1975a). He found the two methods in substantial agreement on one set of synthetic data with a sizable perturbation. We have applied the logit model [Using the program "QUAIL" by Daniel McFadden et al., Institute of Transportation Studies, University of California, Berkeley.] and OLS to the same data and report all results (including Johnson's) in Table 2. For these data, it appears that OLS and the logit model are slightly superior to the nonmetric algorithms, in recovering the main effects. Table 2 shows the true main effects, the estimated main effects for each of the four methods and two measures to indicate the quality of the estimates. The root mean square error summarizes the distance between the true and estimated effects while the index of metric recovery measures the correlation between true and estimated effects. Of course, these results are not necessarily generalizable. It will be necessary to compare these




Advances in mathematical psychology have precipitated applications of conjoint measurement to problems in marketing and related fields. The attractiveness of nonmetric procedures appears to be the ability to obtain cardinal measurement from "weak" data. Yet, it is not clear whether metric procedures should not be used even when the data are not obtained at the interval scale level. Although we expected the metric procedures to be robust, we find that they may in fact outperform nonmetric procedures. Metric procedures also have an advantage in that the computer programs are more widely available and are more efficient in terms of run time. However, although a variety of conditions have been investigated for comparing metric and nonmetric procedures, it is not known at present how generalizable the results are. Research is currently in progress to compare the procedures further.




J. Berkson, "Maximum Likelihood and Minimum X2 Estimates of the Logistic Function," Journal of the American Statistical Association, 50 (1955), 130-162.

Philippe Cattin and Friedhelm Bliemel, "Problems in Conjoint Measurement: Estimation Procedures, Behavioral Assumptions and Data Collection Methods," Working Paper CP-375, Center for Research in Management Science, University of California, Berkeley, revised (1976).

D.J. Finney, Probit Analysis. (second edition). Cambridge University Press, 1964.

Paul E. Green, "On the Design of Choice Experiments Involving Multi-factor Alternatives," Journal of Consumer Research, 1 (September 1974), 61-68.

Paul E. Green, and Vithala R. Rao, "Conjoint Measurement for Quantifying Judgmental Data," Journal of Marketing Research, 8 (August 1971), 355-363.

Richard M. Johnson, "Trade-Off Analysis of Consumer Values," Journal of Marketing Research, 11 (May 1974), 121-127.

Richard M. Johnson, "A Simple Method for Pairwise Monotone Regression," Psychometrika, 40 (June 1975), 163-168 (a).

Richard M. Johnson, "Beyond Conjoint Measurement: A Method of Pairwise Trade-Off Analysis," paper presented at the Sixth Annual Conference of the Association for Consumer Research, 1975 (h).

Joseph B. Kruskal, "Analysis of Factorial-Experiments by Estimating Monotone Transformations of the Data," Journal of the Royal Statistical Society, Series B, 27 (1965), 251-263.

Charles F. Mansky, "Maximum Score Estimation of the Stochastic Utility Model of Choice," Journal of Econometrics, 3 (1975), 205-228.

Daniel McFadden, "Quantal Choice Analysis: A Survey," Annals of Economic and Social Measurement, forthcoming (1976).

Barnett R. Parker and V. Srinivasan, "A Consumer Preference Approach to the Planning of Rural Primary Health Care Facilities," Research Paper No. 271, Graduate School of Business, Stanford University, 1975.

V. Srinivasan and Allan D. Shocker, "Estimating the Weights for Multiple Attributes in a Composite Criterion Using Pairwise Judgments," Psychometrika, 38 (December 1973), 473-493.