# Regression Versus Interpolation in Conjoint Analysis

^{[ to cite ]:}

Dov Pekelman and Subrata Sen (1977) ,"Regression Versus Interpolation in Conjoint Analysis", in NA - Advances in Consumer Research Volume 04, eds. William D. Perreault, Jr., Atlanta, GA : Association for Consumer Research, Pages: 29-34.

^{[ direct url ]:}

http://acrwebsite.org/volumes/9324/volumes/v04/NA-04

It is a common practice in conjoint analysis to calculate utilities for several __discrete__ attribute levels and then use linear interpolation to determine utilities for __other__ attribute levels. For __continuous attributes__, a preferred alternative to linear interpolation might be the use of __utility functions__. Using __regression analysis__ to estimate the utility function, this paper presents an analytic procedure to numerically determine the usefulness of the utility function approach relative to interpolation.

INTRODUCTION

The use of conjoint analysis to evaluate new product concepts is an interesting development in marketing research (see Green and Wind, 1973, pp. 54-59, 114-126, and Johnson, 1974, for descriptions of the technique and Davidson, 1973, Fiedler, 1972, and Green and Wind, 1973, pp. 133-215, for soma marketing applications). As typically used in marketing, conjoint analysis determines utilities for various __discrete__ levels of the attributes used by a consumer in evaluating product alternatives. For example, suppose that "Top Speed" is one of the attributes used by a consumer in evaluating automobiles and that three levels of "Top Speed" are used to generate the required rank-ordered input data from the respondent. If the three levels are, say, 80, 100, and 120 miles per hour, one obtains as output the consumer's utilities for these three levels of "Top Speed." Now suppose that a proposed automobile brand has a "Top Speed" of 90 miles per hour and it is necessary to determine the consumer's utility for this level of "Top Speed." If the attribute is __continuous__ (as "Top Speed" is), the consumer's utility for a "Top Speed" of 90 miles per hour is generally determined by a __linear interpolation__ between the utilities estimated for 80 and 100 miles per hour.

Such interpolation is shown to be unsatisfactory by Pekelman and Sen (1975) who suggest, instead, a technique to estimate utility __functions__ (for continuous attributes) instead of estimating utilities for a few discrete levels of each attribute. The utility function approach makes it possible to estimate utilities for __any level__ of the attribute __within the range__ of the discrete levels used to generate the input data.

While Pekelman and Sen (1975) show that the utility function approach has certain theoretical advantages over existing conjoint analysis techniques, they provide no empirical data to support their position. Hence, it is necessary to inquire whether the utility function approach is, in fact, amore accurate estimator of consumer utilities. This question can be answered by conducting simulations (supplemented by tests on actual consumer data) comparing the performance of the Utility Function approach with, say, the MONANOVA approach for discrete utility levels (Kruskal, 1965). The number of attributes, the number of levels per attribute, the underlying utility functions, etc., could all be varied in the simulations. Each method could be evaluated in terms of the degree to which it can predict the rank ordering of a set of __holdout__ product concepts using a measure such as the Kendall t (see, for example, Winkler and Hays, 1975, pp. 871-874).

The simulation outlined above is likely to be time consuming and expensive. Therefore, in order to obtain some insight into the general problem of interpolation versus the estimation of a utility function, we replace the __conjoint utility function approach__ with __regression analysis__ and compare the performance of interpolation techniques with that of regression. Regression analysis enables us to investigate this issue analytically. Besides, we do not expect the __direction__ of the results to change if we use conjoint measurement instead of regression.

Before proceeding further, it is important to emphasize that the issue of interpolation does not arise if __all__ the attributes are categorical. In other words, if the original attribute levels refer to nominal entities such as types of entrees and types of desserts (Green and Wind, 1973, pp. 159-185), it does not make sense to define a new product concept containing attribute levels other than the original levels used in the analysis. If, on the other hand, the attributes are continuous attributes such as "percent discount," "number of stores," and "$ cost of card" (which are three attributes of retail discount cards analyzed in Green and Wind, 1973, pp. 133-158), it is certainly reasonable to ask what the consumer's utility will be for discount cards not defined by the original levels of the three attributes. More generally, conjoint analysis problems involve both continuous and categorical attributes. For example, Green (1974) describes a problem dealing with trans-Atlantic air travel which involves two categorical attributes (e.g., airline, represented by four discrete levels: TWA, BOAC, Pan Am, and Air France) and seven continuous attributes (e.g., arrival time punctuality, anticipated plane load, etc.). For such problems, it is a simple matter to use a variation of the Pekelman and Sen (1975) approach to estimate utility functions for the continuous attributes and specific utility values for the discrete levels of the categorical attributes. Clearly, the interpolation issue remains important for the continuous attributes of such problems. We see, therefore, that except for the case where __all__ the attributes of a conjoint analysis problem are categorical in nature, the issue of interpolation versus the estimation of a utility function remains a matter of importance.

COMPARING REGRESSION WITH LINEAR INTERPOLATION

Preliminary Considerations

To compare interpolation with regression, we will deal with only a single attribute and will assume that we have available a sample of n observations: (X_{1}, Y_{1}), (X_{2}, Y_{2}),. . ., (X_{n}, Y_{n}). X_{i} is the i^{th} level of the attribute and Y_{i} is the corresponding measured utility. We now wish to predict the utilities for each of p __new__ levels of the attribute: X_{a}, X_{b},. . . X_{p} (such that X_{j} ? X_{i} for j = a, b, . . ., p and i = 1, 2,...,n).

These predictions will be made using (1) interpolation, and (2) the regression function estimated from the sample observations: (X_{1}, Y_{1}) ..... (X_{n}, Y_{n}). Now, if we were using conjoint analysis to make the utility predictions, the only standard that would be available to judge the accuracy of the predictions would be the respondent's __rank ordering__ of the utilities of the p new levels of the attribute. Thus, the accuracy of the predictions made by conjoint analysis can be evaluated only by comparing the __rank order__ of the predicted utilities with the rank order stated by the respondent. Therefore, to provide a basis for comparison with conjoint analysis, the predictions for both schemes (interpolation and regression) will be converted into a __rank ordering__ of the predicted utilities: Y_{a}, . . ., Y_{p}, for the p new levels of the attribute. Assuming that we know the __true__ rank order of the utilities for X_{a}, X_{b}, . . ., X_{p}, we can compute Kendall t to measure the predictive accuracy of each scheme. The two schemes can then be compared in terms of these computed t's. We now describe the details of the procedure outlined above.

Rank Order Prediction Using Kendall's t

The underlying utility function which generates the n observations: (X_{1}, Y_{1}), (X_{2}, Y_{2}), ..... (X_{n}, Y_{n}) is assumed to be of the following quadratic form (see Pekelman and Sen, 1975, for a discussion of the advantages of using a quadratic utility function):

Y = qX + rX^{2} + e (1)

where e is the error in measuring Y. It is assumed that e is normally distributed, E(e) = 0, and Var(e) = s^{2} for all values of X.

Now, let A, B, and C be three new product concepts (i.e., p = 3 in this example) characterized by their predicted utility values: Y_{a}, Y_{b}, and Y_{c}. Y_{a}, Y_{b}, and Y_{c} have means: m_{a}, m_{b}, and m_{c} and variances s_{a}^{2}, s_{b}^{2}, and s_{c}^{2}, respectively. The means: m_{a}, m_{b}, and m_{c}, express the __true__ utility values of A, B, and C. We assume that A > B > C, so that m_{a} > m_{b} > m_{c} If the predictions: Y_{a}, Y_{b}, and Y_{c} are obtained by using either interpolation or regression, Y_{a}, Y_{b}, and Y_{c} will be normally distributed given the assumption regarding the error term in (1). The variances: s_{a}^{2}, s_{b}^{2}, and s_{c}^{2}, will depend upon whether the predictions are made using linear interpolation or regression (see Appendixes A and B for specific formulae).

Now, A, B, and C can be rank-ordered in 3! = 6 possible ways as shown in Table 1. The six possible rank orders can result in 0, 1, 2, or 3 violations of the true rank order of A, B, and C (see Table 1). For a particular number of violations of the true rank ordering of p objects, Kendall's t takes on a specific value. The t values for 0, 1, 2, and 3 violations of the true rank ordering of three objects are shown in Table 2. Hence, by calculating the probabilities of 0, 1, 2, and 3 violations, we call determine the __distribution__ of t for the two predictive schemes. A comparison of the distributions or the mean values of t will indicate the relative effectiveness of the two schemes in predicting the rank order of a set of p new attribute levels.

POSSIBLE RANK ORDERS AND NUMBER OF VIOLATIONS FOR THREE OBJECTS

VALUES OF KENDALL t FOR n_{v} VIOLATIONS OF THE RANK ORDERING OF THREE OBJECTS

Calculation of the probabilities of n_{v} (where n_{v} = 0, 1, 2, 3) violations is done by combining the probabilities of the appropriate rank orders. From Table 1, the appropriate relationships are as shown below:

Pr(n_{v} = 0) = Pr [ABC]

Pr(n_{v} = 1) = Pr [ACB] + Pr [BAC]

Pr(n_{v} = 2) = Pr [BCA] + Pr [CAB]

Pr(n_{v} = 3) = Pr [CBA]

Our task, therefore, boils down to the computation of the probability of each possible rank ordering of the three objects.

Computing the Probability of Specific Rank Orders

We will indicate how the probabilities of the various rank orders are calculated by taking the specific example of the rank order: [BAC].

Pr [BAC] = Pr{ (Y_{b} - Y_{a}) __>__ 0 and (Y_{a} - Y_{c}) __>__ 0}

= Pr (s_{1} __>__ 0 and s_{2} __>__ 0) where

s_{1} = (Y_{b} - Y_{a}) and s_{2} = (Y_{a} - Y_{c})

Since Y_{a}, Y_{b}, and Y_{c} are normally distributed, s_{1} and s_{2} are also normally distributed. Further, the mean of s_{1} is given by (m_{b} - m_{a}) while the mean of s_{2} is given by (m_{a} - m_{c}). However, in order to compute the Pr(s_{1} $ 0 and s_{2} $ 0), we also need to determine S, the variance-covariance matrix for [s_{1}s_{2]}. S will depend upon the specific prediction scheme employed: linear interpolation or regression. The specific expressions for the elements of S are developed in Appendix A for Regression and in Appendix B for Interpolation.

Knowing that s_{1} and s_{2} are normally distributed with variance-covariance matrix S and means (m_{b} - m_{a}) and (m_{a} - m_{c}) respectively, we can easily calculate the Pr(s_{1} $ 0 and s_{2} $ 0) (using numerical integration or, in the case of our simple example, tables of the bivariate normal distribution), and hence, the Pr [BAC] . We can compute the probabilities of the other five rank orders in a similar manner.

It is now a simple matter to compare the two prediction schemes in terms of the mean and variance of Kendall t for any specific set of input data: (a) the n observations: (X_{1}, Y_{1}), (X_{2}, Y_{2}), ..... (X_{n}, Y_{n}), (b) a numerical value for s^{2} in equation (1), (c) values of the parameters (i.e., q and r) of the utility function postulated in equation (1), and (d) a set of "holdout" product concepts defined by attribute levels: X_{a}, X_{b}, . . ., X_{p}.

The results presented above have primarily been for three "holdout" product concepts. Theoretically, the results can be easily generalized to p (where p > 3) "holdout" concepts. Practically, it would be difficult to deal with cases where p > 4 since integration of multivariate normal distributions with more than three dimensions is not an easy task. However, for most marketing applications, a practical limitation of four new concepts does not appear to be overly restrictive.

We now present a method of utilizing the computed Kendall T values for Regression and Interpolation. Our objective is to evaluate the relative desirability of the two prediction schemes in terms of the impact on a company's market share.

PERFORMANCE OF INTERPOLATION AND REGRESSION IN TERMS OF MARKET SHARE

Consider the situation in which three new product concepts are being considered for possible production, and management is interested in determining the most preferred concept. Assume that the __true__ ranking is A > B > C and that the potential market shares of the three products: MS_{a}, MS_{b}, and MS_{c} are proportional to their true preference values: m_{a}, m_{b}, and m_{c}. In other words, MS_{a}/MS_{b} = m_{a}/m_{b}, MS_{a}/MS_{c} = m_{a}/m_{c}, MS_{b}/MS_{c} = m_{b}/m_{c}. We now show how the two prediction schemes can be compared in terms of interval estimates of the expected market share.

To obtain these interval estimates, we proceed as follows. If we predict [ABC] or [ACB] we will choose product concept A and our market share will be MS_{a}. Similarly, predicting [BAC] or [BCA] will result in a market share of MS_{b} while a market share of MS_{c} will he obtained if we predict [CAB] or [CBA] . Let P_{a}, P_{b}, and P_{c} denote the respective probabilities of predicting that product concepts A, B, and C will be the most preferred. P_{a}, P_{b}, and P_{c} can be calculated from the probabilities of occurrence of each of the six possible rank orders. For example, Pa = P[ABC] + P[ACB]. Since we know how to compute the probabilities of the various rank orders, we know how to compute P_{a}, P_{b}, and P_{c}. Knowing P_{a}, P_{b}, and P_{c} we can compute MS and Var(MS), the mean and the variance of the market share for both estimation schemes.

MS = P_{a}MS_{a} = P_{b}MS_{b} + P_{c}MS_{c}

A simple numerical example will indicate how the two estimation schemes can be compared on the basis of MS and Var(MS). Let m_{a} = 8, m_{b} = 5, and m_{c} = 3. P_{a}, P_{b}, and P_{c} are assumed to be:

Using (2), we compute MS for the two schemes as follows:

Regression: MS = MS_{a}/8 {(0.9)(8) + (0.07)(5) + (0.03)(3)} = 0.955 MS_{a}

Linear Interpolation: MS = MS_{a}/8 {(0.7)(8) + (0.2)(5) + (0.1)(3)} = 0.862 MS_{a}

Using (4), we compute Var(MS) for the two schemes as follows:

Regression: Var(MS) = MS^{2}_{a} {(0.9)(1 - 0.955)2 + (0.07)(5/8 - 0.955)2 + (0.3)(3/8 - 0.955)2} = 0.0195 MS_{a}

Interpolation: Var(MS) = MS^{2}_{a} {(0.7)(1 - 0.862)2 + (0.2)(5/8 - 0.862)2 + (0.1)(3/8 - 0.862)2} = 0.048 MS_{a}

Converting the variances into standard deviations, s(MS), we can compute confidence intervals for the two schemes,

Regression: s(MS) = 0.14 MS_{a}

Confidence Interval for two standard deviations

= MS + (2)(0.14 MS_{a}) = (0.955 MS_{a} + 0.28 MS_{a})

= (1.235 MS_{a}, 0.675 MS_{a}).

Interpolation: s(MS) = 0.22 MS_{a}

Confidence Interval for two standard deviations

= MS + (2)(0.22 MS_{a}) = (0.862 MS_{a} + 0.44 MS_{a})

= (1.302 MS_{a}, 0.422 MS_{a})

These confidence intervals are plotted in Figure 1. Looking at Figure 1, we see that in this numerical example, the Regression prediction provides us with a higher mean market share and a lower variance of the market share. Compared to Interpolation, there is roughly a 10 percent increase in expected market share while the variance is reduced by about 59 percent. It would appear that in this example, Regression is preferred on both counts.

CONFIDENCE INTERVALS FOR REGRESSION AND INTERPOLATION

SUMMARY

An alternative to the common practice of calculating utilities for several __discrete__ attribute levels is the estimation of utility __functions__ for continuous attributes. The utility function approach enables us to compute utilities for attribute levels not used to generate the input data and appears lo have several theoretical advantages over the conventional approach of interpolating between discrete attribute levels. A procedure to numerically determine the usefulness of estimating a utility function by comparing the predictive perform-antes of regression analysis and linear interpolation was presented in this paper. Regression analysis was used in this comparison instead of conjoint measurement because (1) the comparison could be made analytically, and (2) we would obtain an estimate of the magnitude of improvement that could be expected from the use of utility functions.

VARIANCE - COVARIANCE MATRIX FOR REGRESSION ANALYSIS

VARIANCE - COVARIANCE MATRIX FOR INTERPOLATION

REFERENCES

J. D. Davidson, "Forecasting Traffic on STOL," __Operational Research Quarterly__, 24 (December, 1973), 561-569.

John A. Fiedler, "Condominium Design and Pricing," in Venkatesan, M., ed., __Proceedings of the Third Annual Conference__ (Association for Consumer Research, 1972), 279-293.

Paul E. Green, "On the Design of Choice Experiments Involving Multifactor Alternatives," __Journal of Consumer Research__, 1 (September, 1974), 61-68.

Paul E. Green and Yoram Wind, __Multiattribute Decisions in Marketing__, Hinsdale, Illinois: The Dryden Press, 1973.

Richard M. Johnson, "Trade-Off Analysis of Consumer Values," __Journal of Marketing Research__, 11 (May, 1974), 121-127.

Jan Kmenta, __Elements of Econometrics__, New York, New York: Macmillan, 1971.

Joseph B. Kruskal, "Analysis of Factorial Experiments by Estimating Monotone Transformations of the Data," __Journal of the Royal Statistical Society__, Series B, 27 (1965), 251-263.

Dov Pekelman and Subrata K. Sen, "Utility Function Estimation in Conjoint Measurement," in Ronald C. Curhan, ed., __1974 Combined Proceedings__ (Chicago: American Marketing Association, 1975), 156-161.

Robert L. Winkler and William L. Hays, __Statistics: Probability Inference, and Decision__ (2nd ed.), New York: Holt, Rinehart, and Winston, 1975.

----------------------------------------

Tweet
window.twttr = (function (d, s, id) { var js, fjs = d.getElementsByTagName(s)[0], t = window.twttr || {}; if (d.getElementById(id)) return; js = d.createElement(s); js.id = id; js.src = "https://platform.twitter.com/widgets.js"; fjs.parentNode.insertBefore(js, fjs); t._e = []; t.ready = function (f) { t._e.push(f); }; return t; } (document, "script", "twitter-wjs"));