Regression Versus Interpolation in Conjoint Analysis

Dov Pekelman, University of Pennsylvania
Subrata Sen, University of Rochester
ABSTRACT - It is a common practice in conjoint analysis to calculate utilities for several discrete attribute levels and then use linear interpolation to determine utilities for other attribute levels. For continuous attributes, a preferred alternative to linear interpolation might be the use of utility functions. Using regression analysis to estimate the utility function, this paper presents an analytic procedure to numerically determine the usefulness of the utility function approach relative to interpolation.
[ to cite ]:
Dov Pekelman and Subrata Sen (1977) ,"Regression Versus Interpolation in Conjoint Analysis", in NA - Advances in Consumer Research Volume 04, eds. William D. Perreault, Jr., Atlanta, GA : Association for Consumer Research, Pages: 29-34.

Advances in Consumer Research Volume 4, 1977    Pages 29-34

REGRESSION VERSUS INTERPOLATION IN CONJOINT ANALYSIS

Dov Pekelman, University of Pennsylvania

Subrata Sen, University of Rochester

ABSTRACT -

It is a common practice in conjoint analysis to calculate utilities for several discrete attribute levels and then use linear interpolation to determine utilities for other attribute levels. For continuous attributes, a preferred alternative to linear interpolation might be the use of utility functions. Using regression analysis to estimate the utility function, this paper presents an analytic procedure to numerically determine the usefulness of the utility function approach relative to interpolation.

INTRODUCTION

The use of conjoint analysis to evaluate new product concepts is an interesting development in marketing research (see Green and Wind, 1973, pp. 54-59, 114-126, and Johnson, 1974, for descriptions of the technique and Davidson, 1973, Fiedler, 1972, and Green and Wind, 1973, pp. 133-215, for soma marketing applications). As typically used in marketing, conjoint analysis determines utilities for various discrete levels of the attributes used by a consumer in evaluating product alternatives. For example, suppose that "Top Speed" is one of the attributes used by a consumer in evaluating automobiles and that three levels of "Top Speed" are used to generate the required rank-ordered input data from the respondent. If the three levels are, say, 80, 100, and 120 miles per hour, one obtains as output the consumer's utilities for these three levels of "Top Speed." Now suppose that a proposed automobile brand has a "Top Speed" of 90 miles per hour and it is necessary to determine the consumer's utility for this level of "Top Speed." If the attribute is continuous (as "Top Speed" is), the consumer's utility for a "Top Speed" of 90 miles per hour is generally determined by a linear interpolation between the utilities estimated for 80 and 100 miles per hour.

Such interpolation is shown to be unsatisfactory by Pekelman and Sen (1975) who suggest, instead, a technique to estimate utility functions (for continuous attributes) instead of estimating utilities for a few discrete levels of each attribute. The utility function approach makes it possible to estimate utilities for any level of the attribute within the range of the discrete levels used to generate the input data.

While Pekelman and Sen (1975) show that the utility function approach has certain theoretical advantages over existing conjoint analysis techniques, they provide no empirical data to support their position. Hence, it is necessary to inquire whether the utility function approach is, in fact, amore accurate estimator of consumer utilities. This question can be answered by conducting simulations (supplemented by tests on actual consumer data) comparing the performance of the Utility Function approach with, say, the MONANOVA approach for discrete utility levels (Kruskal, 1965). The number of attributes, the number of levels per attribute, the underlying utility functions, etc., could all be varied in the simulations. Each method could be evaluated in terms of the degree to which it can predict the rank ordering of a set of holdout product concepts using a measure such as the Kendall t (see, for example, Winkler and Hays, 1975, pp. 871-874).

The simulation outlined above is likely to be time consuming and expensive. Therefore, in order to obtain some insight into the general problem of interpolation versus the estimation of a utility function, we replace the conjoint utility function approach with regression analysis and compare the performance of interpolation techniques with that of regression. Regression analysis enables us to investigate this issue analytically. Besides, we do not expect the direction of the results to change if we use conjoint measurement instead of regression.

Before proceeding further, it is important to emphasize that the issue of interpolation does not arise if all the attributes are categorical. In other words, if the original attribute levels refer to nominal entities such as types of entrees and types of desserts (Green and Wind, 1973, pp. 159-185), it does not make sense to define a new product concept containing attribute levels other than the original levels used in the analysis. If, on the other hand, the attributes are continuous attributes such as "percent discount," "number of stores," and "$ cost of card" (which are three attributes of retail discount cards analyzed in Green and Wind, 1973, pp. 133-158), it is certainly reasonable to ask what the consumer's utility will be for discount cards not defined by the original levels of the three attributes. More generally, conjoint analysis problems involve both continuous and categorical attributes. For example, Green (1974) describes a problem dealing with trans-Atlantic air travel which involves two categorical attributes (e.g., airline, represented by four discrete levels: TWA, BOAC, Pan Am, and Air France) and seven continuous attributes (e.g., arrival time punctuality, anticipated plane load, etc.). For such problems, it is a simple matter to use a variation of the Pekelman and Sen (1975) approach to estimate utility functions for the continuous attributes and specific utility values for the discrete levels of the categorical attributes. Clearly, the interpolation issue remains important for the continuous attributes of such problems. We see, therefore, that except for the case where all the attributes of a conjoint analysis problem are categorical in nature, the issue of interpolation versus the estimation of a utility function remains a matter of importance.

COMPARING REGRESSION WITH LINEAR INTERPOLATION

Preliminary Considerations

To compare interpolation with regression, we will deal with only a single attribute and will assume that we have available a sample of n observations: (X1, Y1), (X2, Y2),. . ., (Xn, Yn). Xi is the ith level of the attribute and Yi is the corresponding measured utility. We now wish to predict the utilities for each of p new levels of the attribute: Xa, Xb,. . . Xp (such that Xj ? Xi for j = a, b, . . ., p and i = 1, 2,...,n).

These predictions will be made using (1) interpolation, and (2) the regression function estimated from the sample observations: (X1, Y1) ..... (Xn, Yn). Now, if we were using conjoint analysis to make the utility predictions, the only standard that would be available to judge the accuracy of the predictions would be the respondent's rank ordering of the utilities of the p new levels of the attribute. Thus, the accuracy of the predictions made by conjoint analysis can be evaluated only by comparing the rank order of the predicted utilities with the rank order stated by the respondent. Therefore, to provide a basis for comparison with conjoint analysis, the predictions for both schemes (interpolation and regression) will be converted into a rank ordering of the predicted utilities: Ya, . . ., Yp, for the p new levels of the attribute. Assuming that we know the true rank order of the utilities for Xa, Xb, . . ., Xp, we can compute Kendall t to measure the predictive accuracy of each scheme. The two schemes can then be compared in terms of these computed t's. We now describe the details of the procedure outlined above.

Rank Order Prediction Using Kendall's t

The underlying utility function which generates the n observations: (X1, Y1), (X2, Y2), ..... (Xn, Yn) is assumed to be of the following quadratic form (see Pekelman and Sen, 1975, for a discussion of the advantages of using a quadratic utility function):

Y = qX + rX2 + e   (1)

where e is the error in measuring Y. It is assumed that e is normally distributed, E(e) = 0, and Var(e) = s2 for all values of X.

Now, let A, B, and C be three new product concepts (i.e., p = 3 in this example) characterized by their predicted utility values: Ya, Yb, and Yc. Ya, Yb, and Yc have means: ma, mb, and mc and variances sa2, sb2, and sc2, respectively. The means: ma, mb, and mc, express the true utility values of A, B, and C. We assume that A > B > C, so that ma > mb > mc If the predictions: Ya, Yb, and Yc are obtained by using either interpolation or regression, Ya, Yb, and Yc will be normally distributed given the assumption regarding the error term in (1). The variances: sa2, sb2, and sc2, will depend upon whether the predictions are made using linear interpolation or regression (see Appendixes A and B for specific formulae).

Now, A, B, and C can be rank-ordered in 3! = 6 possible ways as shown in Table 1. The six possible rank orders can result in 0, 1, 2, or 3 violations of the true rank order of A, B, and C (see Table 1). For a particular number of violations of the true rank ordering of p objects, Kendall's t takes on a specific value. The t values for 0, 1, 2, and 3 violations of the true rank ordering of three objects are shown in Table 2. Hence, by calculating the probabilities of 0, 1, 2, and 3 violations, we call determine the distribution of t for the two predictive schemes. A comparison of the distributions or the mean values of t will indicate the relative effectiveness of the two schemes in predicting the rank order of a set of p new attribute levels.

TABLE 1

POSSIBLE RANK ORDERS AND NUMBER OF VIOLATIONS FOR THREE OBJECTS

TABLE 2

VALUES OF KENDALL t FOR nv VIOLATIONS OF THE RANK ORDERING OF THREE OBJECTS

Calculation of the probabilities of nv (where nv = 0, 1, 2, 3) violations is done by combining the probabilities of the appropriate rank orders. From Table 1, the appropriate relationships are as shown below:

Pr(nv = 0) = Pr [ABC]

Pr(nv = 1) = Pr [ACB] + Pr [BAC]

Pr(nv = 2) = Pr [BCA] + Pr [CAB]

Pr(nv = 3) = Pr [CBA]

Our task, therefore, boils down to the computation of the probability of each possible rank ordering of the three objects.

Computing the Probability of Specific Rank Orders

We will indicate how the probabilities of the various rank orders are calculated by taking the specific example of the rank order: [BAC].

Pr [BAC] = Pr{ (Yb - Ya) > 0 and (Ya - Yc) > 0}

                = Pr (s1 > 0 and s2 > 0) where

s1 = (Yb - Ya) and  s2 = (Ya - Yc)

Since Ya, Yb, and Yc are normally distributed, s1 and s2 are also normally distributed. Further, the mean of s1 is given by (mb - ma) while the mean of s2 is given by (ma - mc). However, in order to compute the Pr(s1 $ 0 and s2 $ 0), we also need to determine S, the variance-covariance matrix for [s1s2]. S will depend upon the specific prediction scheme employed: linear interpolation or regression. The specific expressions for the elements of S are developed in Appendix A for Regression and in Appendix B for Interpolation.

Knowing that s1 and s2 are normally distributed with variance-covariance matrix S and means (mb - ma) and (ma - mc) respectively, we can easily calculate the Pr(s1 $ 0 and s2 $ 0) (using numerical integration or, in the case of our simple example, tables of the bivariate normal distribution), and hence, the Pr  [BAC] . We can compute the probabilities of the other five rank orders in a similar manner.

It is now a simple matter to compare the two prediction schemes in terms of the mean and variance of Kendall t for any specific set of input data: (a) the n observations: (X1, Y1), (X2, Y2), ..... (Xn, Yn), (b) a numerical value for s2 in equation (1), (c) values of the parameters (i.e., q and r) of the utility function postulated in equation (1), and (d) a set of "holdout" product concepts defined by attribute levels: Xa, Xb, . . ., Xp.

The results presented above have primarily been for three "holdout" product concepts. Theoretically, the results can be easily generalized to p (where p > 3) "holdout" concepts. Practically, it would be difficult to deal with cases where p > 4 since integration of multivariate normal distributions with more than three dimensions is not an easy task. However, for most marketing applications, a practical limitation of four new concepts does not appear to be overly restrictive.

We now present a method of utilizing the computed Kendall T values for Regression and Interpolation. Our objective is to evaluate the relative desirability of the two prediction schemes in terms of the impact on a company's market share.

PERFORMANCE OF INTERPOLATION AND REGRESSION IN TERMS OF MARKET SHARE

Consider the situation in which three new product concepts are being considered for possible production, and management is interested in determining the most preferred concept. Assume that the true ranking is A > B > C and that the potential market shares of the three products: MSa, MSb, and MSc are proportional to their true preference values: ma, mb, and mc. In other words, MSa/MSb = ma/mb, MSa/MSc = ma/mc, MSb/MSc = mb/mc. We now show how the two prediction schemes can be compared in terms of interval estimates of the expected market share.

To obtain these interval estimates, we proceed as follows. If we predict [ABC] or [ACB] we will choose product concept A and our market share will be MSa. Similarly, predicting [BAC] or [BCA] will result in a market share of MSb while a market share of MSc will he obtained if we predict [CAB] or [CBA] . Let Pa, Pb, and Pc denote the respective probabilities of predicting that product concepts A, B, and C will be the most preferred. Pa, Pb, and Pc can be calculated from the probabilities of occurrence of each of the six possible rank orders. For example, Pa = P[ABC] + P[ACB]. Since we know how to compute the probabilities of the various rank orders, we know how to compute Pa, Pb, and Pc. Knowing Pa, Pb, and Pc we can compute MS and Var(MS), the mean and the variance of the market share for both estimation schemes.

MS = PaMSa = PbMSb + PcMSc

EQUATION   (2)  -  EQUATION   (4)

A simple numerical example will indicate how the two estimation schemes can be compared on the basis of MS and Var(MS). Let ma = 8, mb = 5, and mc = 3. Pa, Pb, and Pc are assumed to be:

TABLE

Using (2), we compute MS for the two schemes as follows:

Regression: MS = MSa/8 {(0.9)(8) + (0.07)(5) + (0.03)(3)} = 0.955 MSa

Linear Interpolation: MS = MSa/8 {(0.7)(8) + (0.2)(5) + (0.1)(3)} = 0.862 MSa

Using (4), we compute Var(MS) for the two schemes as follows:

Regression: Var(MS) = MS2a {(0.9)(1 - 0.955)2 + (0.07)(5/8 - 0.955)2 + (0.3)(3/8 - 0.955)2} = 0.0195 MSa

Interpolation: Var(MS) = MS2a {(0.7)(1 - 0.862)2 + (0.2)(5/8 - 0.862)2 + (0.1)(3/8 - 0.862)2} = 0.048 MSa

Converting the variances into standard deviations, s(MS), we can compute confidence intervals for the two schemes,

Regression: s(MS) = 0.14 MSa

Confidence Interval for two standard deviations

= MS + (2)(0.14 MSa) = (0.955 MSa + 0.28 MSa)

= (1.235 MSa, 0.675 MSa).

Interpolation: s(MS) = 0.22 MSa

Confidence Interval for two standard deviations

= MS + (2)(0.22 MSa) = (0.862 MSa + 0.44 MSa)

= (1.302 MSa, 0.422 MSa)

These confidence intervals are plotted in Figure 1. Looking at Figure 1, we see that in this numerical example, the Regression prediction provides us with a higher mean market share and a lower variance of the market share. Compared to Interpolation, there is roughly a 10 percent increase in expected market share while the variance is reduced by about 59 percent. It would appear that in this example, Regression is preferred on both counts.

FIGURE 1

CONFIDENCE INTERVALS FOR REGRESSION AND INTERPOLATION

SUMMARY

An alternative to the common practice of calculating utilities for several discrete attribute levels is the estimation of utility functions for continuous attributes. The utility function approach enables us to compute utilities for attribute levels not used to generate the input data and appears lo have several theoretical advantages over the conventional approach of interpolating between discrete attribute levels. A procedure to numerically determine the usefulness of estimating a utility function by comparing the predictive perform-antes of regression analysis and linear interpolation was presented in this paper. Regression analysis was used in this comparison instead of conjoint measurement because (1) the comparison could be made analytically, and (2) we would obtain an estimate of the magnitude of improvement that could be expected from the use of utility functions.

APPENDIX A

VARIANCE - COVARIANCE MATRIX FOR REGRESSION ANALYSIS

APPENDIX B

VARIANCE - COVARIANCE MATRIX FOR INTERPOLATION

SECTION 1

SECTION 2

SECTION 3

REFERENCES

J. D. Davidson, "Forecasting Traffic on STOL," Operational Research Quarterly, 24 (December, 1973), 561-569.

John A. Fiedler, "Condominium Design and Pricing," in Venkatesan, M., ed., Proceedings of the Third Annual Conference (Association for Consumer Research, 1972), 279-293.

Paul E. Green, "On the Design of Choice Experiments Involving Multifactor Alternatives," Journal of Consumer Research, 1 (September, 1974), 61-68.

Paul E. Green and Yoram Wind, Multiattribute Decisions in Marketing, Hinsdale, Illinois: The Dryden Press, 1973.

Richard M. Johnson, "Trade-Off Analysis of Consumer Values," Journal of Marketing Research, 11 (May, 1974), 121-127.

Jan Kmenta, Elements of Econometrics, New York, New York: Macmillan, 1971.

Joseph B. Kruskal, "Analysis of Factorial Experiments by Estimating Monotone Transformations of the Data," Journal of the Royal Statistical Society, Series B, 27 (1965), 251-263.

Dov Pekelman and Subrata K. Sen, "Utility Function Estimation in Conjoint Measurement," in Ronald C. Curhan, ed., 1974 Combined Proceedings (Chicago: American Marketing Association, 1975), 156-161.

Robert L. Winkler and William L. Hays, Statistics: Probability Inference, and Decision (2nd ed.), New York: Holt, Rinehart, and Winston, 1975.

----------------------------------------