Some Findings on the Estimatino of Continuous Utility Functions in Conjiont Analysis

Philippe Cattin, University of Connecticut
ABSTRACT - Several functions can be used for representing the utility function of interval-scaled attributes, including the part worth function and continuous functions. Pekelman and Sen have shown that a quadratic function can improve predictions over a part worth function. But then, one can often use a linear function instead of a quadratic or part worth function. The purpose of this paper is to illustrate, with a pilot empirical study, what happens, especially with respect to predictive validity, when using linear functions instead of quadratic or part worth functions.
[ to cite ]:
Philippe Cattin (1982) ,"Some Findings on the Estimatino of Continuous Utility Functions in Conjiont Analysis", in NA - Advances in Consumer Research Volume 09, eds. Andrew Mitchell, Ann Abor, MI : Association for Consumer Research, Pages: 367-372.

Advances in Consumer Research Volume 9, 1982      Pages 367-372

SOME FINDINGS ON THE ESTIMATINO OF CONTINUOUS UTILITY FUNCTIONS IN CONJIONT ANALYSIS

Philippe Cattin, University of Connecticut

ABSTRACT -

Several functions can be used for representing the utility function of interval-scaled attributes, including the part worth function and continuous functions. Pekelman and Sen have shown that a quadratic function can improve predictions over a part worth function. But then, one can often use a linear function instead of a quadratic or part worth function. The purpose of this paper is to illustrate, with a pilot empirical study, what happens, especially with respect to predictive validity, when using linear functions instead of quadratic or part worth functions.

INTRODUCTION

In conjoint analysis a consumer's utility for a continuous attribute is often estimated for several discrete levels of the attribute using a part-worth function motel. The utility of an intermediate level must then be interpolated, if needed. Alternatively, a continuous utility function can be assumed and estimated. While only the part function model can be used with categorical attributes, continuous attributes can be represented with several models. There are three major types of model: two continuous types (the vector model, usually represented by a linear function, and the ideal point model, usually represented by a quadratic function), and the part worth function model (Green and Srinivasan 1978, Figure, p. 106). Flexibility increases from the vector, to the ideal point, and to the part worth function models allowing more shapes for a utility function. But then, the number of degrees of freedom decreases. Hence, "the reliability of the estimated parameters is likely to improve in reverse order. Consequently, from the point of view of predictive validity, the relative desirability of the three models is not clear" (Green and Srinivasan 1978, p. 106)."

Pekelman and Sen (1979a, 1979b) have shown that quadratic utility functions improve predictions over part-worth functions when using data produced with quadratic functions, and when the number of attribute levels is at least four. A major reason is that the part worth function requires more parameter estimates (3 or more) than the quadratic function (2); hence, the parameter estimates of the part worth function are not as reliable. [With only 3 attribute levels both the quadratic and part worth functions estimate 2 parameters (beyond an intercept), and will produce the same attribute utility estimates for the 3 levels (and thus will hate the same predictive validity if only the 3 levels are used). Attribute utility estimates for intermediate levels will differ because one uses a continuous function, the other interpolation; the function that best represents the actual choice rule will have more predictive validity. With only two attribute levels, the quadratic function cannot be used The linear function and the part worth function (with inter-population between levels) can. They produce the same attribute utility estimates for all levels.] But then, one can often assume a linear function instead of a quadratic t or of a part-worth function. In practice, a priori expectations can be used to help decide which function to select (Green and Srinivasan 1978, p. 107). For instance, one expects a priori that everybody has a preferred (finite) level of sugar content in (say) a dessert. Hence, the quadratic function 19 appropriate in this case. In other instances, an attribute utility is expected to be monotone increasing or decreasing. Moreover, it can depend upon the consumer as to whether it is of the monotone or ideal point type.

These expectations were used as a starting point to build Figure 1. For each type of expectation, the appropriate (mathematical) function(s) are shown along with the constraint on the parameter(s) of each function that ensures that the function is indeed as expected (i.e., has a maximum and not a minimum in case 1; is increasing in case 2; is decreasing in case 3; and does not have a minimum in case 4b). The part-worth function which can be used in all cases is not shown in Figure 1. [Sometimes, one may expect multiple peaks. Such an expectation is not included in Figure 1. Tea (if the range of levels goes from cold to hot) falls in the multiple peaks category because consumers tend to prefer iced and hot tea to in-between temperature levels (Green and Srinivasan 1978, p. 106). This implies two peaks (ideal points). But then, one could argue that iced tea and hot tea belong to two different product categories. Moreover, as argued by Pekelman and Sen (1979a, p. 266), this type of function seems more likely to occur when aggregating responses across consumers than at the individual level. For instance, there are consumers who prefer low suds detergent and others high suds detergent which gives two idea: points at the aggregate level (Kuehn and Day 1962, Exhibit 2) but not at the individual level. Since the concern here is with individual-level models (which are used in the market simulations often tone in conjoint analysis studies), the multiple peaks case is not included in Figure 1.]

Figure 1 shows that except for the ideal point case (case 1), where only the quadratic and part worth functions should be used, the linear, quadratic and part worth functions can all be used for cases 2, 3, and 4. [A compensatory model is assumed in conjoint analysis and in the work reported in this paper. In one study, respondents asked to think aloud while doing a conjoint task were found to use mostly noncospensators rules (Olshavsky and Acito 1980). But then, compensators models were found to have about as much internal and external validity as models built with noncompensatory rules. Two reasons for this phenomenon are that compensatory models can approximate noncompensatory rules and that it is not easy to properly identify noncompensatory rules (Cattin 1981). Additional results in psychology (e.g., Dawes 1974) tend to show that compensatory functions are a good approximation to noncompensatory rules.] In these cases, the part worth and quadratic functions will tend to have more predictive validity if the linear function is sufficiently different from the underlying choice rule, and if the amount of noise in the data is sufficiently small (otherwise, the part worth and quadratic (parameter) estimates would not be reliable enough). But then, how much noise does it take and when is the linear function a close enough representation for the linear function to have more predictive validity than the quadratic or part worth function?

It should be noted that the only nonlinear continuous function that is suggested is the quadratic function. Green and Srinivasan (1978, p. 106-107) and Pekelman and Sen (1979a, 1979b) have also emphasized the quadratic function. There are other nonlinear functions that can be used. For instance, both the logarithmic and the exponential functions are monotone (increasing or decreasing) and nonlinear. But then. their mathematical representations (e.g. Ui - log (a + biXi) for the logarithmic function and Ui - exp (a + biXi) for the exponential function) are not linear in Xi. If one attribute is represented by a logarithmic or exponential function and another attribute by another function, the resulting multiattribute model is nonlinear and a special estimation procedure is required. On the other hand, the quadratic function (aiX4 + biXi2) has one term linear in Xi and the other linear in Xi2. Most of the common estimation procedures can be used as long as the other attributes are also represented by linear or "linearized" functions.

In summary, a priori expectations are helpful. However, there is still often some uncertainty as to whether the function with the most predictive validity was selected for all respondents (in a survey). We now turn to a pilot study which illustrates changes in predictive validity as a function of the utility function. It is assumed throughout that there is no interaction. If an interaction is expected, the appropriate interaction term should be added. Alternatively, the attributes can be redefined to eliminate the interaction (Green and Srinivasan 1978, p. 107).

A PILOT STUDY

Data were obtained from a group of eleven French graduate students in business with some working experience. All of them were asked to evaluate an orthogonal array of eighteen hypothetical cars defined on four attributes. They were told beforehand to assume that they were in the market for a car. The dependent variable was an 11-point scale. The purpose of the exercise was to illustrate conjoint analysis with a practical example, as part of a short course in Multivariate Analysis taken by the eleven students. The data were collected before noon and each participant's attribute utilities (using part worth functions) were estimated during the lunch break and handed back to them afterwards.

The attributes and the three levels of each attribute were: (1) Gas Consumption (6, 9 and 12 liter/100 km); (2) Price (15,000, 22,500 and 30,000 Francs); (3) Maximum Speed (110, 130 and 150 km/hour); and (4) Number of Seats (4, 5 and 6). For each attribute the extreme (lowest and highest) levels will be assumed to represent the relevant (c, d) range of attribute levels (e.g., the attribute utility for less than 6 liter/100 lo of gas consumption is of no interest). In a real conjoint analysis study the decision makers involved must specify the relevant range of each attribute (based on what they feel they can develop, market, and so on).

The first three attributes are metric and continuous. Utilities for gas consumption and price are expected to be monotone and decreasing (case 3). (With gas consumption defined as it is, it should indeed be decreasing rather than increasing which it is with mpg). The appropriate mathematical function is the linear or the quadratic function depending upon how much nonlinearity and noise there is. One can expect the utility for maximum speed to have an ideal point. But then, it is likely to be monotone and increasing for some respondents over the relevant range (case 4)e (The maximum speed of 150 km/hour is not very high). Thus, here again, both the linear and the quadratic functions may be appropriate. The fourth attribute takes on discrete values only. Hence, the part worth function is appropriate. However, the part worth function values may be represented by a nonlinear function (or even by a linear function). In summary, both linear and quadratic functions appear appropriate for gas consumption, price and maximum speed, while the part worth, quadratic and linear functions may be appropriate for the number of seats.

There are no validation data. But then, regression can be used as the estimation procedure since the dependent variable is an 11-point scale. As a result, (a) one can obtain an F-value for each estimated parameter (indicating its significance), and (b) one can estimate the predictive validity of each regression model using a formula derived by Rozeboom (1979) for the fixed (predictor variables) case (Cattin 1980. P. 411). Rozeboom's formula is:

EQUATION    (1)

where" is the number of observations in the estimation sample, p the number of regression parameters (excluding the intercept) and p2 can be estimated with the adjusted EQUATION Rozeboom's formula does indeed estimate the predictive validity of a regression model without any validation sample, since it estimates the squared population cross-validated multiple correlation (i.e.,the squared correlation that can be expected between the actual and predicted values of the dependent variable on observations not used in the estimation, where the predicted values are obtained using the regression-estimated parameters). The advantage of formula (1) over sample cross-validation is that it does not require a validation sample and that it produces more precise estimates (Cattin 1980).

Of course, R2 increases with the number of parameters. But then, so does the shrinkage between R2 and p2. Hence, the squared cross-validated multiple correlation of a model that includes parameters (e.g., quadratic terms) that were added to a first model (e.g., linear terms only) may or may not have more predictive validity than the first model. It will have more predictive validity if the R2 is sufficiently higher than the Rw of the first model, i.e. if the additional parameters contribute sufficiently to the R2 or are significant enough.

The analyses that were done with the data collected from the eleven respondents can be categorized into three steps. They are now discussed.

Step 1

Three regressions were run for each respondent assuming in turn the three following models: (1) Quadratic utility functions for the first three attributes and part worth function for the number of seats (called thereafter the Fulls Quadratic Model); (2) Linear utility functions for the first three attributes and part worth function for the number of seats; (3) Linear utility functions for all attributes (called thereafter the Fully Linear Model). The predictive validity of each model was then computed using (1). The results are shown in Table 1 along with the R-squares. It should be pointed out (see footnote 1 and footnote (a) of Table 1) that the quadratic and part worth functions produce the same R2 and the same predictive validity (as estimated with (l))s when the attributes take on three levels in the estimation sample. This is because both functions estimate two parameters, and because the quadratic function fits perfectly the three points of the part worth function.

The results (Table 1) indicate that the Fully Quadratic Model has more predictive validity than model 2 for only three respondents (Respondents No. 3, 5 and 11). Moreover, the Fully Quadratic Model has more predictive validity than the Fully Linear Model for four respondents (Respondents No. 3, 5, 10 and 11), while model 2 has more predictive than the Fully Linear Model for only two respondents (Respondents No. 5 and 10). Hence, the Fully Linear Model has more predictive validity than the other models more often than not. Moreover, the Fully Linear Model has a slightly greater average predictive validity (.684) than model 2 (.680). The average predictive validity of the Fully Quadratic Model is lower (.653). A clear implication of these results is that, if quadratic functions are assumed across the board for all attributes and respondents, the overall predictive validity Day be affected.

Step 2

The second step involved attempts to add quadratic terms to the Fully Linear model to increase the predictive validity estimated with (1). This was tone on a stepwise basis starting with the quadratic term that had the high est F-value in the fully Quadratic Model. The procedure was stopped when the predictive validity of the resulting model did not increase any more.

Table 2 shows the results obtained for the respondents whose predictive validity was increased by the addition of at least one quadratic term. One quadratic term was added to the model of five respondents (Respondents No. 1, 2, 8, 10 and 11) and three quadratic terms to the model of one respondent (Respondent No. 5). The predictive validity of the model of the five remaining respondents was not increased by the addition of the quadratic term with the highest F-value. In other words, the choice rules used by the respondents (whatever they were) are such that they are, more often than not, best represented by linear functions rather than by quadratic or part worth functions (since, as indicated earlier, the predictive validates obtained in this study with quadratic and part worth functions are equal).

It should be noted that the resulting predictive validates estimated with (1) are likely to overestimate the actual predictive validates because the derivation of (1) assumes that the predictor variables in the regression model were selected a priori and not with a stepwise procedure. However, the resulting models are likely to have more predictive validity than the Fully Linear or the Fully Quadratic Models for the six respondents in Table 2. The resulting average R2 and Pc were .840 and .738 respectively. (The Fully Linear Model had an average P2c of .684).

The F-values of the added quadratic terms are also shown in Table 2. Both the F-values in the Full Quadratic Model and to enter (i.e. ,to enter into the previous model in the stepwise procedure) are shown. The two sets of F values (in the model and to enter) are not quite the same because they are based on different models with different p (number of predictor variables) values, but they are not very different.

It is instructive to note that the highest F-value that did not increase the predictive validity (across all respondents) was 1.600 in the Fully Quadratic Model (and 1.794 to enter). These values are lower than all the F values in Table 2. However, there is no guarantee that this would hold in other instances. The relationship between F and P-c is far from linear. Nevertheless, this indicates that, if the quadratic tern of a quadratic function is not significant enough, its inclusion in the model lovers the predictive validity. Whenever this is the case. it seems appropriate to droP it.

For all practical purposes, as long as an orthogonal array or a fractional factorial design is used, and as long as there are not too many continuous attributes (e.g. .3, 4 or 5), the stepwise procedure used to select quadratic terms is likely to produce the model with the highest P2c With more attributes it is less and less likely because the number of alternative models (2k where k is the number of continuous attributes) increases geometrically.

It is noteworthy that the quadratic function was not found to improve predictive validity very often. It improved it in four respondents out of eleven for gas consumption, in two respondents for both price and the numb of seats, and in zero respondent for maximum speed. (For the maximum speed attribute the utility was found to increase with maximum speed for 10 respondents. It was found to decrease with one respondent, but not significantly (F=.645 in the Fully Linear Model). had there be a "clear cut" ideal point attribute (e.g., sugar from none to a lot), the quadratic function would probably have been found to have more predictive validity for most respondents.

Step 3

So far it has been assumed that there is no constraint on any of the parameters. However, as noted earlier, it can be expected that the utility for both gas consumption and price be monotone and decreasing (or at least not increasing) over the relevant range of attribute levels. Hence, the constraints shown in Figure 1 (Cases 3a and 3b) apply. For both maximum speed and the number of seats, we shall assume that both ideal point and monotone (decreasing or increasing) cases are possible. In this case, there is no constraint on a linear function (Figure 1, Case 4a), but there is a constraint on 8 quadratic function (Figure 1, Case 4b). The linear and quadratic utility functions obtained in Step 2 on each attribute for each respondent were then inspected to determine whether there is any inconsistency, i.e., whether any constraint was violated. Six inconsistencies were found. They are shown in Table 3. Four quadratic functions were found to have a maximum or a minimum within the relevant range, when they should not. Moreover, two linear functions were found to have a positive slope when they should not, but with low F-values (.284 and .336).

At this point, LINMAP (Srinivasan and Shocker 1973) can be used for estimating the models that have inconsistencies. It handles the above constraints and would eliminate the inconsistencies. Alternatively, a least squares procedure with linear constraints (Theil 1971, p. 42-46) can be used. This can be achieved with ALSAS (Perreault and Young 1979). Whatever the estimation procedure the slopes of the two inconsistent linear functions would be constrained to be zero rather than positive, and the quadratic functions to have their maximum or minimum at one of the extreme ends of the relevant range rather than within range. There is no formula for estimating the predictive validity of the resulting models. However, the elimination of inconsistencies can only improve the predictive validity.

SUMMARY AND CONCLUDING COMMENTS

The major findings of the pilot study can be summarized as follows: (a) The Fully Linear models were found to have more predictive validity than the Fully Quadratic models for seven respondents out of eleven; (b) A stepwise procedure was then used to find the model with the highest (or close to the highest) predictive validity for each respondent; (c) A few inconsistencies were found (in five of the eleven resulting motels); LINMAP or a constrained least squares procedure can be used to eliminate these inconsistencies and improve the predictive validity. Some additional work (whether it be simulation or analytical) would be useful. It would be valuable, for instance. to know in what conditions the linear function improves the predictive validity compared to the quadratic and part worth function, when the underlying function is expected to be monotone (cases 2 and 3 in Figure 1), or maybe monotone (case 4). Moreover, how such of a difference would a constrained estimation procedure make? A procedure that could be used in commercial studies and that would "optimize" predictive validity would be useful. It would be somewhat cumbersome to use (in a commercial study) the 3 step procedure used in the above pilot study. It was meant to illustrate what happens when using linear vs. quadratic or part worth functions (e.g., changes in predictive validity, inconsistencies in parameter estimates, and so on).

FIGURE 1

MAJOR TYPES OF UTILITY FUNCTIONS

TABLE 1

R-SQUARE AND PREDICTIVE VALIDITY OBTAINED FOR EACH RESPONDENT WITH THREE DIFFERENT MODELS

TABLE 2

PREDICTIVE VALIDITY OBTAINED BY LETTING QUADRATIC TERMS THAT INCREASE PREDICTIVE VALIDITY ENTER THE FULLY LINEAR MODEL (ON A STEPWISE BASIS STARTING WITH THE HIGHEST F VALUE IN THE FULLY QUADRATIC MODEL)

TABLE 3

INCONSISTENCIES FOUND ON THE GAS CONSUMPTION AND PRICE UTILITY FUNCTIONS FOR THE MODELS OBTAINED IN TABLE 2 (THE UTILITY FUNCTION FOR BOTH GAS CONSUMPTION AND PRICE IS EXPECTED TO BE MONOTONE AND DECREASING OVER THE RELEVANT RANGE OF ATTRIBUTE LEVELS)

REFERENCES

Cattin, Philippe (1980), "Estimation of the Predictive Power of a Regression Model," Journal of Applied Psychology, 65, (August) pp. 407-414.

Cattin, Philippe (1981), "On the Use of Verbal Protocols in Conjoint Analysis Studies," Decision Sciences, 12, (October).

Dawes, R. A., and Corrigan, B. (1974), "Linear Models in Decision Making," Psychological Bulletin, 81, pp. 95-106.

Green, Paul E., and Srinivasan, V. (1978), "Conjoint Analysis in Consumer Research: Issues and Outlook," Journal of Consumer Research, 5 (September), pp. 102-123.

Kuehn, Alfred A., and Day, Ralph L. (1962), "Strategy of Product Quality," Harvard Business Review, 40, pp. 100-110.

Olshavsky, Richard W., and Acito, Franklin (1980), "An Information Processing Probe Into Conjoint Analysis," Decision Sciences, 11 (July), pp. 451-470.

Pekelman, Dov, and Sen, Subrata R. (1979a), "Measurement and Estimation of Conjoint Utility Functions," Journal of Consumer Research, 5 (March), pp. 263-271.

Pekelman, Dov and Sen, Subrata R. (1979b), "Improving Prediction in Conjoint Measurement," Journal of Marketing Research, 16 (May), PP. 211-220.

Perreault, William D. Jr., and Young, Forrest W. (1980), "Alternating Least Squares Optimal Scaling: Analysis of Nonmetric Dots in Marketing Research," Journal of Marketing Research, 17 (February) pp. 1-13.

Rozeboom William U. (1979), "The Cross-Validational Accuracy of Sample Regressions," unpublished manuscript, The University of Alberta, Edmonton, Canada.

Srinivasan, V., and Shocker, Allan D. (1973), "Estimating the Weights for Multiple Attributes in a Composite Criterion Using Pairwise Judgments," Psychometrika, 38, (December), pp. 473-493.

Theil, Henry (1971), Principles of Econometrics, NY: John Wiley

----------------------------------------