# Hierarchical Model Testing in Conjoint Analysis

^{[ to cite ]:}

J. Douglas Carroll, Paul E. Green, and Wayne S. DeSarbo (1980) ,"Hierarchical Model Testing in Conjoint Analysis", in NA - Advances in Consumer Research Volume 07, eds. Jerry C. Olson, Ann Abor, MI : Association for Consumer Research, Pages: 688-691.

^{[ direct url ]:}

http://acrwebsite.org/volumes/9767/volumes/v07/NA-07

Increasingly, researchers are becoming interested in the relationship of part-worth functions, obtained from conjoint analysis, to other aspects of the respondents (e.g., their demographics, preferences for current brands, etc.). This paper describes a straightforward procedure for determining commonalties among utility functions, as related to other facets of the subject and experimental task.

To date, virtually all applications of conjoint analysis have involved a sufficient number of preference judgments to enable the researcher to estimate utility functions at the individual-respondent level. By so doing, the utility functions can be used later in various types of simulations involving individual choice behavior.

Nevertheless, research situations can arise in which the researcher is interested in what various respondents' utility functions may have in common. For example, are their utilities sufficiently similar to be represented by a common (group-average) function? If not, what correspondences among respondents may exist between the extremes of complete individuality versus complete agreement.

In other cases--particularly commercial applications of conjoint analysis where respondent time and survey cost constraints are prevalent--the researcher may have to settle for fewer preference judgments than are needed for individual utility function estimation. Accordingly, one may wish to fit models that assume some type of partial commonality across respondents.

In both kinds of situations it is also typically the case that individual preference judgments are not highly reliable on a test-retest basis. To the extent that estimates based on various levels of aggregation are compatible with the original data, group-based parameter values should be more stable than individual-based estimates.

The purpose of this research note is to describe a statistical procedure that enables the consumer researcher to test alternative utility models in a hierarchical manner. The approach draws upon model comparison techniques in multiple regression by which a researcher can compare some "full" model with some "restricted" model in which the parameters of the latter model are a proper subset of those of the former model. [For a general discussion of model comparison procedures, see Chapter 2 of Green, with Carroll (1978). A related approach to the analysis described here can be found in Ford, Moskowitz, and Wittink (1977).] The test is designed to find out if the additional parameters in the full model account for a significant amount of additional variance in the criterion variable to warrant their inclusion.

The basic formula for carrying out these tests utilizes the F statistic:

where R^{2}_{f}, R^{2}_{r} denote the coefficient of multiple determination for full and restricted model and d_{f}, d_{r} denote their respective degrees of freedom. Under the usual error term assumptions, this statistic follows the F distribution, with d_{r} - d_{f} degrees of freedom for numerator and d_{f} degrees of freedom for denominator.

Model comparison tests are not new. They have been used in such diverse areas as econometric analysis and the analysis of multidimensional contingency tables. However, their use in consumer research is still in its early stages.

THE DATA

Data for application of hierarchical model testing were obtained from another study (Carroll, Green and DeSarbo 1979). A sample of 46 second-year MBA students (33 males and 13 females) were asked to rate 32 profile descriptions of leisure time allocation on a 0-10 point desirability scale (see Table 1). The profiles were made up according to an orthogonal main effects plan entailing four levels each of the five activities in Table 1:

Level 1: 1 or 2 hours Level 3: 5 or 6 hours

Level 2: 3 or 4 hours Level 4: 7 or 8 hours

The particular number of hours chosen within level was determined randomly, subject to each of the two possible numbers of hours appearing an equal number of times, within level, over the whole set of 32 profiles.

Two classes of multiple regression models were fitted. Based on theoretical considerations, it was hypothesized that a linear-in-logs model (in which the desirability ratings are regressed on log-hours of each activity) might be an appropriate representation. In this case six parameters can be fitted--a partial regression coefficient for each of the five activities and an intercept term.

However, a more general model consists of a dummy-variable regression in which each four-level activity is coded by three dummies. In this case 15 partial regression coefficients, in addition to an intercept term, are fitted.

For purposes of illustration we first apply the hierarchy of models approach to the simpler, linear-in-logs model. This is followed by a less comprehensive examination of the dummy-variable regression model and a general discussion of the procedure.

ANALYSIS OF LINEAR-IN-LOGS MODEL

Since each respondent supplies 32 observations (drawn from an orthogonal main effects design), sufficient data are available to estimate all parameters at the individual-respondent level. However, to motivate the approach, a sequence of five models was first fitted and tested:

1. Model 1--a linear-in-logs model fitted to data pooled over all 46 respondents.

2. Model 2--a model that included all of the predictors of model 1 plus a single dummy intercept term denoting the respondent's sex.

3. Model 3--a model-1 extension that included a separate dummy-variable intercept term for each respondent.

4. Model 4--a model-3 extension that included a slope term for each respondent.

5. Model 5--a model-4 extension that fitted a separate linear-in-logs model for each respondent.

While models 1, 2, and 5 are well known, a few comments are in order regarding models 3 and 4. Model 3 assumes that each subject has thc same utility function as the group but allows the individual to have an idiosyncratic origin or reference point.

Model 4 assumes that each subject has the same utility function as the group but permits an idiosyncratic origin and an idiosyncratic scale unit by which the utilities are stretched or compressed to best fit the subject's data. Model 4 is not, strictly speaking, a linear model; it is a bilinear model. However, a very good approximate fit can be obtained by a sequence of linear (least squares) model fitting steps, yielding F statistics that are approximately distributed as F (with the appropriate degrees of freedom).

Model 2 Versus Model 1

As shown in Table 2, the R^{2} values for models 1 and 2 were 0.055 and 0.057, respectively. Substituting in equation (1), we have:

A SUMMARY OF MODEL COMPARISONS

which, with 1 degree of freedom for numerator and 1466 degrees of freedom for denominator, is not significant at the 0.05 level. [In this example, the degrees of freedom are: EQUATION where n denotes the number of cases (46 x 32 = 1472), P_{1} denotes the number of predictors for model 1, and P_{2} denotes the additional dummy-variable for sex.]

By way of substantive interest, the parameter values of model 1 are:

Y = 3.098 - 0.021 log-hours (TV)

+ 0.189 log-hours (reading)

+ 0.204 log-hours (sports) (3)

- 0.176 log-hours (hobbies)

+ 0.777 log-hours (socializing).

A test of each partial regression coefficient indicated that all coefficients except that for TV were significant beyond the 0.05 level. Of the significant coefficients, we note that all are positive except that (-0.176) associated with hobbies. [Subsequent analysis indicated that the utility function for hobbies was of the ideal-point variety (Carroll 1972), in which preference ratings first increased slightly and then decreased rather sharply.]

Other Model Comparisons

In a similar fashion other model comparisons were made, with results also appearing in Table 2. Model 3 is straightforward (since only 45 additional dummy variables in addition to the intercept term and 5 log-hours predictors are required). As such, one obtains a single value of R^{2} for the complete regression.

Such is not the case for model 4 which allows both idiosyncratic origin and unit. Model 4 is fitted by first computing the regression function for the total group; see equation (3). Following this, the same 32 fitted criterion values Y_{i} (i = 1,2,...,32), as computed for the group, serve as an independent variable in each subject's two-variable regression. Each subject's original Y_{i}'s serve as a criterion variable. Each of the 46 separate regressions yields an R^{2} value. However, what is needed is a single R^{2} that reflects variance accounted-for around the __grand mean__ across the total sample (not around each subject's mean). This summary value, denoted by R^{2}, is computed as follows:

where, using conventional dot notation to indicate total-group versus individual criterion-value means, we have for the k-th individual:

After R^{2} is obtained, this value appears as the appropriate entry for the full model in equation (1).

A similar procedure was used to obtain a single R^{2} for model 5. However, in this case each individual R^{2}k is found by regressing the k-th subject's Y_{i}'s on his/her own predictor set, followed by application of equation (4) to the individual R^{2}k's.

ANALYSIS OF THE DUMMY-VARIABLES MODEL

A less extensive hierarchical comparison was made of models based on a dummy-variable formulation of the problem. This comparison entailed three models--models 6 through 8. As noted earlier, each of the five 4-level activities can be coded into three 0-1 dummy variables, leading to a regression equation that fits 15 partial regression coefficients and an intercept term.

The R^{2} of model 6 is computed from data pooled over all 46 respondents. The R^{2} of model 8 is based on individual R^{2}k's, found from 46 individual fits, followed by application of equation (4).

Model 7, however, is based on a different procedure. In this model we assume that each subject follows the utility of the group but with the additional freedom to exhibit different importance weights, associated with the total-group utilities. This model can be expressed as:

where b_{O} is the k-th subject's intercept, the b_{j}^{(k)}'s, are his/her importance weights and U_{ij} is the group-average utility for the i-th allocation (i ~ 1,2,...,32) of the j-th activity. The U_{ij}'s, in turn, are computed from the preliminary group-level regression of model 6 and represent the appropriate partial regression coefficient associated with the l-th level (l = 1,2,3,4) of the j-th activity.

Following computation of the 46 individual R^{2}'s, as based on equation (7), R^{2} was computed by means of equation (4). Since model 7 fits only six parameters for each subject (rather than the 16 parameters fitted in model 7), it is more restrictive. As Table 2 shows, both of the tests are significant. It should be noted that in cases where the researcher is limited in terms of the number of stimuli that can be presented to the subject, model 7 needs only J+l individual observations while model 8 requires

observations, where m_{j} denotes the number of levels of the j-th attribute (or activity, in this case).

DISCUSSION

As may be surmised, other classes of models can be fitted that may be appropriate in various applications. For example, cases can arise in which subjects are classified into __a priori__ groups, as illustrated earlier in the case of male versus female respondents.

In the model 2 versus model 1 comparison we examined only the difference in male versus female scale origins (via the fitting of a single intercept term). Clearly, other models could be considered, such as:

1. Differential slopes as well as intercepts.

2. Differential saliences, as in model 7.

Moreover, models could be developed to encompass several sets of background variables--sex, age, marital status--simultaneously, if desired.

Still another class of hierarchical models that can be fitted and compared are those based on a preliminary cluster analysis of the response data. For example, assuming that each respondent receives a set of common (or "core") stimuli, subjects can be initially clustered on the basis of some convenient program like Johnson's hierarchical method (Johnson 1967). The input data may consist of Euclidean distances computed between each subject pair's response vectors, as associated with the core stimuli.

Following this, group utilities are computed for each cluster and the salience model applied for each subject in the cluster. The resulting __average__ R^{2} is then compared to the average R^{2}'s associated with the subject being assigned to each of the other clusters, in turn. The subject is then assigned to that cluster for which the overall average R^{2} is highest.

After all subjects are so classified, group utilities are then computed for the new clusters and the assignment process is repeated until either a researcher-supplied maximum number of iterations is met or until no subject is reassigned from iteration t to iteration t+l.

Having found the average R^{2} associated with the C clusters one could apply equation (1) to see if this model should be accepted, versus a model based on data pooled across all subjects (albeit with significance levels to be taken with a large grain of salt, given the way in which the subgroups were formed in the first place).

Still other model comparisons could be made and only a few of the possibilities have been illustrated here. Suffice it to say that hierarchical model testing provides a very flexible approach to the study of individual or intergroup differences in utility functions. Considering the problems encountered in obtaining reliable data at the individual-respondent level and the pressing need to keep the number of stimuli that each subject receives to a manageable number, the hierarchical testing approach should see increasing application in the future. It provides a uniform approach to selecting the most parsimonious model that is consistent with the systematic variation in the data and the researcher's sequence of candidate models for testing.

REFERENCES

Carroll, J. Douglas (1972), "Individual Differences and Multidimensional Scaling," in R. N. Shepard, A. K. Romney, and S. B. Nerlove (eds.), __Multidimensional Scaling: Theory and Applications in the Behavioral Sciences__. Vol. I. New York: Seminar Press, 105-155.

Carroll, J. Douglas, Paul E. Green, and Wayne S. DeSarbo (1979), "Optimizing the Allocation of a Fixed Resource: A Simple Model and Its Experimental Test," __Journal of Marketing__, 43, 51-57.

Ford, David L., Herbert Moskowitz, and Dick R. Wittink (1977), "Econometric Modeling of Individual and Social Multiattribute Utility Functions," __Multivariate Behavioral Research__, 13, 77-98.

Green, Paul E., with contributions by J. Douglas Carroll (1978), __Analyzing Multivariate Data__. Hinsdale, Ill.: Dryden Press.

Johnson, Steven C. (1967), "Hierarchical Clustering Schemes," __Psychometrika__, 32, 241-254.

----------------------------------------

Tweet
window.twttr = (function (d, s, id) { var js, fjs = d.getElementsByTagName(s)[0], t = window.twttr || {}; if (d.getElementById(id)) return; js = d.createElement(s); js.id = id; js.src = "https://platform.twitter.com/widgets.js"; fjs.parentNode.insertBefore(js, fjs); t._e = []; t.ready = function (f) { t._e.push(f); }; return t; } (document, "script", "twitter-wjs"));