A Strategy For a Priori Segmentation in Conjoint Analysis

James B. Wiley, University of Alberta
ABSTRACT - The general multivariate regression (GMR) model is used as an integrating framework for conjoint analysis. An advantage of the GMR approach is that it offers extensive capabilities for formulating and testing hypotheses. Particularly appealing is the way hypotheses pertaining to interactions between group membership and attribute profiles may be formulated. Group by attribute interactions can provide the basis for segmentation strategies. Illustrations of the formulation of a variety of hypotheses are provided in the present paper using a simple prototypical conjoint task.
[ to cite ]:
James B. Wiley (1993) ,"A Strategy For a Priori Segmentation in Conjoint Analysis", in NA - Advances in Consumer Research Volume 20, eds. Leigh McAlister and Michael L. Rothschild, Provo, UT : Association for Consumer Research, Pages: 142-148.

Advances in Consumer Research Volume 20, 1993      Pages 142-148


James B. Wiley, University of Alberta


The general multivariate regression (GMR) model is used as an integrating framework for conjoint analysis. An advantage of the GMR approach is that it offers extensive capabilities for formulating and testing hypotheses. Particularly appealing is the way hypotheses pertaining to interactions between group membership and attribute profiles may be formulated. Group by attribute interactions can provide the basis for segmentation strategies. Illustrations of the formulation of a variety of hypotheses are provided in the present paper using a simple prototypical conjoint task.

Conjoint analysis, introduced to marketing by Green and Rao (1971), enables marketers to determine the relative impact of product/service attribute levels on preference and other dependent variables. The term "conjoint analysis," however, does not imply a specific technique for data collection, manipulation, or estimation. Rather, there are a variety of approaches C differing in how data is collected, the amount of aggregation prior to estimation, the approach to estimation, statistical assumptions, and the like C all of which qualify as conjoint analysis.

Typically, however, estimates are based on individuals' responses to judiciously constructed attribute profiles which are characterized in terms of a common set attributes. The profiles differ in terms of the levels ("yes/no", "$1.98/2.58", "often/sometimes/never") that the attributes assume. The respondent is shown a set of profiles and is asked to evaluate each. The relative impact of each attribute level can then be determined using linear (i.e., OLS) or monotonic (Kruskal, 1965) regression, LINMAP (Srinivasan and Shocker, 1973), or other estimation procedures. Ordinary least squares (OLS) regression probably is the currently most widely used estimation tool.

In this paper it is shown how generalized multivariate regression (GMR) may be used for estimation in CA applications. As a conceptual and estimation tool, GMR offers at least three advantages over the OLS regression traditionally used:

- Different forms of a priori segmentation can be introduced in a natural and consistent fashion.

- The fact that individuals make repeated responses, which probably are correlated, is recognized in estimation.

- A broad variety of hypotheses-both within group and across groups-can be formulated and tested within the framework.

Hagerty (1985) and Kamakura (1988) used a structurally equivalent model to implement their respective approaches to aggregated conjoint measurement. The present approach differs from the Hagerty and Kamakura approaches in two ways.

- First, GLS (generalized least squares) estimation is used and, hence, a variety of covariance structures can be accommodated. As a result, hypothesis tests are more efficient than the OLS counterparts in the sense that more of the available information is included in the test statistics. Kamakura (1988) used OLS estimation procedures.

- Second, the emphasis with the present application is on applications where aggregation segments are defined a priori. The emphasis in the Hagerty and Kamakura papers is on post hoc aggregation. Since the segments are defined a priori group membership does not depend on the dependent variable and hypotheses regarding differences between groups on the dependent variable may be tested using traditional testing procedures. Post hoc procedures are widely used in marketing segmentation studies. Generally these procedures seek to maximize some measure of difference between the post hoc groups on the dependent variable. It is well known that under these conditions the assumptions of traditional procedures for testing the significance of differences between groups are violated. One can form groups using the procedures of Kamakura (1988), or test hypotheses using the procedures described below, but one should not do both.


There are three ways CA commonly is formulated as an OLS regression problem: as individual, aggregate, and grouped analyses.

- Provided each respondent evaluates a sufficient number of concepts, separate sets of partworths may be estimated for individuals. The number of observations no corresponds to the number of concept evaluations provided by each respondent and the number of parameters estimated by OLS is equal to the number of partworths np.

- At the other extreme, data from all respondents may be pooled by "stacking" individuals' vectors of observations. The resulting vector of observations will have no times the number of respondents ns elements. The number of parameters estimated by OLS remains equal to the number of partworths np. However, the partworths now are the average of the partworths estimated using individual analysis.

- A middle ground that retains idiosyncratic differences, at least at the segment or group level, is to cluster respondents accoMwing to some criterion or criteria, and then perform a grouped regression for each cluster. Green and Srinivasan (1978) suggest that respondents be clustered according to their partworth utilities. Alternative approaches would be to cluster on the observation vector Y, or on the basis of covariates, such as demographic, socioeconomic, or lifestyle data.

The three approaches, their advantages, and disadvantages are summarized in Figure 1.


Aside from the problems mentioned above, each of the traditional approaches to CA suffer from a common set of conceptual shortcomings. First, the data generated by each respondent probably is best thought of a set of repeated trials in which the individual makes a series of responses to a set of profiles which have the factorial structure that typifies conjoint methodology. That is, the data of a conjoint study can more appropriately be thought of as a (ns x no) matrix of responses (i.e., ns respondents each make no responses) than as a (ns*no x 1) vector as in Equation (T2, Figure 1). Second, in the typical study there is only one set of profiles represented by the design matrix X, rather than ns (or ng) profiles which happen to be the same, which might be inferred from the multiple appearances of X in Equations (T1, T2 and T3 - Figure 1). The realities of the aggregated data are captured by the generalized multivariate regression (GMR) formulation of Potthoff and Roy (1964), and Khatri (1966):

(1) noY'ns = noXnp * npbng * ng"ns + noens,

where Y' is the transpose of the matrix of observations, X is the common design matrix, b is a matrix of parameters, A is a known matrix associated with group membership, and e is a matrix of errors whose columns are independently distributed as a no-variate distribution with common covariance matrix noSno and mean vector 0. The matrices X and A are assumed to be of full rank. The interpretation of the respective matrices are as follows:



3.1 The Matrix of Observations, Y

It is assumed that each on ns respondent generates a no x 1 vector of responses to no concepts. Typically, the responses take the form of ratings or rank orders. Thus, the ith respondent's data consists of the vector yi = noy1. The matrix Y' has the structure:

Y' = y1, yi, ..., yns.

Each column of Y' contains the data of a single respondent. Each row of Y' contains the responses of ns individuals to one of the concepts. Each individual is assumed to belong to one or more of ng groups of a priori interest.

3.2 The Covariance Matrix, S

The covariance matrix S can be given the partitioned structure:


where p = no, the number of concepts in the CA application. No special assumptions regarding the elements {sij} of the covariance matrix S need to be made. For example, the variance element {s112} captures variation in responses to concept one due to uncertainty, position, unique combinations of attributes, or whatever; and covariance element {s12} = s11-2 * s22-2 * r12 captures the effect of heteroscedasticity and correlation between concept one and two that may result from the fact that one follows the other in the questionnaire.

3.3 The Parameter Matrix, b

The parameter matrix npbng contains the np partworth estimates for ng groups. As described in the next section, with "dummy" coding in the A matrix, the first column contains the estimates for group one, the second column the estimates for group two, and so forth. Interpretations of the elements of b will differ, however, with alternative codings of A. Following Morrison (1976) and Grizzle and Allen (1969), the estimator of b in Equation 1 is:

(2)  b=(X' D-1 X)-1 X' D-1Y' A' (A A')-1, where

(3)  D= Y' Y - Y' A' (A A')-1 A Y.

The matrix D is an estimate of the matrix S, the covariance matrix of e in Equation 1. [It should be noted, however, that a pooled covariance is assumed to apply for all groups and hence S can at best be an approximation of the true ones of the respective groups. If the groups respond to the concepts in different orders or if they have markedly different preferences for the concepts, then the pooled covariance matrix may not be a good approximation of the true covariance matrices of the groups. Generally speaking, however, studies are conducted using printed questionnaires and concepts are presented in the same order, so the repeated measures aspects of the studies are the same across groups. Also, the attributes used to describe the concepts generally are costs or benefits for which there are monotonic relationships between levels and partworths across groups. In such cases, one would expect that a pooled covariance matrix would be a reasonable approximation across groups.] It should be noted that the amount of heteroscedasticity and multicollinearity in conjoint data is an unresolved empirical issue. While GLS estimates may be obtained using (3) in (2), corresponding OLS estimates may be obtained by replacing (3) with D = I, where I is the identity matrix of appropriate rank. Procedures for formulating and testing hypotheses are not effected. The interpretation of the elements of b depends on the coding scheme selected for the grouping matrix A.

3.4 The Grouping Matrix, A

Which group, or groups, the individual belongs to is indicated by the matrix A. Under various codings for A, individuals may be represented as a) belonging to a unique group, b) have a probability, or fuzzy, association with groups, or c) belong to several groups corresponding to a sample structure, such as jointly being a member of gender, lifestyle, income, or usage groups. [The practical difference between the present paper and Kamakura (1988) is that in the present paper the elements of A are taken to be known, hence they are not conditional on Y, while Kamakura uses iterative procedures to estimate the elements of A, hence they are conditional on Y.] In order to simplify the exposition of the overall approach, in the present paper it is assumed that individuals belong to one of two unique groups.

"Dummy" and "effect" coding are the most common strategies for coding the grouping matrices (Kerlinger and Pedhauser, 1973.) When used to code the A matrix, the two approach result in quite different interpretations of the parameter estimates. With "dummy" coding, respondents would be coded as A1 in Figure 2. The first row of the A matrix would contain "1's" if the respondent was in group one, zero otherwise. The second row would contain "1's" in the second row if the respondent was in group two, zeros otherwise. With this coding, the term ng[(A A')-1]ng of Equation 2 has the inverse of the group sizes on the diagonal and zeros off-diagonal. The term no[Y' A']ng is the sum of the observations in each group. The product of the two terms gives the mean response for the groups. Assuming the design matrix X is coded as below (with the first column coded as the constant) parameters b11 and b21 are the within group constants for group one and two, respectively. The parameters b12 b13 b14 b15 and b22 b23 b24 b25 are the mean partworth for the respective groups. In this respect, dummy coding provides an analysis that is equivalent to the grouped analysis summarized in Figure 1.


An alternative way to code respondents into mutually exclusive groups would be to "effect" code the A matrix, as A2 of Figure 2. The first row of the A matrix would contain a vector of "1's". The second row would contain "+1's" if the respondent was in group one and "-1's" if the respondent was in group two. With this coding, the term ng[(A A')-1]ng of Equation 2 has the inverse of the sample size on the diagonal and the difference between the sample sizes of the two groups off-diagonal. The first column of no[Y' A']ng is the sum of the observations across all groups. The second column contains the difference between the responses of group one to each concept and the responses of group two to the concept. Again assuming the first column of X is coded for the constant, parameter b11 is the grand mean and b12 b13 b14 and b15 are the partworths for the data pooled across groups. The parameters b21 b22 b23 b24 b25 capture the two-way interactions between group membership and partworths, i.e., a direct test that the within group partworths are equal. For example, if b21 is zero, then the mean responses for the two groups are equal to the grand mean. If b22 is equal to zero, the estimate of the parthworth of the first level of the first attribute based on the pooled data and be used in both groups. If not, the partworth for group one is (b12 + b22) and for group two (b12--b22). If it were known that the groups differed in the partworths, A1 would be the logical coding for the analysis. If on the other hand it were known that the groups did not differ, the A2 coding would make the most sense.

3.5 The Design Matrix, X

The within subject matrix, X, must be the same for all individuals. The parameters associated with columns of X, however, can differ and assume the value of zero for some individuals or groups. The coding of X follows familiar conventions in CA. For example, the coding for X provided in the previous section illustrates the use of orthogonal polynomials to code quantitative variables. That is, it is assumed that two groups of respondents evaluate a set of concepts that are profiled in terms of two attributes. The first attribute, a, is assumed to be a quantitative attribute having three levels. Three price levels C low, medium, and high C would generate such an attribute. Given the three levels, linear and quadratic effects can be estimated. The linear effect would be expected to be negative for price; higher prices should result in lower preference. The quadratic component can be interpreted as an indication of whether there is "concavity" or "convexity" in the ratings of the quantitative variable. That is, the {1 -2 1} coding for the quadratic effect represents the difference between the sum of the lowest and highest values and twice the middle value. Alternatively, the quadratic effect with three levels is equal to [(alow + ahigh)/2 - amiddle]. Assuming the linear component indicates a significant trend, if the middle value is significantly less than the mean of the extreme values, then the partworths are increasing at an increasing rate (or decreasing at a decreasing rate). If the middle value is greater that the mean of the two extreme values, then the rate of increase is decreasing (rate of decrease is increasing). The test of significance on the component provides the appropriate test of the null hypothesis and the sign indicates the direction of change. The following section illustrates how hypotheses may be formulated within the framework of the GMR model.

The second attribute, b, is assumed to be a two level qualitative attribute. A study evaluating two hypothetical brands would generate this sort of attribute. A significant main effect within a group would indicate group members have relative greater preference for one of the brand names. There will be four parameters to estimate within each group (np = 4), i.e., the mean, m, linear effect for the "price-like" attribute a, quadratic effect for a, and main effect for the "brandname-like" attribute b. The first column of X gives the coding for the mean, the second and third the coding for linear and quadratic polynomials for a, and the fourth gives main effect coding for b (Kirk, 1982, p.830).


Hypotheses of the form H0 : F b C' = 0 against the alternative H1 : F b C'_ 0 may be formulated and tested using procedures provided in the Appendix. The matrix r F np identifies the rows of b that will enter into the hypothesis test. The matrix t C' ng identifies the columns of b that will be used. [It is evident from Equation 2 that information about the covariance structure of errors D (Eq. 3) is incorporated in the covariance structure of partworth estimates. The typical conjoint study in which individuals make repeated responses to a fixed set of concepts is the sort of setting that might generate a data set with an arbitrary covariance structure that differs from that assumed for OLS estimation. While that magnitude of the estimates may be robust to departures of D from (iid), significance levels of hypotheses defined on the estimated parameters can vary depending on the nature of the departure.]

A variety of hypotheses formulated in terms of F and C are interpreted in Figure 3. For example, Figure 2 indicates that under A2 the interaction terms for each of the four parameters are in the second column of b. Accordingly, in Figure 3 (set 1) there are ones in each column of F C indicating that each of the parameters is to be selected C and a one in the second column of C' C indicating that it is the second column of parameters that are to be selected. The test is that the values b21 and b22 and b23 and b24 and b25 simultaneously are equal to zero. The equivalent coding under A1 tests the hypothesis that the parameter values are equal in the two groups, i.e., their difference is zero. As with the A2 formulation, the F matrix is coded to select each row, but the C' matrix is coded to take the difference between the parameters in each row. If the difference is zero, then there is no group by attribute interaction. Figure 3 (set 2) gives the test of no group by attribute a interaction. The F matrix selects the parameters corresponding to the linear and quadratic effects. The C matrix under A1 tests the hypothesis that the difference between the two sets simultaneously are equal to zero. The C matrix under A2 directly tests whether the interaction parameters simultaneously are equal to zero. Figure 3 (set 3) selects only the quadratic component and test it for group by attribute interaction. Figure 3 (set 4) shows how to test whether the grand mean is equal to zero. Under A1, the means of the two groups must be selected and pooled to get the grand mean, which is then tested to see whether it is equal to zero. Under A2, the b11 element corresponds to the grand mean and the test is whether it is equal to zero. Taken as a whole, the set of hypotheses provides they basis for so-called "step-down" hypothesis tests on the "price-like" attribute.



Further illustration that the interpretation of the hypothesis corresponding to F and C matrices (depending on the coding of A) is provided by Figure 3 (set 5 and 6). Under A1, set 5 tests the hypothesis that the "brand" effect in group one is zero. Under A2, set 5 tests the hypothesis that the average "brand" effect across groups is zero. Set 6 provides the formulations for testing the hypothesis that the "brand" effects across the two groups are equal.


The aggregated CA problem may be formulated in terms of the generalized multivariate regression model. Using this approach, hypotheses of the form H0 : F b C' = 0 against the alternative H1 : F b C'_ 0 may be formulated. The matrix r F np identifies the rows of b that will enter into the hypothesis test. That is, hypotheses regarding partworths, or linear combinations of partworths, are specified by F. The matrix t C' ng identifies the columns of b that will be used. In other words, it specifies which groups are used in the hypothesis. A particularly appealing aspect of the approach is that hypotheses regarding interactions between group membership and partworth values may be formulated in an efficient and compact form through judicious choice of F and C. Examples are provided by hypothesis sets one through six of Figure 3. The appeal of having an efficient mechanism for screening group by attribute interactions is that significant interactions of this sort provide evidence for potential market segmentation strategies. That is, significant interactions indicate that members of the respective groups may respond differently to concept formulations.

The difference between the outlined approach and the group approach summarized in Figure 1 is that a covariance matrix that captures empirical heteroscedasticity and collinearity in responses is available. Estimation and subsequent hypothesis tests incorporate this information. Two directions for additional research are planned. First, is evident that the estimation procedure may be formulated as a within subjects design with conjoint attributes corresponding to "trials" factors and the group factor corresponding to "between group" factors. Many statistical software packages can accommodate within subjects designs. Formulating the GMR formulation as a within subjects design will make widely available estimation packages available for CA hypothesis testing. Secondly, there are numerous empirical response processes that might affect the typical conjoint study. A study in currently under way to simulate a number of the possibilities and evaluate the sensitivity of hypothesis tests to violations of the assumption of OLS which is commonly used for estimation.




Testing hypotheses of the form H0 : F b C' = 0 against the alternative H1 : F b C'_ 0 may be tested using the following likelihood ratio test. F is a (r x np, r _ np) specified matrix. C' is a (t x ng, t _ ng) specified matrix. The tests are based on the hypothesis and error matrices:

(A1) H = F b C' (C R C')-1(F b C')'

(A2) E = F (X D-1 X)-1 F', where

(A3) R = (A' A)-1+(A' A)-1A' Y D-1Y' A (A' A)-1 - b (X' D-1X) b

A test of H0 is given by the U statistic (Srivastava and Carter, p. 184):

(A4) Ur,n,t = |E| / | E + H |,

where n = ng1 + ng2 - no + np - ng, ngi = the number in group i, and|.| is the determinant of the indicated term. For large n, an asymptotical test may be based on:

(A5) TSTAT =- [ n - ( r - t + 1)/2] ln(U),

(A6) P{ TSTAT _ z} _ P{ c2 rt _ z }.


Green, Paul E., and V. R. Rao (1971), "Conjoint Measurement for Quantifying Judgmental Data," Journal of Marketing Research, 8, 355-63.

Green, Paul E. and V. Srinivasan (1978), "Conjoint Analysis in Consumer Research: Issues and Outlook," Journal of Consumer Research, 5 (September), 103-23.

Grizzle J. E. and D. M. Allen (1969), "Analysis of Growth and Dose Response Curves," Biometrics, 25, 357-81.

Hagerty, M. R. (1985),"Improving the Predictive Power of Conjoint Analysis: The Use of Factor Analysis and Cluster Analysis," Journal of Marketing Research, (May) XXII, 168-84.

Kamakura, W. A. (1988), "A Least Squares Procedure for Benefit Segmentation with Conjoint Experiments," Journal of Marketing Research, (May) XXV, 157-67.

Kerlinger, F. N. and E. J. Pedhauser (1973) Multiple Regression in Behavioral Research, New York : Holt, Rinehart and Winston, Inc.

Khatri, C. G., (1966) "A Note on a MANOVA Model Applied to Problems in Growth Curves," Ann. Inst. Statist. Math., 18, 75-86.

Kirk, R. E., (1982) Experimental Design: Procedures for the Behavioral Sciences, 2nd ed, Monterery, Ca: Brooks/Cole Publishing Company.

Kruskal, J.B. (1965) "Analysis of Factorial Experiments by estimating Monotone Transformations of the Data," Journal of the Royal Statistical Society B, 251-263.

Morrison, D.F. (1976), Multivariate Statistical Methods, New York: McGraw-Hill Book Company.

Potthoff, R. R. and S. N. Roy (1964) "A Generalized Multivariate Analysis of Variance Model Useful Especially for Growth Curve Problems, Biometrika, 51, 31 3-26.

Srinivasan, V. and A. D. Shocker (1973) "Linear Programming Techniques for Multidimensional Analysis of Preferences," Psychometrika, 38, 337-369.

Srivastava, M. C. and E. M. Carter (1983) An Introduction to Applied Multivariate Statistics, New York : North Holland.