On the Design of Multiattribute Choice Experiments Involving Large Numbers of Factors Or Factor Levels

Paul E. Green, University of Pennsylvania
[ to cite ]:
Paul E. Green (1974) ,"On the Design of Multiattribute Choice Experiments Involving Large Numbers of Factors Or Factor Levels", in NA - Advances in Consumer Research Volume 01, eds. Scott Ward and Peter Wright, Ann Abor, MI : Association for Consumer Research, Pages: 228-241.

Advances in Consumer Research Volume 1, 1974    Pages 228-241


Paul E. Green, University of Pennsylvania

[Paul E. Green is S. S. Kresge Professor of Marketing, the Wharton School, University of Pennsylvania.]

Describing how people choose among multiattribute alternatives is a topic of increasing interest to researchers in consumer behavior. A number of approaches--value expectancy models, functional measurement, conjoint measurement--are being employed in the estimation of consumer utilities in which the underlying model is assumed to be compensatory [5]. One statistical model that has attained rather high popularity in these kinds of measurement applications is the ANOVA model of traditional experimental design.

In conjoint measurement, the ANOVA formulation, particularly the additive, main-effects model, provides a useful approach to modeling multiattribute choice by emphasizing orthogonal designs in which the part-worth contributions of each factor to total utility can be estimated without confounding. Symbolically, the utility U (x) of some multiattribute alternative x = (x1, x2, ..., xn), expressed as an n-component vector, is defined as:


where Uj can be any real-valued function of xj, the j-th component of x.

In conjoint measurement methodology f (x)is defined to be a monotonic function of x. More specifically, given any two alternatives, x and x', in the conjoint measurement model:

x >o x'    U (x) > U (x')   (2)

where >o is an observed weak order relation indicating that x is not dispreferred to x'. The primary advantage of conjoint measurement is the fact that interval-scaled utilities can be estimated from response data originally expressed in only ordinal (or possibly categorical) terms. Thus, the additive conjoint measurement model can be viewed as a main-effects ANOVA design with the important modification that the response variable is only rank ordered.

However, a practical problem arising in the application of conjoint measurement to estimating component utilities concerns the large number of experimental combinations that are involved under any multiattribute choice problem of realistic size. The purpose of this note is to describe three procedures for coping with the problem of designing stimuli for respondent evaluation in cases that entail large numbers of factors or experimental levels within factor. While these proposals are described in the context of parameter estimation in multiattribute choice, they would also seem to have some applicability to other kinds of experiments involving judgmental evaluation tasks.

The three approaches are each discussed in terms of a hypothetical problem dealing with consumer evaluation of commercial airline services.


For purposes of illustration, assume that a researcher is interested in developing consumer trade-off measures for the following factors and levels of transatlantic airline travel:

List A -- Factors and Levels (three levels per factor)

A. Airline carrier: TWA; BOAC; Air France.

B. Aircraft type: B-707; B-747; DC-10.

C. Departure time relative to time most preferred: within 4 hours; within 2 hours; within 1/2 hour.

D. Anticipated plane load: 90% full; 70% full; 50% full.

E. Number of stops enroute: two stops; one stop; non-stop.

F. Price: regular fare; 5% discount; 10% discount.

G. Arrival time accuracy: within 2 hours; within 1 hour; within 1/2 hour of scheduled arrival time.

H. In-flight service: below average; average; superior.

In addition, suppose the researcher is interested in the following articulations of in-flight service:

List B -- Factors

A. Seat width.

B. Leg room.

C. Meals and bar service.

D. Entertainment.

While not exhaustive, the preceding lists are illustrative of the kinds of characteristics that consumers might consider in choosing transatlantic flights.

For the moment we concentrate our attention on List A. We note that preparation of all distinct combinations from this list would lead to a 38 factorial design of 6,561 flight profiles. Clearly, some means is required to reduce this set to some more manageable number for respondent evaluation. And, ways to do just this provide the primary motivation for this note.


The reader is no doubt aware of the concept of a latin square design--a type of fractional factorial--in which main effects can be estimated from a reduced number of combinations, as compared to a full factorial design. For example, with three factors A, B, and C, each at three levels, the following represents one example of a latin square:


We note that each level of factor C appears once in each row and each column of the above table. Moreover, only nine (rather than 27) combinations are needed to estimate all main effects.

The reader may also be aware of the graeco-latin square design which is a generalization of the latin square to cope with an additional factor, also at the same number of levels. For example, a graeco-latin square design involving four factors, each at three levels, can be constructed as follows:


Note that each pair Ci, Di appears exactly once in the table and that each Ci and Di separately appears once in each row and column. Orthogonal arrays extend this idea to deal with still greater numbers of factors (under the appropriate design conditions).

In general, if each factor is at the same k levels, an orthogonal design leading to the (unconfounded) estimation of all main effects can be constructed if k is a prime or power of a prime. [In some cases, certain interaction effects can be estimated as well; however, here we confine our discussion to estimating main effects only.] Hence, these designs--called orthogonal arrays [10]--can be constructed for k = 2, 3, 4, 5, 7, 8, 9, 11, etc. If each factor is at k levels and the number of experimental combinations is m = kn where n is a positive integer (and k is a prime or power of a prime), then orthogonal arrays can be determined for up to s different factors by the expression:


In the above example, m = kn = 32 = 9; hence s = 4. However, if m = 27 and k = n = 3, we have:


= 13 different factors (from only 27 response combinations)

However, if all 13 factors are used in the experimental design, no degrees of freedom are left over for error estimation. By the same token, with conjoint measurement techniques we desire still additional degrees of freedom in the form of non-metric constraints. Since the eight factors of List A would use up only sixteen degrees of freedom (two for each factor), eleven degrees of freedom would remain to help determine the solution, if m were set at 27 combinations and a nonmetric approach were used. [Some Monte Carlo work with MONANOVA has been conducted by the author and his students. Tentatively, the research suggests that the number of nonmetric input parameters should be about twice the number of parameters to be estimated. Unfortunately, the work has not been extensive enough to state anything more than a rough indication at this time.]

Table 1 shows one such orthogonal array, adapted for the eight factors in List A. Note that each level appears an equal number of times (nine) in each column. This design represents a 1/243 replicate of the original 38 factorial. Several authors [1, 2, 3, 9, 10] provide procedures for constructing orthogonal arrays for other numbers of levels and factors. Moreover, some extensions have been made to the asymmetrical case in which the number of levels need not be the same across factors [7].

Using the Design

In practice the design of Table 1 would be used to construct 27 profiles of the airline characteristics appearing in List A. Each profile, as made up from the factor levels of Table 1, could then be evaluated by having the respondent perform one of various types of evaluation tasks. For example:

1. Assign each profile to an ordered set of five categories, ranging from "definitely would not take this flight" to "definitely would take this flight;" or

2. Assign each profile to an ordered set of five categories and then rank profiles within category in terms of likelihood of taking the flight; or

3. Assign a subjective probability to each profile in turn, ranging from zero to 100%, for taking the flight.

Whatever the evaluation procedure employed, one would obtain a (possibly weak) ranking of the 27 profiles from most to least likely to select.

Since conjoint measurement algorithms (e.g., Kruskal's MONONOVA program [8]), can take missing entries, the rank order itself can serve as input, yielding interval-scaled part-worths (with arbitrary zero but common unit) for each factor. Using these scales, one can then construct, via the additivity assumption, estimated utility values for all 6,561 combinations. The ranking of a subset of these utilities could be checked against the ranking of respondent evaluations obtained from a corresponding set of profiles (not used in the calibration phase) as a way of providing validation-type information.

Since an illustrative analysis of conjoint measurement data has been presented elsewhere [4], we do not go through this step here. Suffice it to say that such orthogonal arrays can be constructed for levels of k = 2, 3, 4, 4, 7, 8, 9, 11, which should cover most cases of practical interest.


In some cases of interest, the researcher may have ten or twelve levels of one or more factors. Moreover, the number of levels may differ from factor to factor. This type of problem can be handled by a three-stage procedure, involving:

1. Separate estimation of each single-factor utility scale, followed by

2. Presentation of an orthogonal array drawn from a 2n factorial design made up of end-point utility-level descriptions, followed by

3. Rescaling of single-factor utilities in accordance with the common scale unit derived from evaluations of the orthogonal array stimuli.



To illustrate this procedure, let us return to List A and assume that twelve different airline carriers (including TWA, BOAC, and Air France) are involved. Single factor utility estimation would proceed by having the respondent:

1. Select his most preferred carrier and assign it a value of 10. This carrier would represent the upper end-point level.

2. Select his least preferred carrier and assign it a value of 0. This carrier would represent the lower end-point level.

3. Assign values between O and 10 (ties permitted, of course) to the remaining ten carriers.

Here we assume that direct magnitude estimation can be used to find single-factor utilities at the interval-scale level. (Less stringent assumptions involving ranking procedures are also possible, but are more time consuming.)

If eight such single-factor utility scales are found, one for each factor in List A, the next step is to select two levels of each factor and present the respondent with an orthogonal array based on a 28 factorial. Although the choice of reference levels of each factor for this step in the procedure is generally not critical, usually we would wish to choose those two levels that are most widely separated in terms of single-factor utility. That is, ordinarily we would choose the level receiving the +10 value and the level receiving the 0 value in the single-utility scale estimation step.

Using the Design

Sixteen profiles (with each factor at two end-point levels) would then be made up according to the orthogonal array of Table 2. The respondent would be asked to evaluate these profiles, as described earlier, and a (possibly weak) ranking would be obtained and submitted to MONANOVA. Utility scales would then be found for all end-point stimulus levels.

The last step entails a rescaling of the intermediately valued factor levels by application of the utility difference between the extremes. That is, the common scale unit found from MONANOVA would be used to rescale the single-factor utility scales. In this way, each of the eight unidimensional utility scales (found from direct magnitude estimation) would be stretched or compressed in accordance with the utility difference estimated by the employment of the lowest and highest-utility levels in the orthogonal array comparisons. For example, suppose the utility difference between lowest and highest evaluated carrier were only one-third that of the utility difference between lowest and highest evaluated aircraft type. If so, the former's scale range would be only one-third of the latter's and all intermediate levels of air carrier utilities would be adjusted proportionately in terms of their values on the original (direct estimation) scale.

After this step has been completed, all utilities (end-point levels and intermediate) would be expressed in common unit and, again, the researcher could develop estimated additive utilities for any combination of interest. The three-stage approach, outlined above, possesses quite a bit of flexibility for dealing with a relatively large (and a not necessarily equal) number of levels within factor. Its disadvantage, of course, is that three steps are involved: (a) single-factor utility estimation; (b) orthogonal array applied to end-point factor levels; and (c) rescaling of single-factor utilities to incorporate a common scale unit.




The third procedure described in this note is also multistage. To illustrate it, we shall assume that the factors in List B are all components of the "higher-level" factors, in-flight service, the eighth factor of List A. Moreover, for purposes of illustration we shall assume that the four factors of List B are each describable according to four levels. For example, "entertainment" could be characterized by: (a) no entertainment; (b) magazines only; (c) magazines and FM music; (d) magazines, music and movies. (However, there is no need to nest levels, as illustrated here.)

Under this method, one would first prepare an orthogonal array for the four subfactors of List B, each at four levels. This orthogonal array appears in Table 3 and constitutes the set of initial stimuli for evaluation.

Using the Design

In the hierarchical approach, utilities would first be found for the four factors of List B, again by having the respondent rank order (possibly weak) the 16 profiles made up from Table 3 and submitting the data to MONANOVA, as before. Next, the lowest ranked, middle ranked and highest ranked of the 16 profile combinations made up from List B can be substituted for the factor, in-flight service, and the orthogonal array approach applied to the (so-modified) List A, using the design of Table 1. The utility range for these three profiles can then be used to rescale the List B utilities, as obtained from the Table 3 design. [This procedure assumes that the orthogonal array (Table 3) adequately covers the range of the eighth factor in List A.]

The basic idea of the hierarchical approach is to use utilities developed at a higher stage in the hierarchy to rescale lower-stage components. Again, one ends up with a set of (additive) utilities for all levels of all factors. In principle, this idea can be extended to three or more levels in a decision-problem hierarchy.


The three preceding methods have all been described from the viewpoint of having each respondent react to all combinations of interest.

Situations can arise, however, in which the researcher may be satisfied with analyses based on group data and is willing to present individual respondents with only a subset of the stimulus combinations. Or, even though the respondent is assumed to evaluate all stimuli, the researcher may wish to present them in subsets so as to make any specific ranking task easier. For example, assume that one wished to estimate utility functions for the eight factors in List A. Let us also assume that only two levels of each factor (the first and third levels shown for each factor in List A) are involved in this case. Finally, let us assume that the researcher does not wish to present the respondent with any more than six stimuli at any one time.



Balanced incomplete block (BIB) designs represent a large class of designs that are characterized by the main condition that each pair of "treatments" (combinations) appears an equal number of times in some block (a "block," in this case, being a person). To be specific, assume that we wished to make up a BIB design for the 16 combinations denoting the orthogonal array of Table 2. We recall that this array is based on a 28 factorial design.

On of the conditions that BIB designs [11] must satisfy is the following:

bk = rt    (4)

where: b denotes number of blocks; k denotes block size; r denotes number of replications; and t denotes number of treatments.

If, according to Table 2, we have 16 treatments (orthogonal-array combinations) and we desire six combinations for ranking by each of 16 respondents (i.e., k = 6, and b = 16) we have:

16(6) = r (16)

r = 6 replications per treatment

Table 4 shows a BIB design involving 16 respondents (blocks) with six combinations per block and each treatment is replicated six times.

In BIB designs, each treatment is paired with all other t-l treatments some A times. In general, a BIB design requires that the following relationship hold:

r (k-1) = h (t-1)    (5)

6 (5) = h (15)

h = 2

Thus, in Table 4 each treatment is paired with all other treatments exactly twice

If the BIB design is used in conjunction with any of the three preceding approaches, the rank order needed for MONANOVA is developed by counting the frequency (across respondents or across pairs, within respondent,) with which each treatment is evaluated higher than each of the other treatments with which it is paired. For example, if the first respondent (Table 4) ranked the combinations in the order most to least preferred: 5; 1; 16; 15; 9; 14, we can infer that combination 5 is more highly valued than 1, 16, 15, 9, and 14; 1 is more highly valued than 16, 15, 9, and 14; and so on.

Finally, it is possible to use BIB designs in conjunction with a core set of combinations that are given to each respondent. For example, each respondent could be given all 16 combinations of Table 2 and, in addition, a set of BIB designed combinations that could be analyzed (at the group level) for various two-factor interactions.




The three measurement procedures described earlier in this note grew out of a pragmatic need--to be able to estimate parameter values for multiattribute alternatives in problems of realistic size. Other procedures will 4 (and should) be developed as more experience with the methodology is obtained. [R. M. Johnson [6] has proposed another conjoint measurement procedure based on two-factor-at-a-time comparisons. However, his data collection task would seem to be formidable in problems of realistic size, since Johnson proposes--as a rule-of-thumb procedure--that each factor be compared, pairwise, with at least two other factors. Thus, in the case of List A, at least 16 three-by-three tables would be presented for evaluation, requiring a total of 9 (16) = 144 evaluations by the respondent.] However, at this stage in the state of the art it is well to review some of the limitations of these approaches:

1. All three approaches are based on additive utility models.

2. In the second and third (multistage) methods, it is assumed that rescaling is stable over stages.

3. The profile components are assumed to be more or less perceptually independent, i.e., factors and levels are chosen in such a way that the respondent believes that each profile is at least technically realizable.

4. The profiles are assumed to be complete enough (and yet without exhibiting excessive information "overload") to capture the respondent's evaluative process.

Little is known at this point about how realistic these assumptions are. Some evidence has been assembled to support the notion that many types of evaluations are well represented by additive models, even in the face of interactions, so long as each utility scale is conditionally monotone over all levels of all other factors. However, to the author's knowledge, no evidence has been assembled on the stability of utilities as estimated via the hierarchical procedure.

The perceived independence and completeness of profiles is still a problem to contend with, inasmuch as respondents may discount the value of profiles that are: (a) unbelievable or (b) unrepresentative of the total situation. Respondents' "belief" judgments about profiles could, of course, be obtained and examined as ancillary data.

While nonmetric methods of analysis have been emphasized here, it should be appeared that the orthogonal array approach is just as readily suited to metric procedures (ANOVA or dummy-variable regression3. In this case the evaluative response data would be assumed to be metric to begin with; hence, conjoint-measurement algorithms would not be needed to perform a monotone rescaling of the response variable. Finally, fractional factorial designs in which (for example, two-factor interactions are retained) would seen to hold some prom se for relaxing the more stringent assumptions followed here, albeit with a marked increase in the number of profiles required for respondent evaluation. Designs for incorporating two-factor (or higher) interactions are available and can be employed in either nonmetric or metric scaling approaches to utility estimation.


Bose, R. C., Mathematical Theory of the Symmetrical Factorial Design, Sankhya, 8 (1947), pp. 107-66.

Bose, R. C., and Bush, K. A., "Orthogonal Arrays of Strength Two-and Three," Annals of Mathematical Statistics, 23 (1952), pp. 508-524.

Fisher, R. A., "The Theory of Confounding in Factorial Experiments in Relation to Theory of Groups," Annals of Eugenics, 11 (1942), pp. 341-353.

Green, Paul E., Carmone, F. J., and Wind, Yoram, "Subjective Evaluation Models and Conjoint Measurement," Behavioral Science, 17 (1972), pp. 288-299.

Green, Paul E. and Wind, Yoram, "Recent Approaches to the Modeling of Individuals' Subjective Evaluations," paper presented at the Attitude Research Conference, Madrid, February 1973.

Johnson, Richard M., "Trade-off Analysis: A Method for Quantifying Consumer Values," working paper, Market Facts, Inc., Chicago, Ill., September, 1972.

Kishen, K. and Tyagi, B. N., "On Some Methods of Construction of Asymmetrical Factorial Designs," Current Science, 30 (1961), pp. 407-409.

Kruskal, Joseph B., "Analysis of Factorial Experiments by Estimating Monotone Transformations of the Data," Journal of the Royal Statistical Society, Series B, 27 (1965), pp. 251-263.

Plackett, R. L. and Burman, J. P., "The Design of Optimum Multifactorial Experiments," Biometrika, 33 (1946), pp. 305-325.

Raghavarao, Damaraju, Constructions and Combinatorial Problems in Design of Experiments, New York: John Wiley 6 Sons, 1971.

Winer, B. J., Statistical Principles in Experimental Design, Second Edition, New York: McGraw-Hill Book Co., 1972.