Limits to Accuracy in Conjoint Analysis

Franklin Acito, Indiana University
Richard W. Olshavsky, Indiana University
ABSTRACT - The relative performance of two conjoint designs, one using two levels per attribute, and the other using three levels is compared. Although the three-level design can potentially provide more information to the researcher, the increased difficulty of the evaluation task results in poorer data than obtained in the two level design.
[ to cite ]:
Franklin Acito and Richard W. Olshavsky (1981) ,"Limits to Accuracy in Conjoint Analysis", in NA - Advances in Consumer Research Volume 08, eds. Kent B. Monroe, Ann Abor, MI : Association for Consumer Research, Pages: 313-316.

Advances in Consumer Research Volume 8, 1981      Pages 313-316


Franklin Acito, Indiana University

Richard W. Olshavsky, Indiana University


The relative performance of two conjoint designs, one using two levels per attribute, and the other using three levels is compared. Although the three-level design can potentially provide more information to the researcher, the increased difficulty of the evaluation task results in poorer data than obtained in the two level design.


Conjoint analysis, beginning with the expository article by Green and Rao (1971), has developed into a mainstream tool of the market researcher. Investigations of the reliability and validity of the technique (e.g., Green and Wind 1973, Acito 1977, Scott and Wright 1976, and McCullough and Best 1979 ) have produced generally favorable evaluations. Researchers, however, still face difficult design decisions in using conjoint analysis for which the existing literature provides little guidance. An example of such a design issue is the number of levels per attribute. Experimental designs (see Addleman 1962) exist for two, three, four or more levels per factor. In some cases the number of levels is dictated by the context of the problem. In other situations, the analyst has control over the number of levels.

Price is one attribute which allows flexibility in selecting number of levels. Two levels, P" and PB can be used to derive utility values U" and UB. Utilities for intermediate price levels can be obtained by linear interpolation,


The analyst may not be willing to make such an assumption of linearity, however. To investigate non-linear effects, a third price level can be included in the design. Inclusion of the third level provides, in principle, at least, more detailed estimation of the utility function. However, two issues arise in increasing the number of levels per attribute which may result in poorer information from the conjoint analysis.

First, increasing the number of levels results in a greater number of parameters to be estimated, increasing the chances for statistical error with a fixed number of profiles or assemblies. (This is true only if the attribute levels are treated as discrete, which would be the case if the analyst did not wish to restrict the utilities to an a priori functional form.) Second, the additional information processing burden on the respondent can cause confusion, carelessness, or the adoption of simplifying choice heuristics frustrating the researcher's attempt to obtain greater accuracy. This study is designed to determine if a reduction in accuracy does in fact occur as the number of levels is increased. This study is also designed to provide information, through the simultaneous use of a process tracing technique (protocol analysis), concerning the hypothesized deterioration in the subject's capacity to cope with the increased information processing burden.



Twenty MBA students served as subjects in this study. They were selectively recruited on the basis of their ownership or usage of a product (typewriters) which was of interest in another, unrelated, part of the study. Since subjects were randomly assigned to the 2 level or 3 level condition, the lack of control for prior knowledge of stereo receivers should not be a problem. Subjects were paid for participation. The small sample size is typical of studies in which protocol analysis is performed.

Stimulus Materials

Stereo Hi-Fi receiver was selected as a product of interest to students for which the number of levels per attribute could be varied uniformly (i.e., most major attributes were inherently multi-level).

For this study, six attributes and three levels were selected: Power Output (45, 30, 50 percent); Sensitivity (2.0, 2.5, 3.0 microvolts); Signal to Noise Ratio (75, 70, 65-decibels); Price (200, 250, 300 dollars): Warranty (3, 2, 1 years), The number of levels per attribute was either 2 or 3, depending upon the condition (underlined values were used in the 2 value condition). Using an orthogonal array design, 23 profiles were developed for each of the two conditions (Addleman 1962, Plackett and Burman 1946). [The orthogonal design for the three level case given by Addleman (1962) contained 25 assemblies; however, only 23 were distinct. The design for 24 assemblies given by Plackett and Burman for two levels has 22 distinct combinations. Twenty-three were used, so that one duplicate was included in this design.] The letters "A" to "W" were used to indicate alternatives of the 2 level condition, while the numbers "1" to "23" were used to designate the alternatives of the 3 level condition.

A second set of eight holdout profiles was developed (completely different from those in the previous sets) involving only 2 levels per attribute, also based upon an orthogonal array. (The fact that this set had only 2 levels is not a critical issue; all that was desired was a criterion set of preference ranks.) These were numbered "100" through "800."

A third set of ten actual stereo receivers was developed by selecting a representative set of ten "actual" models from a stereo catalog. For this set, actual brand names were used and warranty was dropped since no warranty information was provided in the catalog. (Subjects were told to assume that all models had the same two year warranty.) Additional descriptive information was also available, as was a photograph of each receiver.


The experimental task required respondents to express preferences for the alternative receiver profiles. These profiles were printed on 3" x 5" cards with each card listing the value of each of the six attributes. In the actual brand condition, pages from the catalog of a local stereo equipment retailer were used. To ensure comprehension of all attributes, definitions were presented to each subject on a separate sheet; subjects were requested to read these definitions and to refer to them throughout the choice process as needed.

The 23 cards for the first task were shuffled (for each subject) and arranged in an array on a table. Ten subjects were randomly assigned to the 2 level condition and ten to the 3 level condition. The subjects were asked to examine the "receivers" and to select the one model from the set of alternatives offered that would be purchased. Subjects were further instructed that they could not move or in any other way manipulate the cards. This was dome to simulate the situation encountered in a recall outlet in which customers could not physically rearrange the alternatives. [Bettman (1979) has made a call for such realism in study design. Olshavsky and Acito (1979) have recently investigated the effects of changes in card sorting procedure on choice rule.] Finally, subjects were instructed to verbalize all of their thoughts as the receivers were considered. The experimenter, who was seated opposite the subject, constantly monitored the protocol and reminded (in simple, nondirective ways), the subject to articulate his/her thoughts when-. ever necessary to increase the amount of protocol data. [Evidence concerning the lack of interference of protocol analysis on cognitive tasks has been recently reviewed by Simon (1979).] Each session was tape recorded with the subject's knowledge.

After the subject made the first choice from the 23, that alternative was eliminated and he/she was asked to imagine it were no longer available for sale. The subject was instructed to again make a "purchase" decision from the remaining alternatives. This procedure was repeated until a preference ordering for all 23 alternatives was obtained.

To provide the data needed to test the predictive ability of conjoint analysis, subjects were asked to express their preferences for the set of eight holdout profiles with the same procedure used for the 23 alternative sat. Finally, subjects were asked to indicate rank preferences for 10 models of receivers from the catalog pages. Data for each subject were submitted to MONANOVA (Kruskal 1965) to derive part-worth utility values.


Prediction of "Holdout" Ranks

The ranks of the eight holdout profiles were predicted using the utility values derived from the conjoint analysis and the Spearman rank correlation coefficients were computed between the predicted and actual ranks for each respondent. Table 1 shows the mean correlations for the two and three level designs. The difference in the means was not significant. Moreover, the two level design produced more direct "hits" on the ranks than did the three level design (45 out of a possible 80 versus 31 out of 80). A c2 test for the 2 x 2 table resulting from this data indicated a significant difference at the .05 level.

Prediction of Actual Brands

The respondents' ranks for the ten actual brands were disaggregated according to the three brands used in the experiment. Three models each of Pioneer and Kenwood receivers were used while four models of the Technics brand were used. Appropriate utility values derived from the conjoint analysis were combined (using linear interpolations where necessary) to predict the preference ranks for each of the three sets of ranks. Table 1 shows the distribution of perfect prediction for each of the experimental conditions. The differences between the overall number of perfect predictions (24 out of 30 for the two level design versus 15 out of 30 for the three level design) was significant beyond the .05 level using a c2 test.





Utility Value Violations

In many situations the directionality of respondent preferences for attribute levels can be stated a priori. In this experiment, each of the attributes had a clearly defined preference directionality (e.g., lower distortion should be preferred to higher distortion, lower price preferred to higher price, etc., with the usual ceteris paribus assumption). For both experimental conditions, the number of violations of a priori expectations was determined. For the three level condition, only the outer extreme levels of each attribute were uses for this assessment. Table 2 shows the results. A total of six violations was observed for the two level condition while 11 violations were observed for the three level condition. (If the price utility sign is not considered, the two level had three violations and the three level design had 8 violations.)

The utility values for the intermediate levels of each attribute in the three level condition were also examined. For each attribute used in this study, it seemed appropriate to assume that the utility values for the intermediate levels should lie between those for the extremes. In other words, utility functions for each attribute were assumed to be monotonically related to levels of the attribute. A number of violations of this monotonicity were observed: two violations for the power attribute; four for price, two for distortion: seven for FM sensitivity; five for noise, and six for warranty.

Error Estimates

The two-level design required the estimation of six parameters with MONANOVA while the three level design required the estimation of twelve parameters. Since the number of profiles was 23 for both designs, the 3 level design is more susceptible to error. One perspective on this problem is gained by examining the stress values expected for both designs using random data. The average stress for the two level design with random data was .72, with a standard deviation of .09. Assuming the stress values to be normally distributed, 5% of the stress values for random data are expected to be below 57%. A similar procedure for the three level design resulted in an estimate that 5% of the stress values for random data would be below 29%.

Table 1 shows the average stress values observed for the two level and three designs. Since the random data results above indicate that it is "easier" to achieve a low stress value with the three level design, lower stress values would be expected for real data using that design. The averages indicate that, to the contrary, the stress for the two level design was somewhat lower, although the difference between the two designs was not significant at the .05 level.

Choice Rule Analysis

The protocol analysis was based on transcripts of the tape recordings for the set of 23 profiles only. The objective was to ascertain the decision rule used by each subject. The tape recorded protocols were transcribed and then broken down into a sequence of task relevant statements as is typically done with protocol data (Payne, Braunstein, and Carroll 1978). Subjects were then classified according to the type of choice rule they used (as inferred from the protocols). In the interest of consistency and to ensure reliable categorization, the same definitions of choice rule and coding criteria used by previous researchers were adopted (Olshavsky 1979, Payne 1976, Wright and Barbour 1977). [Details of the procedure for choice rule identification have been discussed in previous papers, e.g., Olshavsky and Acito 1980.]

Of the 10 subjects in the "two-level" condition, five used a lexicographic choice rule (by attribute). Three used a conjunctive rule (by brand) but modified the importance of the choice criteria according to a specific attribute priority order. And two used a modified conjunctive rule where priority of attribute was varied after all alternatives with the most desired attribute were depleted.

Of the 10 subjects in the three level condition, seven used a choice strategy that was so inconsistent and ambiguous that it was not possible to classify it as a single choice rule or even a combination of choice rules. In those seven cases a pair-wise comparison process (attribute dominance or additive difference) was used with little or no attempt to apply the pair-wise comparison strategy to all remaining alternatives. Of the remaining three subjects, one used a lexicographic rule but then switched to a pair--wise (attribute dominance) strategy.


The major finding of this study is that increasing the number of levels per attribute does not necessarily increase accuracy in utility function estimation. The results suggest that, to the contrary, the utility values derived from the two level condition are somewhat superior in predictive ability when used in the manner described. This superiority persists even in cases where interpolations of utility values are needed (as in the catalog descriptions). If the three level design were superior, its superiority should be evident where interpolated utility values were used, since non-linearities in the utility functions would be captured. The utility values derived for the three level condition also were more likely to violate a priori assumptions about directionality of preferences. This result must of course be qualified given the small sample size and the unrepresentativeness of MBA students. Further research on this important issue is required using a larger sample of different types of subjects and less complex products.

These results could be due to the instability in parameter estimates resulting from fewer degrees of freedom remaining for error and/or they could be due to the confusion of the respondents. However, the error analysis performed here and the protocol analysis suggest that the more likely explanation is that most of the respondents in the three level condition were unable to cope with the increased information processing burden imposed upon them. This implies that great care must be given to the interpretation of conjoint analysis results based on designs involving a large number of levels per attribute.


Acito, F. (1977), "An Investigation of Some Data Collection Issues in Conjoint Analysis," Educators' Proceedings, Chicago: American Marketing Association, 82-85.

Addleman, S. (1962), "Orthogonal Main-Effect Plans for Asymmetrical Factorial Experiments," Technometrics, 4 (February), 21-46.

Bettman, J. R. (1979), An Information Processing Theory of Consumer Choice, Reading, MA: Addison-Wesley Publishing Company.

Green, P. Z. and Rao, V. R. (1971), "Conjoint Measurement for Qualifying Judgmental Data." Journal of Marketing Research, 8 (August), 355-63.

Green, P. E. and Wind, Y. (1973), Multiattribute Decisions in Marketing: A Measurement Approach, Hinsdale, Illinois: Dryden Press.

Kruskal, J. B. (1965), "Analysis of Factorial Experiments by Estimating Monotone Transformation of the Data," Journal of the Royal Statistical Society, Series B (March), 251-63.

McCullough, J. and Best, R. (1979), "Conjoint Measurement: Temporal Stability and Structural Reliability," Journal of Marketing Research, 16 (February), 26-31.

Olshavsky, R. W. (1979), "Task Complexity and Contingent Processing in Decision Making: A Replication and Extension, Organizational Behavior and human Performance, 24, 300-316.

Olshavsky, R. W. and Acito, F. (1980), "The Impact of Data Collection Procedure on Choice Rule," Advances in Consumer Research, 7, 729-732.

Payne, John W., Braunstein, L. and Carroll, J. S. (1978), "Exploring Predecisional Behavior: An Alternative Approach to Decision Research," Organizational Behavior and Human Performance, 22, 17-44.

Payne, John W. (1976), "Task Complexity and Contingent Processing in Decision Making: An Information Search and Protocol Analyses," Organizational Behavior and Human Performance, 16 (August), 366-387.

Plackett, R. L. and Burman, J. P. (19&6), "The Design of Optimum Multifactor Experiments," Biometrika, 33, 305-325.

Scott, J. E. and Wright, P. (1976), "Modeling an Organizational Buyer's Product Evaluation Strategy: Validity and Procedural Considerations," Journal of Marketing Research, 13 (August), 221-4.

Simon, Herbert A. (1979), Information Processing Models of Cognition, Annual Review of Psychology, 30 (February), 363-96.

Wright, Peter and Barbour, Fredrick (1977), "Phased Decision Strategies: Sequels to an Initial Screening," in M. K. Starr and M. Zeleny (eds.), Multiple Criteria Decision Making: TIMS Studies in the Management Sciences, Vol. 6, Amsterdam: North-Holland.