On Alternative Experimental Methods For Conjoint Analysis

Thomas W. Leigh, Pennsylvania State University
David B. MacKay, Indiana University
John O. Summers, Indiana University
ABSTRACT - The rise in popularity of conjoint analysis as an approach to studying consumer preferences has been accompanied by the development of a variety of alternative methodological procedures. Interest in this paper is focused on the performance of "experimentally efficient" designs. Issues which face the researcher interested in deciding upon an appropriate design for conjoint analysis are discussed and results of a pilot study on the test-retest reliability and convergent validity of alternative procedures are presented. The results tend to favor the "less efficient" designs with full factorial and pair-comparisons data gathering procedure having the greater reliability and validity.
[ to cite ]:
Thomas W. Leigh, David B. MacKay, and John O. Summers (1981) ,"On Alternative Experimental Methods For Conjoint Analysis", in NA - Advances in Consumer Research Volume 08, eds. Kent B. Monroe, Ann Abor, MI : Association for Consumer Research, Pages: 317-322.

Advances in Consumer Research Volume 8, 1981      Pages 317-322

ON ALTERNATIVE EXPERIMENTAL METHODS FOR CONJOINT ANALYSIS

Thomas W. Leigh, Pennsylvania State University

David B. MacKay, Indiana University

John O. Summers, Indiana University

ABSTRACT -

The rise in popularity of conjoint analysis as an approach to studying consumer preferences has been accompanied by the development of a variety of alternative methodological procedures. Interest in this paper is focused on the performance of "experimentally efficient" designs. Issues which face the researcher interested in deciding upon an appropriate design for conjoint analysis are discussed and results of a pilot study on the test-retest reliability and convergent validity of alternative procedures are presented. The results tend to favor the "less efficient" designs with full factorial and pair-comparisons data gathering procedure having the greater reliability and validity.

INTRODUCTION

Over the past decade, conjoint analysis has become an increasingly popular approach for the study of consumer preferences. In an excellent review of the literature, Green and Srinivasan (1978) have identified six major methodological steps involved in applying conjoint analysis: (1) selection of a model of preference; (2) data collection method; (3) stimulus set construction for the full profile method; (4) stimulus presentation; (5) measurement scale for the dependent variable; and (6) estimation method. The purpose of the paper is to investigate the effects of the methodological decisions in steps three and five on the reliability and validity of conjoint analysis.

Interest in this area is motivated by the recent emphasis on the use of "experimentally efficient" designs with conjoint measurement (Acito 1979 , Cream 1974, Green et al. 1978). Green and Srinivasan (1978) define an experimentally efficient design as one that returns a high amount of information per unit of experimental time. They suggest that efficient designs are likely to involve fractional factorials (stimulus set construction) and rank order (as opposed to pair-comparison) data (measurement scale). Of course, efficiency can also be used in a statistical sense to refer to estimates that have the property of minimum variance. By studying the reliability and validity of experimentally efficient designs, insight should be obtained as to whether experimentally efficient designs will lead to statistically efficient estimates of model parameters.

This issue is by no means trivial. Experimentally efficient designs reduce the amount of time required of a subject to provide preference data with respect to alternative product designs with a specific set of attributes. To the extent that reduction of subject time prevents fatigue, boredom, and stereotyped response patterns, random or error variance will be reduced and reliability and validity may be improved. However, the price of this savings in subject time is a reduction in the number of independent observations. Estimates of model parameters are thus more dependent on any given response. The self-compensating nature of random error has less opportunity to occur with a small number of observations and reliability and validity can be adversely affected. Experimental methods which are empirically shown to have higher reliability and validity have a better chance of providing statistically efficient estimates. However, experimentally efficient designs need not lead to increased reliability and validity.

The following sections expand upon the issues relevant to stimulus set construction, measurement scale, and reliability and validity. After this, a study in which the reliability and validity of 13 different conjoint measurement procedures based on combinations of stimulus sets and measurement scales is described. A presentation of the results and a discussion of the findings concludes the paper.

Stimulus Set Construction

The stimulus set construction step includes the issues of attribute range and variation, interattribute correlation, and full versus fractional factorial designs. Only the last issue is considered here. For full factorials, the number of responses required of a subject increases geometrically rich the number of attributes. Even with a relatively small number of attributes, full factorial designs can become infeasible.

Even when the number of required responses for a full factorial design appears feasible to use with a given measurement scale (e.g., 5 attributes at 2 levels each; 32 stimuli) one might still find that a fractional factorial design produces more reliable and valid results than a full factorial design. The reduction of the subject's task with a fractional factorial may tend to increase the study's response rate and encourage respondents to put more time and effort into evaluating each individual stimulus. Conversely, the smaller number of responses provides less opportunity for random errors to cancel each other. Statistically, the basic issue is whether the lower degrees of freedom in a fractional factorial design are compensated for by higher quality (lower error) data. This is, of course, an empirical question and depends upon the particular stimuli being investigated, the subjects who participate, and the research setting. Caution is thus advised in generalizing the results from one study to another.

Another potential problem with fractional factorial designs is that they do not allow for the estimation of some higher order interaction effects. However, these will frequently be negligible and estimation of lower order effects may be sufficient.

Measurement Scale for the Dependent Variable

A variety of procedures for "defining a measurement scale for the dependent variable" have been used. Coombs (1966) refers to these as "methods of collecting data." The procedures fall into two basic categories, nonmetric (e.g., rank order and pair-comparisons) and metric (e.g., rating scales, graded pair-comparisons, and constant sum pair-comparisons). The nonmetric procedures only require the subject to make ordinal judgments concerning their preferences (or purchase intentions) while the metric approaches involve at least interval scale responses. The choice between these two basic approaches involves the experimental questions of whether subjects can provide metric responses and whether the longer time periods that usually accompany the greater number of responses frequently required with some nonmetric methods (e.g., pair-comparisons) result in boredom, fatigue and greater random error.

Of the two nonmetric approaches, rank order has been most popular for collecting preference data for conjoint analysis (e.g., Green and Wind 1975; McCullough and Best 1979). This method generally requires less respondent time than pair-comparisons and, for a large number of stimuli, the pair-comparisons method becomes impractical. At a given level of fractionation, the number of responses required for pair-comparisons may greatly exceed that required with rank orders. Thus, a half factorial design for five attributes of two levels each would, for pair-comparisons, involve 120 comparative responses (25-1 = 16; C(16,2) = 120), and for rank orders, it would involve 16 responses. Corresponding values for a full factorial design are 496 and 32 responses respectively. Unfortunately, when using the pair-comparisons method, incomplete block designs (Green 1974) often do not allow for the development of a complete rank order of the stimulus set for individual subjects (Cliff 1975).

Among the metric approaches, rating scales are the easiest to apply. Unlike the nonmetric methods of rank order and pair-comparisons, rating scales do not force a subject to use "comparative" judgments since the stimuli are evaluated individually, one at a time. This contributes to the anchoring effect difficulties that the method experiences. However, use of warm-up trials and familiarization periods where the subject reviews all stimuli before making the rating responses can lessen the severity of this problem.

Graded pair-comparisons are a metric procedure that, because of its explicit requirement of comparative evaluation, would appear to have an advantage over rating scales with respect to potential anchoring effects. Its advantage over (binary) pair-comparisons is that it allows the respondent to specify a degree of preference for one stimulus over another. Another advantage over pair-comparisons is that graded pair-comparisons can use both fractional factorials and incomplete block designs to determine respectively the construction and selection of the stimuli that are presented to the subject.

Constant sum pair-comparisons, the third metric procedure, require ratio scaled responses and would seem to share some of the problems and advantages associated with graded pair-comparisons. The nature of the stimuli and familiarity of the subjects with the stimuli are important factors in deciding whether ratio scale responses are reasonable to require.

Reliability and Validity

Reliability can be conceptualized as consisting of temporal and structural components (McCullough and Best 1979). The focus of this paper is on temporal stability which is concerned with the effect of random (error) variance. Test-retest methods are a common way of measuring the temporal stability of consumers' responses. When fractional factorial designs are used, alternative forms should be used on the test and retest portions to reduce the possibility of memory producing a high test-retest correlation. This is of particular importance when the time interval between test and retest is of short duration. Product moment correlations are usually used to measure the similarity of test and retest (e.g., Acito 1977, Green and Srinivasan 1978) though other measures, such as sums of squared differences, are also appropriate. Correlations may be computed among the independent part-worths or among the estimated utilities that are derived from the part-worths. Most studies have employed the latter approach (e.g., Acito 1977, Green et al. 1972, Jain et al. 1975). Acito (1979) argues that the relatively small number of observations available when comparing the test and retest part-worths will lead to unstable results. However, it should be noted that the higher degrees of freedom associated with utility values is partially illusory since these values are functionally related through the part-worths. A design with five attributes at two levels each, for example, produces at most five "independent" part-worths (four when the unit of measurement is arbitrary). Furthermore, if the variance of the part-worths is small (i.e., the attributes are all of relatively equal importance), the test and retest part-worths could be quite close (in an absolute sense) yet yield a low correlation. The severity of this problem can be tested by either comparing the test-retest correlations to the within test variance of the part-worths or by using a different measure of test-retest association, such as the sum of squared differences.

Validity considerations necessarily depend on the use for which the measure of interest is intended. Conjoint analysis of preference Judgments has been frequently proposed as a procedure for "predicting" the success of potential new products (Green and Srinivasan 1978, Pekelman and Sen 1979). As such, predictive validity with actual choice behavior as the criterion would be of central interest (Wittink and Montgomery 1979). For many non-commercial research studies this is economically infeasible and a less demanding measure of future choice behavior, such as a raffle where subjects identify the product they would select if they were to win could be used to provide a measure of predictive validity. Alternatively, the researcher can obtain a second measure of preference or intentions to provide a measure of convergent rather than predictive validity (Scott and Wright 1976, Cattin and Weinberger 1979).

A number of articles have dealt with the reliability and validity of conjoint analysis. A review of these efforts has recently been provided by McCullough and Best (1979). Much of this work has dealt with the reliability and validity of alternative algorithms and/or individual conjoint measurement approaches. The comparative reliability and validity of experimentally efficient designs have generally been ignored.

METHODOLOGY

A convenience sample of 52 undergraduate students in an introductory marketing class was used in this pilot study. The stimuli were written descriptions of inexpensive pocket cameras. Pocket cameras were selected because they represent a moderately complex product category of reasonably high interest to the respondent population. An article on comparative ratings of pocket cameras in a consumer magazine was consulted in choosing the five camera attributes that were used in the study: (1) fixed versus zone focus, (2) maximum aperture--large versus small; (3) automatic exposure--"yes" or "no"; (4) electronic flash--"yes" or "no"; and (5) built-in telephoto --"yes" or "no." Many commercial applications consider a larger number of attributes and attribute levels. However, it was necessary to limit the number of attributes used in this study to insure that the less efficient procedures would be feasible for most of the stimulus sets considered. Five "measurement scales for the dependent variables," two nonmetric methods (rank order and pair-comparisons) and three metric methods (graded pair-comparisons, direct subjective estimates of the dollar values, and rating scales) were used.

The rank order instructions were patterned after those presented in Green and Wind (1973, pp. 261-62). Respondents in the pair-comparisons condition were merely asked to check the camera description (within each pair) they most preferred. For graded paired comparisons, respondents were also asked to specify (for each pair of stimuli) how much more they would be willing to pay for the camera of their choice. The "direct subjective estimates of the dollar value" of the cameras represented the "maximum the respondent was willing to pay (in dollars) for each of the cameras." Finally, the rating scale procedure involved evaluating each camera on an 11-point scale from "least prefer" to "most prefer."

Implementation of the above would suggest 15 approaches (3 levels of fractionation and 5 methods) to collecting preference data. However, two of these (combinations of the full factorial stimulus sets with pair-comparisons and graded pair-comparisons) were not deemed feasible because of the large number of responses required (C(,32,2) - 496). These treatment combinations were thus excluded. Even with the 1/2 fractional factorial, 120 (C(16,2) responses are required for both the graded and "ungraded" pair-comparisons. Because of the subject's added task of evaluating the degree of differential preference under the graded pair-comparisons method, 120 responses were considered infeasible for this method. Hence a partially balanced incomplete block design was utilized with the 1/2 fractional fractorial for graded pair-comparisons. This was possible because of the interval scale nature of the data.

The 13 methodological treatments were randomly assigned to the 52 subjects (4 per cell). Each respondent was given a short statement explaining each of the five product attributes to eliminate potential problem with unfamiliarity concerning the terminology used. Following the completion of their respective preference judgment casks each subject was asked to give a direct subjective estimate of the differential dollar value of the "higher" (versus the "lower") of the two levels for each attribute. This provided a second measure of each subject's part-worths. As such, it presented an opportunity to investigate the convergent validity of the part-worths. It also permitted the comparison of the test-retest reliability of the conjoint analysis procedures with that of a rather simple and less costly approach to analyzing consumer preferences.

After approximately one mouth the subjects repeated their respective tasks. The alternative form approach (i.e., a different set of stimulus descriptions) was not possible for the full factorial case and was not used for the fractional factorials to maintain consistency across treatments.

ANALYSIS

Because of the small cell sizes and the relatively high variance within cells no statistical tests were performed. The results must be considered as only suggestive of the comparative viability of the alternative approaches. We will rely on examining trends in the data in interpreting the results.

Since most of the attributes were naturally dichotomous and all were represented at two levels, the "part-worth function'' model was considered the most appropriate for representing the respondents' preferences. The mere highly fractionated designs did not allow for the estimation of interaction effects and none were considered for any of the results presented in order to maintain comparability across data collection conditions. However, the existence of interaction effects was investigated for some of the metric models where possible (e.g., the rating scale used with a full factorial) and few (and in most cases none) were found to be significant for any of the subjects.

Monotone analysis of variance (Kruskal 1965) was used to estimate the part-worths for the nonmetric method (i.e., rank order and pair-comparisons). To convert the pair-comparisons into the predominant rank orders required by the MONANOVA algorithm, TRICON (Carmone et al. 1968) was used. Other procedures for the conversion of pair-comparisons to predominant rank orders are possible (Carroll 1972) but use of alternative conversion methods makes little difference.

Least squares estimators were used to derive the part-worths for the metric methods (i.e., graded pair-comparisons, direct subjective estimates and rating scales), For the graded pair-comparisons data, each observation was classified by the differences in the first order parameters of the paired stimuli (see Dykstra 1958 for a discussion of the analysis of graded pair-comparisons data).

Pre-Analysis of the Data

The resultant part-worths were examined for a priori directionality and pareto optimality violations. In this study there appeared to be strong rationale for assuming that one level of each attribute should be preferred (e.g., a large aperture should be preferred to a small aperture). For other products and/or attributes such an assumption may not be warranted. However, whether or not this assumption can be made in a particular situation should not affect the comparative reliabilities of the various conjoint measurement approaches. Six subjects had multiple directionality violations on at least one of the "tests." The number of possible pareto optimality violations was also calculated for each task. Seven subjects were eliminated from the study because their data exhibited 151 or more of all possible pareto optimality violations. Five of these subjects were among the six having multiple directionality violations.

Test-Retest Reliability

Because of the small call sizes and substantial within cell variance, the test-retest reliability results must be viewed as primarily suggestive rather than conclusive. Two measures of reliability were considered: (1) within subjects test-retest correlations of part-worths; and (2) within subjects test-retest correlations of the estimated utilities. Both the average adjusted (for sample size) and unadjusted correlations are presented in Table 1.

The average utility correlations were uniformly high (ranging from .799 to .988 "unadjusted" and from .791 to .988 "adjusted") while the average part-worth correlations varied widely across the data collection conditions (ranging from .074 to .924 "unadjusted" and from .000 to .879 "adjusted"). Similar results for utility correlations have been reported in the literature (Acito 1977, Green et al. 1972, McCullough and Best 1979). Over all conditions, the average stimulus value (utility) correlation exceeded the average for part-worths by more than .4 (for both "adjusted" and unadjusted"). A relatively high test-retest stimulus value correlation does not necessarily suggest a substantial test-retest part-worth correlation. For example, the average adjusted test-retest stimulus value correlation for the subjective estimate (1/2 fractional factorial) condition was .885, while the corresponding average part-worth correlation was .000. The correlation between the two sets of reliability measures (stimulus value and part-worth) was only .479 (for the adjusted test-retest correlations),

Adjusting for sample size would seem particularly important for part-worth correlations since they typically involve small "sample" (number of part-worths) and the unadjusted correlations tend to be low to moderate. This results in a significant attenuation of the correlations. In the present study, the overall average part-worth correlation was reduced by .105 (from .490 to .385) compared with only .004 (.895 to .891) for the stimulus values.

The test-retest part-worth correlations would seem to be considerably more relevant than the stimulus value correlations in evaluating the reliability of conjoint analysis procedures. The primary objective of these procedures is to provide reasonably accurate estimates of these parameters. The utility value correlations may be somewhat inflated in the sense that merely observing pareto optimality in making the preference judgments can produce fairly substantial values on this measure. Also, the fact that the utility values are linear functions of the part-worths will tend to make these correlations high. Much the same argument could be made concerning rank order correlations between respondents' test and retest preference judgements which serve as the input for conjoint analysis.

TABLE 1

ESTIMATED PART-WORTHS AND UTILITIES: AVERAGE TEST-RETEST CORRELATIONS

Earlier it was argued that if the variance of an individual's part-worths was small, one might expect his test-retest reliability to be low. If the argument holds in this situation, one would expect that the correlation of the individual part-worth reliability measures with the "within subject" variances of the part-worths would be positive. This correlation was calculated across all subjects in two ways, using the average of the test and retest variances of the part-worths and using the minimum of the test and retest part-worth variances. The respective correlations were 0.10 and 0.11. While the above argument is intuitively plausible, it did not find substantial support in this study.

In comparing the results for the alternative approaches one notices that only small differences exist with respect to average test-retest stimulus value correlations, suggesting the lack of sensitivity of this measure. However, the part-worth correlations display substantial variability. With the exception of the subjective estimate method, the less fractionated designs (full and half factorials) tend to produce more reliable part-worth estimates. Apparently, in this particular research context, the control over random error provided by the larger effective "sample sizes" of these designs more than offset any tendency for the increased effort demanded from the respondent to produce less stable judgments concerning individual stimuli.

Those methods involving "comparative" judgments (i.e., pair-comparisons, graded pair-comparisons, and rank order) tended to perform better than those requiring "individual" judgments (i.e., subjective estimate and rating scale). Graded pair-comparisons had the highest overall average test-retest part-worth correlation (.628) followed by paired comparisons (.615) and rank order (.315). Furthermore, the highest overage correlation (.897) was obtained for the graded pair-comparisons (1/2 fractional factorial--incomplete block) design. Perhaps this method is more attractive, at least for situations involving a low to moderate number of attributes, than would be suggested from its limited use in previous marketing studies. However, in its present form, this approach may not be optimal for larger (sample size and number of attributes) commercial studies,

Convergent Validity

The direct dollar metric estimates of the part-worths were included in both the "test" and "retest" to provide a baseline for evaluating the test-retest reliabilities of the various conjoint analysis approaches and also for the purpose of examining convergent validity. These direct dollar metric estimates were obtained after the respondents provided their conjoint analysis data. Their reliability could be affected by fatigue or boredom created by the conjoint analysis tasks. However, this did not seem to be a significant problem. Except for the subjective estimate method, there was no systematic tendency for the less fractionated factorial designs (those requiring more judgments) to be associated with lower test-retest reliabilities for the direct dollar estimates of the part-worths (Table 2). Incidentally, the subjective estimate approach to conjoint analysis resulted in low test-retest reliabilities. It is not clear whether the respondents in this condition were unreliable per se or whether the difficulty of the subjective estimate task negatively affected these subjects' performances on the subsequent dollar metric task.

The direct dollar metric estimates of the part-worths demonstrated a higher average (over all conditions) test-retest reliability than the combined average obtained for the conjoint analysis approaches studied (.500 vs. .385) even though its reliability suffers from a lack of redundancy in the data collected. Furthermore, graded pair-comparisons which appears to share method variance with the direct dollar metric method was the only approach for which the test-retest reliability was higher for conjoint analysis than for the dollar metric given coupon subjects (.628 vs. .560). In this study at least, the less complex and time consuming task of providing direct dollar metric estimates of part-worths was superior (with respect to test-retest reliability) to most of the popular conjoint analysis approaches. While it could be argued that these direct estimates are more easily remembered, memory shouldn't be a significant factor in this study since the test and retest were separated by approximately one month and the subjects were unaware that a retest was to be conducted.

Convergent validity relates to the amount of agreement among maximally different methods of measuring the same construct. Our second method (the direct dollar metric estimate of the part-worths) like the conjoint analysis procedures, relies on stated preferences, and as such, shares some method variance with these procedures. Hence, only a weak test of convergent validity was possible.

TABLE 2

CONVERGENT VALIDITY: DERIVED PART-WORTHS AND DOLLAR METRIC ESTIMATES

Each subject's dollar metric estimates were correlated with their conjoint analysis results for both the test and retest and the correlations averaged (Table 2). The results show that this measure of convergent validity tends to be higher for the less fractionated designs (i.e., full and half factorials). The methods involving comparative judgements (i.e., rank order, pair-comparisons, and graded pair-comparisons) fared better than chose requiring "absolute" judgements. This was expected since these designs previously demonstrated higher test-retest reliability. Convergent validity (with the direct dollar metric estimates) for graded pair-comparisons was substantially higher than that obtained for any of the other methods. However, this is probably due in part to the method variance they share (i.e., the similarity of the measurement tasks).

SUMMARY

In spite of the increasing popularity of conjoint analysis among marketers, only limited published evidence exists regarding its reliability and validity within the various research contexts in which it has been applied, While the superiority of conjoint analysis over "direct subjective estimates" of the "part-worths" appears to be widely accepted, substantial empirical evidence for this position is lacking. Furthermore, those wishing to apply conjoint analysis in a specific decision context have little objective information on which to base their selection of a methodological approach from the many alternatives available.

If conjoint analysis is to be more than a passing fad researchers must establish its test-retest reliability and predictive validity over a broad range of consumer choice situations. Moreover, it needs to be demonstrated that this set of techniques provides more accurate predictions (forecasts) than those obtainable from merely asking respondents to directly report their "part-worths." It is not sufficient to merely compare conjoint analysis with naive models which utilize no information concerning the relative importance of the various attributes. Finally, there is a need for empirical studies focusing on the comparative viability, of the alternative methodological approaches under a variety of research conditions.

The present study has centered on a systematic investigation of the effects of two aspects of conjoint analysis (the use of fractional factorials in stimulus set construction and "measurement scales for the dependent variable") on the test-retest reliability and convergent validity of the estimated part-worths. Since the cell sizes were small, only tentative conclusions concerning the relative effectiveness of the alternative methodological approaches are appropriate.

The results suggest that the less fractionated design tends to produce higher test-retest part-worth reliabilities when the number of dichotomous attributes is limited, although these designs will become less feasible as the number of attributes is increased. Those methods involving "comparative" judgments (i.e., pair-comparisons, graded pair-comparisons, and ranking) were more reliable than those requiring "individual" judgments (i.e., direct subjective estimates of the dollar value of the stimuli and rating scales). In the case of rating scales, this may well have been partially due to problems with anchoring effects.

When compared to the test-retest reliability of the direct subjective estimates (dollar metric) of the part-worths, the conjoint analysis results were disappointing. Only graded pair-comparisons appeared to be superior on this criterion. This approach also showed the highest convergent validity with the direct part-worth estimates.

In other research settings, particularly those with extensive stimulus sets, experimentally efficient designs may be more desirable if not necessary. While some may argue that our evaluation of conjoint analysis is unfair in that it considers only individual reliability and validity and does not consider the reliability of the "average" (aggregate) subject, it must be remembered that direct part-worth estimates will also improve when across subject averages are used. The reliability of averages will always improve as sample sizes increase. Furthermore, if individual reliability and validity is weak, the ability of conjoint analysis to explore individual differences and define market segments must be questioned.

REFERENCES

Acito, Franklin (1977), "An Investigation of Some Data Collection Issues in Conjoint Measurement," Proceedings, American Marketing Association, 82-5.

Acito, Franklin (1979), "An Investigation of the Reliability of Conjoint Measurement for Various Orthogonal Designs," Proceedings, Southern Marketing Association, 175-78.

Carmone, Frank J., Green, Paul E., and Robertson, D. J. (1968), "TRICON--An IBM 360-165 Fortran IV Program for the Triangularization of Conjoint Data," Journal of Marketing Research, 5, 219-20.

Carroll, J. Douglas (1972), "Individual Differences and Multidimensional Scaling," in Multidimensional Scaling, Vol. I, Roger N. Shepard, A. Kimball Romney and Sara B. Nerlove, (eds.), New York: Seminar Press.

Cattin, Philippe and Weinberger, Marc (1979), "Some Validity and Reliability Issues in the Measurement of Attribute Utilities," Advances in Consumer Research, Vol. VII, 780-3.

Cliff, Norman (1975), "Complete Orders from Incomplete Data: Interactive Ordering and Tailored Testing," Psychological Bulletin, 82, 289-302.

Coombs, Clyde H. (1966), A Theory of Data, New York: John Wiley & Sons.

Dykstra, Otto Jr. (1958), "Factorial Experimentation in ScheffT's Analysis of Variance for Paired Comparisons," American Statistical Association Journal, 53, 529-42.

Green, Paul E. (1974), "On the Design of Choice Experiments Involving Multi factor Alternatives," Journal of Consumer Research, 1, 61-8.

Green, Paul E., Carmone, Frank J., and Wind, Yoram (1972), "Subjective Evaluation Models and Conjoint Measurement," Behavioral Science, 17, 288-99.

Green, Paul E., Carroll, J. Douglas, and Carmone, Frank J. (1978), "Some New Types of Fractional Factorial Designs for Marketing Experiments," in Research in Marketing. Vol. I, J. N. Sheth, (ed.), Greenwich CT: JAI Press, 99-122.

Green, Paul E. and Srinivasan, V. (1978), "Conjoint Analysis in Consumer Research: Issues and Outlook," Journal of Consumer Research, 5, 138-42.

Green, Paul E. and Wind, Yoram (1973), Multiattribute Decisions in Marketing: A Measurement Approach, Hinsdale, Ill.: The Dryden Press.

Green, Paul E. and Wind, Yoram (1975), "New Ways to Measure Consumers' Judgments," Harvard Business Review, 53, 107-17.

Jain, Arun K., Acito, Franklin C., Malhotra, Naresh K., and Mahajan, Vijay (1978), "A Comparison of the Internal Validity of Alternative Parameter Estimation Methods in Decomposition Multiattribute Preference Models," Working Paper, School of Management, State University of New York at Buffalo.

Kruskal, Joseph E. (1965), "Analysis of Factorial Experiments by Estimating Monotone Transformations of the Data," Journal of the Royal Statistical Society, Series B, 27, 251-63.

McCullough, James and Best, Roger (1979), "Conjoint Measurement: Temporal Stability and Structural Reliability," Journal of Marketing Research, 16, 26-31.

Parker, Barnett R. and Srinivasan, V. (1976), "A Consumer Preference Approach to the Planning of Rural Primary Health Care Facilities," Operations Research, 24, 991-1025.

Pekelman, Day and Sen, Subrat K. (1979). "Improving Prediction in Conjoint Measurement," Journal of Marketing Research, 16, 211-20.

Scott, Jerome E. and Wright, Peter (1976), "Modeling an Organizational Buyer's Product Evaluation Strategy: Validity and Procedural Considerations," Journal of Marketing Research, 13, 211-24.

Wittink, Dick R. and Montgomery, David B. (1979), "Predictive Validity of Trade-off Analysis for Alternative Segmentation Schemes," Proceedings: American Marketing Association, 69-73.

----------------------------------------