Conditional Logit Versus Mda in the Prediction of Store Choice

Stephen J. Arnold, Queens University
Victor Ruth, University of Toronto
Douglas J. Tigert, University of Toronto
ABSTRACT - When a research problem focuses on determinancy of choice rather than on classification or identification of a retailer's strengths and weaknesses, conditional loser analysis is clearly superior to multiple discriminant analysis. Too often, MDA is misused as a methodology for prediction of choice.
[ to cite ]:
Stephen J. Arnold, Victor Ruth, and Douglas J. Tigert (1981) ,"Conditional Logit Versus Mda in the Prediction of Store Choice", in NA - Advances in Consumer Research Volume 08, eds. Kent B. Monroe, Ann Abor, MI : Association for Consumer Research, Pages: 665-670.

Advances in Consumer Research Volume 8, 1981      Pages 665-670


Stephen J. Arnold, Queens University

Victor Ruth, University of Toronto

Douglas J. Tigert, University of Toronto


When a research problem focuses on determinancy of choice rather than on classification or identification of a retailer's strengths and weaknesses, conditional loser analysis is clearly superior to multiple discriminant analysis. Too often, MDA is misused as a methodology for prediction of choice.


The research described here was inspired by the recent paper of Gensch and Recker (1979) and by an earlier paper by Westin and Watson (1975). The Gensch and Recker paper provided a comparative analysis of the predictive power and diagnostic quality of conditional logit and regression analysis as applied to the problem of consumer choice of food stores. The paper raised a number of critical issues about the adequacy of a number of covariance choice models when the research question focuses on cross-sectional data rather than individual analyses and when the key research issue is determinancy of choice.

The Westin and Watson paper focussed on the issue of whether or not the population should first be segregated on the basis of "importance" characteristics before predictive models of choice are applied. In addition, the authors also raised the issue of the appropriate model for a choice framework rather than a classification framework. The research objectives for the study reported here were threefold and they build upon the key issues arising from the two papers cited above:

i) to shed some light on the differences in the underlying assumptions of both the theoretical underpinnings and the quantitative structure of conditional logit and multiple discriminant analysis.

ii) to test the key assumption of conditional logit related to the concept of a "common behavioral rule" (homogeneity of choice) expressed by McFadden (1974) and challenged by Westin and Watson (1975).

iii) to examine the diagnostic qualities and predictive powers of conditional logit and discriminant analysis when applied to the same body of data.

The Multinomial, Multiattribute Logit Choice Model

Among the wide variety of competing choice models in the consumer behaviour literature, the logit model belongs to the general class of models described as covariance models. This class includes regression, MDA, logit and probit and is distinguished from other major classes such as Fishbein's (1972) learning-based model, lexigraphic models, hierarchical models, multidimensional scaling and linear programming, etc. Some models incorporate individual beliefs about attribute importance while others attempt to derive such weights. The covariance models attempt to derive the weights and then use those weights in the prediction equation(s).

The logit model, which has its foundations within the psychological literature, is consistent with the theory of sampling from a population of utility maximizing decision makers, and uses the attribute ratings of both chosen and unchosen alternatives in the choice set in order to reveal the determinant attributes. Briefly, the conditional logit model posits that when the nth individual in the population (n - 1 .... ,N) has a vector of measured attributes Sn and faces Jn alternatives which are described by vectors of attributes xjn then the individual has a utility function


where V is a nonstochastic function reflecting "representative tastes and x is stochastic and reflects the idiosyncrasies of this individual in tastes for the alternative with attributes xjn. If it is assumed that


a vector of unknown parameters, and that x is a function that varies randomly in the population with the property that in each possible alternative set {x1, ..., xjn}, the values x (sn, xjn) are independently distributed with the Weibull (double exonential) distribution, then McFadden (1974) has shown that the probability Pin of the nth individual selecting the ith alternative is


Estimation of the unknown parameters 0 is based on a maximum likelihood procedure i.e., finding the vector 0 which maximizes the Joint probability of observing the particular choice outcomes of each consumer. Mathematically, this procedure is equivalent to maximizing, by any of a variety of programs for unconstrained nonlinear optimization, the adjusted Log likelihood function


where zin is the vector of "ratings" by the nth consumer of the chosen ith alternative, and

zjn is the vector of "ratings" by the nth consumer of the jth alternative.

sin is the value of  the dependent variable.

The advantages of the conditional logit model over other methods of identifying determinant attributes are severalfold. The result of the maximum likelihood procedure is that the elements of 0 are asymptotically efficient and normally distributed under very general conditions and that an approximation is good even in small samples (N > 50). Also provided are standard errors of each of the elements of 0 thus permitting hypothesis testing and construction of confidence intervals. A determinant attribute can therefore be defined as an attribute where the null hypothesis that its coefficient is zero is rejected at the a level of significance.

The logit model can also derive the affect of a change in the choice probability as a result of a change in any of the attributes of any of the alternatives. However, the analyses of these elasticities and cross-elasticities is beyond the scope of this paper. Essentially, the programme requires that a reasonable model is one in which individuals compare all pairs of choice alternatives on the basis of their perceived differences in satisfaction on those attributes possessed by all alternatives. The programme allows for differential consideration sets across respondents so long as there is at least one pair of alternatives. Thus, the research instrument must generate the evoked set for each respondent and provide for a rating system for each alternative on all attributes. Sin is the value of the chosen/preferred store in each pair. It could be a dummy variable with the value "1" if the Zij pair includes the preferred store in which case all pairs of stores which do not include the preferred store are eliminated from the analysis. Alternatively, Sin could be the proportion of trips taken to a particular store in proportion of total dollar spending at a particular store, etc. Similarly, the (Zjn - Zin) net difference scores could be based on semantic differential scales or simpler measurements such as dummy variables which express the extent to which a particular store either did nor did not possess a specific characteristic.

How Does Conditional Logit Differ From MDA

Multiple Discriminant Analysis is a fundamentally different type of model in that it is a classification rather than a choice model. The other techniques, including conditional logit, basically assume one population making choices based on their evaluation of the independent variables. MDA assumes several distinct populations (e.g., shoppers of the different chains), each having different patterns of scores on the independent variables. Discriminant analysis begins by pre-classifying respondents into market segments and then searching for a set of coefficients in n-dimensional space that will maximally separate the groups. Essentially, the discriminant programme examines across versus within group variances on the independent variables and develops a set of discriminant functions that will partition the groups in an n-dimensional space in the most efficient manner. MDA utilizes information only about the chosen alternative for each respondent and looks for patterns of differential scores on the attributes across the groups. There is no underlying relationship between attribute differentials and attribute importance in determining store choice. In conditional logit, it can be hypothesized that where large differences occur in attribute ratings between chosen and unchosen alternatives within the consideration set, then those differences should reflect determinancy of choice.

In addition, the preference probability function in conditional logic analysis overcomes several deficiencies in both discriminant and regression analyses. The computed probabilities are constrained to fall in the range of zero to one. More important, the shape of the logistic curve is theoretically consistent with the consumer choice literature. It assumes that the probability of choosing a given alternative is based on both the current position of that alternative relative to other alternatives as well as on changes in that position based on improvements on determinant attributes. The logistic curve also recognizes the concept of "threshold" levels. Over certain portions of the curve, small improvements in market position might lead to only small improvements in the probability of choosing a particular alternative whereas at other positions on the curve, small improvements in market position might lead to large increases in the probability of choosing a particular alternative. Neither linear regression nor discriminant analysis incorporate those theoretical concepts. The following empirical section clearly demonstrates the diagnostic and predictive superiority of the multinomial logit model over MDA.


The data analyzed were collected via telephone from a central WATS facility in Chicago, by a professional interviewing company. The market survey was conducted in the SMSA of Tampa/St. Petersburg in January, 1978 with all calling in the evening hours to insure the proper representation from working women. The sample of 903 respondents was drawn on the basis of a systematic random sampling procedure from the appropriate telephone directories covering the market area.

Questionnaire Design

The major section of the questionnaire asked respondents to name a single chain that was felt to best answer each of a series of 19 questions representing a fairly exhaustive list of grocery store attributes. Such attributes included dimensions as "lowest prices," "easiest to get to from home," "largest assortment/selection of food products," etc. The procedure, referred to as the associative technique, is simplistic in design and easy to administer by telephone. Its weakness lies in the fact that the attribute ratings on each chain by each respondent are all dummy variables. Either the chain is the best, the lowest, the cleanest, etc., or it is not. However, the diagnostic qualities of the associative technique in pinpointing strengths and weaknesses of the competitive chains in a market are well known (Tigert and Ma 1978).

In a separate section of the questionnaire, respondents were asked to specify both the most important and the second most important reason for choosing the store where they shopped most often. These "attribute importance" questions, generated by the direct questioning method, were not used in either the logit or MDA analysis but are used later in this analysis to help confirm or reject the logic and MDA results. In fact, one might argue that the logit coefficients and/or the MDA standardized discriminant function coefficients provide a measure of the validity of the direct questioning technique in uncovering determinant attributes.

The total set of outlet attributes was fairly exhaustive and included all the dimensions covered by the Lindquist (1974-75) review. They were based, however, on over 100 grocery shopping studies in Canada, the U.S., the Netherlands and the U.K. that have been completed by the authors. A summary of the development of that attribute list is reported in Arnold, Ma and Tigert (1977).

The dependent variable for the analysis was also a dummy variable represented by the specific chain at which the respondent shopped most often. For the MDA analysis, respondents were grouped into chain sets for each of the six largest chains in the Tampa/St. Petersburg market. Of the total sample of 1,000 respondents, 903 were shoppers of one of these six chains and this group forms the sample for the following analyses.

Definition of the Consideration Set

The logit analysis allows for differential consideration sets for each individual. For purposes of this analysis, a specific chain was included in an individual's evoked set if the chain received at least one mention on the 19 store attribute rating questions. Specifically, the chain had to be best on something. While it is hypothetically possible for a consumer to choose an outlet that is not perceived to be best on anything, no respondent was eliminated from the analysis on the basis of the screening criteria. As shown in the first column of Table l, the average respondent had about 4 chains in the consideration set and therefore about 6 pairs of stores on which the net difference scores were computed. These results are consistent with the findings of Gensch and Recker (1979).

A Test of the Homogeneity of Choice Assumption

The logit analysis was first completed over the total sample of 903 respondents. Subsequently, three sub-samples of respondents were selected based on their responses to the direct questioning procedure on attribute importance. One sub-sample reported that "lowest prices" was the single most important reason for choosing the store where they shopped most often. A second group mentioned "assortment/variety of food products" and a third group rated "easiest to get to from home" as the most important attribute.

The logit model was estimated using for the choice object, the ratings of the store "shopped most often" rather than the store "shopped last" or a number of other shopping ratings included in the questionnaire. Previous work by Tigert and Arnold (1980) suggested that better predictions and higher logit coefficients result when using store shopped most often.

The strength and diagnostic quality of the logit model versus MDA was examined by comparing the hit ratios in the prediction results as well as the size and signs of the coefficients relative to the results of the direct questioning procedure on attribute importance.


Table 1 reports on the results of the logit analysis and presents a series of logit coefficients for the total sample and for the three sub-samples described earlier. The coefficients are reported only for the most significant attributes from two different computer runs. In the top half of the table, only five attributes are reported. These five dimensions, covering location, price, assortment and service (2), have consistently appeared in the retailing literature as key determinants of store choice and they generated an 80 percent correct prediction in the logit model.

To investigate the improvement that could be achieved by expanding the attribute set, a second run was made with the same five attributes plus the next best three attributes. The logit coefficients for the original five attributes changed only marginally and the predictive power of the logit model improved only slightly to 81 percent correct prediction.

By far the most powerful store attribute was location/ convenience ("easiest to get to from home") with a logit coefficient of 1.48 for the total sample. In the MDA analysis, this dimension was not significant at all in discriminating between the six shopper groups. The rationale for this major difference in results is directly traceable to the use of net difference scores versus absolute ratings on the chains. Table 2 sheds some light on this key concept.



If we assume that convenience to home is a necessary condition to move a chain into the consideration set, then it would seem reasonable to hypothesize that many consumers would shop most often at the chain that was the "easiest one to get to from home". Thus, we would expect to find that in the associative technique, a large proportion of consumers would report that the chain at which they shopped most often was also the chain that was easiest to get to. The first row of Table 2 confirms that hypothesis. The average percentage across the row (weighted by chain market size) is 71 percent and there is very little variance across the chains. Thus, in terms of the analytical procedure used by MDA, the mean



scores on this dimension across the six shopper groups would show little variance and would not therefore, provide much discriminatory power. As a consequence, in spite of the fact that location is a critical determinant of store choice, the results of the MDA would he negative in terms of the classification objectives. Attributes which do exhibit a high variance across the column include: i) lowest prices; ii) specialty baked goods; iii) best delicatessen; iv) cleanest stores, etc. In fact, these were the dimensions that were the most significant in the MDA analysis. In short, what MDA does is provide strong diagnostics on how the chains are perceived by their own customers to be different on the various attributes but not which attributes were the most critical in determining store choice.

The logit analysis focuses directly on store choice. It is because the chains all score fairly high on easiest to get to from home among their own customers that the net difference scores are as high. Unchosen stores received very few mentions as rating best on location.

This basic difference in the research objectives of logit and MDA lead to very different results in terms of predictive power in classifying respondents as shoppers of their preferred outlet. Table 3 shows the MDA analysis generated only 55.4 percent correct predictions compared to 81.4 percent for the logit analysis. The improvement in predictive power of the logit over the MDA was 26/55 or about 47 percent.

More important, the logit classification was very close to the actual market share positions of the chains while the MDA results were off by more than 100 percent for some chains. Yet, MDA continues to be used extensively in marketing research for purposes of classifying consumers and predicting choice. In fairness to MDA, we should report that when MDA mis-predicted, it tended to misclassify respondents into the shopper set closest to the correct chain in the perceptual map (not shown). For example, the misclassified Kash N Kerry shoppers were almost all classified as either U-Save shoppers or Pantry Pride shoppers.

Testing Homogeneity of Choice

The underlying assumption of logit analysis is that all consumers have a homogeneous and "representative" set of attributes by which they maximize their utility of store choice. To test this assumption in terms of the relative strength of the logit coefficients, we re-ran the logit analysis for the three sub-samples described earlier. For the sub-sample that indicated earlier in the questionnaire that "lowest prices" were the most important store attribute in choosing the store where they shopped most often, the size of the logit coefficient on "lowest prices", rose dramatically, from 0.71 for the total sample, to 1.87 for this sub-sample. Simultaneously, the coefficient on assortment of food dropped precipitously to a non-significant level and the coefficient for location dropped by half. The conclusion about this segmentation analysis is transparent. Consumers who seek out primarily low prices are prepared to trade away both location and assortment for price. The results have major strategic significance for grocery retailing and suggest that both warehouse discount stores and limited assortment box stores in good locations have a bright future.



In the column labeled location in Table 1, the logit coefficient for location/convenience is so overpowering that a number of other coefficients, including the price coefficient, became non-significant. Consumers appear to be operating in a tradeoff mode between price and location. The significance of that tradeoff can be seen more clearly in Exhibit II. Across 14 major markets, in the U.S., Canada, The Netherlands, and the U.K., the authors have asked the same direct questions about attribute importance. At the same time, chains have been monitored in terms of their relative price positions by a large grocery basket of 120 items. Exhibit II reports on the proportion of respondents who said either low prices or location/convenience were the most important reasons for choosing their preferred grocery outlet. When price differentials across chains are small, upwards of 50-55 percent of consumers say they choose their preferred store on the basis of location/convenience. When price differentials rise, the consumer is sensitized to these differentials through advertising, shopping experiences (learning) and word-of-mouth behaviour. The higher the price differential, the larger is the consumer segment that trades away location for price.

Finally, note in Table 1 that the sub-sample labeled assortment/variety yielded the highest logit coefficient for the assortment/variety attribute, the highest coefficient for the "fast checkout" attribute and a moderately strong coefficient for location. This market segment could best be described by the concept of "one-stop shopping .... for the hassled consumer". One can visualize a customer who says..."find me a convenient supermarket with everything I need and get me out fast". The super combo, best exemplified by Albertson's in Tampa/St. Petersburg, would seem to be ideally suited to serve this consumer segment.

Logit Coefficients Versus Attribute Importance By Direct Questioning

Table 4 reports on the results of our direct questions to respondents about the most and second most important reasons for choosing their preferred grocery outlet. The four attributes receiving the most mentions, i.e., location, price, assortment and service, are identical to the four attributes with the highest logit coefficients in Table 1. Attributes such as best delicatessen and best specialty baked goods, that were most significant in the MDA analysis, do not even appear on the list in Table 4. More important, the relative size of the scores in the average column on the right hand side of Table 1 are closely paralleled by the relative size of the logit coefficients in Table 1. Location is clearly the dominant attribute in both tables with price, assortment and service fairly close in second, third and fourth positions.



The logit coefficients, therefore, provide strong support for the direct questioning technique in measuring attribute importance. They also provide support for the early work with gravity models which utilized only two outlet characteristics: i) location (distance from home) and ii) store size (a proxy for assortment). Both the gravity models and conditional logit attempt to incorporate information about all alternatives in the consideration set. MDA does not.




Logit analysis is a powerful tool for understanding consumer choice of food stores, particularly in disposing determinant attributes. While MDA is adequate for purposes of perceptual mapping and understanding weaknesses and strengths of alternative chains, its performance is weak in predicting choice and in validating determinant attributes.

A number of research steps should follow the results achieved here. First, there is a need to examine the improvement in results that could be achieved by measuring both attribute importance and store ratings through alternative scales such as semantic differentials or perhaps forced choice. Given the wealth of models now available that allow for differential evoked sets, further improvements could be made in alternative ways of defining evoked sets for individual respondents. Finally, logit analysis is only one of a variety of new models now available for analyzing consumer choice. More comparative analysis across competing models would be helpful in uncovering the most powerful alternatives.


Arnold, Stephen J., Ma, Sylvia and Tigert, Douglas J. (1978), "A Comparative Analysis of Determinant Attributes in Retail Store Selection," in ed. Advances in Consumer Research, Vol. VI, Ann Arbor: Association for Consumer Research.

Arnold, Stephen J. and Tigert, Douglas J., Determinant Attributes in Consumer Choice, working paper, University of Toronto, 1980.

Fishbein, Martin (1972), "The Search for Attitudinal-Behavioural Consistency," in Joel B. Cohen (ed.) Behavioural Science Foundation of Consumer Behaviour, New York: Free Press.

Gensch, Dennis H. and Recker, Wilfred W. (1979), "The Multinomial, Multiattribute Choice Model," Journal of Marketing Research, XVI, 124-132.

Lindquist, Jay D. (1974-75), "Meaning of Image," Journal of Retailing, 50, 29-38, 116.

McFadden, Daniel (1974), "Conditional Logit Analysis of Qualitative Choice Behaviour," in ed. Paul Zarembka, Frontiers in Econometrics, New York: Academic Press, pp. 105-42.

Tigert, Douglas J. and Ma, Sylvia (1979), "In Search of A Supermarket Strategy: Albertson's Drives on Tampa/ St. Petersburg," Proceedings of the 1978 Attitude Research Conference, American Marketing Association, Tarpon Springs, Florida.

Westin, Richard B. and Watson, Peter L. (1975), "Reported and Revealed Preferences As Determinants of Mode Choice Behaviour," Journal of Marketing Research, XII, 282-289.