How Regular Is Regularity? An Empirical Test of the Regularity Assumption

Thomas S. Gruca, University of Massachusetts
ABSTRACT - Using scanner panel data, an empirical test of the regularity assumption in the case of a new product entry is performed. Consumers with similar pre-entry purchasing patterns are assigned to segments on this basis to provide sufficient sample sizes for statistical testing.
[ to cite ]:
Thomas S. Gruca (1990) ,"How Regular Is Regularity? An Empirical Test of the Regularity Assumption", in NA - Advances in Consumer Research Volume 17, eds. Marvin E. Goldberg, Gerald Gorn, and Richard W. Pollay, Provo, UT : Association for Consumer Research, Pages: 398-405.

Advances in Consumer Research Volume 17, 1990      Pages 398-405


Thomas S. Gruca, University of Massachusetts

[The author would like to thank John Totten of IRI, Inc. for his help in obtaining the data used in this study. This research was partly supported by a grant from Proctor and Gamble Co. to the Marketing faculty of the University of Illinois.]


Using scanner panel data, an empirical test of the regularity assumption in the case of a new product entry is performed. Consumers with similar pre-entry purchasing patterns are assigned to segments on this basis to provide sufficient sample sizes for statistical testing.

Seven segments of the caffeinated ground coffee market of two retail markets were studied containing 6 and 4 regular brands respectively. It was found that there were no statistically significant violations of the regularity hypothesis. This result is in concert with the expectations from previous lab work which pointed to fewer violations as the choice environment increased in realism.


An important problem for consumer researchers is the effect a new product has on consumers' preferences for the products already in the market. There is conflicting evidence from lab studies concerning possible change in preferences when a new alternative becomes available. Ryans (1974) found that the introduction of a new alternative does not change the preference orderings of the incumbent products. In contrast, Huber and Puto (1983) reported that the existence of a new alternative can greatly change the probability that one of the incumbent brands is chosen.

One thing that researchers in choice theory often agree on is: if the number of alternatives is increased from N to N+1 with the introduction of a new alternative, the probability of choosing any one of the original N alternatives does not increase. This assumption is called "regularity" and is a minimal condition for a large number of choice models including the Luce Choice Axiom (Luce 1959, 1977), Tversky's EBA model (Tversky 1972), market share attraction models (Bell, Keeney and Little 1975) and the ideal point model. The importance of regularity to choice models is illustrated by a comment from Luce (1977) that the "only property of general choice probabilities that has not been empirically disconfirmed has been regularity."

Despite the above conclusion, Huber, Payne and Puto (1982) and Huber and Puto (1983) present empirical evidence for violations of regularity in the choice processes of the subjects. They found that the addition of another alternative increased the probability that one of the original alternatives was chosen. In response to these findings, some recent models have been proposed to handle situations where regularity is violated (Currim 1982, Batsell and Polking 1985).

The results of these choice experiments and the models proposed to remedy the problem of regularity violations should not be accepted uncritically. The assumption of regularity appears to be violated in the choice tasks reported in the these experiments. If these results can be generalized to other common choice situations, those marketers and academics who are interested in consumer choice may have to radically redirect their research to account for this problem. The aforementioned choice models which cannot deal with violations of regularity are among the most used models in consumer research.

Thus, the potential importance of the findings of Huber, Payne and Puto (1982) and Huber and Puto (1983) serve as motivation to critically evaluate these studies and their results. Furthermore, there are a number of limitations in these studies (and in all lab studies of this type) which may have accounted for the violations of regularity. Therefore, an empirical (field) test of the regularity assumption is performed.


The main empirical evidence of violations of regularity comes from two lab studies by Huber, Payne, and Puto (1982) and Huber and Puto (1983). In both of these studies, groups of subjects were asked to make choices from sets of two alternatives and from sets of three alternatives. In the Huber, Payne, and Puto (1982) study, the new alternative was asymmetrically dominated by the existing set of products, i.e. dominated by one alternative but not the other. In the Huber and Puto (1983) study, the new alternative was extremely good on one choice dimension but poor on the others.

In the Huber, Payne and Puto (1982) study, it was found that the addition of the new alternative increased the probability of choosing one of the two original alternatives in 18 of 24 cases. This increase was taken as evidence of a violation of the regularity hypothesis. The authors concluded that the number of these violations was statistically significant; "This occurred in 18 out of 24 different cases (p < 0.05)." The proportion of violations in the Huber and Puto (1983) paper was 15 out of 20 different cases using a similar experimental design.

The observed increases in the choice probabilities seemed large in an absolute sense (averaging 14.5% and 15.1% in the respective studies). However, using a z-test for equality of proportions, it can be shown that most of the changes in choice probabilities are not statistically significant. In fact, less than half of the reported violations are significant (p < 0.05). Of the 18 reported violations in the earlier study, only 8 were found significant. From the data provided in the later study, it was found that only 5 of 15 reported violations were significant.

One of the major limitations of these studies may have been the construction of the experimental stimuli. Since the goal of these studies was to illustrate the change in choice probabilities when a similar new brand in introduced (similarity effects), it can be assumed that these effects - attraction or substitution - were responsible for the violations of regularity. However, a replication of these studies by Ratneshwar, Shocker and Stewart (1987) found that the similarity effects demonstrated in these two studies are moderated by stimulus meaningfulness and product familiarity. One might argue that, based on these most recent findings, the remaining cases of violations of the regularity hypothesis may be due to stimulus construction problems.

A further problem with the product choice studies lies in the assumption that the subjects' preferences were homogeneous. This assumption had to be made in order to be able to measure product choice probabilities between subjects. Otherwise, the subjects would have to make repeated choices from the same set of alternatives.

However, it is difficult to argue that consumers are homogeneous. In fact, the entire field of market segmentation is predicated on using these differences to improve the process of marketing. On the other hand, without the assumption of homogeneity, the between subject analysis in the choice experiments is incorrect.

The final limitation is one common to all laboratory studies: external validity. The question of the generalizability of lab results to the world outside is especially important in this case since the lab experiments are limited to 2 to 4 products evaluated on, at most, 3 dimensions. The importance of empirical verification is heightened by the fact that most product categories have a larger number of alternatives which can be evaluated on a wide range of bases.

Although some of these limitations are common to all laboratory research, it is important to replicate findings of this importance outside the lab regardless of the quality of the research. For this reason, a test of the regularity assumption using actual choice data is performed. The advantages of this method and the actual design is discussed in the next section.


The use of actual choice data to test for violations of the regularity assumption avoids some of the limitations of the lab studies discussed above. The main advantages are 1) less restrictive assumptions concerning homogeneity of preferences, 2) no restriction on the number of choice attributes to be used by consumers, 3) larger number of available alternatives, and 4) choice made in natural setting. The primary assumption underlying the analysis of actual choice data is that 0 the choice process of each consumer is zero order and stationary. A zero order choice process is one in which the choice of an alternative at time t is independent of the choice made at time t-l. A stationary choice process assumes that the choice probabilities are fixed over time. Under these assumptions, a Z-test for equality of two proportion (choice probabilities before and after the entry of the new product) can be used to test for violations < regularity.

There is one problem with relaxing the assumption concerning homogeneity of preferences If the test for violations of regularity is performed the individual level, a large number of choices must be observed. However, the longer the period of observation, the more likely that the marketplace will change leading to non-stationary choice processes. On the other hand, aggregating all of the consumers' choices might obscure possibly important consumer heterogeneity.

As a compromise, it is proposed that the market be segmented according to the choice probabilities of the consumers. A segment is defined as a group of consumers who are homogeneous with respect to the probabilities of choosing different brands in a product class. Those consumers with the same choice probabilities will be considered a segment and treated as a single unit. Since the situation examined in this paper is the introduction of a new brand into an existing market the segmentation will be based on the pre-entry buying behavior of the consumers.

Market Segmentation Using Choice Probabilities

The market segmentation will use the pre-entry purchase patterns as a basis for market segmentation. The procedure used was developed by Grover and Srinivasan (1987) using latent class modeling using a widely available computer algorithm.

It is assumed that the probabilistic brand choice process is stationary (constant) and zero order (brand choice at time t is not affected by brand choice at time t-l) so that each household is characterized by its choice probabilities. If there are n brands in the market, an n-component vector of choice probabilities varies over the population. It is assumed that differences in purchase patterns observed across households can be captured by n brand loyal and m switching segments. Each household is assumed to be a member of only one of the n+m segments.

A brand loyal segment will purchase only its preferred brand. This implies that for segment 1, the proportion of households purchasing brand i on one occasion and brand j on another is given by: pi,j,1 = 1 if i=j=1,=O otherwise for all 1 = 1,..,n.

For the brand switching segment k, let pi,k be the probability of choosing brand i. In other words, pi,k is the brand share of brand i for segment k. It is clear that pi,k $ 0 and Sipi,k = 1 for all k

=1,..,m. Since the choice process is assumed to be zero-order and stationary, the probability of buying brand i on one occasion and brand j on another is si,j,k = pi,kpj,k for segment k.

Let the proportion of consumers who are loyal to brand 1 be V1 and the proportion of consumers in switching segment k be Wk. Clearly, V1 $ 0, Wk $ 0 and S1V1 + SkWk = 1. Defining the theoretical proportion of consumers in the market who buy brand i on one occasion and j on another as Si,j, then Si,j is given by:

Si,j = S1V1Si,j,1 + SkWkSi,j,k

= S1V1pi,1Pj,1 + SkWkpi,kpj,k

for j not equal to i and

Si,j = Vi + SkWk(pi,k)2.

The latent class procedure estimates the proportion of consumers in each brand loyal segment (V1), the proportion of customers in each switching segment (Wk), and the vector of brand

choice probabilities for each switching segment (pi,k, i=1,...n). This is accomplished through the decomposition of the brand switching matrix S=(Si,j).

In order to assign each household to a given segment, a maximum likelihood function was used to determine the Bayesian probability that a given household belongs to a given segment. The household is assigned to that segment for which this probability is maximized.

The likelihood function used the purchase frequency vector for each household in addition to the estimates from the latent class procedure. Let ri (for i=1,..,n) denote the number of purchases of brand i for a given household. The likelihood of (r1,..,rn) given segment h (1,..m+n) is given by the formula:

Lh = [(Siri)! / pi(ri)!]

       *(p1,hr1 p2,,hrn).

The prior probability of belonging to segment h is V1 for h=1,..n and Wk for k=n+l,..n+m and will be denoted by Bh. The posterior probability of belonging to segment h is given by Bayes' Rule:

Ph = BhLh / SjBjLj, j=l,..,n+m.

From this market segmentation procedure, those consumers with the same pre-entry purchase patterns will be identified. This segmentation will be used to test whether the introduction of a new brand changes the choice probabilities of those consumers who do not buy the new product.


The empirical tests of the regularity hypothesis reported here will try to replicate the violations observed in the lab. The situation studied is the change in choice probabilities with the introduction of a new brand into an existing market. The study will consider the choice probabilities of only those consumers who do not purchase the new brand. Since the empirical test compares the pre-entry and post-entry purchase probabilities, only those consumers who do not try the new brand can be assumed to have a choice process is stationary and zero order.

This restriction of the sample is similar to those experiments in which none of the subjects chose the new, added alternative. There were 9 such experiments in Huber and Puto (1983).

In the previous laboratory experiments, the added alternative was inferior in some respect at least one member of the original choice set. In the data used in this paper, however, the new alternative was sampled (and repurchased) by a large number of consumers in the two markets under study. Since many consumers felt that the new brand was sufficiently similar to existing brands to warrant trial, it will be assumed that the perception of similarity of the new brand to existing offerings is widely shared. The new brand is therefore assumed to be similar to existing offerings and not an inferior substitute. It bat case, there may be effects of attraction or substitution (which caused the violations of regularity in the lab experiments). Although there is no way to determine if the non-triers perceived the new brand as being similar, if these effects are as strong as demonstrated in the lab, violations of regularity should occur in the purchase patterns of those who not try the new brand.

The database used is the IRI Academic Research Database of coffee purchases. The markets under study are the retail ground coffee markets of Pittsfield, MA and Marion, IN. The new product entry being studied is the entry of Master Blend, a General Foods brand, in March, 1981. After the brand was introduced, it was available in all grinds in all stores tracked by the IRI data.

The study is limited to ground caffeinated coffee purchases since it has been found that regular ground coffee and instant or decaffeinated coffee are used for different usage situations rather than being substitutes (Urban, Johnson and Hauser, 1983).

Those brand-grind combinations which accounted for at least 1% of the purchases in the pre-entry period were retained for analysis. The brand-grind combinations used in the study are listed in Table 1. The eight brands retained accounted for 94.8% of the pre-entry purchases in Pittsfield. The six brands retained for the Marion market accounted for 95.7% of the pre-entry purchases. It is clear that these brands account for most of the purchases in these markets.

In addition to choosing which brands to include in the study, which households to include must also be decided. The households included in the study have purchased ground coffee at least 8 times before and after the entry of the new brand.



Lighter buyers were excluded since, due to the small number of purchases observed, it is very difficult to accurately measure their purchase probabilities. Including the light buyers might lead to their misassignment to a segment simply because not enough purchases were observed.

Segmentation Results

In these markets, there are a large number of households who purchase both regular and decaffeinated coffee, presumably for different purposes. Those households which purchase both types of coffee will be excluded since changes in the decaffeinated brands (for example, a price hike) might affect purchasing of regular coffee. Therefore, only households loyal to caffeinated brands will be studied.

To determine which households were loyal to which brands, the segmentation procedure described above was applied to all eligible households (those with at least 8 pre-entry purchases). In the interest of brevity of presentation, only the final segmentation results will be presented (for the estimation details, see Grover and Srinivasan, 1987).

In order to obtain these results, the data were split by markets into two subsamples for model identification and then estimation. The chi-square test for the assumption of a zero-order, stationary choice process performed on the brand switching matrices showed no significant deviations. The m+n solution was identified for m = 1,5 switching segments. It was determined that the four switching segment solution for Pittsfield and the two switching segment solution for Marion were the best fitting (based on the recommended R2 measure) without overfitting the data. The final model parameters were estimated from the second subsample. These estimation results are in Table 2.

Tests for Violations of Regularity

Each consumer was assigned to a single segment based on his pre-entry purchase behavior (total segment size in Table 3). Those consumers who did not purchase the new product at least once in the first year of availability were the subjects of the tests-for regularity violations (sample size in Table 3).

The purchase frequency vectors of the consumers in each segment were converted to purchase probabilities by dividing each element of the vector by the total number of purchases made by the consumer. The segment choice probability vector was then computed by aggregating the individual probabilities by segment for both the pre-entry and post-entry time periods. This yields an unweighted "average" choice probability vector.

Without the conversion to probabilities at the individual level, those consumers with a large number of purchases would have a great effect on the segment purchase probability vector. A small change in the buying patterns of a single, high volume user could be magnified, leading to a possibly erroneous conclusion that regularity had been violated.





The Z-statistic requires that the number of observations of a choice from a set be supplied. Since the segment choice probability vector is based on the "average" member of the segment, the average number of purchases by a member of a segment will be used as the number of observations.

To determine if regularity has been violated, the pre and post entry purchase vectors will be compared. If there is any increase in the probability of choosing some brand in the post-entry observations, the statistical significance of this increase will be determined using a Z-test for the equality of proportions.

The changes in the post-entry purchasing patterns may be due to factors other than the entry of a new brand and these factors may even be unobservable. However, the procedures used to choose the brands and households for the study attempted to reduce these effects (like spurious changes due to sampling error) and are the same ones used by practioners using these analysis procedures to study actual market behavior.



Statistical Notation

The hypotheses are:

H0 = P(x|A) $ P(x|B) and H1 = P(x|A) < P(x|B).

If N1, N2 are the number of observations and R1, R2 are the number of times that x was chosen, then define:

p* = (R1 + R2) / (N1 + N2).

The Z-statistic is:

Z = [(R1/N1) - (R2/N2)]

      + [p*(1-p*)(1/N1 + 1/N2)].

Previous work has used McNemar's test, Fisher's exact test or the chi-square statistic. These are all proper tests for violations of the assumption of zero-order choice process or stationary choice process since they take into account all choice probabilities. However, the concern in this paper is only for violations of regularity. In addition, since the data have not been collected in the lab, there is no way to insure that the assumptions of cell size can be fulfilled. This leads to using the Z-statistic to test for violations of regularity.


There were three and four segments in the Pittsfield and Marion markets respectively which had a sufficient number of consumers who did not try the new product. The Z-tests reported in Table 4 do not indicate any significant violations of the regularity hypothesis. Although the largest increases in choice probabilities for each segment averaged over 6%, none of them were statistically significant.


It would be very surprising if there were a large number of significant violations found in the data presented here. However, the situation observed in this study is as close to the previous lab experiments one could hope for in the real world. How could the results be so different from the lab studies? The real question is: Are the results different?

Recall that Huber, Payne and Puto (1982) and Huber and Puto (1983) reported a large number of violations of regularity but that only a few of these violations were significant. Recall further that the work of Ratneshwar, Shocker, and Stewart (1987) showed that the similarity effects (which lead to violations of regularity) may be mitigated by product familiarity and meaningfulness of the lab stimuli. Since at least 8 choices of coffee by each household were observed in the supermarket, it is reasonable to assume that the consumer was familiar with the product (ground coffee).

The progression of work from the original lab studies to the Ratneshwar, Shocker, and Stewart (1987) paper to this study show a definite reduction in the incidence and severity of violations of the regularity assumption. It is not an accident that each of the succeeding studies attempted to be more realistic. It is hoped that the corresponding drop in regularity is also not an accident.

The study does not have the dramatic implications for marketers and scholars that it would if the results were different. The validation of the results of Ryans (1974) is still important since his study involved a durable good and the product preference measure was a ranking. The product in this study is a grocery item whose implied preference is indicated by a probability. This news should give some comfort to those who use choice models which require the regularity assumption, like the ideal point, attraction models and the others mentioned above.

There are obviously a large number of limitations to this study. The lack of regularity violations in this entry episode does not mean that it cannot or will not happen in other situations. There may be question of the aggregation of consumers into segments since previous work relied exclusively on pooled or individual level data. Furthermore, there may have been some higher order or non-stationary choice processes at work in this market. Even with these potential problems (and all of the others not mentioned), the results do not seem to be exceptionally out of line with previous research.


Batsell, Richard and John Polking (1985), "A New Class of Market Share Models," Marketing Science, 4 (Summer), 177-198.

Bell, D., Ralph L. Keeney and John D. C. Little (1975), "A Market Share Theorem," Journal of Marketing Research, 12 (May), 136-141.

Currim, I.S. (1982), "Predictive Testing of Consumer Choice Models Not Subject to Independence of Irrelevant Alternatives," Journal of Marketing Research, 19 (May), 208-222.

Grover, Rajiv and V. Srinivasan (1987), "A Simultaneous Approach to Market Segmentation and Market Structuring," Journal of Marketing Research, 24 (May), 139-153.

Huber, Joel, John W. Payne, and Christopher Puto (1982), "Adding Asymmetrically Dominated Alternatives: Violations of Regularity and the Similarity Hypothesis," Journal of Consumer Research, 9 (June), 90-98.

Huber, Joel and Christopher Puto (1983), "Market Boundaries and Product Choice: Illustrating Attraction and Substitution Effects," Journal of Consumer Research, 10 (June), 3144.

Luce, R. Duncan (1959), Individual Choice Behavior, New York: John Wiley.

Luce, R. Duncan (1977), "The Choice Axiom After Twenty Years," Journal of Mathematical Psychology, 15 (2), 215-233.

Ratneshwar, Srinivasan, Allan D. Shocker, and David W. Stewart, (1987) "Toward Understanding the Attraction Effect: The Implications of Product Stimulus Meaningfulness and Familiarity," Journal of Consumer Research, 13 (March), 520-531.

Ryans, Adrian B. (1974), "Estimating Consumer Preferences for a New Durable Brand in an Established Product Class," Journal of Marketing Research, 11 (November), 434-443.

Shocker, Allan D. and V. Srinivasan (1974), "A Consumer-Based Methodology for the Identification of New Product Ideas," Management Science, 20 (February), 921-937.

Tversky, Amos (1972), "Elimination by Aspects: A Theory of Choice," Psychological Review, 86 (November), 542-593.

Urban, Glen, Philip L. Johnson, and John Hauser (1984), 'Testing Competitive Market Structures," Marketing Science, 3 (2), 83-112.