Testing the Stability of Market Segmentation Analysis

ABSTRACT - The validity of segments formed by search procedures is of major concern to marketers. Currently split-half analysis is the most recommended method, but this requires large samples. This paper presents an alternative validation method which does not require large samples. The procedure is illustrated using a segmentation by AID analysis.


Tony Schellinck and Ian Fenwick (1981) ,"Testing the Stability of Market Segmentation Analysis", in NA - Advances in Consumer Research Volume 08, eds. Kent B. Monroe, Ann Abor, MI : Association for Consumer Research, Pages: 723-727.

Advances in Consumer Research Volume 8, 1981      Pages 723-727


Tony Schellinck, Dalhousie University

Ian Fenwick, Northeastern University

[Part of this research was supported by a grant from the Dalhousie School of Business Administration and the assistance of Marie Kalbfleisch.]

[Tony Schellinck is Assistant Professor of Marketing at Dalhousie University; Ian Fenwick is Visiting Associate Professor of Marketing at Northeastern University.]


The validity of segments formed by search procedures is of major concern to marketers. Currently split-half analysis is the most recommended method, but this requires large samples. This paper presents an alternative validation method which does not require large samples. The procedure is illustrated using a segmentation by AID analysis.


Market segmentation is well entrenched as an integral part of marketing theory, but market segmentation analysis still faces many problems. As Wind (1978) points out: The validity of segmentation research is by far the most crucial question facing management. Do the segments discovered in a segmentation study exist in the population? Is the estimated segment size accurate? And how accurate are the estimated segment responses to the firm's marketing actions?

These questions are particularly important when the researcher uses a clustering based segmentation method. Powerful search techniques scan the data and will almost certainly identify a structure. The authors have yet to find a real data set that did not yield distinct, interpretable clusters from at least one clustering algorithm. Unfortunately the identification of such "segments" in the data is no guarantee of their existence on the ground. The susceptibility of search procedures to sampling error is well known (Doyle and Fenwick 1975) and there is a real danger that the segmentations produced may be highly unstable. Typically, consumer researchers have few a priori hypotheses about their segments and consequently can rationalize almost any segmentation, spurious or not,

Validating segment structures is recognized as a particularly difficult task (Anderberg 1973; Frank and Massey 1975; Hartigan 1975; Sherman and Sheth 1975). Replication is the most frequently recommended validation method, usually in the form of a split-half or hold-out analysis (Frank and Massey 1975; Mehrotra and Wells 1978). These methods analyze one half of the data and use the other half to test the analysis' predictions. In the case of segmentations, groupings derived from one half of the sample may be compared with those derived from the other.

Split-half analysis has two major problems. First, for a viable split-half test, half the data set has to be sufficient for the analysis technique being used. Consequently the researcher has to collect a sample twice the normal size. In practical terms this often translates into either no attempt at validation or else two analyses both based on inadequate samples.

Second, split-half testing will identify discrepancies between the analyses applied to the different halves, but will offer no clue as to which is the better representation of reality. If the split-halves fail to agree in some respect (e.g. factor structures differ; segmentations change; or predictive power is low) it is usually impossible to tell whether the differences are due to idiosyncrasies of the particular split used or genuine indications of an unstable analysis. If the sample had been split in a different way, different variations in the analyses might have been observed and different results labeled unstable. Indeed researcher's reaction to split-half disagreement is varied. While most applications simply drop all parts of the analysis that are not identical in both split-halves, some actually retain all results that are of importance in either split-half (Roberts and Wortzel 1979).

It would be of much greater practical use to have a validation method that was based on more than two sample variations. This could test stability and offer a majority opinion of what the true structure was like. The method presented, here called "N + 1 Sample Analysis", allows just such an estimation of stability without requiring additional data collection.


N + 1 Sample Analysis is based on the jackknife procedure (Tukey 1958). The jackknife requires that the available sample be split, at random, into N equal-size subsets. The analysis to be jackknifed is carried out on the whole sample, and then repeated omitting each subset of the sample in turn. Each of these analyses on the sample with a subset omitted yields estimates pi, i = 1,N. The jackknifed estimator, J(p), is a weighted difference between the mean of these estimates and the results for the whole sample analysis:

J(p) = Npall - (n-1)p,    (1)

where pall is the estimate derived from the whole sample analysis and B the mean of the pi values. Jackknifing will often reduce bias and enables significance tests to be performed (for a further discussion of the jackknife and its use in marketing, see Fenwick 1979).

As it stands the jackknife procedure cannot be applied to testing segment stability. It is designed for a parametric rather than grouping methods. However the principle of N + 1 separate analyses - one on the whole sample and N on samples with a subset omitted - is distinctly relevant. The N analyses of samples with a subset omitted can be used to validate the grouping formed in the whole sample analysis. In the example presented here N + 1 Sample Analysis is used to validate a segmentation produced using AID.


AID (Automatic Interaction Detection) developed by Morgan and Sonquist (1963) is frequently used as a segmentation method (Assael and Roscoe 1976; Sheth and Roscoe 1972; Gensch 1978; Assael 1970). AID employs a sequential dichotomization technique to partition the sample into progressively smaller groups so as to maximize between group differences. This analysis forms a tree diagram the end groups of which are used to define market segments.

AID is notoriously unstable. Its developers cautioned against its use on samples smaller than 1000 and validation was strongly recommended. Validation has in fact rarely been attempted, and where it has AID has not held up well (Doyle and Fenwick 1975). Nonetheless AID is very appealing as a segmentation tool. It offers the capability to handle ratio to, interval, ordinal or nominal data. It produces an analysis that is easily explained to management and therefore has a good chance of implementation.

N + 1 Sample Analysis allows AID to be validated without using a large sample. Moreover the method enables individual segments to be assessed, so stable groupings may be retained and unstable ones discarded. Similarly, outliers, sample members that do not fit within the segmentation, may be identified and either discarded or perhaps set aside for more detailed analysis.


The example presented concerns a segmentation analysis of data collected from telephone interviews with 200 households. Using normal methods this sample size would probably prevent any split-half validation and certainly prohibit the use of AID. Yet samples of this size are by no means unusual, particularly when the interview is detailed and the incidence of qualified respondents low. The dependent variable measures the adoption of a service. The predictors used here are 20 general AID items, found in pretesting to be related to the service in question.

The N + 1 Sample Analysis involves three stages. First, AID is applied to the whole sample. Using the "standard" stopping criteria (split eligibility of .01, split reducibility .01, minimum group size for splitting 40) in our case produced eight splits, forming 9 end-groups (segments). The AID tree from this analysis appears in Figure 1.



Second, the sample is split, at random, into equal-sized subsets and the AID analysis repeated, omitting each subset in turn. Although using identical stopping criteria these analyses are likely to produce rather dissimilar results. In the example presented here, 10 subsets of 20 cases each were formed, and so 10 further AID analyses were performed, each analysis omitting 20 cases. Both the configurations of the AID trees and the predictors entering the analyses varied considerably over the different runs.

The AID analysis appears to be affected by small sample variations. However, changes in the shape of the tree and changes in the predictors used are not by themselves evidence of unstable segments. If there is multicollinearity it is quite possible for the membership of the end-groups (segments) to remain unchanged although the variables defining them alter. So different AID trees could in fact be grouping together the same individuals. If we are to test the stability of segments we must look at segments, not the variables defining them.

The final stage of the N + I Sample Analysis is to compare the segment found in the whole sample analysis with those obtained from each of the runs with a dace subset omitted. Before any meaningful segment comparison can be made it is necessary to align the segments - i.e., make sure we are comparing the most similar groups. Accordingly, for our example, the 9 segments derived in each of the runs with a data subset omitted were arbitrarily lettered. Ten tables can now be formed comparing each of the subset omitted analyses with the whole sample analysis. An example of one of these cables is shown in Table 1.



For each of the segments formed in the whole sample analysis this table shows how that segment's membership was distributed over the segments formed in the first subset omitted analysis. For example, all the members placed in segment 1 in the whole sample analysis were in segment A of this particular subset omitted analysis. The members of segment 4 were considerably more dispersed: 46% turned up in segment D, 15% in segment E and 39% in segment E. The columns of the table are now re-ordered so as to maximize the diagonal elements, and the columns numbered. This procedure is repeated for each of the ten tables. This ensures that segments with the same number are indeed the most similar, i.e., that we are comparing like with like.

It is now possible to evaluate the segmentation. Three measures can be calculated. Shared membership, or common core, this estimates the proportion of a segment that would still be placed together were the analysis repeated. That is, if the original segment is stable then members should consistently show up in a common segment in each of the repeated analyses. Distinctiveness, an indication of the size of the shared membership relative to the segment as a whole. Members of the original segment should be the only members in a particular subset analysis segment. And the number of outliers, individual cases that are not well modeled by the segmentation.

Table 2 presents shared membership percentages for this set of data. Segment 1 is perfectly stable, 100% of its members are always classified together. Segment 2 is close to perfect stability with an average shared membership of over 90%. Segments 6 and 7 are rather unstable: on average they retain only slightly more than half of their original members.

Shared membership is not the only criterion however. In particular shared membership could be high as a result of comparing a small whole sample analysis segment with a large subset omitted analysis segment. We also require that the segment be distinctive or unique, i.e., the shared membership should be a major part of the whole segment. Table 3 shows the percentage of shared membership in each segment.

Notice segment 1 is unique. No other cases are ever included with those from segment 1. Segment 3 is rather less distinct. For example, in the 7th analysis, although shared membership was 92% (Table 2), this membership made up only 55% of the total segment. Segments 4,6,7, and 9 are particularly indistinct with a large proportion of their members drifting in and out.

Finally, it is possible that low shared membership may be the result of outliers. A few cases in the whole sample analysis segment may fail to be consistently segmented and so cloud the performance of the segment as a whole. Table 4 presents a frequency distribution for each segment





showing the number of times each case remains within the common core. Thus, for segment 1 all members are consistently segmented in every run in which they appear (notice the maximum number of subset omitted analyses for any case is 9, in one run it is in the subset excluded). In contrast segment 7 has not a single case that is consistently segmented in every run, and 5 cases only appear in this segment twice. However there are no bimodal distributions: segments with poor cases lack good cases as well. So outliers do not appear to be biasing the percentages in tables 2 and 3. If some segments had displayed a concentration of outliers these could have been excluded and Tables 2 and 3 re-computed.


The N + 1 Sample Analysis described has enabled the stability of a segmentation to be assessed, despite small sample size. Overall, 2 segments were found to have unacceptably low shared membership and 2 more to be insufficiently distinct. For the purposes of this analysis a segment was considered unstable if it failed to retain, on average, 60% of its original members or if the membership retained made up less than 60% of the whole segment. Clearly these cut-off proportions will vary with the purpose of the analysis. Exploratory research may well



accept segmentations considerably below those used here; high investment decisions will probably wish to impose more stringent requirements. At the end of an N + 1 Sample Analysis the researcher has a clear idea of the segments in the data and of their likely stability. An informed decision can be made on which segments to exploit, which to reject and which require further investigation. Although the example presented here uses the AID algorithm the method is suitable for any segmenting search procedure, Of course nothing can make up for any unrepresentativeness in the sample. Like the jackknife itself, N + 1 Sample Analysis merely makes more intensive use of the available data; the relevance of its results depends on the sample. Similarly, unstable segments show that the sample used is too varied to support that segmentation, not that a larger or more representative sample would not ensure stability. N + 1 Sample Analysis is intended to bolster confidence in segmentation results, not replace large sample analysis. Despite all these reservations this method does offer unique advantages to researchers forced to use small samples, and provides at least a first step towards full-scale validation.


Anderberg, Michael R. (1973), Cluster Analysis for Applications, New York: Academic Press.

Assael, Henry (1979), "Segmenting Markets by Croup Purchasing Behavior: An Application of the AID Technique," Journal of Marketing Research, 7, 153-8.

Assael, Henry, and Roscoe, A. M., Jr. (1976),"Approaches to Market Segmentation Analysis," Journal of Marketing, 40, 67-76.

Doyle, P. and Fenwick, I. (1975), "Pitfalls of A. I. D. Analysis," Journal of Marketing Research, 12, 408-413.

Fenwick, Ian (1979), "Techniques in Market Measurement: The Jack-Knife," Journal of Marketing Research, 16, 410-414.

Frank, R. E. and Massey, W. F. (1975), "Noise Reduction in Segmentation Research," in J. U. Farley and J. A. Howard (eds.), Control of Error in Market Research Data.

Gensch, Dennis H. (1978), "Image-Measurement Segmentation;' Journal of Marketing Research, 15, 384-94.

Hartigan, John A. (1975), Clustering Algorithms, Toronto: John Wiley & Sons.

Mehrotra, Sunil and Wells, William D. (1977), "Psychographics and Buyer Behavior": Theory and Recent Empirical Findings," in Arch Woodside, Jagdish Sheth and Peter Bennett (eds.), Consumer and Industrial Buying Behavior, Elsevier North-Holland, N.Y., 49-66.

Morgan, James N., and Sonquist, John A(1963), "Problems in the Analysis of Survey Data, and a Proposal," Journal of the American Statistical Association. 58, 415-34.

Roberts, M. L. and Wortzel, L. H.(1979), "New Life-Style Determinants of Woman's Food Shopping Behavior," Journal of Marketing, 43, 28-39.

Sherman, Lawrence and Sheth, Jagdish N. (1975), "Cluster Analysis and Its Applications in Marketing Research," Working Paper, University of Illinois.

Sheth, Jagdish N., and Roscoe, A, Marvin, Jr. (1972), "Demographic Segmentation of Long Distance Behavior: Data Analysis and Inductive Model Building," Faculty Working Paper, College of Commerce and Business Administration, University of Illinois at Urbana- Champaign.

Sonquist, J. A. and Morgan, J. N. (1964), The Detection of Interaction Effects, Ann Arbor: Institute for Social Research, University of Michigan.

Tukey, J. W. (1958), "Bias and Confidence in Not-Quite Large Samples," (abstract) Annals of Mathematical Statistics, 29, 614.

Wind, Yoram (1978), "Issues and Advances in Segmentation Research," Journal of Marketing Research, 15, 317-337.



Tony Schellinck, Dalhousie University
Ian Fenwick, Northeastern University


NA - Advances in Consumer Research Volume 08 | 1981

Share Proceeding

Featured papers

See More


Does a Blurry Background Make a High Roller? The Effects of Blurry versus Sharp Backgrounds on Consumers’ Risk-Taking Behavior

Yoonho Jin, INSEAD, Singapore
Amitava Chattopadhyay, INSEAD, Singapore

Read More


Can Making Family Salient Improve Retirement Contributions? Evidence from Field Experiments in Mexico

Avni Shah, University of Toronto, Canada
Matthew Osborne, University of Toronto, Canada
Jaclyn Lefkowitz, IDEAS42
Andrew Fertig, IDEAS42
Dilip Soman, University of Toronto, Canada
Nina Mazar, Boston University, USA

Read More


Ineffective Altruism: Giving Less When Donations Do More

Joshua Lewis, University of Pennsylvania, USA
Deborah Small, University of Pennsylvania, USA

Read More

Engage with Us

Becoming an Association for Consumer Research member is simple. Membership in ACR is relatively inexpensive, but brings significant benefits to its members.