Variety Seeking Among Songs Which Vary in Similarity

Michael R. Hagerty, University of California, Berkeley
ABSTRACT - A basic model of variety seeking among musical selections was tested against some refinements. Preference was found to decrease for songs similar to the song just heard, but the increase in predictive accuracy over the simpler model was only 1%. A new method of estimating variety seeking models is suggested and evaluated.
[ to cite ]:
Michael R. Hagerty (1983) ,"Variety Seeking Among Songs Which Vary in Similarity", in NA - Advances in Consumer Research Volume 10, eds. Richard P. Bagozzi and Alice M. Tybout, Ann Abor, MI : Association for Consumer Research, Pages: 75-79.

Advances in Consumer Research Volume 10, 1983      Pages 75-79


Michael R. Hagerty, University of California, Berkeley


A basic model of variety seeking among musical selections was tested against some refinements. Preference was found to decrease for songs similar to the song just heard, but the increase in predictive accuracy over the simpler model was only 1%. A new method of estimating variety seeking models is suggested and evaluated.


Individual choice behavior has been difficult to predict well (Bass 1974). Subjects are generally observed to switch brands without any external change in the brands offered. Variety seeking has been one effect suggested to account for this, in which an internal change in the organism occurs when a brand is consumed such that preference for it declines. Jeuland (1978) suggested a basic model of variety seeking, in which prior consumption of a brand is captured by an experience function. The more recently a consumer has experienced the brand, the larger he experience function E(t) at the present time t. Overall preference Pi(t) for a brand i at time t is then

EQUATION     (1)

where Gi is general preference for brand i, k is a constant for each person denoting how important variety seeking is to him, and Ei(t) is the experience function with brand i. Thus, the more experience with a brand, the less preferred it becomes. Ei(t) in turn is defined

Ei(t) = cEi(t-1) + di(t)     (2)

where c is a forgetting parameter between zero and one, and di(t) is equal to 1 only if brand i was chosen on this trial, otherwise zero. Thus experience on the present trial is composed of some fraction of the experience function on the last trial, plus the experience on this trial di(t). As c grows larger, less is forgotten from trial to trial. Jeuland's model effectively captures some important concepts of variety seeking: (1) as experience increases, preference decreases, and (2) as time from last experience increases, preference recovers. This paper considers two more refinements which may be desirable to include in models of variety seeking.

The first possible refinement is in similarity effects. If brand i is very similar to brand j, one would expect preference for i to decrease when brand j is experienced. This is not predicted in Jeuland's model. However, the model could be modified to do so by allowing di to vary between 0 and 1. Thus the experience added on the present trial, di(t), would be 1 if brand i were consumed, 0 if a completely dissimilar brand were consumed, and would move toward 1 as the consumed brand becomes more similar. McAlister (1982) suggests a related model which predicts the similarity effect from how close together two stimuli are rated on important-attributes.

The second potential refinement is inhibition effects. The experience function in (2) assumes that forgetting is affected only by passage of time. However, a period of rest might be less effective in forgetting a brand than an equal period in which other brands are consumed, since the intervening brands could "jumble" preceding memories. This effect exists for memory of verbal learning, where material is forgotten more quickly when material similar to it is interposed between learning and testing, and is called retroactive inhibition. If this effect exists for variety seeking it implies that the experience function depends not only on the passage of time as in Equation 2, but by what is experienced during that time. Further, the inhibition effect sometimes counters the similarity effect: suppose a brand i is consumed, and we wish to restore preference for it as soon as possible. The similarity effect predicts that we should give the consumer brands as dissimilar as possible to i, to prevent further decrease in preference for i. On the other hand, the inhibition effect predicts that we should give the consumer brands as similar as possible to i, to increase forgetting.

To test these effects, consumers listened to recorded songs, then rated which song they wished to hear next. Thus the different songs constituted "brands," and listening constituted "consumption." Songs were used for several reasons. First, variety seeking among songs seems very high, with few people playing the same song twice in a row. This makes variety seeking effects easy to evoke. Second, "consumption" or the sequence of songs may be easily and realistically manipulated to increase the range of effects. In contrast, "free choice"studies allow consumers to witch brands when they wish. Thus the fall in preference is limited in range. Third, preference for sequences of songs is of direct interest to radio programmers.

In summary, this paper tests two hypotheses which are refinements to the basic model of variety seeking:

H1: (Similarity) Preference decreases for songs similar to the song just heard.

H2: (Inhibition) Preference recovers more quickly when similar songs intervene than when a silent break intervenes.

In addition to testing for each or these effects, models of the effects are estimated and the increase in predictive ability is measured. The next section describes a problem in testing hypotheses in variety seeking, and proposes a solution. Then the solution is used to test the hypotheses. Finally, some implications are suggested.


In order to measure a subject's preference Ri(t) after different levels of experience, subjects were asked to rate their preferences for songs after they listened to each song in the sequence. Each rating period was called a trial. Under usual model testing procedures, the different models would be fit directly to these ratings. However, in studies of variety seeking this cannot be done since ratings may not be comparable from trial to trial. Figure la contains such an example with hypothetical data. At trial 1, the subject has heard nothing yet and prefers song E over A, which is preferred over B. At trial 2, E has just been played, so its preference is lower, while A and B are not affected. So far we have described the subject's true preferences, yet there are reasons why his rated preferences may look more like Figure 1b. In trial 2, his true preferences have a smaller range than in trial 1, so the subject might expand the range of ratings to cover the whole rating scale. The result is in Figure 1b, where A appears to have moved up in preference relative to trial 1, even though this is an artifact of the subject's desire to use the entire rating scale. Helson (1964) has shown that ratings are indeed affected in this way when new "anchors" are presented which change the range of the stimuli.


In summary, the problem with using the ratings over several trials is that ratings may not be comparable across trials: subjects can compare different songs on the same trial, but may not be able to compare the same song on different trials. One possible solution to this problem is to fit models by maximizing the sum of correlations between predicted and actual for each trial (each column in Figure lb) separately. This prevents any comparison across trials, and has been used by McAlister (1982). However, this method does not allow specific hypotheses about changes across trials to be tested.

A second solution is suggested here which does allow hypothesis testing. The idea is to find the transformation which takes the rated preferences in Figure 1b back to the true preferences in Figure 1a. If we can find the true preferences, then trials may be directly compared, and the hypotheses tested. Under the assumption that Helson is right, true preferences can be obtained by (1) finding 2 songs (standards) which can be assumed to remain stable in preference, and (2) apply that linear transformation to the ratings which yields no change for the standard songs between trials. This can be done in Figure 1b by assuming that preferences for A and B have not changed, since they have not been played, and are not similar to E. Therefore the line linking the A's and the line linking the B's should be horizontal. The B line is already horizontal. The A line can be made so by shrinking all of the ratings in trial 2. The result of this shrinking is just what is wanted: the true preferences in Figure la.

To summarize, if one can be reasonably certain that preference for two standard stimuli have not changed relative to the other stimuli, one can compare across trials. A similar idea is necessary, for example, to compare the height of growing plants across weeks. One needs a standard ruler, whose length is assumed not to grow, to compare how much the plants have grown from week to week. This solution can now be generalized to 3 or more standard stimuli and 3 or more trials. Let S be the number of standard stimuli and T be the number of trials. The idea is to minimize the squared deviations in each standard stimulus from trial to trial. This is minimized by finding the linear coefficients bt and at to transform the rating Rst to the "true" preference at+btRst. That is,

EQUATION     (3)

The first two summations simply add deviations across all possible pairs of trials, and the third adds deviations across all standard stimuli. The solutions for the coefficients b and a can be found by the method of least squares. First the degenerate solution of b = a = 0 must be ruled out by constraining b1 = 1 and a1 = 0. That is, the first trial is left as it is found and the rest of the trials are adjusted to it. Next, derivatives are taken with respect to at and bt, and are set to zero. This yields a set of 2(T-1) simultaneous linear equations to be solved of the form:

EQUATION     (4)

EQUATION     (5)

These can always be solved (as long as there are 2 or more standard stimuli) by standard computer routines to yield the coefficients b and a which minimize the squared deviations of each standard stimulus from trial to trial.

This simultaneous equation method is similar to running T-1 simple regressions on the standard stimuli between each trial and the following trial. This would also yield coefficients b and a to transform each trial. However, because minimization is done separately for each trial instead of simultaneously, the simultaneous equation approach will yield a lower overall minimum, and is therefore preferred.


Subjects were 88 undergraduate students in an introductory marketing class at a large state university. They were required to participate in some experiments as part of a subject pool requirement.

Stimuli were six recorded instrumental and vocal musical pieces of about 60 seconds' length. These were Polythene Pam (P) by the Beatles, Sergeant Pepper (S) by the Beatles, Cannonball Rag (C) by Merie Travis, Masterpiece Theatre Theme (M) by Mouret, Triumphal March (T) by Campra, and Fight Song (F) by University of Illinois Marching Band.

Subjects first listened to all 6 stimuli to acquaint themselves, then listened to a sequence of musical selections which included the stimuli. Half of the subjects listened to the sequence PSCMT/FSSFTTXSSBT called Sequence 1, and half listened to the sequence PSCMT/FSSBTXSSFTT. Here X refers to a song not among the 6 stimuli. B means a 2-minute break where no song was played or rated. The songs before the slash were listened to but not rated, and served to acquaint the subJects with the songs they were rating. After each song following the slash was heard, subjects rated the 6 stimulus songs with respect to how much they would like it played next. It was stressed that they should not rate overall preference. but preference for hearing each song if it were played next. Subjects were instructed to locate the initials of the 6 songs along a line to show their preferences, where the most preferred song was placed on the left end, the least preferred on the right. Subjects were run in groups of about 10.

At the end of the session, subjects were asked to rate each or the 6 stimuli on 4 dimensions, including degree or complexity, smoothness of sound, degree of orchestration. and level of emotional simulation. These dimensions were developed by Batsell (1980) in a study of song preferences. Ratings were on an 11-point scale used by McAlister (1979), ranging from 0 to 10, where 0 meant the song had none at all of the attribute, and 10 meant the song had the maximum amount conceivable in music.


As a manipulation check on similarity, the difference in each of the 5 rating scales was computed for each pair of songs for each subject. These differences were squared and summed to form a Euclidean distance between each pair of songs. The resulting distances have often served to estimate dissimilarity between stimuli (Torgerson 1958). The distances are given in Table 2. As expected, the P-S and M-T pairs are rated far more similar than any ocher.



The ratings of the 6 songs averaged over the subjects in Sequence I are shown on the ordinate in Figure 2, as a function of the sequence played. Thus, immediately after the right Song was played in the first sequence, the average subject rated F lowest and M highest. S was played next, which reduced its ranking from second to firth. The rating of the song just played is expected to show the fastest decrease. and it does for all trials.



As we have seen in the introduction, the change in rating between trials cannot be directly interpreted, because the subjects' task was only to rate the 6 songs relative to each other within a trial. Thus Figure 2 was transformed by the simultaneous equation method to allow direct comparison across trials. Three standard songs were used: C,M, and F. These 3 songs were rarely played during the ratings and are therefore expected to change little in-preference. The simultaneous equation method adjusted each trial's ratings for each subject to minimize the change in ratings from trial to trial for these 3 standard songs. These transformed ratings are shown in Figure 3, averaged over the subjects in Sequence 1. The lines representing C, M, and F are dashed. They should be nearly horizontal lines, indicating no change in rating from trial to trial. Except for trial 1, the C, M, and F lines do ap-pear nearly horizontal in Figure 3. Trial 1 is the major-exception where the variance or spread of ratings is considerably larger than in succeeding trials. This seems to be a regression toward the mean effect which is also observed in simple regression. Since the first trial deviates from the others, this trial was omitted from further analysis.

Test of H1 (Similarity Effects)

The song S was rated 3 times in each sequence (excluding trial 1). The resulting decrease in rated preference for S was compared to the simultaneous decrease in preference for P (judged very similar to S) for each subject. Since these two were rated very similar, hypothesis 1 predicts almost as much decrease in P as in S. Inspecting Figure 3 reveals that P decreased moderately. The median decline in P over subjects was only 38, of the decline in S, with a 95% confidence interval between 58% and 2%. This interval does not include zero, thus supporting Hypothesis 1. However, the magnitude of the effect seems small, considering that S and P were chosen to be as similar as possible (same artists, same musical style).

To test how well the similarity model can actually improve predictions, the model was estimated and cross-validated, and compared to the simpler Jeuland model. The similarity model was estimated identically to the Jeuland model in Equation 1 and 9, except that dp (t) was allowed to vary between zero and one when S was played, as was dg(t) when T was played. This captured the effect of similar songs. Both models were estimated using the data from 9 of the 10 trials for each subject. The tenth trial was a cross-validation trial, where predictions were correlated with the actual ratings.

The two models were each estimated in two ways described in the Analysis section. The first is similar to McAlister (1982), where the parameters of the model are estimated by the gradient search computer routine STEPIT, which maximizes the sum of the 9 correlations between predicted ratings and actual ratings on each trial. This method avoids completely the problem of comparing ratings across trials. The second estimation method used not the raw ratings but the ratings transformed by the simultaneous equation process. These ratings should be comparable across trials. The parameters were estimated by STEPIT, which could now maximize the overall correlation between predicted ratings and actual.

The estimation and cross-validation was done twice for each subject for each model, using 2 different validation trials. The results are shown in Table 2 for the two models and the two estimation methods. The largest effect was due to estimation method, with an average cross validated squared correlation 4% higher for McAlister's method than for the simultaneous equation method (F(1,87)=7.S, p <.01). The next largest effect was due to the models, with a squared correlation 1 higher for toe similarity model than for the Jeuland model. This was only marginally significant (F( 1. 87)= 2.97, p< .1). All other effects were not significant. Thus the similarity model-could improve prediction by only 1%.



Test of H2 (Inhibition Effects)

The recovery of S (its increase in rated preference) between trial 3 and 5 in Sequence I was compared to its recovery between trial 3 and 5 in Sequence 2. The only difference between these two conditions is that subjects in Sequence 2 had a 2-minute break, instead of hearing F as in Sequence 1. Both sequences had heard S on trial 3. Since total elapsed time since hearing S was the same in both sequences, Jeuland's model predicts that the recoveries should be equal. On the other hand, the similarity hypothesis predicts that recovery would be slower in Sequence 1, where the song F was played. This is so since F is more similar to S than a break. Instead, recovery was faster in Sequence 1, with mean recovery of 10.2, than in Sequence 2 with mean recovery of 2.7, consistent with Hypothesis 7. However, the difference was only of marginal significance (F(1,86)=2.29, p < .14).


The similarity hypothesis was confirmed, with similar songs decreasing in preference as well as the song just played. However, the magnitude of this effect was quite small, increasing predictive accuracy by 1,o. Can we improve the predictive accuracy of the similarity model by improving the estimation methods? For example, increasing the number of trials for which data are gathered would improve the accuracy of parameter estimates. Alternatively, subjects with similar responses might be clustered, and parameters estimated on more reliable average ratings. A bound on how much such improved estimation methods could improve predictive accuracy was obtained by a simulation. First, the predictions of the best fitting similarity model were obtained. Then the Jeuland model was estimated from these predictions. Finally, the predictions of the Jeuland model were correlated with a hold-out sample. This procedure gives the maximum possible increase in squared correlation for the similarity model, where the data and estimation procedure contain no error at all. Unfortunately, this maximum obtainable increase in R- for the similarity model is only 1.5%, or an increase of .5% over the present estimation method. Considering that this study purposely chose some especially similar songs, It hardly seems worth the effort of adding the similarity effect to the basic Jeuland model.

The results also showed that the simultaneous equation method of adjusting ratings to make them comparable across trials was workable, though its predictive accuracy was slightly (3;') lower than McAlister's method. Thus the McAlister method would seem more useful when the objective is to maximize prediction. However, when testing specific hypotheses about variety seeking, the simultaneous equation method is the only one available, and provides nearly comparable predictive accuracy. The simultaneous equation method does require pre-planning, since at least two standard stimuli must be included in the design for subjects to rate.

Variety seeking models have some application to radio programming. A typical policy for popular music stations is to play each song on the playlist cyclically, with more popular songs having shorter cycles. If a listener allows the similarity model of variety seeking and listens continuously, then this policy would not be optimal, but would need adjustment according to how similar each song is to the other songs. Thus, an unusual or dissimilar song should get more airplay, even though it is less preferred than others. This might help explain the success of "novelty" songs such as "Monster Mash" which many listeners said were not very good, but were "different" or "fun."

It is not clear that a variety seeking model tan explain all of listeners' preferences in popular radio. Under all the models considered here, a station with access to all or the hits of the last 10 years should almost never repeat a song: once a song is played, its preference should decrease, and songs from other years just as good in overall preference should be preferred next. Instead, "oldies" are rarely played on popular radio. Perhaps overall preference for a song changes over time, such that newer releases are more pre,erred. Possible future work might extend variety seeking models by examining how listeners trade off the "newness" of a song with how recently it has just been played.


Bass, Frank M. (1974), "The Theory of Stochastic Preference and Brand Switching," Journal of Marketing Research, 11, 1-70.

Batsell, Richard R. (1980), "Consumer Resource Allocation Models at the Individual Level," Journal of Consumer Research, 7, 78-87.

Helson, H. (1964), Adaptation-Level Theory, New York: Harper and Row.

Jeuland, Abel P. (1978), "Brand Preference over Time," Proceedings of the American Marketing Association, S. C. Jain (ed.), 33-37.

McAlister, Leigh (1979), "Choosing Multiple Items from a Product Class," Journal of Consumer Research, 6, 213-224.

McAlister, Leigh (1982), "A Dynamic Attribute Satiation Model of Variety-Seeking Behavior," Journal of Consumer Behavior, 9, 141-150.

Torgerson, Warren S. (19;8), Theory and Methods of Scaling, New York: John Wiley.