Effects of the Mass Media News on Trends in the Consumption of Caffeine-Free Colas

David P. Fan, University of Minnesota
Carol L. Shaffer, University of Minnesota
ABSTRACT - The influence of the news media on consumer behavior is explored using an extension of procedures employed successfully in the past to predict public opinion from the press. For the current caffeine and cola study, texts of news articles were retrieved from the NEXIS computer database and scored by computer. The story scores were used to project expected time trends in overall consumption of caffeine-free colas. The projections show a dramatic increase in expected sales of the caffeine-free product in the early 1980s accompanying the introduction of the colas. Then the model predicts a plateau of little change in market share in good agreement with overall sales of caffeine-free colas.
[ to cite ]:
David P. Fan and Carol L. Shaffer (1990) ,"Effects of the Mass Media News on Trends in the Consumption of Caffeine-Free Colas", in NA - Advances in Consumer Research Volume 17, eds. Marvin E. Goldberg, Gerald Gorn, and Richard W. Pollay, Provo, UT : Association for Consumer Research, Pages: 406-414.

Advances in Consumer Research Volume 17, 1990      Pages 406-414

EFFECTS OF THE MASS MEDIA NEWS ON TRENDS IN THE CONSUMPTION OF CAFFEINE-FREE COLAS

David P. Fan, University of Minnesota

Carol L. Shaffer, University of Minnesota

[Supported in part by Public Health Service Research Grant MH 39610.]

ABSTRACT -

The influence of the news media on consumer behavior is explored using an extension of procedures employed successfully in the past to predict public opinion from the press. For the current caffeine and cola study, texts of news articles were retrieved from the NEXIS computer database and scored by computer. The story scores were used to project expected time trends in overall consumption of caffeine-free colas. The projections show a dramatic increase in expected sales of the caffeine-free product in the early 1980s accompanying the introduction of the colas. Then the model predicts a plateau of little change in market share in good agreement with overall sales of caffeine-free colas.

INTRODUCTION

Prior studies have already shown that the news media plays a significant role in influencing public opinion (Dearing and Rogers, 1988; Fan, 1988; Iyengar and Kinder, 1987; Page, Shapiro, and Dempsey, 1987). Since purchase patterns are related to opinions, the news media should also affect consumer behavior, especially for products such as caffeine-free colas since prices are the sa ne regardless of caffeine content, since caffeine has minimal effect on taste, and since all products of the same company are physically displayed together for easy customer choice. Therefore, advertising and news coverage are likely to be the major influences on the purchase of caffeine-free colas. Of the two, the news was likely to be dominant force since there was no incentive to produce such colas without messages about caffeine's effects.

METHODOLOGY

Four main steps were used to study the news and market share of caffeine-free colas: (1) retrieval of the texts of relevant news articles from an electronic database, (2) performance of a computer content analysis of the articles, (3) calculation of mathematical projections of consumption trends, and (4) comparisons of the calculated trends with actual opinion values or sales figures. The analysis follows other studies demonstrating that time trends of public opinion could be predicted from computer content analysis scores of news stories from the Associated Press (AP) wire service. Topics of previous studies have included consumer sentiment (Fan, 1988; Tims, Fan, and Freeman, in press) and the 1987 - 1988 presidential contest between George Bush and Michael Dukakis where the time trend calculated from 2603 AP stories was within 2.7 percent, on average, of 120 actual published poll values (Fan and Tims, 1989).

RESULTS

Stories mentioning both "caffeine" and "soft drinks" or "colas" were retrieved from the AP because this news wire had already been used successfully for opinion calculations (see references in previous section) and because this news wire is relied upon by most written and electronic news organizations in The United States. The analysis ran from January 1, 1977, well before the discussion of caffeine-free colas, to August 7, 1989 when the analysis was begun. All text within 50 words of the words "caffeine," "cola" or "soft drink" was retrieved into the investigators' computer from all 238 identified stories (650,000 characters of text). Text at a greater distance from any of these words was usually irrelevant and was therefore not harvested.

This text was then analyzed by computer for its support of various positions pertinent to caffeine in colas. A reading of a random sample of the retrieved stories showed that their overall sense could be captured by examining only paragraphs containing the word "caffeine." Therefore, the computer was used to "filter" the text keeping only such paragraphs. The yield was 230,000 characters or 36 percent of the original retrieval. The paragraph was chosen as the unit of analysis because the beginnings of paragraphs are easy for the computer to identify and because AP paragraphs tend to be short, typically discussing one idea in one or two sentences.

The caffeine containing paragraphs were read by human analysts to determine the extent to which the paragraphs supported different positions. Four referred to disadvantages of caffeine consumption: harm to the central nervous system, causation of diseases (such as hypertension or heart disease), impediment to the development of fetuses or young children, and harm for unspecified reasons. The fifth position was that caffeine provided a positive benefit or at least did not constitute a health hazard.

The human analysts then constructed a dictionary and a corresponding set of rules to score the paragraphs for their support of the five positions. The scoring used the fact that the filtered paragraphs were all about caffeine since they all contained the word. Consequently, the mere presence of words mentioning children or pregnancy (e.g. "baby," "conception," "fetus," or "mother") implied that the drug was discussed in an unfavorable light. Most paragraphs mentioning these terms described caffeine's ill effects.

Similarly, words like "cancer" and "illness" led to paragraph scores linking caffeine with diseases; words like "addict" and "anxiety" suggested that caffeine was bad for the nervous system; and words like "avoid" and "problem" meant a score unfavorable to caffeine for nonspecific reasons. Scores for nonspecific disadvantages were also implied by phrases such as "caffeine-free," "no caffeine" and "not ... drink caffeine containing." This latter set of conditions meant that the mere mention of caffeine-free products was scored as adding to the available information claiming that caffeine-free colas were better than their caffeine containing counterparts. Words such as "alert," "benefit," and "analgesic" or "pain killer" resulted in a paragraph being scored as favorable to caffeine.

The dictionary also included negation words like "insufficient," "not," and "without." Rules were written so that a negation word close to a word in the "children" class would lead to a paragraph score favorable to caffeine. Thus, for example, a paragraph with the phrase "does not affect fetuses" was scored as favorable to caffeine. In the reverse case, a negation word close to a clause favoring caffeine resulted in a score unfavorable to caffeine for nonspecific reasons.

Doubt and ambiguity were also coded. If a paragraph had a favorable and/or unfavorable mention of caffeine, the computer was instructed to score the paragraph as half pro- and half anti-caffeine if that paragraph also had a conditional word like "disputed" or "inconclusive." Each paragraph had a total possible score of 1.0 with partial scores summing to this value. If a paragraph had phrases scored as being harmful for different reasons, the paragraph score was split between the reasons .

In developing the dictionary and rules described above, two different human analysts compared the computer decisions with impressions obtained by reading the text. The dictionary and corresponding rules were changed and refined until the computer decisions gave good approximations with human evaluations based on about 200,000 characters from stories selected at random. Then, the computer scored the entire text uniformly and consistently without further human interference. The broad applicability of the computer instructions is seen in the fact that 217 of the 238 retrieved stories had a score supporting at least one of the five chosen positions.

Since each score also had a date, it was convenient to plot all the scores as a function of time (Fig. 1). For this plot, each paragraph was given its scored value on the day of the AP dispatch. With each passing day, The score from each paragraph was decreased by one-half because the mathematical model of ideodynamics (Fan, 1988) has found that this rapid decrease in score value reflects the memory loss of the public. With a one day half-life, a paragraph effectively loses all its persuasive ability within a week.

The data in Fig. 1 show that there was relatively little news about caffeine in the cola context before 1979. In late 1978, the Food and Drug Administration (FDA) began reviewing studies implicating the substance as a possible cause of birth defects. In September 1980, the FDA added caffeine to a list of substances that pregnant women should avoid or use sparingly. By December, an FDA review panel recommended removing the drug from the list of substances Generally Regarded as Safe (GRAS) and placing it in a category of substances requiring additional study. In 1981, evidence challenging the FDA's warnings against caffeine began to appear in the news. From 1982 to the beginning of 1984, the press carried many reports debating caffeine's potential health hazards, primarily in the area of fetal development. By late 1984, discussion on the possible harms of caffeine had diminished (Fig. 1).

In addition to discussion about the health aspects of the drug, Fig. 1 also included substantial coverage of the soft drink manufacturers' introduction of caffeine-free colas in response to the caffeine controversy. Royal Crown introduced the first caffeine-free sugarless cola in March 1980. At that time the FDA required caffeine in sugared colas. In mid-October, the FDA proposed changing its regulations so that regular colas need not include caffeine. In March 1982 7-UP shook the soft drink industry with an anti-caffeine advertising campaign and the introduction of its caffeine-free cola, Like. PepsiCo and Coca-Cola decried the 7-UP campaign, but four months later, in July 1982, PepsiCo introduced its own caffeine-free colas, Pepsi Free and Sugar Free Pepsi Free (these were renamed Caffeine free Pepsi and Caffeine-free Diet Pepsi in 1988). Coca-Cola followed suit in April 1983, introducing caffeine-free versions of Coke, Diet Coke, and Tab.

The history of the market share for Coca-Cola and PepsiCo's caffeine-free colas is plotted in Fig. 2 beginning with the first full calendar year after their introduction (Maxwell, 1989). Only Caffeine-free Diet Coke has shown an increase in recent years. However, the overall market share of Diet Coke is so large that the increase in its caffeine-free version alone was enough to cause a small rise in the share of all colas without caffeine in 1987 and 1988 (Fig. 2, bottom frame).

The scores from the 238 AP stories on caffeine and soft drinks or colas (Fig. 1) were used in the mathematical model of ideodynamics to calculate expected purchases of caffeine-free colas. This model is consistent with previous reports noting that consumers respond positively to increasing amounts of information, although not necessarily in a linear fashion (Keller and Staelin, 1989; Meyer and Johnson, 1989; Alba and Marmorstein, 1987).

Ideodynamics argues that people change their purchasing patterns only in response to information favoring another product or opposing their current product choice. Information reinforcing preferences might cause greater product or brand loyalty but will cause no alterations in buying habits. Using these arguments, a fraction of the public currently purchasing a caffeine containing soft drink should switch to the caffeine-free product by persuasive information noting the dangers of this drug. If CF describes the number of consumers of caffeine-free colas in the population, and CC represents the number of purchasers of the caffeine containing variety at a particular initial time tl, then CF + CC = 100 percent at all times tn if the population of cola consumers does not change composition significantly during the study. It should be possible to compute time trends for both CF and CC from t1 to tn if the C values at tl and all intervening changes are known.

FIGURE 1

INFORMATIONAL PRESSURES RELEVANT TO THE PROBLEMS OF CAFFEINE IN STORIES MENTIONING BOTH SOFT DRINK OR COLA AND CAFFEINE

In any time interval t, the number of caffeine-free consumers CF is postulated to increase in proportion to the strength CF of the persuasive paragraphs describing problems with caffeine (Fig. 1, next to bottom frame). However, the increase should also be proportional to CC, the population still consuming the caffeine containing product since that is the population of potential converts. The more the potential converts, the larger should be the number of people whose habits are changed given the same persuasive information. There is also a constant of proportionality k incorporating the attention being paid to caffeine information and the natural resistance to switching buying patterns.

The reverse transition is also hypothesized in which CC, information about caffeine not being a problem (Fig. 1, bottom frame), would convert the caffeine-free drinkers CF to return to the caffeinated product. However, caffeine has always been in colas and it is conceivable that there is a residual preference for the caffeinated variety, perhaps due to a small taste difference or a desire to be kept awake by caffeine. In this case, there might be an additional persuasive force encouraging the drinkers of caffeine-free colas to return to the caffeine containing soft drink. This persuasive information could be modeled as being proportional to the amount of caffeine containing colas available. That is, the more colas sold with caffeine, the more the consumer would feel that it was permissible to consume the soft drink with the drug. In other words, the persuasive pressure favoring more caffeine would include not only GC from news messages saying that caffeine was beneficial or at least not harmful but also an added component proportional to CC, the number of drinkers of caffeinated colas. If the proportionality constant is c for CC, then the ideodynamic expressions for the numbers of consumers of caffeinated and caffeine-free colas at time t are

(Eq. 1) CF,t=CF,t-1 + k.GF,t.CC,t-1 - k.(GC,t +

c.CC,t-1).CF,t-1

CC,t = CC,t-1 - k.GF,t.CC,t-1 + k.(GC,t +

c.CC,t-1).CF,t-1

(see Fan, 1988 for additional details for derivation of these equations). In the first equation, the positive term reflects the recruitment of caffeine-free drinkers from the caffeine consuming group and the negative term is due to the loss of caffeine-free buyers due to return to the caffeine purchasing population. The same terms are also found in the second equation and keep the total population size at 100 percent.

The initial condition for equations 1 was that the entire population drank caffeine containing colas, the only kind in existence, on January 1, 1977 when the AP retrievals and calculations started. Then computations were made at t = 24 hour intervals with the newly calculated C values after each day being used for the computation on the subsequent day. The only two unknown parameters in this equation are constants k and c. These parameters were optimized to fit the actual consumption pattern for the overall consumption of caffeine-free colas beginning in 1984, the first year in which both Coke and Pepsi caffeine-free colas were sold during the entire calendar year (Fig. 2, bottom frame).

Comparison with Fig. 1 explains the shape of this projected time trend. Before 1980, there was relatively little information about caffeine, so there should be no change with 100 percent consumption of the caffeinated variety (Fig. 2). Then from 1980 to the beginning of 1984 there was a notable increase in anti-caffeine information (Fig. 1, next to bottom frame) with a much smaller corresponding increase in pro-caffeine news (Fig. l, bottom frame). This rise in anti-caffeine coverage caused the rapid increase in the projected preference for caffeine-free colas from 1980 to early 1984. Then, the anti-caffeine news decreased sharply (Fig. 1). At this time, the c.CC term began to be important leading to a plateau or even a slight decrease in the consumption of the caffeine-free product.

If parameter c in the c.CC term is increased, there will be a more pronounced drop in the caffeine-free market share as was found for all colas except Diet Coke. The c.CC term was justified on the grounds that some people might prefer to have caffeine in their cola for reasons other than those given in the mass media. Obviously, the reverse could also be true, other persons might not like the effects of caffeine, such as inhibiting sleep. Simple substitution into equations 1 of the original condition that CC + CF = 100 percent yields the conclusion that, if both CC and CF have different constant multipliers, then only one term is needed, in this case c.CC, reflecting the extent to which the constant for CC is larger than the one for CF.

For the studies described above, only AP stories discussing both caffeine and soft drink or cola were used. To account for the public using all information about caffeine and not just that in soft drink stories, another NEXIS retrieval was made, this time for all stories mentioning the word "caffeine." This retrieval included not only the AP but also the United Press International wire service, the New York Times and Washington Post newspapers, and U.S. News & World Report and Newsweek magazines. This analysis ran from September 26, 1980, the earliest date for which all the sources were present in the database, to July 27, 1989. With this larger number of data sources and the lack of requirement for the word caffeine, 1782 stories were identified and text was retrieved from all within 50 words of the word "caffeine" (2,500,000 characters of text). The same content analysis was applied as described above leading to the paragraph plots of Fig. 3. The first filtration to retain only paragraphs mentioning "caffeine" resulted in 1,400,000 characters or 56 percent of the retrieved text. The computer gave a score for at least one of the six scored positions for 1479 of the original 1782 retrieved stories.

FIGURE 2

ACTUAL AND CALCULATED MARKET SHARE OF CAFFEINE-FREE SOFT DRINKS

The paragraph plots in both Figs. 1 and 3 show that caffeine was discussed more often in the context of problems than as being desirable, regardless of whether colas were mentioned. The complexion of the positive and negative information could be explored by calculating likely public opinion using equations 1 without the c.CC term. These simpler equations were consistently used in the past for studies relating the mass media with opinion (Fan, 1988; Tims, Fan, and Freeman, in press; Fan and Tims, 1989; Fan and McAvoy, 1989). Using the CWC method of Fan and McAvoy (1989) it was possible to estimate that the projected opinion -- and hence the information structure -- was fairly constant and overwhelmingly against the consumption of caffeine (Fig. 4). The anti-caffeine messages were consistently higher (Fig. 4) for FiB. 3, being closer to 90 percent than the 80 percent for the AP stories for Fig. l. The differences between the two lines could either be due to the requirement of cola mentions in Fig. l or the fact that the AP is systematically slightly less sympathetic to caffeine than the other news sources.

Besides including an extra c.CC term, the projection in Fig. 2 uses the same AP story scores but a different constant k. The value was k = 0.037 per AP paragraph per day for purchases (Fig. 2) and k = 1.0 per AP paragraph per day for opinion (Fig. 4). The much larger value in Fig. 4 suggests that people are much more ready to start thinking that caffeine is bad than to start purchasing caffeine-free drinks.

An additional question is whether the decreased discussion of caffeine in colas after 1984 is due to less discussion of colas and soft drinks in general. To explore this possibility, another retrieval was made in which 2000 out of 7306 AP stories on soft drinks or colas were retrieved at random from January l, 1977 to August 7, 1989. All text within 50 words of one of these key words was retrieved (2,600,000 characters of text) and scored for the number of paragraphs discussing soft drinks or colas regardless whether caffeine was also present. The caffeine stories were ignored because they only constituted 3 percent of the total. Fig. 5 shows that the peak of caffeine discussion from 1982 - 1984 did not correspond to the major peaks of soft drink discussion in 1985 - 1987 when other issues were discussed, such as the introduction of new Coke and lawsuits involving the major manufacturers.

DISCUSSION

This study has examined the effect of the news media on consumer behavior and was greatly facilitated by the ability to retrieve large amounts of relevant information from electronic databases. Such retrievals are more complete than is possible using manual searches because the computer, with no loss of attention, can easily scan for words appearing within the contents of a story instead of merely in the headline. For this analysis, 4020 stories containing 4,700,000 characters of text were retrieved and analyzed by computer, the only practical method for evaluating such a large quantity of text. The new method of successive filtrations is highly flexible since the user can enter customized dictionaries and rules for individual studies. Using a fixed set of instructions. the computer has the advantage of consistency and uniformity, an advantage which compensates for its inability to detect some subtleties in language usage that only a human can catch.

The computer content scores were entered into the mathematical model of ideodynamics to give consumption time trends which are close to actual market shares. The data from Fig. 1 used for the calculations included both exogenous reports on the desirability of caffeine and endogenous stories on the public's own behavior in terms of purchases of caffeine-free colas. The computations were made every 24-hours, thereby avoiding aggregation errors when data are pooled into quarters or years. The calculations can be performed throughout a time period when information important to the consumer is available.

The ideodynamic model includes only two parameters- and is therefore suitable for rigorous testing using a reasonable number of data points. In fact, since both parameters have already been fixed, a good test would involve retrieving more stories into the future and seeing if the computer text analysis can yield scores which still give accurate market shares without altering either constant.

Even for the data in this paper, no changes in the parameters can change the time at which the public should initially have become interested in caffeine-free colas. With little news prior to 1980, the model only permitted public preference for the caffeinated variety.

Then, with the increase in cola news in 1980 - 1984, the model insists on an increase in interest in caffeine-free colas. The only question is how much. After 1984, news coverage relevant to colas diminished so the main question is the extent to which caffeine-free consumers have inherent preferences which will push them back to drinking the caffeinated drink. Depending on this pressure governed by constant c, the decrease in cola consumption will be more or less great.

The model uses three important assumptions which merit some discussion. In the first place, the model should only work if most of the important persuasive information is used in the calculation. Thus a more accurate calculation would require inclusion of persuasive information from marketing and advertising.

FIGURE 3

INFORMATIONAL PRESSURES RELEVANT TO THE PROBLEMS OF CAFFEINE IN ALL STORIES ON CAFFEINE

FIGURE 4

OPINION THAT CAFFEINE IS BAD PREDICTED FROM SOFT DRINK STORIES OR FROM ALL CAFFEINE STORIES

FIGURE 5

PARAGRAPHS DISCUSSING SOFT DRINKS AND COLAS REGARDLESS OF WHETHER CAFFEINE IS MENTIONED

Another important assumption was that the same people were studied for the entire 12 years from 1977 to 1989 and that they only chose between the caffeinated and caffeine-free versions of a cola. However, from 1981 to 1988, there has been a continual increase in the sales of colas so the question is whether there are substantial numbers of new drinkers and, if so, how they make their choices. Also, a cola consumer might switch not to a diet variety but to an alternative drink like a lemon-lime soft drink. The estimates in this paper will be inaccurate if these other factors are very important.

Finally, it is assumed that caffeine-free colas are as readily available as the caffeinated variety.

REFERENCES

Alba, J. W. and H. Marmorstein (1987). The effects of frequency knowledge on consumer decision making. Journal of Consumer Research, 14 (June), 14-25.

Dearing, J. W. and E. M. Rogers (1988). The agenda-setting process for the issue of AIDS. Paper presented at the Annual Conference of the International Communication Association.

Fan, D. P. (1988). Predictions of public opinion from the mass media: Computer content analysis and mathematical modeling. New York: Greenwood Press.

Fan, D. P. and G. McAvoy (1989). Predictions of public opinion on the spread of the disease of AIDS: Introduction of new computer methodologies. Journal of Sex Research, 26, 159- 187.

Fan, D. P. and A. R. Tims (1989). The impact of the news media on public opinion: American presidential election 1987-1988. International Journal of Public Opinion Research, 1, 151-163.

Iyengar, S. and D. Kinder (1987). News that matters: Television and american opinion. Chicago: University of Chicago Press.

Keller, K. L. and R. Staelin (1989). Assessing biases in measuring decision effectiveness and information overload. Journal of Consumer Research, 15 (March), 504-508.

Maxwell, J. C. (1989). Annual soft drink report. Beverage Industry, (March), magazine insert.

Meyer, R. J. and E. J. Johnson (1989). Information overload and the nonrobustness of linear models: A comment on Keller and Staelin. Journal of Consumer Research, 15 (March), 498-503.

Page, B., R. Shapiro, and G. Dempsey (1987). What moves public opinion? American Political Science Review, 81, 23-43.

Tims, A. R., D. P. Fan, and J. R. Freeman (in press). The cultivation of consumer confidence: A longitudinal analysis of news media influence on consumer sentiment. Advances in Consumer Research, 16.

----------------------------------------