An Experimental Investigation Concerning the Comparative Influence of Mtv and Radio on Consumer Market Responses to New Music

Lori Baldwin, Florida State University
Richard Mizerski, Florida State University
ABSTRACT - MTV is a recent media phenomenon that has been attributed almost mystical powers in selling youth on new rock-oriented music. It is often suggested that MTV will soon replace radio as the most influential music medium. This study experimentally examines the proposed differential influence of MTV over radio on dimensions that are suggested to be important for selling new music - song title recall and recognition, lyric recognition, affect toward and intention to purchase recorded songs. Although MTV was found to provide significantly improved awareness, its differential impact on affect and intention was less clear.
Lori Baldwin and Richard Mizerski (1985) ,"An Experimental Investigation Concerning the Comparative Influence of Mtv and Radio on Consumer Market Responses to New Music", in NA - Advances in Consumer Research Volume 12, eds. Elizabeth C. Hirschman and Moris B. Holbrook, Provo, UT : Association for Consumer Research, Pages: 476-481.

Radio airplay has been traditionally considered the primary force behind record sales. However, many of today's new songs are climbing the charts after receiving little or no radio airplay. MTV has been acknowledged as having a major effect on sales. The following provide examples of typical industry comments:

"MTV's proving to be an influential alternative to radio in making the public aware of new music."

Joel Denver, Radio & Records Magazine

"Customers mention the video clip they saw on MTV when they ask for a record."

CBS Records Representative

"A recent nationwide survey by Billboard Magazine found record stores reporting sales increases of 15-208 for acts shown on MTV, particularly new recording artists."

Wall Street Journal

As with any cable channel, much of MTV's attractiveness lies in its ability to effectively reach a specific market segment who is very interested in, and attentive to, its programming content. The innovativeness of the programming is especially appealing due to the presumed stagnant nature of popular radio formats (Radio & Records November 18, 1982). Many formats have been accused of playing it "too safe" by limiting their play lists and exhausting top sellers through overplay. MTV "can afford a less frantic schedule . . . run(ning) about 750 titles in the library, with about 500 in rotation at any one time" (Les Garland, Radio & Records).

In addition, the innovative nature of the medium itself is appealing to the young audience. Music television has gone beyond "Midnight Special" and "American Bandstand," incorporating story lines and superior presentations--MTV is visual radio.

Combining the principles of radio with the added benefit of seeing the artist perform is thought to affect a stronger and longer-lasting impress - Tommy Matola, President of Champion Entertainment, Inc., suggests that today's society is a video-oriented one, due in part to dependence on TV and the popularity of video games. This, coupled with the popularity of music among youth, provides an environment which is very conducive to a medium such as MTV.

Also, there is some evidence from basic research that visual stimuli may be superior to audio in terms of developing affect, recall and recognition--all of which are suggested to affect the purchase of rock music records or tapes. Nonetheless, it is surprising to note that virtually no empirical study has been conducted to validate the intuitive feelings of the industry.


Numerous areas of marketing research including learning experiments, recall and recognition studies, attitude research, and consumer judgment experiments involving visual versus verbal stimuli present evidence which directly relates to the issue at hand.

Effect of Visual Versus Verbal Stimuli

In an experiment conducted by Mitchell and Olson (1981), visual stimuli were found to be more influential than verbal information in encouraging inference about a product. Furthermore, the presence of pictorial stimuli more actively predisposed knowledge constructs and memory schemata.

Other studies have indicated that visual memory is superior to verbal memory. Brainerd, Desrochers, and Howe (1981) found that exposure to pictorial stimuli resulted in better retention of message content than exposure to word stimuli. The authors explained these findings in terms of Dilley and Pavio's (1968) hypothesis that pictorial stimuli enhance recall because they are "particularly effective in evoking sensory images which could mediate response recall."

Based upon findings that visual memory is superior to verbal memory (e.g., Erdelyi and Becker, 1974; Lippman and Shanahan, 1973; Pavio and Csapo, 1969), Lutz and Lutz (1977) examined the effect of interaction within pictorial stimuli. In a study of recall of Yellow Page advertisements, Lutz and Lutz utilized two types of pictorial stimuli. The first consisted of an interactive image which integrated the name of a brand (verbal) and a picture representing the product. Picture interaction was considered present when both the brand and product picture were shown together in interaction. The other depicted the brand representation separately from the verbal form.

The interactive imagery experimental group recalled more brand names than both the non-interactive imagery and control groups. The authors concluded that "an interactive (visual) image facilitates better recall than a noninteractive image, presumably by increasing the concreteness of the material to be learned" and "the more pictorial the interaction, the more facilitative the mediating image."

Gail McKoon (1980) conducted two experiments examining memory representation of pictorial interaction. "Priming," or speeding item-recognition was the technique used to investigate memory structure. McKoon found that the tine to recognize a target part of a picture was primed (speeded) when two parts of the picture were interacting. In other words, elements of pictorial stimuli were recognized more quickly when they were seen interacting with other elements of the stimuli (e.g., two characters shaking hands).

Reasons for Differences in Verbal and Visual Processing

The effect of pictorial stimuLi has also been examined in the context of information processing. Morris Holbrook and William Moore (1981) posited that differences in judgmental responses between verbal and pictorial stimuli are a result of differences in the encoding of the two types of stimuli.

Holbrook and Moore's research presented numerous arguments which are applicable to the present study. The first concerns the validity of the "verbal additive" paradigm in representing consumer judgements. The verbal additive paradigm has evolved from techniques such as multi-attribute attitude models. These models employ verbal rating scales, combining attribute evaluations and "weighting" techniques to produce predictive indices of brand preference. The verbal additive paradigm is based upon the assumption that the consumer processes information verbally and combines "cues" (bits of information) additively.

Holbrook and Moore suggest that when consumers evaluate "aesthetic" products (whose attributes are multisensory and emotive in nature, such as music), another sort of evaluative processing takes place. Here, cues are encoded simultaneously rather than additively, and the judgement effect of one cue is dependent on other cues. Product evaluations are seen to be dependent on interactions between product features. For example, an evaluation of an article of clothing is dependent on the evaluation of the neckline combined with the evaluation of the waistline, length, etc. In a similar vein, the evaluation of a musical recording may be dependent on the combined effects of the tempo, vocals, etc. These "feature interactions" result in an evaluation in which cues are combined configurally (simultaneously) as well as additively. This approach to determining product evaluation is termed "representational." Holbrook and Moore contend that the representational approach is superior to the verbal additive in identifying judgmental evaluations of aesthetic products (such as clothing, food, and (music.)

The relationship of the representational approach to the present area of interest lies in its application to the encoding of visual stimuli. Holbrook and Moore (1981) suggest that for products with sensory appeal, pictorial stimuli will be superior to verbal stimuli in encouraging cue configurality, resulting in affective evaluation.

Differences between the processing of pictorial and verbal information have received considerable support from psychological literature on three facets of information processing. The first involves the "dual-coding" hypothesis (e.g., Pavio, 1971; Pavio and Begg, 1974) which proposes that there are two independent cognitive systems. Pictures tend to be perceived, stored, and processed simultaneously in an imagery system, while words are received and processed sequentially in a separate verbal system.

A second body of literature indicates that differences in pictorial and verbal stimuli are dependent on specialized functioning of the right and left hemispheres of the brain (e.g., Anderson, Garrison, and Anderson, 1979; Geschwind, 1979; McGee, 1979). These studies suggest that the left brain dominates the verbal system, while the right brain is primarily responsible for the imagery system. The analytical and logical thought processing tends to be centered in the left brain, while the global/holistic processing is more likely to occur in the right brain.

Finally, factor analytical studies conducted by Das, Kirby, and Jarman (1979) supported the proposition that task-related abilities (i.e., dealing with words versus pictures) depend upon successive (additive) versus simultaneous modes of information processing. The studies showed that verbal abilities cluster together, presumably due to their dependence on simultaneous processing.

An Alternative Perspective--Verbal Better Than Visual

Findings from the earlier-cited studies have received some resistance from other researchers. Elizabeth Loftus, considered an authority on the subject of memory, cites differences in time of response and memory storage between iconic (visual) and echoic (audio) stimuli. Response to light is thought to occur in 180 milliseconds, while response to sound is quicker at 140 milliseconds. Furthermore, Loftus suggests that images fade more quickly from iconic memory than echoic memory (Ries and Trout. 1983).

Music Research

Much of the music research suggests that music has a strong effect on a listener's affective state. In a study of gratifications and expectations associated with popular music consumption, Gantz, Gartenberg, Pearson and Schiller (1980) determined five major areas of musical consumPtion effects. One of the categories, "Effects on an Individual's Affective State," had "relaxes and calms" and "makes one feel happy, good, or excited" as the two most Prevalent responses within the affect category.

Coker (1972) suggested that the listener interacts with the music as he listens (emotional involvement). Coker further proposes that music is made "aesthetically meaningful" by the individual in much the came way that social interaction becomes meaningful, suggesting parallels between the process of music appreciation and the process of interaction between members of a social group.

In summary, the diverse literature from marketing, psychology and music tends to show that visual and particularly audio along with visual stimuli, are more influential than audio stimuli over a wide range of responses that the recording industry would like to influence. These responses include recall, recognition and affect toward the music of interest. Although purchase intentions have not yet been addressed, it would seem to follow that the proposed superiority of visual and audio stimuli (over audio stimuli only) on these awareness and affective responses would carry over to prompting stronger intentions to purchase as well. After input from recording industry marketing representatives, several consumer responses are of particular business interest. These responses are hypothesized to be differentially effected by MTV, as compared to radio presentations of the same musical material.


1. The correct recall of song titles will be greater for subjects exposed to an audio and visual stimuli (MTV), than for subjects receiving only audio (radio surrogate).

2. The correct recognition of song titles will be greater for subjects exposed to an audio and visual stimuli (MTV), than for subjects receiving only audio (radio surrogate).

3. The correct recognition of song lyrics will be greater for subjects exposed to an audio and visual stimuli (MTV), than for subjects receiving only audio (radio surrogate).

4. Positive affect toward a song will be greater for subjects exposed to an audio and visual stimuli (MTV), than for subjects receiving only audio (radio surrogate).

5. The intention to purchase a song will be greater for subjects exposed to an audio and visual stimuli (MTV), than for subjects receiving only audio (radio surrogate).


Selection of Treatment

A July, 1983 segment of the "Basement Tapes," an MTV program that previews new music in the late evening, was used as the stimulus for both the audiovisual and the audio treatments. The use of an actual MTV production was felt to provide a more realistic presentation than a treatment produced by the investigators. The format of the "Basement Tapes" readily lent itself to the experiment in that it provides both an audio (unlike regular MTV programs), and a visual superimposed identification of the song title before and after each song. Therefore, the audio treatment would provide an identical exposure including song identification, without overdubbing.

All of the songs in the "Basement Tapes" were virtually unknown before their exposure on the program. Post-test inquiry showed that none of the subjects had remembered seeing them previous to the experiment. Thus, there should be no confounding due to familiarity with the treatment songs.

The songs from the program covered the most common types of videos played on MTV (and radio for that matter), and included "in-concert," and "concept" themes. The visual production quality appears to be comparable to songs featured on regular MTV programming. The segment chosen featured six new songs, along with commercials, and was pretested in the audiovisual format. This was done to make sure that none of the songs would be considered too good or too bad in order to alleviate potential "floor" or "ceiling" effects on measures in the main experiment. Subjects in the pretest were similar to respondents used i" the main experiment.

Experimental Procedure

Groups consisting of between six and eight subjects were exposed to either an audiovisual presentation of the MTV "Basement Tapes," or an audio-only treatment. The facility used was a focus group room (with all mirrors covered) that was configured like a typical living room. The respondents were seated on chairs and a couch in a semicircular pattern in front of either a television (audiovisual) or a receiver-tuner. The VCR used to play the program was placed out of sight, and the audio portion of both treatments was played through a single JVC-brand speaker in order to provide identical audio quality across treatments.

After entering the room and choosing a seat, the subjects (all undergraduate students) were told that they were part of a study to evaluate a new form of local music programming that either a local TV station (audiovisual treatment), or radio station (audio treatment) were considering for regular programming. They were further told that they should view/listen to the segment in order to answer some later questions about how they liked the concept, and that they should refrain from talking to one another. However, to make the treatment environment as realistic as possible, the subjects were allowed to read magazines provided on end tables, or to write letters, etc.

The 30-minute treatment was then played to the subjects. After the segment was completed, the respondents were given the first questionnaire that included song title recall and recognition questions. After they completed these measures, the subjects were then re-exposed to 45 seconds of two songs (Song #2 and #6) from the original segment. These songs were judged approximately equal on how much pretest subjects liked the songs. After re-exposure, the subjects completed a second questionnaire that ascertained lyric recognition, affect toward, and their intention to purchase each of the two songs.

It was felt necessary to provide a second exposure to the treatment songs for a more valid indicator of these latter responses. For example, it would be difficult to put much weight on responses concerning affect if the subject did not remember the song.

Dependent Measures

Song Title Recall. Subjects were asked to recall all of the song titles they could remember without the need for giving them in the order presented. Correct recall required the subject to provide at least two of three, or three of four words in the title. This criteria would provide a conservative gauge for recall. One song had a single-word-title, and obviously needed recall of that one word to be judged correct. There were no interpreting differences between coders with intercoder reliability at 1008.

Song Title Recognition. Subjects were then provided a list of ten song titles and asked to circle the six songs presented in the first segment. Therefore, the subjects could have received a score from 0 (no song correctly recognized) to 6 (all six songs correctly recognized).

Song Lyric Recognition. After a 45-second re-exposure to each of two selected songs, the subjects were asked to circle one of four lines of lyrics that were featured in each song. Choice of the correct line reflected correct lyric recognition.

Affect Toward the Song. Items from Holbrook and Huber's (1979) Index of Global Evaluation were used to gauge subjects' affect toward the two re -exposed songs. This index employs a list of eight bipolar adjectives, separated by a 7-point scale, that has been suggested to be a more stable and appropriate measure of subjects' affect toward aesthetic stimuli. The responses were summed across the eight adjectives (transformed to account for opposing directions of some items) to provide a theoretical range from 8 (low affect) to 56 (high affect).

Intention to Purchase. Finally, each subject was asked to rate the statement, "I would purchase this song the next time I shop for music," along a 7-point scale anchored by Strongly Agree (1), and Strongly Disagree (7).


Because the order of the songs were not varied in their presentation, order effects are probably evident in the results. However, the thrust of this study is to compare the differential influence of MTV to an audio/radio presentation, rather than to examine absolute ratings of songs.

Title Recall

The first hypothesis proposes that subjects in the audiovisual treatment will provide significantly more correct recall of song titles than individuals in the audio treatment. The number recalled for each treatment, for each song. is shown in Table 1.



A larger proportion of audiovisual subjects correctly recalled each song title, with the difference significant for song 5 (X2 = 10.38, 2df, p < .001), and marginally significant (X2 = 3.34, 1df, p < .067) for song 6. a e consistency in the direction, and the fact that two out of six songs showed the differences to be significant, moderately supports this first hypothesis. [A log linear analysis was not available at the submission deadline, but will be supplied when it is up on the system in May, 1984. The present results should provide a reasonable approximation.]

Title Recognition

The second hypothesis states that the audiovisual treatment group will provide greater recognition scores than the audio-only subjects. Table 2 shows the mean scores for each group, and reveals that the audiovisual subjects did score higher in terms of correctly recognizing more of the song titles in the stimulus. Because of the metric quality of the data, a t-test was used to ascertain if the differences were significant. Given the significant differences in the two population variances (F = 2.99, p c .001), the pooled variance estimate was used. The results of the t-test show that the audiovisual groups' recognition scores were statistically significant (t = 3.67, 93df, p < .001), and provides strong support for the second hypothesis.



Song Lyric Recognition

The third hypothesis proposes that the audiovisual group will also provide more correct song lyric recognition for the two songs re-exposed to the subjects. Table 3 provides the proportion of correct responses for each treatment group for the two songs. The audiovisual group provided more correct recognition of lyrics for both songs. However, these differences were statistically significant for only the first re-exposed song (t = 4.38, 1df, p < .036). Therefore, the third hypothesis can be considered only partially supported.



Affect Toward the Song

The fourth hypothesis projected that the audiovisual group would exhibit more favorable affect toward each of the two songs. Tables 4 and 5 present the mean scores for each treatment group for each song. The means are in the predicted direction for only the first re-exposed song (#2). Here, the audiovisual group tended to have stronger, positive affect toward this song than the audio group. The audio group reported higher mean affect toward the second song (#6). However, applying t-tests to the data showed that the differences reached statistical significance for the first song only (t = 1.73, 91df, p < .043), providing partial support for hypothesis four.





Intention to Purchase Songs

The final hypothesis stated that the audiovisual group should provide stronger intention to purchase each of the two songs. Tables 6 and 7 show the mean scores for each group for each re-exposed song. As with the previous affect measure, a significant difference was found for only the first song re-exposed (#6). Here, the audiovisual group provided stronger intention to purchase (t = 1.89, 91df, p < .03). These findings provide partial support for the last hypothesis.






Subjects provided the audiovisual stimulus (MTV) tended to provide more accurate song title recall and recognition, as well as song-lyric recognition. These responses were suggested as being important, by recording company representatives, for requesting a song for airplay or purchase.

However, the findings are less conclusive when viewing measures of affect toward, and intention to purchase two re-exposed songs. The last two hypotheses were only partially supported. The audio (radio surrogate) treatment group showed stronger (but not significant) affect toward and stronger (but not significant) intention to purchase the second song. On the other hand, the audiovisual group (MTV) had significantly stronger affect toward the re-exposed first song and significantly greater intention to purchase the first song. Therefore, hypotheses four and five were only supported by findings for one of the two songs tested for affect and intention to purchase.

There are several cautions that should be noted before some final conclusion can be made concerning the results. First, the sample of stimuli cannot be presented as statistically representative of rock music on MTV or radio. However, given the dynamic nature of rock music, "statistically representative" may be valid for such a very short time, the value of that perspective may be moot.

In addition, the sample was limited to a specific population (college students at a Southeastern university). Still, that group is a major audience of both MTV and rock-oriented radio.

Perhaps covariates such as "musical involvement," or previous propensity to purchase rock music recordings may help explain the inconsistency in affect and intention to purchase, although one would expect these factors to "wash out" given the sample size and the random subject allocation to each treatment.

Finally, the limited number of exposures (once for title recall and recognition; twice for lyric recognition, affect and intention to purchase) offers, at best, a short-run indicator of differential effects between the two media types. Yet, it may be the issue of relative impact of exposure that may offer an insight to the inconsistency of results.

MTV appears to much more strongly influence indicators of awareness with as little as one exposure, and may have its major impact on this initial aspect of consumer decision-making. The simultaneous presentation of audio and visual cues clearly appear to work better on this area. This impact is not as strongly evidenced for higher level cognitive processes, such as evaluation and intention to act, which may require more time, and input from other sources such as peer groups and opinion leaders.

Nonetheless, the importance of obtaining initial awareness and memorability is critical for introducing new music. Perhaps the anecdotal evidence of the recording industry--that MTV significantly increased the distribution of new groups and music types--is reflected in the findings.

In short, the combination of audio and visual cues provided by MTV may do more for introducing music, especially in the initial stages of a song's life cycle. Radio, with its local flavor (opinion leader surrogate?) and generally higher frequency of playing a song, may still be necessary for a successful record.


