Clark Leavitt (1975) ,"Theory As a Bridge Between Description and Evaluation of Persuasion", in NA - Advances in Consumer Research Volume 02, eds. Mary Jane Schlinger, Ann Abor, MI : Association for Consumer Research, Pages: 607-614.

[This research was supported by the College of Administrative Science of The Ohio State University.]

[Clark Leavitt is Professor of Marketing at The Ohio State University.]

This paper reviews an advertising testing procedure that presents the advertising in a simple, uncomplicated way with no attempt to mislead the respondent as to the purpose. However, the response format is extensive--being a set of rating scales reported previously. The scales represent a unique set of dimensions along which consumers map commercials and other ads. The paper deals with the problem of establishing decision rules for acceptance of advertising when independent dimensions must be used as the basis. Data are presented to illustrate recommended procedures.

Earlier publications [Wells, W. D., Leavitt, C., & McConville, M. A reaction profile for TV commercials. Journal of Advertising Research, 1971, 2 (6) 11-17. Wells et. al. report the earliest analysis and Leavitt (1970) a further step in developing the scales. Note the publication dates are in reverse order because of different lags.] have described a set of descriptive rating scales for television commercials. Because of their sensitivity and reliability, these scales provided an excellent diagnostic device for comparing both alternative executions and alternative campaigns.

A problem arises, however, in using the scales to provide a more definite evaluation of alternatives. How can the scales be used for quantified predictions of effectiveness? Simply taking the unweighted average of all the scales would give equal weight to each. This arbitrary procedure is hard to defend because certain scales seem to be tapping more important variables than others. For example, most advertisers feel that it is more important to ave personally relevant commercials than to have humorous commercials.

Another approach may be to relate the scales to a theory or model of communication. Assuming the root of the problem is the extreme empirical manner in which the scales were originally developed, a theoretical reference point-although it cannot provide a universal set of weights--can at least provide a sense of direction.

A theory that is useful for this purpose is the process model of persuasion used either implicitly or explicitly by most workers in the field. This model asserts that there are four important aspects to the response to a message that mediates behavioral change. These are:

--attention or arousal

--communication of information

--change in attitude or intention

--retention of effects

When these four-part processes are at optimal levels a message is persuasive according to the process model.

In order to relate the eight empirically derived scales presented previously to this theoretical framework, further statistical analyses of the existing scales were carried out and modifications were made based on these analyses.


First, a factor analysis was performed with the objective of achieving the fewest number of factors that would adequately explain the data (that is, simple structure). Four factors were found that produced the best combination of high loadings and low first-order intercorrelation among items on different factors.

The second step was construction of a balanced set of items for two primary factors. Balance was achieved by adding negative loading words to the original set and equating the total number of items on the two factors. This was intended both to strengthen the factor structure and to avoid yea-saying response bias. The best negative/positive balance that could be achieved was 12 positive and eight negative words because the supply of reliable negative words was limited.


The first factor to emerge in the new analysis was Stimulating. This includes items from three of the original eight scales: Humor, Vigor and Uniqueness (or Amusing, Energetic and Novel).

The second factor was Relevant. It consisted of three more of the original scales: Personal Relevance, Authoritative and Irritation or Disliked.

The third and fourth factors were the Sensuous and Familiar scales, respectively.

The new Stimulating and Relevant Scales were now modified by adding and/ or deleting words so that each consisted of approximately 12 positive and eight negative, words. In addition, the Sensuous scale was broadened by the inclusion of more positive words derived from further testing and a new understanding of this variable. Familiar was left unchanged.

Other changes were made as the result of further testing. The Professional Execution scale seemed unstable and was replaced by words referring to source credibility and realism. These changes were made possible by the fact that the simple structure achieved in the final analysis became a more powerful tool for picking out stable high loading items.


Comparing results to theory produced several insights. First of all the underlying nature of the first two factors became clear. Stimulation seems to parallel attention and the five scales (see Table I) provide scope sufficient to measure all aspects. It would seem likely that this factor would also relate to wearout since dull commercials could be expected to wear out sooner.

The Relevant factor seems to map the essential condition for change in attitude or intention. It is a good presumption that we intend to use relevant objects and that objects are relevant because we intend to use them.

None of the factors seems to bear on the communication aspect. The word informative" is part of the scales but correlates with words like "helpful" and "important for me." Obviously for the average consumer irrelevant information is not really information at all.

This means that the scales are not-sufficient for checking "communication." Apparently if the analyst wishes to determine whether the copy points were clear (even if not important) he will have to use other means. This is confirmed by actual near-zero correlations with such other measures.

The relation of these four perceptual factors to retention is not yet clear. It is to be hoped that testing the same commercial at various points in its exposure history may help clarify this.

Finally, the greater generality of the sensuous scale comes as a surprise. Theory was helpful here also because the lack of fit made the surprise that much clearer. This scale was renamed Gratifying because experience gained in the studies reported here show that it was elevated not only by low-keyed sensuous commercials but also by characters who were idealized in one sense or another and by themes that were folksy, sentimental or confirmed common prejudice or folk wisdom. Taken together, these stimuli suggest a process of cognitive and emotional closure in the viewer.



To this point the development of the Consumer Relevance Profile from the original set of perceptual scales has been a classic case of construct validation. That is, the attempt has been to establish a set of postulated variables as self-consistent or reliable. This is a kind of validity since it meets the first requirement of any presumed real variable - that it exists.

The next question is whether these variables are correctly named. Specifically, does Stimulating relate to other measures of attention and does the Relevant dimension predict persuasion or, at least, does it correlate with other--independent--measures of persuasion?

Following are some demonstrations that address the validity questions. The results are brief but encouraging.

These results don't provide a statistical weighting, of course. Another approach to a more exact definition of effectiveness might be by analyzing patterns of scores that seem to indicate poor performance.

For example, in a sample of 5 radio commercials dealing with shoplifting, the one judged least relevant was most stimulating. This is typical of a case where the entertaining aspects of a commercial seem to overpower the persuasive effects.

The opposite case would be a low stimulating score and a high relevance: commonly produced by a simple announcement format. This is not likely to wear well.

A familiar spokesman can produce average relevance but an increase in Gratifying and in Familiar. This could be a danger signal--watch out for an increase in irritation


Although research here doesn't lead to universally applicable single score, it is moving in that direction. The most general conclusion is that a commercial should be high in both Relevance and Stimulation but not excessively high in one without the other. It may be possible to refine this and possibly quantify it.


A rather dull commercial for a new cereal product was tested when it was first aired and again a year later. As would be expected in the case of a commercial with a heavy schedule, the three largest differences of all the 12 scales were:

                1970   1971

Irritation    169    225

Worn Out  173   212

Familiar      246   283

Perhaps if the commercial had been more stimulating to begin with it would have increased in Familiar alone and not in Irritation.

A second study compared two commercials for the same product early in their career on the air and six months later. The retest was done with a theater audience who checked some of the words used in the Perceptual Scales. Here are the results for words used that were about the same on the original test (or, in a few cases, opposite in direction of response).

                    First Test        Second Test

Words         Falls  Burn        Falls  Burn

Worn out     160   163         15%   24%

Familiar        323   297         34      50

Phony           143    147        22       31

The second test cannot be compared directly to first. However, for these three words the difference in the first test are trivial in two cases and the third case they favor the Falls commercial. The percentage differences are all much larger for Burn in the second test.

This seems to reflect the effects of wearout because Burn was exposed more than Falls and was also quite similar to another commercial for the same product. As in the previous case, here,too, Burn was less stimulating originally and could, therefore, be expected to wear out faster.


An award-winning commercial was judged to do an excellent job of creating awareness of a product change but seemed to place too little stress on product benefit. A new commercial was produced which dealt almost exclusively with the end benefit of using the product.

                         Novel      Personal Relevance

Award winner      292            232

Benefit Oriented   232            328

The new campaign was less novel and creative (less stimulating) but had greater personal relevance. It might be argued that the two commercials performed in tandem with a high degree of effectiveness for introducing a product change.


Three different types of television commercials were adapted to radio. Eight radio commercials distributed among the three types were tested to see which kind of adaptation succeeded best. To determine this, listeners were given both the Perceptual Scales and a buying interest question. The three types of execution were different

Type             Personal Relevance         Buying Interest

Dialogue               323                              78%

Announcer           301                              68%

Drama                  270                              64%

Commercials using a dialogue were better than the standard announcer technique but those using a dramatic format did less well in producing buying interest and relevance. The complications of the dramatic form apparently are harder to get across on TV than on radio.


Three client commercials and two for a new brand of frozen dessert that was competitive to one of the client commercials were tested by a syndicated service. (McCollum-Spielman)

The service administered a shortened version of the Relevance word list in the form of a check list (respondents checked "yes" or "no'). In addition the audience was asked to fill out the service's standard intention-to-buy scale that was always used with new products.

                                    Buying Interest        Relevance

Client type A                         70                      120

Client type B                          58                     114

Competition type A (new)     58                     111

Competition type B (old)       46                      97

Client type C                          46                      90

Results show perfect agreement even though fewer Relevance words were used by the service and the manner of responding was a simple check instead of a five-point rating. New product commercials is an area where the use of an intention-to-buy scale is fairly common and more defensible than with established brands. The fact that buying intention and the Relevance words were in agreement in this situation implies that Relevance might be useful in evaluating new products.


Two gasoline commercials were tested: one corporate--and one product-oriented

                 Personal Relevance

corporate         297

product             348

The product-oriented commercial has a much higher degree of personal relevance for ordinary consumers. The results might have been different if the test had been carried out on stockholders or employees.


The lowest and highest ASI pre-post change scores for 60-second carryout food product commercials were compared. The Personal Relevance averages done with 30 people in the Burnett Laboratory were also lowest and highest for eight 60-second commercials tested.

                       Percent Change    Personal Relevance

pre-post high          13                           341

pre-post low              1                          241


