Internal Validity, External Validity and the Passage of Time As Issues in Developing Advertising Effectiveness Measures

Michael L. Rothschild, University of Wisconsin
Michael J. Houston, University of Wisconsin
ABSTRACT - Experimental research needs to establish tradeoffs between strict control of variables (internal validity) and appropriate environmental noise (external validity). As part of the latter one must consider the passage of time which enhances learning and forgetting. As part of the former one must consider the appropriate ordering of response variables.
This paper presents a methodology which has an appropriate balance among these issues. Results of a pretest are presented which show the effects of advertising on awareness, behavior and the effects of the methodology on reactiveness measures. Key independent variables are repetition of advertising (to simulate the passage of time) and order of awareness and behavior measures.


A major problem in laboratory experimental research concerns demand artifacts and reactiveness of treatment variables within specific research scenarios. These issues are relevant to effect external validity (i.e., whether the results are generalizable beyond the experimental conditions).

A force which often acts to minimize external validity is a striving for internal validity. Here one attempts to develop conditions which allow the researcher to state the achieved result as a function of a well defined variable or set of variables. In order to achieve internal validity or control, one may need to sacrifice the existence of environmental variables which contribute to external validity.

In the area of advertising effects research a number of tradeoffs between internal and external validity are often made. These tradeoffs reflect the pragmatic nature of advertising and the high levels of noise generally surrounding it. Merely testing one advertising variable in a sterile laboratory may give results which bear no resemblance to reality.

In pragmatic commercial testing, day-after recall tests are greatly lacking in internal validity while the Schwerin and ASI tests lack in external validity. Most theoretical research to date has achieved internal validity (strict control over variables) at the expense of external validity. An extreme example is the work of Zajonc (1968).

A third set of variables must also be considered. In the "real world" the processes which are instigated by advertising (the onset of awareness, knowledge, attitude and/ or behavior) take place over a period of time. In order to capture these processes in a laboratory experiment, one must speed up the dynamics. Effects will be very weak if they are measured after a short period of real time; a long period of real time cannot be practically observed in the laboratory.

An alternative is to capture a static moment of the process which may be externally valid in terms of the moment but does not represent the more important passage of time. Another alternative is to compress the passage of time in order to observe the results of a longer term process. The logic for doing so rests on the premise that if there is an effect due to the independent variable, it can be magnified or enhanced by speeding up the process; if there is no effect then no amount of speedup will show this effect. This logic is found in the biological sciences where, for example, rats are fed large amounts of suspected carcinogenic substances to see if cancer results. If the substance is a carcinogen then cancer will result; if not, no amount of consumption will induce cancer to occur. If one has established appropriate safeguards to maintain internal and external validity, then one can safely speed up the advertising process in a manner similar to that used in oncology research.

In developing a balance between internal validity, external validity, and the speedup of the process there are several variables to consider. This paper considers two such variables: 1) repetition level; 2) order of measurement.

Repetition Level

In the "real world" commercials are rarely seen just once; any such limited exposure would rarely have any impact. One attribute of the Idealized Measurement Procedure (Robinson, 1968) concerns the scope of the advertising and recommends many repetitions in the test. To achieve internal validity one would want several repetitions of test messages while controlling other variables. External validity would call for several repetitions in order to simulate the campaign but a limit on repetitions so as not to create demand artifacts Speeding up the advertising process also would suggest the use of several repetitions. In terms of repetition, then, one would want enough exposures to simulate a campaign and the passage of time but not so many as to induce reactiveness.

There are two repetition effects to consider. First, it has been shown several times that repetition has a differential effect on awareness, knowledge, attitude and behavior (Ray, et al., 1973). It is now fairly clear that a large advertising budget can create awareness. Since awareness is primarily a function of repetition, a test of message effectiveness should concentrate on knowledge, attitude and behavior. To do so one must use sufficient repetitions to impact on these higher order responses. In doing so, awareness will show high levels across all treatments but differential effects will be seen in the higher order responses.

The second issue concerns repetition effects under conditions of differential environmental noise (e.g., program content, competing messages, household distractions). It can be seen that experiments with weak environmental noise yields strong repetition effects (Zajonc, 1968; Miller, Mazis and Wright; 1971), that experiments with moderate environmental noise yield moderate repetition effects (Ray, et al., 1973) and that experiments with strong environmental noise yield weak repetition effects (Rothschild, unpublished; Houston & Rothschild, 1978). Generally one can say that external validity increases with noise.

Because of this relationship one would want to do research on repetition effects in an environment with low to moderate noise so that these could be observed. Conversely, to do research on effectiveness across messages one would want an externally valid environment with moderate to high noise levels where some level of repetition is representative of an ongoing campaign. There exist sufficient data to deal with the former case; the present discussion is concerned with the latter.

Order of Measurement

Order of measurement is important in a different way. Just as learning takes place with repetition, the passage of time is one variable which impacts on forgetting. A measurement which takes place early in a stream of

measures will show less forgetting than a later one (ceteris paribus) simply due to the differing amount of elapsed time between the independent treatment variable and each of the two measures. Some dependent variables are sensitive to treatments and build up rapidly (awareness is typically one such variable); they should therefore be given more time to decay so that differential effects of strong and weak treatments can be observed (see Figure la). Given some decay one can then more easily see differences between treatments. Other dependent variables will be less sensitive to treatments (behavior is typically one such variable) and should therefore be measured early when any differences between strong and weak treatments which may exist have not yet been eroded (see Figure 1b).



One must, of course, temper this ordering by considering the impact which the early measures have on later measures. That is, will an awareness measure be more likely to impact on a later behavior measure or will the reverse hold?

With the above issues in mind, the authors have developed a research procedure for measuring advertising effects which attempts to capture both internal and external validity as well as allow for a compressed passage of time. The method is designed to test differences between television commercials at each level of the hierarchy of effects. The procedure was utilized to test the effectiveness of four different warning messages in television commercials for over-the-counter antacid drugs.

The purpose of the remainder of this paper is two-fold. First, the specific components and sequence of the overall research procedure are described. Second, a framework is presented for pretesting the procedure such that levels of reactiveness and demand artifacts can be examined simultaneously with the empirical issues relating to repetition level and placement of recall measures. Findings from a limited pretest of the procedure are presented.


This section first describes the overall method of testing advertising effects and then describes the pretest method for determining appropriate repetition levels and measurement ordering. The latter is used to assess reactivity of the former; the later is the pretest which was used to fine tune the method used in the research project (Houston and Rothschild, 1978).

Advertising Laboratory

The basic design is a post-test-only with control group design, as shown below:


The four treatments involve four warning messages. The control group includes subjects who were exposed to and engaged in all phases of the experiment but did not see any relevant messages.

Subjects were recruited for the alleged purpose of evaluating the late night news programs of three television stations. Upon their arrival at the experimental site, subjects were ushered into a living-room setting where they were told that they would be shown a videotape consisting of segments of three news programs. (Imbedded within the tape were several repetitions of the messages to be tested, along with commercials for other products.)

After viewing the tape, subjects completed a questionnaire concerning the news shows. This distraction task maintained the alleged purpose of the study and also served as a simulation of nonshopping events which would normally occur between exposure to commercials and behavior in the marketplace.

Subjects were next offered an opportunity to purchase some commonly used products. One subject at a time was placed in a purchase environment where each could be observed as they had the opportunity to read labels, compare products, and purchase products. This simulated store-like setting contained 5 major brands of antacids, 5 brands of each of the other products advertised in the videotape, and 5 brands from two other unrelated product classes. Subjects had the opportunity to buy one brand from each product class offered at a 40% price reduction. Such a price reduction was used to encourage purchase behavior. In this setting, subjects were observed in terms of the amount of time spent reading antacid labels, total time spent in the market, the number of antacid brands handled, and the antacid brand, if any, purchased. In this way, the differential effects of the warning messages on label reading and purchase behavior could be measured.

Subjects were next given an unaided recall measure in which they were asked to recall anything they remembered about the tape just viewed. Unaided recall would serve as one type of effect by which warning messages could be compared. Specifically, differences in recall of Alka Seltzer commercials across different warnings were examined. Subjects were also measured in terms of a series of knowledge items. (Results of this experiment can be found in Houston and Rothschild, 1978).

The purpose of the experiment was to test warning messages to see if awareness and knowledge could be maximized among that population for whom there were contra-indications while not affecting purchase behavior among the population for whom there was no contraindication.

Offering the discounted price on the product created the most adverse conditions for observing such behavior.

The Pretest for Appropriate Repetition Level, Order of Measurement and Reactivity

The pretest of the advertising laboratory was conducted using four through seven repetitions of the message within the thirty-minute videotape. The test began at a minimum of four exposures because an earlier test of a similar method showed no effects with one through three repetitions, even though other acceptable data were derived from the study (Rothschild and Houston, 1977). The unsuccessful earlier test (and the current one) were among the first known to the authors to use program content with imbedded actual commercials in the laboratory. Most prior work had simulated television commercials by using print ads shown as slides on a rear screen projector with no program content. The manipulation of one to three repetitions in a thirty-minute program had previously proven to be too subtle to induce effects on response variables (Rothschild, unpublished).

To test the different repetition levels, subjects were randomly assigned to a treatment of the experiment previously described. In the pretest, the message was held constant as repetition levels changed; in the actual experiment, repetition level was constant and several messages were tested. Dependent variables were awareness, knowledge, behavior and reactivity to the design.

Order of measurement concerned the optimum ordering of dependent variables measured as discussed above. Crucial here was the relative placement of unaided recall and in- store behavior measures. Subjects were randomly assigned to treatments which kept or reversed the ordering of these variables.

Reactivity was measured in several ways. Upon completion of the experimental procedures, subjects were asked a series of open-ended questions which were then coded for tabulation purposes:

1.  To think back to the time when they were viewing the videotape and to recall why they thought they were being shown the tape.

2.  To think back to the time when they were filling out the distraction questionnaire and to recall what they thought the researchers were trying to learn from the questionnaire.

3.  To think back to the time when they were in the store and to recall what they thought the researchers were trying to learn by having them go through our store.

4.  What went through their minds when they saw the specific messages being tested.

5.  How many test commercials did they think they saw.

A second set of subjects were given a different reactiveness test. These subjects saw the videotape and were immediately asked to state what was being tested. A third group of subjects saw the videotape, completed the distraction task and then were asked what they thought was being tested.

The first set of subjects were recruited from the general public via telephone solicitation using the procedures implemented in the actual project and were paid for participating (n1 = 48). The second and third groups were a convenience sample of students in an introductory marketing class (n2 = 30; n3 = 29).


The results are presented in three parts:

1.  Repetition effects.

2.  Order of measurement effects.

3.  Reactiveness (to the repetition) effects.

Repetition Effects

There are in general no effects due to varying the repetition level. Awareness of the message is very high (see Table 1), behavior appears to be low (Table 2).



Although purchase behavior appears to be low, it is very close to its actual purchase rate. In the laboratory, seven of forty-nine subjects purchased (14.3%); Alka Seltzer's actual market share is about 30% and about half the population uses the product class (ergo, 15% of the population uses the product).



Subjects were also asked if the subject brand would be among their top three choices if they were to make a purchase (forced purchase intention). Again, there is no repetition effect as shown in Table 3.



Recall of the message was also tested among the two groups of subjects who went through a partial treatment and reactiveness test (see Table 4). Results here are similar to those shown in Table 1.



Order of Measurement Effects

There are weak but consistent effects which would suggest that the store (behavior) should precede the unaided recall. As discussed above, when responses are consistently extremely high or extremely low it is necessary to separate treatment effects when possible. There is no order effect on recall (Table 5) but there is a weak (u.s.) order effect on behavior (Tables 6 and 7).







Reactiveness of Repetition Levels

The reactiveness measures show that a large number of subjects felt the research concerned, in part or in whole the commercials shown as part of the tape. At first glance, such a result may seem damaging. However, the authors were neither surprised nor disturbed by this result. The reactiveness measures were, in effect, designed to be reactive themselves. These measures effectively told subjects that the research concerned something other than what was indicated, i.e., the news programs, and asked the subjects to guess what it was. Since the only other things subjects saw besides news were commercials, a simple process of elimination would suggest a commercial test to them. Also, news is rarely tested but commercial testing is fairly common. Finally, subjects were asked reactiveness questions after it was evident (from the store and the final questionnaire) that the research concerned commercials.

Accordingly, the critical response which would indicate subject knowledge of the research purpose was whether Alka Seltzer and/or its warning message was stated. No subjects responded by discussing the subject product or its commercial even though (1) most subjects saw more of this product than any other topic on the videotape; (2) the format of this commercial was different from anything that had ever been broadcast before and was the only unique aspect of the entire videotape and (3) subjects generally responded appropriately to knowledge questions dealing with the unique characteristic of the commercial.

Table 8 shows that although most subjects suspected a commercial test during the viewing of the videotape, none identified Alka Seltzer or its warning as the focus of the study. There was no repetition effect.



As shown in Table 9, the distraction task served its purpose. At this point, fewer people suspected a commercial test than had after the showing of the tape. Again, there was no repetition effect and no mention of Alka Seltzer.



During the store phase of the test, it was obvious that news was not being tested. There were still no responses dealing with Alka Seltzer, or warning messages. There was one response tied to label reading. (The warning message dealt with label reading.) There were no repetition effects. See Table 10.



Next, subjects were asked how many Alka Seltzer commercials they thought they saw. Table 11 shows a remarkable ability to recreate what had been seen. This is consistent with earlier findings by Webb (1979).



Finally, subjects were asked what went through their minds as they saw the commercials (Table 12). Again, there is no mention of Alka Seltzer; only two subjects mentioned warning messages (the only unique portion of the 30 minute tape).Again, there is no repetition effect.



In the interrupted pretest, subjects were asked why they thought they were shown the videotape. Again, many suspected a test of advertising but only two (out of 59) singled out the warning message. The distraction task served its purpose since following it less subjects thought the test was a pure commercial test. In sum, the interrupted pretest data are very similar to the pretest data presented in the tables above.


In order to accurately test the effects of advertising, one must consider internal validity, external validity, and issues relating to temporal processes. A key issue here deals with maximizing external validity while minimizing reactiveness to the treatment. This paper has presented data from a pretest which deals with these issues and has led to a successful experiment.

Other research has shown that as external validity is increased, the effect of increasing repetition levels is dampened considerably. This in turn calls for higher levels of repetition. There is in the current data no evidence that subjects are more likely to uncover the true purpose of the experiment at the higher levels.

More specifically, there seems to be little positive or negative effect in presenting four or seven repetitions of a thirty-second commercial in a thirty-minute treatment videotape. These differences in repetition level result in approximately 7 to 12% of the tape being devoted to treatment. In this range there seems to be little treatment impact.

A second set of findings deal with the order of measurement. Here weak data shows that in testing the effectiveness of several commercials the behavioral measure should precede the awareness measure. Since behavior is low it should be measured before too much decay takes place; since awareness is high, it should be allowed to decay so that differences in strength can form. There is no evidence that either measure has an impact on the other in the context being studied. These two findings are important to advertising effectiveness research. As external validity is sought, reactiveness must be controlled. This has been shown to be possible.


