Methodological Issues in Simulated Shopping Experiments

Douglas M. Stayman, University of California at Berkeley
Michael R. Hagerty, University of California at Berkeley
ABSTRACT - A laboratory experiment is conducted to test for effects of procedural variations in simulated shopping experiments. It is found that different procedures affect a number of measures related to the validity of laboratory experimentation. Most importantly, market share can vary by 20% depending on whether money is physically handed to the subject before shopping. Methods of adjusting the procedures are discussed to improve the accuracy of simulated shopping in both theoretical and applied studies.
[ to cite ]:
Douglas M. Stayman and Michael R. Hagerty (1985) ,"Methodological Issues in Simulated Shopping Experiments", in NA - Advances in Consumer Research Volume 12, eds. Elizabeth C. Hirschman and Moris B. Holbrook, Provo, UT : Association for Consumer Research, Pages: 173-176.

Advances in Consumer Research Volume 12, 1985      Pages 173-176

METHODOLOGICAL ISSUES IN SIMULATED SHOPPING EXPERIMENTS

Douglas M. Stayman, University of California at Berkeley

Michael R. Hagerty, University of California at Berkeley

ABSTRACT -

A laboratory experiment is conducted to test for effects of procedural variations in simulated shopping experiments. It is found that different procedures affect a number of measures related to the validity of laboratory experimentation. Most importantly, market share can vary by 20% depending on whether money is physically handed to the subject before shopping. Methods of adjusting the procedures are discussed to improve the accuracy of simulated shopping in both theoretical and applied studies.

INTRODUCTION

The use of simulated shopping experiments in marketing is widespread. In such an experiment, shoppers are recruited to a laboratory, where they are asked to buy a brand from a replica of a supermarket shelf. It is used in both academic research (Zeithaml, 1982) and applications (Silk and Urban, 1978; McNiven, 1979). Its usefulness stems from the efficiencies in cost, time, and secrecy of the laboratory approach over a full test market. Experimentation also provides the well documented advantages of greater internal validity and minimization of extraneous variables.

Recently, Calder and others (Calder et al., 1981) have argued about differences in the importance and assessment of external validity in theoretical versus applied research. However, in both types of research it is important that the procedures used be reliable and reflective of the situations under study. This is most true for applied Studies where methods must correspond to "real life" in order to yield findings which are directly generalizable to the situation of interest.

The need for correspondence in theoretical research rests more strongly upon the construct validity of the measures used in that the measures must correspond to the theoretical variables of interest. However, to achieve this correspondence requires procedures which control for extraneous influences that may interact with the variables of interest as they occur naturally. Lack of such control allows for experimental effects due to factors other than those intended. For example, a framing effect caused by a particular procedure which does not occur naturally may cause an apparent relationship between two variables which is not present outside of that specialized setting.

In summary, both theoretical and applied studies rely upon realistic procedures. In addition, comparison of results across studies relies upon the use of similar (standard) procedures, which are clearly lacking in much marketing research. The present paper compares different procedures used in simulated shopping experiments to assess both whether they materially affect the results of the research, and, if so, which procedures should be used by researchers. Three measures are collected to test the procedures: l) actual purchase behavior in the simulated market, 2) how well that behavior agrees with another predictor of market share (measuring concurrent validity), and 3) how realistic the shopping experience was perceived by the shoppers (called perceived realism hereafter).

Research Questions

In this experiment two procedures will be manipulated to see whether they impact one or more of the three measures. The first procedure either (1) gives actual cash to the consumer before purchase, or (2) gives only change from the purchase afterward. This is called the "cash in hand/promised change" condition. While most studies do not report which of these procedures is used there are reasons to suspect that they will yield different results. Intuition suggests that actual cash in hand may be perceived as more valuable than a mere promise of change. Certainly the money is more salient. Such a perceptual difference is supported under specific framing conditions by prospect theory (Tversky and Kahneman, 1981) which suggest that subjects actually given cash may perceive their purchases as costing money (losses) whereas those merely receiving change after being told to pretend they have money may perceive the change as receiving money (gains). This leads to the first three hypotheses: HYPOTHESIS l: Subjects receiving cash will buy less expensive goods because of the relative importance and salience of the cash in hand versus merely promised change. HYPOTHESIS 2: Purchases of subjects receiving cash will have greater concurrent validity because actually having cash more closely resembles a realistic situation. HYPOTHESIS 3: Receiving money and purchasing a good will be perceived as more realistic by subjects than a gain of change after pretending to have money.

The second variable that will be manipulated is the number of produces shopped for in the experiment. For this variable the following hypotheses are derived: HYPOTHESIS 4: Selecting a variety of goods is perceived as more realistic than selecting just one good since most shopping trips involve multiple choices. HYPOTHESIS 5: This realism will lead to greater concurrent validity for purchases in the multiple choice situation.

This final hypothesis can be derived from two sources. First, the work of Wright and Kriewall (1980) which suggests that putting a subject in a realistic frame of mind improves predictions. It is suggested that use of multiple produces will help induce this frame on subjects. Second, the multiple product condition should force some of the cognitive strain and time tradeoffs involved in most real shopping situations.

METHOD

Sample

The subjects were 76 members of the marketing department subject pool at a large western university. The pool is composed of undergraduate students enrolled in the introductory course in marketing. Over 85 percent of students in the course participated in the experiment.

Procedure

The experiment used a two by two between-subjects factorial design. The factors manipulated were giving or not giving cash and the number of products to shop for (either one or five as discussed above). The 76 subjects were randomly assigned to one of four experimental conditions, giving a total of 19 subjects per cell

The subjects were run in 5 groups of approximately equal size on two consecutive weekday evenings. Each group was first administered a questionnaire. This included demographic and usage information. It also contained a 4x4 factorial scale to determine preferences for potato chips. Potato chips were used as the test product based on a pre-test questionnaire which identified it as a product which was frequently purchased by students, one for which brand loyalty was not exceptionally high (to allow for experimental effects on choice), and was immediately usable (which maximized the relevance of the shopping decision). Preferences were rated for four prices (32, 37, 43, and 51 cents) and four products (Lays, Laura Scudders, Granny Goose Natural, and Granny Goose Hawaiian). These preferences were used as a measure of concurrent validity for the choice situation. Actual shopping histories were not d as a concurrent validity measure because potato chips are usually limited to the one brand carried by a store. Thus, choice in the real world is too confounded by distribution. Full factorial scales for two other products (soap and tomato juice) were included to mask interest in potato chips .

The second room was similar to the first except that five product categories were displayed. As in the one product room, each brand had two packages displayed and prices marked in front on index cards. Potato chips were arranged in the same order (ant with the same prices) as the one product condition and were closest to where the subjects entered the room to minimize order effects.

The other four products were: soap (3 brands); candy (5 brands); tomato juice (4 brands); and chewing gum (4 brands). All of these products were chosen through a pre-test which showed them to be items often purchased by students. These items (like potato chips) also could be immediately used and so maximized relevance of the decision to the subject.

As the experimenter walked with each subject to the simulated shopping rooms, s/he gave them instructions on the choice situation. Each subject was told that s/he had one dollar to spend and that s/he must purchase one (and only one) product from each category (or just potato chips in the one product situation). Each was also told that s/he would then buy the products(s) with the dollar and receive the change in cash. Each subject was asked to take his/her time and select whichever brand(s) s/he wanted at the prices marked. To keep the cost of goods as constant as possible across the situations, the subjects in the five product condition were told that they would randomly receive two of the five products chosen. They would then buy these two products and receive the appropriate change.

The money factor was manipulated by either actually giving the subject a dollar bill during these directions or saying, "Pretend that you have one dollar to spend." Each experimenter either gave or didn't give a dollar to every other subject taken through the choice situation. It was stressed to all subjects that they would get to keep the product(s) chosen as well as the change from the dollar.

After these instructions were given, the subject entered the simulated shopping room and mate his/her choice(s). The experimenter limed the choices and market them town unobtrusively after they were all made. The experimenter then collected the dollar from the subjects who received it and returned change and the brand(s) chosen. Subjects in the five product condition first were awarded by lottery the two product classes they would receive.

Subjects were then required to fill out a post-choice questionnaire. This included questions ranking the realism of the shopping situation and its similarity to a real shoPPing trip on a six-point scale. A final question asked whether subjects had chosen a particular product just to "try out a new brand." After completing the questionnaire, subjects were asked to leave by a route not passing the room where the other subjects were waiting to minimize the chance of intermingling.

RESULTS

The results of the experiment fall into three categories: product choice, concurrent validity, and perceived realism. A two-way ANOVA procedure was used to analyze the data for each of these measures. The results vary across measures. Therefore, they will be discussed separately.

Results for the product choice measure are given in Table l. As depicted in the table, there is a strong tendency for subjects in the cash in hand condition to choose lower priced products (Lays and Laura Scudder's) and for those in the promised change condition (told to pretend they had money) to choose the higher priced products (Natural and Hawaiian). In fact, in the cash in hand condition 435 chose Lays and 24% Hawaiian (the lowest and highest priced brands) while in the promised change condition 16% chose Lays and 46: Hawaiian. An analysis using price of purchase as the dependent variable gives a result significant at the .05 level (p - .040, F = 4.32). No relationship between product choice and the number of products is evident (p = .569).

TABLE 1

MEAN PRICE OF BRAND CHOSEN (C)

Table 2 gives average concurrent validity for each condition. The averages represent the mean agreement of choice between the highest choice based on the full factorial scale in the pre-choice questionnaire and actual choice in the experiment. (A one means actual choice was the first choice in the questionnaire, etc.). It is readily apparent that concurrent validity was much better for subjects in the cash in hand condition across number of products (mean convergence of 1.87 versus 2.61 ) . This result is significant at the .01 level (F = 7.35). In fact, the promised change condition was slightly worse than chance (2.61 versus 2.50).

The 5 product condition also appears slightly better than the l product condition (mean convergence of 2.07 versus 2.38) although this result is not significant (p - .275, F = 1.21).

The results for perceived realism (realism and similarity) are given in Table 3. The numbers are the mean ratings on the six-point scales in the post-choice questionnaire. For example, the 3.72 in the upper left-hand quadrant means that the mean response to the question of similarity of the experiment to an actual shopping situation was 3.72 on a scale from one to six (where one is very very similar and six is not at all similar) for the cash in hand/1 Product condition.

TABLE 2

MEAN CONCURRENCE BETWEEN QUESTIONNAIRE AND ACTUAL CHOICE

The results indicate that the five product condition was perceived as more realistic (3.19 versus 3.67) and similar (3.10 versus 3.72) than the one product condition. For both realism and similarity the results are significant at the .10 level (p = .061, F = 3.64 and p = .085, F = 3.07 respectively). It is clear from Table 3 that there is no such relationship for giving versus not giving money (3.47 versus 3.42 and 3.44 versus 3.42 respectively for realism and similarity), contrary to findings for the first two measures.

TABLE 3

PERCEIVED REALISM OF SHOPPING

DISCUSSION

The results show that the procedures used in simulated shopping have a significant effect on each of the three measures used. For product choice, giving actual money let to a large shift in market share (about 20%) toward choosing the lower priced brand (consistent with Hypothesis I). For concurrent validity, market share for the cash in hand condition is predicted better (consistent with Hypothesis 2), while there is little evidence to indicate that five products yield more accurate predictions than one product (Hypothesis 5). In perceived realism likewise, five products was rated more realistic than one product (Hypothesis 4). However, giving or not giving money didn't make a difference (contrary to Hypothesis 3). Thus, the most obvious and pertinent result of this research is the finding that different procedures in simulated shopping experiments can affect all three measures of validity.

The results indicate specific recommendations for future research using laboratory experiment methods. Since almost, if not all, experiments are concerned with the validity of choices, it is important that researchers begin to give subjects actual cash before shopping is started. It is interesting to note that Sawyer et al. (1979) suggested that giving subjects the opportunity to receive change is an important boundary variable. Our research goes further to suggest that subjects need to receive the original cash as well as the change. This modification would not only, as this research suggests, increase the validity of experiments at no extra cost, but also provide for greater standardization of procedures across experiments.

Another potential implication is that a variety of product choices and decisions should be used in experiments testing grocery products, even when only one product is being examined. Previous research has been limited to shopping for one product (Silk and Urban, 1978), probably due to the extra cost of giving consumers several products. However, the lottery method used here allows shopping for five products while costing the same as shopping for two.

It is interesting to note that the giving of money had a large effect or the behavioral variables (choice and concurrent validity) and a much smaller effect on perceived realism. There were no differences in perceived realism between giving cash and only promising change, yet there were significant differences in choice of different priced brands. This pattern is also seen in Tversky and Kahneman's work, where it is not intuitively obvious that these effects will occur, yet they demonstrate consistently large effects on choice from seemingly minor changes in instruction. On the other hand, the other method investigated, shopping for l or 5 products, was perceived to make a difference in realism, yet the actual behavior change was insignificant.

The limitations of this work are several. First, the subjects were a convenience sample, and may not be representative of other types of consumers. Nevertheless, this paper demonstrated at least that predicted market shares do depend on procedures for some subjects. Also, an analysis of covariance showed no difference in results due to sex or age which encourages generalization on these variables (although the relatively homogeneous student population used restricts this conclusion). Second, the dependent variable measuring validity might have been improved. A better measure would have been to validate the predicted market share against actual test market results after a real price change. This exploratory study did not have the resources to operate a full test market, but it did demonstrate that procedures do affect laboratory predictions and perceived realism. And, third, other dependent measures could have been added. Most importantly, future research should analyze the framing and boundary effects of shopping experiments to better test the applicability of hypotheses based on prospect theory. Additional measures may include brand attitudes after the simulated shopping. Since these attitudes are often used to adjust the simulated shopping results in applied studies (Silk and Urban, 1978) we would hope to find a smaller effect of procedure on attitudes.

Other useful procedures which could be investigated in simulated shopping are 1) forced choice versus free choice (if some subjects refuse to buy any of the brands merely because they already have a supply of it at home, we might improve the efficiency of the estimate by forcing them to buy a brand); (2) the study of products in several price ranges (large cost may give higher subject involvement for example); (3) alternative methods of displaying the products (for instance, the usual table set-up versus Zeithaml's more elaborate use of a mock shopping aisle); and (4) use of mandatory tradeoffs, when to purchase an expensive brand of one product the subject must purchase a less expensive brand of another.

In conclusion, this paper presents an argument and evidence for investigating different procedures in simulated shopping experiments. It does not suggest that these experiments are right for all research questions (see Sawyer et al., 1979, and Calder et al., 1983, for a discussion of this and other external validity questions) but does show that the conditions under which such experiments are conducted will often significantly affect the validity of the outcome. We hope that future research will lead to more definitive proposals for standardizing laboratory research in marketing and improving predictiveness of simulated shopping

REFERENCES

Calder, Bobby J., Lynn W. Phillips, and Alice M. Tybout (1981), "Designing Research for Application," Journal of Consumer Research, 8, 3, (September), 197-207.

McNiven, Malcolm A. (1979), "Pillsbury's New Product Measurement System." Paper presented to Special ORSA/TIMS conference on Market Measurement and Analysis, Stanford University, (March 26).

Sawyer, Alan G., Parker M. Worthing, and Paul E. Sendak (1979), "The Role of Laboratory Experiments to Test Marketing Strategies," Journal of Marketing, 43, (Summer), 60-67.

Silk, Alvin J., and Glen L. Urban (1978), "Pre-Test-Market Evaluation of New Packaged Goods: A Model and Measurement Methodology," Journal of Marketing Research, 15, (May), 171-191.

Tversky, Amos, and Daniel Kahneman (1981), "The Framing of Decisions and the Psychology of Choice," Science, 211, (January), 453-458.

Wright, Peter and Mary Ann Kriewall (1980), "State of Mind Effects on the Accuracy with which Utility Functions Predict Marketplace Choice," Journal of Marketing Research, 17, 277-293.

Zeithaml, Valarie A. (1982), "Consumer Response to In-store Price Information Environments," Journal of Consumer Research, 8, 4 (March), 357-369.

----------------------------------------