Response Rates and Internal Validity: Some Implications For Our Literature

Robert Mittelstaedt, University of Nebraska, Lincoln
[ to cite ]:
Robert Mittelstaedt (1981) ,"Response Rates and Internal Validity: Some Implications For Our Literature", in NA - Advances in Consumer Research Volume 08, eds. Kent B. Monroe, Ann Abor, MI : Association for Consumer Research, Pages: 270-273.

Advances in Consumer Research Volume 8, 1981      Pages 270-273

RESPONSE RATES AND INTERNAL VALIDITY: SOME IMPLICATIONS FOR OUR LITERATURE

Robert Mittelstaedt, University of Nebraska, Lincoln

INTRODUCTION

One of the dangers of exposing one's students to some of the classic writings in the Philosophy of Science is that they begin to ask serious questions about the adequacy of the empirical base of our generalizations and our explanatory models. Three years ago, having read of the role of replication in the advancement of a discipline, members of the graduate seminar, which I was privileged to lead, began to consider replication in the literature of our field. The semester ended, the seminar was over, but their - and my - interest continued and led to a wider consideration of the issue and a small-scale study of the replicability of studies reported in the marketing and consumer behavior literature (Madden, Franz and Mittelstaedt 1979).

As that study progressed, several points became evident. First, many are willing to extol the virtues of replication but, when pressed, oppose those moves which would facilitate or support studies which attempt to replicate. Second, much of this opposition seems to proceed from a distrust of the motives of would-be replicators. Third, this distrust, in turn, largely follows from a misunderstanding of the nature of replication. Many seem to believe that, necessarily, the process requires one to provide raw data to anyone who asks to see it and/or answer innumerable foolish questions from those who are too inert or inept to devise their own research projects.

It is the purpose of this paper to describe the several kinds of and purposes for replication and to relate these to the nature of the sampling process and the reporting thereof. Following that, the results of a modest scanning of a sample of ACR papers will be reported which will lead to a few conclusions and recommendations.

TYPES AND PURPOSES OF REPLICATION

A useful classification of the types of replication is given by Lykken (1968) who differentiates among literal, operational and constructive varieties.

Literal replication.  Literal replication is the exact duplication of the original study's sampling procedures, measurement techniques, experimental conditions and methods of analysis. In consumer behavior, literal replication is almost impossible, although testing for split-half reliability or the use of a holdout sample in discriminant analysis comes close to Lykken's concept. If this sort is possible at all, it seems to be the kind of replication which could be conducted only by the original study's investigator or his/her close associates.

Operational replication.  Operational replication begins with the methodological recipe of the original study and attempts to follow it as closely as possible. For one investigator to replicate the work of another, it is obvious that it would be necessary to know a great deal about the methods of the original study. Realistically, no one expects researchers to report this kind of detail; conference papers and journal articles would become impossibly long and exceedingly dull.

Constructive replication.  Constructive replication occurs when one deliberately avoids the methodology of the first study to verify its findings by other means, thereby enhancing the generalizability of those relationships. The researcher attempting the constructive replication of another study needs enough information about the original to determine the probability that the relationship exists, its' possible strength and the general circumstances in which it is likely to be found. Thus, a description of methodology which allows meaningful interpretation of results, is sufficient to permit the constructive replicator to decide whether or not a study's findings are worth replicating, to choose another methodological recipe and to compare the findings of the replication with the original.

The purposes of replication (as suggested by Rose 1954, pp. 262-8; Selltiz, Wrightsman and Cook 1976, pp. 63-70) are to advance the knowledge base of a discipline by:

Testing Methodologies for Soundness.  Obviously very tight operational replication, with prescribed variations, is needed to investigate the soundness of a methodology. Sawyer (1975) has provided some excellent examples of this type of replication in his investigations of demand artifacts.

Testing Predictions Derived From a Model, in a Context Different From That of the Original Study(ies).  Since so many of our models are borrowed from disciplines which have no direct interest in consumer behavior, their predictions are most often untested in any context resembling those of interest to researchers in our field. Thus, while some findings may be well established in, say, social psychology, the first step in demonstrating their relevance to consumer behavior is to show that they hold in circumstances similar to those of the market place. To avoid total confusion if they do not work out, it is a necessity to hold methodology as constant as possible implying, again, some type of operational replication.

Testing the Generalizability of a Relationship by Extending the Findings of a Particular Study or Set of Studies to New Time, Place and Situational Contexts.   Obviously, confidence in the existence of any relationship between two variables is enhanced by finding that it holds across a wide variety of circumstances. When Rogers and Shoemaker (1971) report that 203 of 275 studies have found education to be positively related to innovativeness, we accept this generalization, partly because of the sheer number of supportive findings but, more importantly, because those 203 supporting studies represent the efforts of different investigators working in various times, places, and cultural settings, using different measures of innovativeness. Extending findings involves constructive replication and, in terms of building the knowledge base of our field, may be more useful than many operational or literal replications.

RESPONSE RATES AND INTERNAL VALIDITY

When considering the research paradigm most of us have been trained to use, and especially those with a background in the non-experimental tradition of marketing research, all questions about sampling seem to be related to the issue of external validity. In the apt phrasing of Petrinovich(1979, p. 376):

...great care is extended to obtain a representative sample of subjects from the reference population, and concern is with the reliability (and hence the standard error) of the subject population. The focus is on subject sampling, and the statistical procedures used assign individual differences between subjects to the error term against which the significance of mean differences is assessed. Great care is used to sample subjects, and pains are taken to assure that generalizations can be made to a relevant population of subjects. There is little concern, however, regarding the legitimacy of the generalizations across situations.

By contrast the central concern of constructive replication is exactly with the "legitimacy of generalizations across situations." Thus, the whole question of response rates might be thought to be nearly irrelevant to the issue of replication. Of course, knowledge of response rates would be helpful to the designer of an operational replication but, in the end, not really crucial to such an effort.

However, the informational requirements of constructive replication are essentially those needed to synthesize a study's results into the relevant literature. Thus, to the would-be constructive replicator, as to the knowledge synthesizer, the whole question of response rates - and the reporting of such rates - is important to the extent it bears on the issue of internal validity.

One obvious question arises from the likelihood that most non-response, for whatever reasons, includes the more "extreme" cases, regardless of the variable. To the extent this is true, the statistical effect of non-response is that the sample variance becomes a biased understatement of the population variance and, therefore, the calculated value of alpha understates the true probability of Type I error in any hypothesis test. While proper reporting would not lessen this problem in real terms, it would alert the interested reader to the necessity of making some subjective revision of the reported alpha level and, therefore, of effect size.

However, the issue of non-response cuts deeper. It seems unlikely that the factors which produce non-response are "random". To the extent they are interactively related with the independent variable and/or related to the dependent variables of a given study, response rate is an important internal validity issue.

To take the most obvious example, the probability of a householder being interviewed in a given study is a function of that person's allocation of time to those activities which he or she does at home. The literature of "time budgeting" supports the intuitive notion that this allocation of time between "away" and "at home" is 'systematically related to many other behaviors of interest to consumer researchers. However, because the nature of the probability function is partially, perhaps even substantially, controllable by callback procedures, it is essential that information about such efforts be appropriately reported, along with contact and response rates, to allow one to make an adequate interpretation of any given study.

A less obvious example involves refusals. Anyone who has done any interviewing probably has some explanations for the reason, or reasons, behind most refusals. A reasonable surmise is that many result from some sort of fear and, while we may not know the roots of those fears, it seems likely that they are systematically related to other variables of interest. But, whatever the reasons, refusals are not a random phenomenon and the reasons for their occurrence are likely to be related to other important variables.

In summary, there are serious and substantial questions of internal validity associated with sampling procedures and, ultimately, replication is a matter of internal validity. In Kaplan's (1964, p. 128) words:

...The methodological importance of what is called repeatability is, I think, made more plain by its restatement as intersubjectivity. A scientific observation could have been made by any other observer so situated; nature plays no favorites, but exposes herself promiscuously . . . The methodological question 'is always limited to whether what is reported is an observation that can be used in subsequent inquiry, even if the particular observer is no longer part of the context. I ask "Do you see what I see?" to help decide whether what I see is to be explained by self-knowledge or by knowledge of the presumed object.

REPORTED RESPONSE RATES IN A.C.R. CONFERENCE PAPERS

Described in this section are the results of "reading of 30 ACR papers randomly selected from the "Competitive Papers" sections of Volumes V, VI and VII of Advances in Consumer Research. Conference papers were chosen because they more closely reflect our collective reporting habits than do journal articles which have been revised before publication according to the standards of the journal.

Table 1 describes the general nature of the 30 selected papers. Twenty reported surveys or, at least, non-experiments. Of the remaining 10, 6 were field experiments and 4 were conducted in a classroom or laboratory. Cut along a different dimension, 15 of the studies used an ad hoc sampling procedure designed for the particular study. In 11 studies, respondents were contacted through, and because of, their membership in an extant grouping such as a panel, classroom or volunteer organization. In the remaining 4 studies, one could not tell how respondents qualified for inclusion in the study.

Regardless of the type, every paper reported a value for "n." Sometimes it was the only aspect of the sample which was reported. Most papers contained a very brief description of the pool from which the sample was drawn: these ranged from the terse (e.g., "a sample of students") to a two or three sentence description of a panel's composition.

Beyond these basic items, what one might expect depends mostly on the type of study involved. At the risk of oversimplification, but in the hope of adding some structure to what follows, it is useful to think of the potential for non-response occurring at two levels. First, there is that level or stage which contains the processes by which the sample is framed and, second, there is the level containing the processes by which usable responses are obtained from a chosen sample.

To begin with the first stage, the choice of a sample frame, in effect, defines the actual universe of which the sample may (or may not) be representative. In all studies involving an ad hoc sampling procedure, it may be presumed that some set of screening questions or interviewer instructions are used to "qualify" respondents. Of the 15 studies using ad hoc sampling procedures, one field experiment reported the use of screening variables but gave no indication of the number rejected because of them. Another field experiment reported the number of persons contacted and the number lost and described some of the screeners used. Two of the surveys stated the qualifications for respondent participation but did not give any indication of the number of contactees lost as a result. Eleven of the 15 studies using ad hoc sampling procedures simply did not mention any conditions or actions which, in effect, defined the universe being sampled.

TABLE 1

SELECTED CHARACTERISTICS OF SAMPLED CONFERENCE PAPERS

When sampling from extant groupings, the issue is a bit more complex. First, there are the qualifications for participation in the group itself. As noted, most of the panels were briefly described but only one paper made any serious attempt to describe the means by which the panel had been formed or maintained. It is apparent that those who have access to commercial panels believe that mentioning such a panel by name is sufficient to describe its "representativeness." In a similar fashion, those who use classroom or volunteer organizations as samples seldom feel constrained to describe the "profile" of such groupings; only one paper described the general nature of the volunteer groups from which its respondents were drawn. Second, some further qualifications may or may not be imposed within an extant grouping. Two of the classroom experiments and one survey which drew its sample from a panel described the screening devices employed and the respondent losses associated with each.

At the second stage, it is necessary to divide the studies according to the method of data collection employed and the degree of researcher control over data collection. When a study is conducted by mail, for example, there is a considerable loss of respondents between the presentation of the instrument and the recovery of useable returns while, in the case of an interviewer-conducted survey, the major potential for non-response occurs before the instrument can be presented to the respondent. Even in experiments with extant groups, or in surveys that are administered in a "group" setting, a certain amount of subject loss can be expected, although it was mentioned in only 2 of the 6 such studies examined. In all 7 mail surveys, 2 "drop-off questionnaire" surveys and 2 of the field experiments, respondents completed instruments while out of the presence of the researcher. In these instances it is difficult to sort out the various forms of non-response. However, all but one of the field experiments reported some sort of "response rate" which, in effect, aggregated all losses between the distribution of the instrument and the counting of useable responses and expressed it as a percent of the number of instruments distributed. For example, each of the 7 mail surveys reported the number of questionnaires sent out and the number of useable returns, although none gave any indication of any follow-up procedures not did any attempt any analysis based on quickness of return.

The surveys conducted by personal or telephone interview and the two field experiments which involved the recruitment of volunteers share a somewhat different set of problems. None of these 9 studies gave any indication of the number of potential respondents contacted, none mentioned refusals, nor did any suggest that there were any less-than-complete responses to presented instruments.

SUMMARY AND CONCLUSIONS

In the end, what a given researcher does or does not do is not really as important as whether or not what was done is openly reported in enough detail to allow the interested reader to make an evaluation of a study's internal validity. By this criterion, our collective reporting habits appear, to put it generously, rather loose. From the previous section, it can be concluded that the interested reader of ACR papers would find it difficult, if not impossible, to answer either of two questions about the sample of the typical study: (1) What universe was being sampled? (2) How representative is the sample of that universe?

No one expects sampling procedures to be reported in such detail as to permit operational replication. Few would claim that such was desirable. But, what should concern everyone is that sampling procedures appear to be reported in such a way as to prohibit the extension of findings through constructive replication and the useful literature synthesis that both precedes and follows such efforts.

Some cynics claim that the reporting of methodology is so casual because many wish to obscure their sloppy procedures. Others acknowledge that the problems of reporting largely follow from a lack of understanding and propose that young scholars receive better training so that, over the next 10 or 20 years, reporting habits will be improved by attrition.

What is apparent to me is that the problem of reporting standard deserves attention now. If there is to be a body of literature, in any meaningful sense of that word, some standards must be imposed. Widely diffused and understood definitions of response rates would be a useful first step; the quality of research might not be improved as a result, but the quality of interpretation would be enhanced considerably. Until all researchers are willing to ask, in Kaplan's phrase, "Do you see what I see?" we are not going to get very far.

REFERENCES

Kaplan, Abraham (1964), The Conduct of Inquiry, San Francisco: Chandler Publishing Company.

Lykken, David T. (1968), "Statistical Significance in Psychological Research," Psychological Bulletin, 70 (February), 151-159.

Madden, Charles S., Franz, Loft Sharp, and Mittelstaedt, Robert (1979), "The Replicability of Research in Marketing: Reported Content and Author Cooperation," in O. C. Ferrell, Stephen W. Brown and Charles W. Lamb, Jr., eds., Conceptual and Theoretical Developments in Marketing, Chicago: American Marketing Association, 76-85.

Petrinovich, Lewis (1979), "Probabilistic Functionalism: A Conception of Research Method," American Psychologist, 34 (May), 373-390.

Rogers, Everett M. and Shoemaker, Floyd (1971), Communication of Innovations, 2nd ed., New York: The Free Press.

Rose, Arnold M. (1954), Theory and Method in the Social Sciences, Minneapolis: The University of Minnesota Press.

Sawyer, Alan G. (1975), "Demand Artifacts in Laboratory Experiments in Consumer Research," Journal of Consumer Research, 1 (March), 20-30.

Selltiz, Claire, Wrightsman, Lawrence S. and Cook, Stuart W. (1976), Research Methods in Social Relations, New York: Holt, Rinehart and Winston.

----------------------------------------