An Evaluation of the Characteristics of Response Quality Induced By Follow-Up Survey Methods

P. J. O'Connor, University of Kentucky
Gary L. Sullivan, University of Cincinnati
Wesley H. Jones, E. I. Du Pont De Nemours
ABSTRACT - A study was undertaken to determine the efficacy of using follow-up questionnaires in survey research. Results indicate that such a procedure is an effective tool for increasing response rate. While the number of items omitted is greater in follow-up waves, the reliability and validity of the data collected is not lessened.
[ to cite ]:
P. J. O'Connor, Gary L. Sullivan, and Wesley H. Jones (1982) ,"An Evaluation of the Characteristics of Response Quality Induced By Follow-Up Survey Methods", in NA - Advances in Consumer Research Volume 09, eds. Andrew Mitchell, Ann Abor, MI : Association for Consumer Research, Pages: 257-259.

Advances in Consumer Research Volume 9, 1982      Pages 257-259


P. J. O'Connor, University of Kentucky

Gary L. Sullivan, University of Cincinnati

Wesley H. Jones, E. I. Du Pont De Nemours


A study was undertaken to determine the efficacy of using follow-up questionnaires in survey research. Results indicate that such a procedure is an effective tool for increasing response rate. While the number of items omitted is greater in follow-up waves, the reliability and validity of the data collected is not lessened.


During the last few years, an increasing awareness of the problems inherent in the interpretation of survey response data has been evident in the published literature (Peterson and Kerin, 1980; Jones and Linda, 1978; Jones and Lang, 1980). In an effort to upgrade the quality of the literature in consumer behavior, explicit attention is now being paid by editors and reviewers in major journals to response rates in evaluation of manuscripts for publication (Ferber, 1980). Increasingly, sample composition differences seem to offer viable alternative explanations for contradictory findings in published research.

Much of the early research on survey methodology was highly singular in its focus on the issue of overall response rates. This early literature lacked a theoretical framework upon which to evaluate the nearly limitless procedural options available to the survey researcher (Kimball, 1961; Watson, 1965). As a result, much of this literature is of the trial and error variety. While many of the techniques studied in this fashion are widely used today, typically, the only criteria used to evaluate the suitability of the technique was its effect on overall response rate.

Recently, the set of criterion variables has begun to expand. In addition to overall response rate, attention is now being paid to item non-response in questionnaires, response bias in questionnaire completion, and sample composition bias as issues of concern in survey administration (Peterson and Kerin, 1980; Jones and Linda, 1978; Jones and Lang, 1980). This study addresses the as yet unresearched issues of test-retest reliability and convergent validity (Heeler and Ray, 1972) in survey methodology. Specifically the effects of sponsorship and the use of follow-up questionnaire mailings are addressed on these two criterion variables.

Survey follow-up techniques which include supplying respondents with replacement questionnaires are generally recommended as a means of substantially improving response rates However, it is important to ask whether the apparent reduction in sampling error might be accompanied by an increase in systematic response error. A respondent may hastily complete and return a questionnaire on the second or third mailing simply to avoid receiving further contact from the sponsor. The completeness of the returned questionnaires and the reliability and validity of subjects' responses would be suspect in such cases. This study was designed to explore these possibilities.


As part of a larger research study, a mail survey of 1200 households was conducted in a midwestern state. The study employed a two stage sampling procedure mandated by the study sponsor. First, eleven counties in the state were randomly selected. Then, within each county, 109 household addresses were selected at random from local telephone directories; except for one county where 110 households were selected.

The survey questionnaire was mailed to these 1200 households. Two weeks later, an identical questionnaire was mailed to the same 1200 households along with a note thanking the respondents for their cooperation if they had already returned the questionnaire and asking them to complete and return i: promptly if they had not yet done so. Two weeks after that, yet another wave of questionnaires was sent out to these households. All of the questionnaires were numerically coded so that they could be identified by mailing wave when returned .

The questionnaire was twelve pages in length. The first page of which contained a cover letter providing a statement of the research purpose and necessary instructions for completing and returning the instrument. The organization sponsoring the survey was manipulated. Sponsorship of the study was attributed to one of two sources; either the major University in the respondents' state or the state's Department of Transportation. Nine pages of the instrument consisted of 99 questions which probed the respondents' attitudes and behavioral intentions with respect to a host of traffic safety issues. The eleventh page of questions queried the respondents about their media habits and, finally, the last page asked for demographic data.

Question 3 of the survey asked respondents to rate the seriousness of seventeen different traffic safety problems using seven point semantic differential-type scales. Eight pages later, question 98 exactly repeated eleven of the stimulus items from question 3 using the same measurement scales. The remaining six stimuli from question 3 were employed in question 99 which utilized a different measurement format; rather than the semantic differentials used previously. In this way an assessment of the test-retest reliability and convergent validity of these questionnaire items could be obtained.


Response rate is the proportion of mailed questionnaires that were returned by the sample households. There were 494 total questionnaires returned from the 1200 households surveyed producing an aggregate response rate of 41.2% for the study. However, 34 of the questionnaires were unsuitable for analyses. Some people returned the survey along with a note stating that they felt unqualified to respond to the issues. Other questionnaires were received which were simply uninterpretable. Also, a few people cut off the numerical codes thus preventing determination of the wave to which they belonged.

This left a total of 460 useable questionnaires, an effective response rate of 38.3% for analyses. As can be seen in Table 1, there was virtually no difference in response rate due to survey sponsorship. However, only 254 questionnaires, or 55.2% of the total useable returns, were obtained from the first mailing wave. There were 139 (or 30.2% of the total) returns from the second wave and 67 (or 14.6% of the total) returns from the third wave. Thus a large percentage of the final sample available for analyses resulted from the follow-up mailing procedure employed in this study.



Item omission refers to the total number of questionnaire items that a respondent left unanswered. There were 331 individual items contained in the questionnaire. The number of items omitted ranged from a low of 5 to a high of 281 with a mean omission of 36.72 across all three survey waves. There was a substantial increase in the level of item omission for questionnaires returned after the first wave, however. The mean number of omitted items for each wave was 32.01, 44.56, and 38.33 respectively.

Unreliability was assessed by the within subjects differences in ratings of the eleven items which were measured twice in the survey instrument. Due to the separation of the items in this lengthy questionnaire, a reasonable test-retest measure of reliability was available. The following unreliability index was constructed for use in analysis.



Fi = first rating given the item

Si = second rating given the item

n = number of items rated both times

The index computes the absolute magnitude of the difference in ratings of the repeated items. The differences were then summed and divided by the number of items rated both times by a respondent. In this way the index is standardized across respondents to account for item omissions. The theoretic range of this index is from 0 to 6, where higher numbers indicate greater respondent unreliability. In this study, the unreliability index values actually ranged from O to 3.91 with a mean of 1.41 across respondents. A frequency distribution of the unreliability index is presented in Table 2 indicating that there was a good deal of unreliability present.



An index of invalidity was constructed in a similar manner after correcting for differences in measurement methods. The scale data for the remaining six items were rescaled for directionality in order to make them- compatible with the original ratings in question 3 of the instrument. These were then used as input for the following invalidity index where the symbols are as above.


Again the theoretical range of the index is from 0 to 6, with higher numbers indicating greater respondent invalidity. The actual invalidity index values obtained in this study ranged from 0 to 5.00 with a mean of 1.40 across respondents. Table 2 contains a frequency distribution of the invalidity index.

In order to determine the effects, if any, of survey sponsorship and the follow-up mailing procedure, item omission as well as the unreliability and invalidity indices were used as dependent variables in three separate analyses. Along with survey sponsorship and the wave from which the questionnaire was returned, a few demographic variables were used as covariates in an effort to determine if differences in these variables might account for some of the variation in the dependent measures.

The usual procedure is to analyze this type of data using analysis of variance (ANOVA) techniques. Use of such techniques requires that the effects be additive. Satisfaction of this requirement can be accomplished by cell size equality. However, due to the response rate variation (see Table 1), substantial differences in cell sizes were encountered in the analyses. Accordingly, it was necessary to adopt the general linear hypothesis (Namkoodiri, Carter and Blalock, 1975; Perreault and Darden, 1975) approach in testing for effects. The general linear hypothesis uses least squares analysis to compute partial regression coefficients as parameters of the additive model effects.

The significant regression coefficients for all three analyses are presented in Table 3. The independent variables of sponsor, wave, sex, and race were dummy variables coded as 0 or 1. University sponsorship was coded 1 as was male and non-white race. Education is the subject's report of the number of years completed in school and income is annual household income in thousands of dollars. Only the main effect results are listed since all interactions were found to be non-significant.



Not surprisingly, there were significant beta coefficients for wave with respect to item omission. The follow-up method results in significantly more items left blank. There were also some significant differences due to demographic factors. As can be seen in Table 3, there were more items omitted among the lower educated, lower income and non-white groups.

The wave from which the questionnaire was returned did not effect the unreliability index. Sponsorship did have a significant beta coefficient, however, with University sponsorship resulting in more unreliability. It was also found that unreliability was greater among the less educated and among males rather than females. Neither sponsor nor wave was significant in explaining the invalidity index. In fact, educational level was the only variable that had any effect. The lower the respondents' education level the more invalid the data.


Based upon this research, the use of follow-up questionnaires to increase response rates in mail surveys seems to be a very effective tool. While the completeness of such follow-up questionnaire data is lessened, the quality of the data doesn't seem to suffer. Perhaps, respondents who reply to the second or third wave of a questionnaire are motivated more by a desire to end contact with the sponsor of the study than by their interest in the subject matter of the survey. Thus, they leave many more items blank. Further research is needed to determine the accuracy of this assumption. However, there was no evidence in this study to indicate that there is any systematic bias in the responses that these subjects to provide.

The follow-up procedure used in this study is recommended for survey researchers who need to assess accurately the composition of their sample, but from whom an increased level of item omission would not pose a serious problem. One aspect of the results which is somewhat enigmatic is that due to survey sponsorship. The two different sponsor used in this research were found to have a differential impact on the unreliability index, but not on the invalidity one. The reasons for this are not readily apparent. This is an issue that deserves further investigation.


Ferber, Robert (1980), "The Role of Response Rates in Evaluating Manuscripts for Publication," in Advances in Consumer Research, Vol. 8 (Washington D.C.- Association for Consumer Research).

Heeler, Roger M. and Ray, Michael L. (November, 1972), "Measure Validation in Marketing," Journal of Marketing Research, 9, 361-70.

Jones, Wesley H. and Lang, James R. (February, 1980), "Sample Composition Bias and Response Bias in a Mail Survey: A Comparison of Inducement Methods," Journal of Marketing Research, 17, 69-76.

Jones, Wesley H. and Linda, Gerald (May, 1978), "Multiple Criteria Effects in a Mail Survey Experiment," Journal of Marketing Research, 15, 280-84.

Kimball, Andrew E. (January, 1961), "Increasing the Rote of Return in Mail Surveys," Journal of Marketing, 25, 63-65.

Namkoodiri, N. R., Carter, L. F. and Blalock, H. M. Jr. (1975), Applied Analysis and Experimental Designs (New York: McGraw-Hill).

Perreault, William D. Jr. and Darden, William R. (August, 1975), "Unequal Cell Sizes in Marketing Experiments: Use of the General Linear Hypothesis," Journal of Marketing Research, 12, 333-43.

Peterson, Robert A. and Kerin, Roger A. (1980), "Household Income Data Reports in Mail Surveys," Journal of Business Research, 8, 301-13.

Watson, John J. (1965), "Improving the Response Rate in Mail Research," Journal of Advertising Research, 5, 48-50.