Validity Procedures At the Survey Research Center

F. Thomas Juster, Survey Research Center, The University of Michigan
[ to cite ]:
F. Thomas Juster (1975) ,"Validity Procedures At the Survey Research Center", in NA - Advances in Consumer Research Volume 02, eds. Mary Jane Schlinger, Ann Abor, MI : Association for Consumer Research, Pages: 725-740.

Advances in Consumer Research Volume 2, 1975      Pages 725-740


F. Thomas Juster, Survey Research Center, The University of Michigan

Besides the usual validity checks, the Survey Research Center uses tape recorded interviews as a means of observing interviewer-respondent interaction and to monitor interviewer performance. High levels of interviewer performance are also maintained by institutional arrangements: interviewers are regular University of Michigan employees and are compensated on a time and travel cost rather than a completed contract basis. Hence they tend to be relatively experienced and to have more on-the-job training than is usually true for survey organizations. SRC also maintains a highly professional sampling staff and uses rigorous probability sampling techniques.

Validity research at SRC includes work on ways to provide respondents with both incentives and cues. The former is designed to increase willingness to do the work required to provide accurate responses, the latter to create an interview situation maximizing the likelihood that the respondent will be able to recall accurately.

Discussions of validity procedures divide into two basic classes of cases: validity where an unambiguous external check is potentially available, and validity where no such external reference point exists. Most survey data fit into the first category, but at the Survey Research Center a very large proportion of survey measurements fall into the second.

An interview. to use the language of Kahn and Cannell in their classic, The Dynamics of Interviewing, is a specialized form of verbal interaction, initiated for a specific purpose, and focused on some specific content area with consequent elimination of extraneous material. Thus a valid response to a question on an interview simply represents one where the process of that interaction results in an answer which corresponds to the externally verifiable reality in the one case, and to an accurate description of the respondent's perceptions, opinions, or judgments in the second. Clearly however there is a major difference in strategy for these two cases: if one is trying to find out an objectively verifiable fact, whether the subject was a patient at a hospital or a clinic within the past year, whether he has a television set in the house, how many people live in the house, whether there is indoor plumbing, or how much income the family has--any strategy that will help produce truth is acceptable. But if the subject of the interview is whether the respondent thinks Ford will be a good president, or whether he expects prices to rise and if so, by how much, or whether he feels that he is fairly compensated on his job or that the level of public services in his neighborhood is satisfactory, then the exact specification of the question, the precise words and manner of the interviewer, will determine, to some extent, what kind of answer is forthcoming.

This paper is divided into three sections. In the first, I simply describe existing practice at the Survey Research Center. In the second, methodological studies that are underway at SRC are summarized briefly. In the third, I discuss the relation between validity and cost, and provide some generalizations (largely my own biases) about the payoff to validity checks and procedures and their role in the end product of survey research--the discovery and systematic accumulation of scientifically valid knowledge about behavior.


Validation procedures at the Survey Research Center relate both to the instrument itself and to the performance of interviewers. For the instrument itself, we provide the usual package of routine validity checks--extensive pretesting, small scale pilot studies, debriefing sessions with the pretest and pilot study interviewers, etc. These can be viewed as essentially ways of insuring that the instrument is operational, in the sense that the questions as drafted are (or seem to be) clear to the respondent, and that the answers appear to be responsive to the intent of the question as seen by the researcher who designed it or helped to design it. The principal difference here between our procedures and others is probably that we do more of these tests and that the lag between initial pretest and final version is likely to be longer. That is principally because we have more relaxed time frames--or at least often do--than other organizations, particularly those that need to meet market demands for information on a schedule dictated by the demand side of that relationship.

Other procedures that are systematically incorporated into much SRC survey research relate to measures designed to observe trends and procedures designed to check for drift in the response to open-end questions. As indicated earlier, there are a great many questions and questionnaires in which no validity measure is really possible because what is being obtained is the perception, the opinion, or the judgment of the respondent. But even though there is no way to determine validity in an absolute sense, we often find it useful to repeat identical questions over different periods of time in order to obtain valid measures of differences.

Such differences are valid if and only if the stimulus presented by the question to the respondent is the same in fact, even though the question is identical. To validate that, we use random probes: Every nth respondent is asked a probe "Why is that?" question in order to determine whether the question is being answered with the same frame of reference as was true several years back. For example, questions concerning desegregation are quite likely to produce different responses now than 10 years ago, even if the respondents' attitudes are exactly the same, simply because the word itself tends to mean something different now than 10 years ago.

We have also conducted extensive experimental work designed to check instrument validity in situations where external reference points are available. These procedures and results are discussed below in the next section.

The last procedure I wish to note in discussing instrument validity is the use of tape recordings as a way of observing interviewer-respondent interaction, problem questions, and so forth. Over the last several years, we have begun routinely to obtain tape recordings of pretest interviews. It is often true that one can judge problem questions--those where the meaning of the question to the respondent is obviously quite different from what was intended--by listening to the full interaction between the interviewer and the respondent. Tape recordings are also used, as discussed below, as a means of monitoring interviewer performance.

Interviewer Quality Control

Possibly the most important single dimension of validity procedures in survey operations is to insure that operational data obtained from the field have the same characteristics as the best data that can be obtained from the survey instrument by a skilled, highly motivated, and well trained interviewer. This does not really bear on the issue of whether the instrument is able to obtain valid data--it simply insures that, whatever the potential validity of the instrument, the actual validity of the results comes as close as possible to that.

In this area, the Survey Research Center uses a wide collection of techniques, many of which are standard and found uniformly elsewhere and some of which are probably not. Among the more or less standard ones are:

(1) Skilled interviewers routinely conduct reinterviews with respondents, and the results are compared against the original interview. Persistent flaws in interviewer performance are thus monitored.

(2) Careful attention is paid to quality control throughout the entire survey process--interviewer selection ant training, the preparation of materials for the interview, substantive discussions with interviewers by the study directors, pre-study conferences after training interviewers, and sampling procedures are all given a good bit of attention. Most of what is involved here is based on interaction between the interviewers and the research staff, the research staff and the sampling staff, etc.

(3) Many of our studies involve a reinterview of sample households, and all of our interviewers are aware of this fact.

(4) A sample of respondents is routinely mailed a follow-up questionnaire, primarily to check on the time a survey schedule takes to complete. Interviewer shortcutting may take the form of mixing a small amount of real information with fictitious data, with the result that total time will be substantially less for partly fictitious interviews.

(5) Computerized data on costs per interview, average time per interview, etc. are routinely made available to the field staff, and interviewers whose costs are at one or the other range of these distributions are thus identified regularly. Poor interviewing technique is apt to be associated with either very high cost or very low cost relative to the mean.

There are a number of aspects of SRC field operations which are to some degree unique to the organization, and which bear on the quality of interviewer performance and thus on the validity of data.

These procedures in large part have to do with interviewer selection and motivation, rather than the specifics of interviewer training on any given study. Others would be generally applicable to any survey organization.

The first point to note is that SRC interviewers are classified as regular University of Michigan employees, rather than as independent contractors. This means that interviewers have Social Security taxes withheld, and it also means that interviewing costs are subject to University overhead. The latter has both good and bad aspects, since it raises budgeted costs as well as providing overhead funds for general use in the Center. Largely because of the role of SRC as part of the University, we may also gain some advantage in terms of interviewer motivation with respect to training, work performance, etc. For the most part, the subject matter content of SRC studies deals with what most people would construe as important social issues, where the level of interviewer interest might well be higher than in, for example, studies of marketing habits or brand preferences. In addition to their status as regular U of M employees, SRC interviewers are typically paid on a time and travel cost basis, rather than a completed contract basis. That imposes costs, or at least would have that tendency, but it also has, in our view, significant payoffs in terms of the quality of completed interviews.

As a consequence of these contractual arrangements, SRC interviewers have a number of characteristics which are measurable and which can be presumed to contribute to the validity of the survey measures:

(1) The staff is relatively stable--average experience with SRC is 5 years.

(2) Because we have the expectation of greater staff stability, it pays us to invest more than otherwise in on-the-job training for interviewers. It also means that we have an opportunity to retain interviewers who are at the top end of the performance range.

(3) Both staff stability and training means that we have less need for extensive editing of completed interviews, once a check for general acceptability has been made. That is more of a cost saving than a quality control measure, although it is made possible by the experience and stability controls.

(4) Finally, the pay-for-time and cost system provides no disincentive to obtain interviews that are difficult to obtain. The result should be a more representative sample of the universe as it actually exists, with consequent impact on the validity of the data.

Two other characteristics of our general field operation should be noted.

As indicated earlier, we have been making extensive use of tape recorded interviews as a way of pretesting questions, and these techniques are now being used as a routine check in the evaluation of interviewer performance. Before any interviewer can be raised to a higher pay level, a supervisor report based on tape recorded interviews is mandatory. Our experience here is quite interesting. For example, every survey organization develops folklore about which interviewers "do a good job" and which interviewers one might have doubts about or who appear to be, by some subjective standard, sub-par in their performance. It often turns out, however, that the folklore is wrong. Interviewers who have a great deal of experience and a relatively high standing among peers are sometimes found to be interviewers who "take over" an interview situation and produce a set of results which is as much a consequence of the interviewer's actions as it is the respondent's perceptions. Listening to a tape recording of the completed interview enables our supervisory personnel to measure such straightforward aspects of data quality and validity as how the questions are asked (whether the way they were written or some nonlinear transformation thereof), whether interviewers use directive probes or nondirective ones, whether skip sequences are handled appropriately, whether the pace of interviewing is such as to be conducive to accurate responses, etc. On some of these issues, we have also done some experimental methodological work--for example, on the question of how the "pace" of the interview relates to response validity. But the principal point is that periodically listening to the tape recording of an actual interview by one of the field interviewers is an exceptionally good way to nip bad habits in the bud, to be sure that interviewers are doing what they are supposed to be doing and not something different, and in general to control on quality so as to insure that the data are as close as they can be to what the questionnaire instrument is capable of providing.

Like everyone else, I also have my own favorite tales of what happens to data quality when control over interviewers is casual or, through force of circumstances, suboptimal. One recent case in point, or at least my interpretation of it, suggests the dimensions of the problem.

For a number of years, the U.S. Bureau of Census has been conducting household interviews concerned with consumer purchase plans for automobiles, houses and major durables. Around the middle of the 1960's, some research done by myself and others suggested that a more valid measurement of purchase expectations might well be obtained from a questionnaire which focused on subjective probabilities rather than on something called plans or intentions. We did extensive pretesting of this general concept, including one final pretest in which a random sample of households was interviewed with both the plan or intention version and the subjective probability version, the interviews being conducted a few days apart so that the "real" value of the anticipatory variables was presumably unchanged. Results from the pretest were clearcut: in explaining differences among households in purchase behavior, subjective probabilities were overwhelmingly superior to purchase plans or intentions. Put into a multivariate equation, purchase plans washed out entirely, leaving subjective probabilities the dominant variable.

A major deficiency of the purchase plan approach--the tendency for most actual purchases to be made by households who did not report plan or intention--was substantially ameliorated by the subjective probability approach, in that the portion of purchases made by households with non-zero probabilities was significantly larger than it had been for the plan or intention version. The mean values of the subjective probability scale, which if taken literally should be a direct forecast of the actual purchase rate, were quite close to observed purchase rates for the sample households. Thus on all counts that we could think of, the evidence was convincingly clear that subjective probabilities were a better way to predict purchase behavior.

Hence the Census survey was changed in 1967, with subjective probabilities replacing the purchase intention variables. However, subsequent experience with the use of these data to predict changes in purchases over time do not indicate any superiority at all for the purchase probability measure, and if anything, indicate that it may well be inferior to the plan or intention variable. And in fact, the Census Bureau recently (July 1973) decided to discontinue the buying expectations survey because they felt (whether rightly or wrongly is not the point) that the survey made an insufficient contribution to purchase forecasts to warrant its relatively high cost.

Aside from the inability of the probability measure to improve on the predictive value of the plan or intention measure in time-series analysis, an interesting aspect of the subsequent development is that the structure of the probability data in the operating survey never really approached the structure found in the initial pretest. In particular, it tended to be true that the proportion of total purchases accounted for by non-zero probability households in the operating survey was quite close to the proportion previously found for planners or intenders, rather than substantially larger as the pretest had suggested it would be and as theory suggested it ought to be.

There are many possible explanations for this set of results. One, which is not really possible to explore in retrospect, is that interviewer treatment of this survey instrument essentially invalidated its usefulness. The Census Bureau, as all of you know, concentrates largely on the measurement of objectively verifiable phenomena--whether or not a person is unemployed, whether there is indoor plumbing in the house, whether the household owns a car, whether the house is owned or rented, etc. Census has relatively limited experience with the so-called soft or subjective measures--in general, with the broad area of perceptions, opinions, expectations, etc. The pre-study conferences, some of which I attended, made it quite clear that many Census interviewers regarded asking households about the subjective probability of their making a purchase as an affront to the intelligence of both the interviewer and the respondent. My own experience with this kind of measure is that the interviewer has to be extremely careful not to put an answer into the mouth of the respondent, since the question is of course quite a difficult one and tends to suggest a degree of precision that interviewers and respondents alike are apt to think is unrealistic. But that is not the point? of course, since uncertainty about a probability judgment by no means renders those judgments invalid when aggregated across households. But skepticism on the part of the interviewer about the usefulness of the survey instrument surely can render these judgments invalid, and the inability of the probability data to predict actual purchase behavior, as well as the marked difference in structure between the operating survey and the pretest, suggests that some of this skepticism may have rubbed off on the survey itself and hence on the validity of the data.

Sampling Procedures

An important dimension of survey research which bears strongly on predictive validity--as distinct from the narrower question of the validity of a particular response from a particular respondent--has to do with the procedures by which households are selected and the additional procedures by which designated respondents are the ones to be interviewed. The Survey Research Center probably spends more time fussing about sample selection and obtaining interviews with designated respondents than any other organization outside the Census Bureau. All of our samples are multi-stage probability samples with very limited clustering. Interviews are conducted only with designated respondents, and no substitutions are allowed. One consequence of these comparatively rigid techniques is that we tend to have relatively high nonresponse and, at the same time, relatively high costs.

I had an interesting experience with just how well the sampling operation is done at SRC, in connection with a recent project on the impact of the federal government's General Revenue Sharing Program on state and local governments. To conduct the study, we had to select a sample of U.S. municipalities. Such samples apparently do not exist--except for what the Census Bureau euphemistically calls a "sample" consisting of some 15,000 of the 38,000 total U.S. municipalities. We started with our sampling frame of PSU's, which is of course designed to represent people and not governments, and proceeded to draw a stratified sample of municipalities within some seven city size classes. After any number of modifications of the ongoing sampling frame (new counties had to be added in areas where the municipality sample was especially thin), the final sample was eventually specified and assigned the appropriate weight for its city size class, probability of selection within the PSU and the probability of selection for the PSU as a whole.

The adequacy of this sampling frame can be judged quite accurately, since we have a tape prepared by the Office of Revenue Sharing which includes every U.S. county and municipality and provides data on tax collections, population, and Revenue Sharing allocation. Comparisons against the universe suggests that this 800 municipality sample provides a remarkably close representation of all U.S. counties and municipalities, and of both regional and size class distributions. The error in generating aggregate statistics from this sample, even for regional breakdowns, is apparently going to be no more than a few percent. Although having a carefully drawn probability sample will clearly do nothing to improve the validity of individual responses, it goes a long way to insure that validity at the individual response level can be translated to accurate descriptions of the population as a whole.


Some years back the Institute for Social Research, the "holding company" that includes the Survey Research Center as well as three other quasi-independent groups, instituted a research program designed to find ways of improving the quality of information reported in personal interview surveys. Over the years there has been substantial technical progress in most phases of survey research--sampling, statistics, methods of analysis, use of computers, etc. As noted in a recent working paper by Charles Cannell, who heads up this program at the Institute, "The 1973 model of the survey interview does not vary to any significant extent from the 1943 version."

In part, the reason for the technological backwardness of interview methodology in the social sciences is that most of us work with social-psychological variables which lack an objective external reality against which the validity of survey responses can be compared and evaluated. One can of course be tautological--a valid attitude, perception or expectation is that which is reported to an interviewer on a survey. But if this were really so, characteristics of the interviewer--such as age, sex, race, socio-economic status, attitudes and preconceptions--would be uncorrelated with the reporting of social-psychological variables. We know that this is not the case, though, and hence the problem cannot be assumed away in this manner.

Much of the current program of research on validity at SRC is concerned with what is clearly the simpler of the validity issues--how to obtain accurate reporting where we can test experimental procedures against an objectively verifiable reality. The experimental work has been done almost entirely in the health area, thanks to a series of contracts and later grants with the National Center for Health Statistics and the National Center for Health Services Research and DeveloPment.

In these studies, the major dependent variable is some health attitude or behavior--chronic and acute illness, injury, medication, use of health facilities, reaction toward medical care, etc. Note that data in the health field have the twin characteristics that (a) they are often capable of being verified from medical records, and (b) even when overtly factual, they contain a good deal of latent emotional and attitudinal content. The remainder of this section summarizes and highlights the results obtained in these methodological studies.

Memory Failure

The model of an interview situation with which Cannell and his coworkers at SRC operate is that an interview situation is one in which interaction between interviewer and respondent has the purpose of stripping away nonessentials (for the interview) and exposing information possessed by the respondent that is relevant to the purposes of the interview. Thus one issue is whether or not the respondent can be programmed to improve recollection and reporting of specific events. In general, the problem in health surveys typically shows up as underreporting, although in principle overreporting would be possible if biases other than memory failure are introduced. The research provides wholly unambiguous results, which are consistent with the findings of others: the longer the duration between the interview and the time of the event (hospitalization in this case), the less likely the respondent to report the event (having been hospitalized). This is not surprising, and as noted is probably one of the few firmly documented methodological conclusions in validity studies. What is surprising is the rapidity with which the reporting decay reaches major proportions: underreporting of actual hospital or clinic visits gets to be almost half the total of known visits after about a year. (Table l)

A comparable dimension to duration is salienceCevents which are noteworthy in the total stream of the respondent's experiences tended to be reported much more often than other events. The duration of hospital visits or the frequency of clinic visits are the measure of salience, and the relationships between fraction of events reported and salience are shown in Table 2. About 25 percent of one day hospitalizations were not reported in the interview, and over half of single visits to clinics failed to be reported. (These two kinds of memory issues give attitude results.)



Refusal Rather than Forgetting

A second dimension of invalid responses on interviews concerns the respondent's unwillingness to provide information, rather than inability to do so. Here, the experimental results concern the probability that reporting a particular diagnosis will be a source of embarrassment, or more directly, responses to the question: "If you had x, how willing would you be to have other people know about it?" Cannell finds that a direct measure of embarrassment--percent willing to report if they had x--corresponds remarkably closely to the fraction of valid reports for those same conditions generated by externally validated and independent data. (Table 3)

What these results suggest is a model in which the interviewer-respondent interaction comes closest to providing valid data when, first, the event on which survey data are to be obtained is one that the respondent has in the forefront of his total recollection of events, and second, the event to be reported is not regarded as threatening to the respondent's self-esteem. Thus the question arises: can one influence the behavior of the respondent by manipulating the behavior of the interviewer in such a way as to change the characteristics of the interaction so as to enhance validity?

Interviewer Respondent Interaction

Since the interview is modelled as an interactive situation, how to change respondent behavior by manipulating interviewer behavior becomes the key issue. Cannell and his colleagues consequently taped a number of household interviews, and had the interviews categorized into respondent and interviewer behavior. For example, does the interviewer read the question as written, does he use nondirective probes, does he capture the respondent's answer when he notes the response, etc. For the respondent, is clarification requested, is the answer appropriate to the question. is there refusal. etc.

The analysis indicated a high degree of positive correlation between the total number of units of activity of both the interviewer and the respondent. One would have hoped that reluctant respondents would be cajoled out of reluctance by active interviewers, but this turned out not to be the case. The data also indicated that interviews where both respondents and interviewers showed a high level of "positive" task behaviors tended also to show low levels of nontask relevant behaviors. Thus one possibility is that the respondent follows the pattern set by the interviewer, and if one can change interviewer behavior one can manipulate respondent behavior.

Of particular importance in the interaction process is the nature of feedback used by interviewers. The data indicated that about a quarter of total activity on the part of interviewers constituted feedback--short interjection, longer comments (That's the kind of information we want...), etc. But the data also clearly showed that feedback was not related to the adequacy or inadequacy of response. Instead. the customary use of feedback seemed to be comments made after refusals, presumably as means of building or restoring rapport. It turned out that feedback was not in general used to "reward" positive responses, nor was the absence of feedback used to "penalize" inadequate responses.





The findings from the interaction analysis led to a series of experiments designed to control and use feedback as a positive force to improve reporting, and to capitalize on those results showing a balance in the level of interactions during the interview, with the level being highly correlated with the amount of information reported. The model specifies that: (l) positive feedback will tend to produce better information, and (2) a high level of verbal activity by the interviewer will tend to produce an equally high level on the part of the respondent, with more and better information resulting from these higher levels of verbal activity.

The feedback experiment was conducted by instructing interviewers to provide a reinforcing statement ("That's the kind of information we need," "That's useful information," etc.) after each positive report by the respondent, with several such statements being prepared and the interviewer using them in sequence. The verbal activity experiment was conducted by designing long and short versions of essentially identical questions, for example:

"Have you ever had any trouble hearing?" (short form)

"Trouble hearing is the last item on this list. We are looking for some information about it. Have you ever had any trouble hearing?" (long form)

The results of these experiments were as predicted--reinforcement produced greater validity, as did longer questions. However, when the two procedures were combined with the expectation that they would be reinforcing, both showed main effects, but the combination of the two showed lower reporting rates than either technique by itself. Analysis of these data by respondent educational level suggests the answer, as indicated in Table 4. In a nutshell, reinforcement improves reporting for less well educated respondents, but not for more highly educated ones. But for respondents with less education, short questions produced better results than long ones, and vice versa for highly educated respondents. Overall, for respondents with less education short questions with reinforcement works best, while for more educated respondents, long questions without reinforcement is optimal. One can easily find plausible interpretations of these findings. Less well educated respondents tend to rely more than others on interviewer cues to direct reporting. Thus feedback and reinforcement aids performance for this group, and since the interviewer is apt to be higher status than the respondent, feedback is even more appropriate and welcome. But more highly educated respondents do not need reinforcement and even perceive it as inappropriate and condescending.

On the other hand, long questions, simply because they are long, tend to confuse less well educated respondents, but highly educated respondents tend to benefit from expressing themselves more fully to reflect nuances of thought, from interviewers' high verbal activity, or from the clearer understanding of the specific tasks to be performed that is permitted either by the language or by the additional thinking time given them by the longer questions.




For the most part, the discussion above has been in terms of validating survey measures that are relatively simple to measure and extract from the respondent's memory bank. Motivation and willingness to exert effort in gathering the information is probably a more important source of difficulty than inability to recall. But in at least one major area of survey research, the validity problem is at least as much a memory as a motivation problem-the collection of data from households on financial flows and asset-debt holdings.

One interesting direction that needs to be pursued here is an increased reliance on objective data that the respondent can obtain from records rather than an interview situation where recall is the only possible basis for information. Several issues are involved--do the data sources exist, can the respondent be persuaded to use them, are they sufficiently comprehensive so that they can form the basic raw materials for survey responses, can they be judiciously blended with recollection to provide a richer and more accurate data set, etc.

We have had enough experience over the last several decades to be able to form judgments about the validity of survey data on consumer financial flows and asset-debt holdings. As a generalization, accurate financial flows cannot be recalled, even with the best will in the world and with no concern over disclosure. This is less true for families whose financial affairs are very straightforward--they have no assets but a savings account, they have the same income flow every week or month, they have a spending pattern that usually exhausts all of their receipts and goes for a repetitive set of items, etc. But that kind of financial flow pattern is less and less common as the society becomes richer, more people enter the labor force working part time hours, more people have assets in different forms and several kinds of debts, etc.

It seems clear enough that the solution to these kinds of problems is to obtain access to the financial records possessed by households. And here the principal difficulty is likely to be willingness rather than ability, since one of the features of any increasingly rich society is the greater profusion of paper that seems essential to the way we function. Only a few (about 15%) of U.S. families still lack checking accounts; only a slightly higher percent do not have savings accounts. The vast majority of income payments are received in the form of checks which must be cashed or deposited; interest, dividend, and many nonregular forms of income are, as a matter of statute, now reported to the household every year by way of information forms which are designed to be used as a basis for income tax filing; credit card companies regularly send monthly statements of charges and payments, as well as interest costs; and so forth.

On the basis of these kinds of financial records, it appears to be true that a comprehensive survey of household financial status (income flows, expenditure flows, asset and debt holdings) could be conducted largelY on the basis of records which presumably form a much more precise source of data than any alternative.

A second kind of issue bearing on validity in the broader sense is that of costs. It seems unproductive to think of validity only in the narrow sense of the relation between the true value of the desired measure and the value obtained from a survey. Validity has a price, and the question must be faced: what is the optimum measure of validity in a cost-effective sense? In this context, by cost effective I simply mean the degree of validity which just pays its own way in terms of reducing the uncertainty that would attach to measures of lesser validity by enough to justify the additional costs. In economists' jargon, the marginal revenue from an increment of validity must be at least as high as the marginal cost of obtaining that increment in order to justify, either for private or social purposes, the use of resources to increase validity to any given point. Unfortunately for neat or simple analysis, there are considerable opportunities for tradeoffs as regards validity in this very broad sense. For example, the validity of a psychological scaling measure is dependent on the ability of that measure to classify households into appropriate subgroups--those with the same response within group, but different responses from other groups. Validity could be increased by measures ranging from investing heavily in a methodological research design to improve our understanding of what the scale reflects to simply adding additional scales that capture similar dimensions of perceptions or attitudes and creating groups with a cluster of such scales. To really push validity to its optimum point-to decide what it's worth to improve our classes--we have to know how much gain would be achieved by each of the possible methods and how much cost is involved, then determine the most efficient way to proceed, then decide how much we really gain given the purposes for which the scale is to be used. Although I do not have much that is concrete to offer in terms of criteria, it does seem to me important to recognize that higher validity carries a price tag and may represent a misallocation of research resources.