Special Session Summary Measuring Consumption and Consuming Measurement: the Challenge of Studying Consumers From a Federal Perspective

Frederick Conrad, Bureau of Labor Statistics
[ to cite ]:
Frederick Conrad (1997) ,"Special Session Summary Measuring Consumption and Consuming Measurement: the Challenge of Studying Consumers From a Federal Perspective", in NA - Advances in Consumer Research Volume 24, eds. Merrie Brucks and Deborah J. MacInnis, Provo, UT : Association for Consumer Research, Pages: 330-332.

Advances in Consumer Research Volume 24, 1997      Pages 330-332

SPECIAL SESSION SUMMARY

MEASURING CONSUMPTION AND CONSUMING MEASUREMENT: THE CHALLENGE OF STUDYING CONSUMERS FROM A FEDERAL PERSPECTIVE

Frederick Conrad, Bureau of Labor Statistics

Vast amounts of information are available to American consumers and businesses that can help them make wiser decisions. Much of this information is provided by the Federal government and is the result of considerable research about consumer behavior. This session presented samples of such research from three domains (1) consumers’ use of information on product labels and the implications for labeling policy; (2) accuracy of data collected in national, sample surveys about the public’s economic activities and circumstances; and (3) customer satisfaction with government products and services. The session was organized as a first step in fostering more dialogue between consumer researchers working in the public service and those working in the marketing tradition.

Alan Levy reported a study, carried out jointly with Brenda Derby and Brian Roe, to explore how health claims on food labels can affect consumers’ evaluations of the labeled product and how this can vary with the length, format, and information content of the health claim. They intercepted grocery shoppers at a mall and presented them with three different food products (cereal, low fat yogurt, and frozen lasagna) whose labels contained health claims about certain ingredients in those products. One food was presented in a control condition (no health claim) and the other two in different experimental conditions. The cereal product was labeled with a health claim about folic acid, the yogurt product included a claim about calcium and the frozen lasagna product carried a health claim about saturated fat and cholesterol. The health claims appeared in either short or long form, either with or without an "authority message" (for example, "The American Heart Association recommends "), with or without the seal of the Food and Drug Administration, and with or without an instruction to see the back panel for more information. Because there were more experimental conditions than food products, the assignment of conditions to shoppers was counterbalanced with a Greco-latin square design.

In the control conditions, there were no health claims on the labels. However, in one control condition, the label included a nutrient content claim. This simply mentioned that the food contained the critical ingredient, for example, "low fat." This manipulation made it possible to compare the effect of health claims in general to those of nutrient content alone.

For each food, the shoppers were asked how likely they were to purchase it, how important it would be to a healthy diet, and how universal its health benefits would be. Overall, the presence of labels increased shoppers’ rated likelihood of purchasingthe food and its healthfulness relative to the control condition. However, the different lengths and formats of the health claim made relatively little difference to shoppers on these measures. In particular, there was no effect of label length. This is significant because the Nutrition Labeling and Education Act of 1990 virtually mandates the use of the longer health claims on labels; yet consumers are deriving no additional relevant information from longer than shorter health claims. Invoking an authority had a negative effect on purchase intentions and perceived healthfulness which the authors interpreted as people’s sensitivity to being manipulated. Instructions to see the back panel had a negative effect on purchase intentions and compellingness if the message on the back did not add information; if it did add information, there was no effect on purchase intentions and compellingness.

Like the health claims, the nutrient content claims increased peoples’ ratings of purchase intention and healthiness. Moreover, health claims were no more effective in shoppers’ ratings than were nutrient content claims. The authors hypothesize this is because the nutrient content claim served as a reminder of dietary information with which the shoppers were already familiar. Consistent with this was the finding that nutrient content claims about folic acid had no effect on shoppers’ ratings, and the health effects of this ingredient are not as well known as are the others in the study. They conclude that the way consumers use health and nutrition information depends on their prior knowledge of the food and the health claim for that food.

Frederick Conrad reported two studies conducted collaboratively with Michael Schober to investigate the costs and benefits of standardized versus conversational survey interviewing. Because survey data serve as the basis of major decisions in government, marketing and politics, the quality of the decisions are limited by the quality of the data. Typically, survey questions are read by interviewers, exactly as worded; the interviewers use only "neutral probes" to clarify what they have read. The logic of such standardized interviews is to reduce survey error by controlling the content of the interaction. Unfortunately, respondents do not always interpret the questions as they were intended by the researcher; unlike ordinary conversation, where participants can converse until they believe that they understand each other, standardized interviews prohibit the kind of interaction that would be required to assure mutual understanding. In this sense, standardization may actually increase certain types of error.

In Conrad and Schober’s first experiment, professional interviewers asked 43 laboratory participants (who were playing the role of respondents) 12 questions taken from large government surveys. Instead of answering about their own lives, the respondents answered about fictional situations described in a packet of scenarios, diagrams and receipts.

For each question, the authors designed one scenario for which the correct answer was clear and one for which it was ambiguous. The ambiguities could be resolved by referring to official definitions for the particular survey. For example, one receipt leading to an ambiguity was from a furniture store and included a charge for a floor lamp. The respondent was asked if the protagonist had "purchased or had expenses for household furniture." In fact, the survey’s definition of furniture specifically excludes floor lamps. The combinations of questions and situations were counterbalanced so that for any one participant, half were clear and half ambiguous.

Half of the interviews were conducted according to standardized procedure, half as conversational or "flexible" interviews. If respondents asked for clarification in the standardized interviews, the interviewers could only repeat the questions or response alternatives; in the flexible interviews, the interviewers could say anything they wanted, including paraphrasing the definitions, to assure that respondents understood the question as intended.

Theresults showed that the value of either interviewing technique depends on how difficult it is for respondents to interpret the concepts in the questions given the situation about which they are answering. When this is easy, accuracy is high for both standardized (97%) and flexible (98%) interviews. However, when it is difficult, accuracy suffers under standardized interviewing (29%) but is vastly improved in the flexible condition (87%). It is important to consider two practical issues when evaluating these results. First, it is not clear to what extent people’s own situations are ambiguous with respect to most survey questions. If this is rare, then standardized wording is justified. Second, flexible interviews take more time than standardized interviews and actual, voluntary respondents may be unwilling to participate in them.

Conrad and Schober carried out a second experiment carried out as a (more) natural telephone survey. Respondents were interviewed twice. The first interview was standardized; the second was standardized in half the cases and flexible in the other half. In each interview, respondents were asked five questions about their housing and five questions about their recent purchases. These questions were all taken from large government surveys. Because the respondents (contacted from a national probability sample) answered questions about their own lives, the experimenters could not validate their responses. As a result, they developed a surrogate measure for accuracy: response change between the first and second interviews. The logic was that if misconceptions were not clarified in the initial standardized interview they could be clarified in a subsequent flexible interview leading to changed responses between interviews. If the second interview was also standardized, the misconception was relatively likely to persist. This prediction was supported by the data even though the authors could not control the complexity of the mappings between questions and respondents’ circumstances. When the second interview was standardized, respondents changed their answers between interviews on 11% of the questions. However, when the second interview was flexible, this figure increased to 22%.

Conrad and Schober concluded that practitioners need to be able to demonstrate that complicated mappings are rare enough so that the benefits of standardization outweigh the costs Unless they can do this, data quality may be compromised by standardization techniques.

Tracy Wellens’ presentation, based on her collaboration with Elizabeth Martin and Frank Vitrano, concerned the challenges of measuring customer satisfaction with Federal products and services. She reported their experience surveying customers’ satisfaction for the products and services of the 14 agencies within the Department of Commerce. The study was a response to an executive order "Setting Customer Service Standards" and the National Performance Review recommendation to "put customers first."

The first issue that Wellens and her collaborators had to confront was how to determine what products and services are provided by the member agencies. These range form Census data tapes to National Weather Service (NWS) weather forecasts, to National Oceanic and Atmospheric Agency (NOAA) fishery inspections, etc. Because the Department of Commerce does not maintain a central list of such services, the authors generated a list and divided it into (1) Information Services and Data Products (e.g. newsletters, catalogues, radio programs and off-the-shelf data products and software), (2) Specialized Services and Products (customized for a particular organization such as data collection, training and disaster relief), and (3) Grants and Regulatory Products and Services (e.g. grants, licenses, inspections and patents).

The survey instrument asked separately asked about each of these three categories, though the types of questions were similar across the three categories. In particular, the questions were concerned with timeliness, quality, documentation, clarity and ease of use, price and competence and responsiveness of agency staff. Most of te questions included a satisfaction rating scale. Each section included overall questions about how well that category met the respondents’ requirements and how much red tape was involved.

The next issue confronted by Wellens and her co-authors was who to survey. They limited their sample to external customers, noting that surveying internal customers is likely to involve different types of questions and formats. However, external customers are not a homogenous group and are not uniformly easy to identify. For example, the customers who most resemble commercial consumers purchase products and services from the government. Agencies are most likely to have records of these customers. The Federal government also serves customers who request and receive products and services free of charge. They are customers because, as tax payers, they have already paid for the product or service, and they may be potential paying customers in the more conventional sense. Agencies are less likely to monitor the identity of these customers. A third class of external customers receives products and services passively or unknowingly. For example, someone listening to a radio broadcast that includes a weather forecast from the NWS or someone eating fish that is safe because of fishery inspections by NOAA are probably not aware that they are the recipient of a Federal service at that time. As it turned out the participating agencies provided customer lists that ranged widely in size and apparent quality.

The initial questionnaire was followed by a non-response follow-up, as recommended by Dillman, and increased response rates from about 30% to 43% on average. This was certainly in the ballpark of response rates in the customer survey literature where 30% rates are viewed as successful. Nonetheless, the rates ranged widely between agencies. The authors intimate that if they could have legally guaranteed confidentiality response rates might have been more uniformly high.

The variation in response rates and quality of customer lists constrained the researchers in the kind of benchmarking activities they could carry out. They identify three such activities: comparing performance on the current survey to results for (1) an ideal organization in the private sector, (2) an average organization, and (3) the Department of Commerce itself surveyed again in the future. Because the samples were not comparable between the 14 agencies, it is virtually impossible to assure comparability with other surveys of other organizations; without this assurance the first two types of benchmarking are not meaningful. Thus comparisons should be made with great caution.

Wellens concluded the presentation with a warning about raising expectations that the quality of products and services will improve as a result of customer satisfaction surveys. If the results are not actually used to make things better, the exercise may increase the cynicism of customers or employees or both.

Norbert Schwarz served as synthesizer and discussion leader. He commented that each presentation represented a different point in a process of developing policy about products which are ultimately consumed (as in the talk by Levy), measuring the consumption of these and other products to create official statistics (as in the talk by Conrad), and measuring the satisfaction of official statistics users as well as consumers of other Federal products and services (as in the talk by Wellens). Schwarz commented that Levy’s presentation illustrated the often post hoc nature of consumer research in supporting policies already in place. The more appropriate role of such research is to inform policy as it is developed. He commented that Conrad’s talk was in the tradition of work that combines cognitive psychology and survey research and while much of the best work in this area is done in the Federal arena, it is often specific to particular problems on particular surveys. He advocated research in this area can be easily generalized. Finally, he added to the kinds of problems reported by Wellens in identifying customers of Federal products and services: involuntary "customers," such as taxpayers filing with the Internal Revenue Service have inherently negative feelings about their experience yet must legitimately be included in the samples for such measurement exercises.

----------------------------------------