Integrating Results From Independent Studies

Michael J. Ryan, The University of Michigan
Donald W. Barclay, The University of Michigan
ABSTRACT - Qualitative and quantitative approaches for combining results from independent studies are organized and compared. The approaches are evaluated within the overall context of the research integration process.
[ to cite ]:
Michael J. Ryan and Donald W. Barclay (1983) ,"Integrating Results From Independent Studies", in NA - Advances in Consumer Research Volume 10, eds. Richard P. Bagozzi and Alice M. Tybout, Ann Abor, MI : Association for Consumer Research, Pages: 492-496.

Advances in Consumer Research Volume 10, 1983      Pages 492-496

INTEGRATING RESULTS FROM INDEPENDENT STUDIES

Michael J. Ryan, The University of Michigan

Donald W. Barclay, The University of Michigan

ABSTRACT -

Qualitative and quantitative approaches for combining results from independent studies are organized and compared. The approaches are evaluated within the overall context of the research integration process.

INTRODUCTION

Since one of the primary objectives of science is the production of generalizations, frequent attempts are made to synthesize historical accounts, case studies, and scientific inquiries. Such undertakings take on a high degree of importance under the aegis of a modern philosophy of science that does not adhere to the principle of falsification (Suppe 1977). More specifically, knowledge is cumulative and accepted as a matter of degree. There is no such thing as a "critical" study as each study merely affects the degree to which a body of evidence is believed. Whereas rigor in individual studies should not be downplayed, an activity crucial to generalizing involves systemizing the pieces of information provided by individual studies in a manner rigorous enough so that different reviewers would reach the same conclusions.

Consumer research reviewers traditionally have followed a literary style and present narrative descriptions of different studies. These reviews usually conclude that due to study differences generaLizations cannot be made. Fortunately, a number of quantitative reviewing techniques are available that are more apt to uncover generalizations. These techniques generally fall under the rubric of "meta-analysis" described by Glass (1976) as -the statistical analysis of a large colLection of analysis results from individual studies for the purpose of integrating the findings. Specific meta-analytic objectives, suggested by Jackson (1980) are: to develop estimates of the population parameters of the studied phenomena, to examine how varying characteristics of subjects, contexts and treatments may affect the phenomena, and to examine the implications of identified methodological strengths and weaknesses in the primary studies. By contrast, qualitative integration approaches typically use the "voting" or "box count" method to tabulate significant versus non-significant studies and declare one side the winner. Dissatisfaction with qualitative reviews led to the emergence of meta-analysis, however many of the problems in qualitative reviews are attributable to poor reviewer practices due to a lack of understanding of the research integration process rather than being attributabLe to the approach itself. Interestingly, if these poor practices are transferred to meta-analysis they may also undo any advantages of meta-analysis over qualitative approaches. Attention will now turn to the research integration process and the qualitative and quantitative approaches inherent in this process.

RESEARCH INTEGRATION PROCESS

Traditional qualitative literature reviews are often criticized as being biased, incomplete, and non-systematic. These criticisms, especially the first two, can be largeLy overcome by applying a well defined process, appropriate to both qualitative and quantitative approaches, that parallels primary research methodology. The phases outlined below are drawn primarily from Jackson (1980) and Glass, McGaw, and Smith (1981).

Question Definition

The question to be answered by a consumer behavior reviewer dictates the type of research integration required. Some of the types of questions that could be answered by synthesizing research include: sizing up new substantive and/or methodological developments in a field, verifying existing theories or developing new ones, synthesizing knowledge from different fields or lines of research, or inferring generalizations about substantive issues from a set of studies (Jackson 1980). The focus of this paper is on inferring generalizations about substantive issues from a set of studies directly bearing on the issues.

In attempting to integrate results from several studies, two levels of questions typically need answering. The first concerns the phenomena being reviewed. For example, does X causally influence Y? me second level of question concerns variations among studies that might account for different findings. This leads to an investigation of interactions between study attributes and study outcomes. Failure to accurately define and clarify the integration question precludes successful integration.

Study Selection

The most serious form of bias enters research integration at the study selection stage (Glass, McGaw, and Smith 1981). Defining the scope of the studies selected for review, which is synonymous with defining the population of interest, is the first stage in study selection Decisions with respect to the concepts and constructs of interest, the disciplines or fields from which studies will be selected, and the time frame of these studies set this scope.

The second concern is that of the thoroughness of the search for appropriate studies. The adequacy of indexing systems, the clarity of abstracts, and the ability to and willingness to extract indirect research results from studies whose focus was not primarily on the topic of interest, help or hinder the search process. The reviewer must also resolve the issue of whether the resulting list of studies is in fact a population or a sample of studies. The resolution of this issue is important as a synthesis of a sample of studies requires the drawing of inferences.

What about dissertations and other unpublished research? To ignore this research is to assume that the direction and magnitude of effects is the same in published and unpublished works (Smith 1980). Glass, Smith and Barton, as discussed in Smith (1980), demonstrated in 10 instances where published and unpublished literature could be compared, that the average experimental effect from studies published in journals is larger than the corresponding effect estimates from theses and dissertations. The findings reported in journals were 1/3 standard deviation more favorably disposed toward the favored hypothesis. In addition, there is the issue of 311 those studies hidden away in file drawers due to the lack of interest in null results (Rosenthal 1980).

Perhaps the most hotly debated topic with respect to study selection is that of study quality. Glass and his colleagues contend that studies should not be eliminated because of methodological weaknesses. At the study selection stage the reviewer does not know whether study quality does in fact lead to different study results. The contention is that this is an empirical question to be decided through the application of meta-analytic techniques which will examine the covariation between study quality and study outcomes. It would only be at this analysis stage that the reviewer would address the topic of study quality. Glass is not advocating poor research but is suggesting that all studies for which there is not strong evidence of biased findings or flagrant errors be included. Any exclusion at the study selection stage would be considered to be arbitrary and would result in a loss of information (Glass 1978). On the other hand Eysenck (1978) suggests that Glass and his colleagues are advocating the abandonment of critical judgments of any kind and it is the role of the reviewer to apply critical judgment at the study selection stage. It is our opinion that eliminating studies for methodological weakness reasons could result in a purposive sample of studies or a biased sample of studies representing only respected investigators. This may inhibit the factual systemizing of results from many studies. A better approach would entail incorporating quality" as a study characteristic variable in a meta-analysis as it would force the reviewer to make his/her qualitY criteria explicit.

Characteristics of Studies

Studies selected for integration are not perfect replications in most cases, especially in consumer research. The research reviewer typically finds differences of a substantive, contextual, methodological, and quality-of-research nature and in addition could find differences due to the research traditions of different disciplines. A research reviewer should capitalize on this diversity (Light 1979). Differences in study characteristics may be able to be related to differences in findings within a set of studies.

To be able to complete such an analysis, studies within the set of interest must be categorized along salient dimensions. The research integrator would rely on theoretical underpinnings to assess which of the study characteristics could be hypothesized as leading to differences in study outcomes. How one establishes a working set of characteristics to consider becomes an issue. It seems that content analysis techniques could be used to assemble the common characteristics found within a set of studies.

Traditionally, content analysis has been applied to works which were not intended either to fit together or to constitute scientific" empirical studies. A typical investigation, for example, would attempt to infer consumer product or brand images from an analysis of the mass media (e.g., Kassarjian 1977). Various units of analysis such as amount of space, number of articles, key words, etc. would be employed. It is quite common to attempt to infer a value system of a public from a content analysis of the messages it consumes.

Whereas the usefulness of traditional content analysis has been amply demonstrated though seldom applied, we are suggesting a somewhat different use. Consumer researchers represent a variety of disciplines which differ in terms of value systems and methodological sins of omission and commission. Thus, for example econometricians worry a great deal about their models whereas psychologists focus more on measurement and data related issues. Other differences which may be more subtle could be detected by applying content analysis to sets of studies therebY enabling an identification of study characteristics which may account for differences in study outcomes.

Now that the initial phases of the research integration process; question definition, study selection, and the determination of study characteristics have been overviewed, we will turn to the qualitative and quantitative approaches to integration and the issues involved in choosing an approach.

QUALITATIVE INTEGRATION

Qualitative research integration incorporates two techniques; the traditional literature review, and the box count or voting method of summarizing findings. The traditional literature review provides a narrative overview of prior findings accommodating smaller bodies of literature than are typically encountered today. Narrative overviews of prior findings, while offering a certain contextual richness, generally do not provide the systematic information a researcher needs to design more powerful future investigations (Pillemer and Light 1980).

The box count or voting method is a more formal method which determines for each study only whether or not a statistically significant difference was found and if so in what direction. A counting of significant positive, significant negative, and non-significant findings and hence the declaring of a winner is reported. An extension of this approach, the sign approach, tabulates the direction of effects without regard to statistical significance and computes the probability of the results obtained under the assumption that the two methods studied in an experimental setting are equally effective (Cooper and Rosenthal 1980). The voting method is biased in that it disregards sample size. For example several small studies showing not quite significant results would outvote one large sample study showing just significant results generating a conclusion quite at odds with one's best instincts (Glass, McGaw, and Smith 1981).

Several criticism have been leveled at these qualitative techniques. Often subjective criteria for deciding how to synthesize result in "standards of objectivity, verifiability, replicability, and clarity, against which primary research were judged being ignored or forgotten when scholars turned to the problem of integrating the primary evidence" (Smith 1980). Literature reviewers have been accused of biasing study selection by eliminating studies based on methodological considerations, inattention being paid to unpublished studies, narrowly defining the scope of the studies to be reviewed, and failing to capitalize on study differences. The reviewer in effect took advantage of being in the position of information gatekeeper (Cooper and Rosenthal 1980) by carping on the design or analysis deficiencies of all but a few studies--those remaining frequently being one's own work or work of friends and hence having these "acceptable" studies reveal the truth (Glass 1976).

Supporters of meta-analysis claim that this more quantitative approach will overcome these weaknesses. However, one can see that it is not the integration approach that is essentially at fault but poor reviewer practice. As an example Cook and Leviton (1980) discuss the quality of a literature review completed by Zuckerman (as discussed in Cook and Leviton) and the meta-analysis of Arkin, Cooper, and Kolditz which reach different conclusions about the existence of self-serving attributions in studies of interpersonal influence. It turns out that Arkin, Cooper and Kolditz (1980), in addition to applying meta-analytic techniques to this series of studies, also did a more thorough job of study selection. Cook and Leviton were able to show that it was the improved study selection process that led to the different results as opposed to the application of meta-analytical techniques. Applying more traditional approaches to the new studies selected would have arrived at similar conclusions to those found using me ta-analysis .

A second criticism is that qualitative reviews can result in information overload if many studies are involved in such a narrative review. Thirdly, the use of the box count or voting method technique results in not using information that may be available in the primary studies. Effect sizes may be ignored in experimental studies, for example. Some of the above criticisms leveled at qualitative reviews can be overcome. Study selection bias can be reduced and as Cooper and Leviton (1980) demonstrated, qualitative reviews can incorporate much more information than just a box count.

However, it is our contention that three problems remain. First of all, qualitative reviews have difficulty in handling a large number of studies. Secondly, the diversity of study characteristics is not easily incorporated into such a review and it is difficult to estimate the impact- of these different study characteristics on findings. Finally, narrative overviews do not provide systematic information and may appear quite disjointed especially when addressing a large number of studies. Thus the strength of the qualitative integration approach rests in dealing with the integration of smaller sets of studies.

QUANTITATIVE INTEGRATION

Within the research integration process, the term meta-analysis has become synonymous with quantitative approaches to integration. This means that since different meta-analysts incorporate different quantitative techniques into their tool kits, there is a variety of definitions of meta-analysis. Definitions of meta-analysis include for example -formal procedures for combining the results from several empirical studies (Pillemer and Light 1980) and Glass, McGaw, and Smith's (1981) claim that meta-analysis includes "any statistical methods that have proved useful in extracting meaning from data.' me latter researchers go even further to claim that meta-analysis becomes a perspective not a technique. The term itself was dubbed by Glass (1976) and was deemed to be "the analysis of analyses" with individual studies becoming the units of analysis.

The benefits of meta-analysis (Pillemer and Light 1980) include

(1) Increasing power due to increased sample size which may result from the pooling of smaller sample sized studies showing concordant but non-significant results.

(2) Obtaining a more precise average effect size measure.

(3) Describing the form of the relationship between two variables over a wider range as individual studies may cover this wider range.

(4) Harnessing the benefits of contradictions and determining the explanations for these. For example similarly labeled treatments may in fact differ in important ways and these differences could be related to differences in outcomes. In consumer research, different brand loyalty measures, attitude measures or perhaps recall methods could be examined with respect to differences in outcomes. Setting-by-treatment interactions, study design differences, or analysis strategy differences can be quantitatively related to study differences.

(5) The ability to effectively handle a broader conceptual scope.

(6) The ability to handle larger samples of studies.

Cook and Leviton (1980) outlined the techniques or steps that they perceive that Glass, Rosenthal, and Light would-each include in their respective definitions of meta-analysis. Accumulating these perspectives, a meta-analytic approach to integration would include any or all of the techniques which are described briefly below.

Cluster Approach

Light and Smith's (1971) method of quantitative research integration does not rely on the use of published statistics from each study but requires access to the original data collected in each study. Where conclusions reached among studies with identical measures diverge, relevant dimensions along which the studies differ are sought and examined as additional factors influencing the study outcomes. Studies hence end up being "clustered" and contingency-theoretic statements about treatments, settings and populations are produced. This cluster approach is not quite in the true meta-analysis paradigm as it deals with the primary data contained within individual studies but for completeness this approach is mentioned here.

Combined Significance Tests

Significance levels from independent studies are summarized primarily by combining the probability levels obtained from two or more studies testing essentially the same directional hypothesis. Thus an overall level of significance can be derived that yields more power than any of the individual studies. This approach is most useful when "the separate studies can be considered independent and essentially random samples estimating a 'true' difference between populations so that variation among study outcomes is attributable to chance'(Pillemer and Light 1980). If there are setting specific effects, a single answer may be misleading and multiple answers may be more useful hence leading to grouping subsets of studies.

Rosenthal (1978) summarizes and compares nine different procedures for conducting a combined significance test. At the same time Rosenthal recognized that this approach is not comprehensive since it does not include a description of the magnitude of relationships and may be overly dependent upon sample size. In addition, in a meta-analysis using this technique the number of studies may be so large that the null hypotheses will be routinely rejected.

File Drawer Analysis

Rosenthal (1980) discusses extending the use of combined significance tests to analyze what is perceived to be a bias inherent in published studies that have been actually carried out with respect to a given phenomena. In the extreme, Rosenthal suggests that journals are filled with the 5% of the studies that show Type I errors while the file drawers are filled with the 95% of the studies that show non-significant results.

The file drawer procedure estimates the number of studies containing null results that must be in file drawers before the overall probability of a Type I error can be brought to any desired level of significance. If the overall level of significance of the research review will be brought down to the just significant level by the addition of a few more non-significant results, the combined finding is susceptible to the file drawer threat.

Combining Effect Sizes

The early thrust of meta-analysis as developed by Glass and his colleagues, discussed in Glass (1980), focused on this technique of meta-analysis. The combining of effect sizes recognizes that results of a combined significance test may not be particularly illuminating. The idea is to express group differences from difference studies on a common scale so that findings from studies employing different measure and different methods can be meaningfully compared (Walberg and Haertel 1980). The mean difference between a control and treatment group expressed in standard deviation units would be such a combined effect size measure. me combining of effect sizes suggests that meta-analysis has a descriptive purpose to fill which leads to "an on average' statement in the case of experimental studies.

Other summary measures besides the effect size may be chosen depending upon the type of literature being reviewed. For example, Glass (1978) gives a number of useful formulae for converting various statistics to the metric of Pearson product-moment correlations. For some bodies of literature, including many within consumer research, product-moment correlations may be more easily derived and interpreted.

Study Characteristics versus Study Outcomes

An important component of meta-analysis which both Glass and Light emphatically include in their respective definitions of meta-analytic approaches to research integration is that of quantitatively determining the effect of study characteristics on study outcomes. Differences in study outcomes can perhaps be partially explained by differences in theoretical, methodological, and contextual aspects in a body of literature. Differences may occur due to different disciplines having generated the independent studies, the type of primary samples used, the type of subjects, similarly labeled but in fact different treatments, variable/construct definition, researcher characteristics, or time frame over which treatments are applied.

Meta-analysts hold up this type of analysis as perhaps the strongest reason for following a meta-analytic approach. Qualitative research integration approaches often break down since the diversity among studies becomes so difficult to manage and assess that the reviewer gives up in despair and concludes that inconsistent findings mean that no conclusions can be drawn. Class, McGaw, and Smith (1981) suggest that the analysis of data in meta-analysis is properly approached as an instance of multivariate data analysis in which the studies are the units on which measurements are taken and the study characteristics and findings are the many variables. Variations in studies are then considered an asset not a Liability.

One approach outlined by Rosenthal (1978) and applied by Farley, Lehmann and Ryan (1981) involves imposing a quasi-experimental design on the studies treating each study as a single observation. me descriptive statistics reporting effects, relationships, model parameters etc. constitute the data. Analysis of variance techniques are used to determine if salient study characteristics do in fact impact study outcomes in a systematic fashion. This process, labeled "imperfect replication" by Farley, Lehmann and Ryan, was used to integrate the results from independent tests of Fishbein Intention Models. Two beta weights and a goodness of-fit statistic were extracted from each test and five salient study characteristics were used in this analysis. Only two of these characteristics affected study results, the method for measuring attitude and the researcher's discipline regardless of attitude measure. An additional benefit put forth for this analysis of variance approach is that of designing research programs to fill sparsely populated ANOVA cells.

Farley, Lehmann and Ryan (1982) have extended the approach into sparse replications" which attempt for example to assess patterns of numerical results where there are few studies involved, different products and situations, differences in parameter estimation procedures, and differences in measures. This approach involves the pooling of system parameters from several measurements across several studies to provide an adequate sample of conceptually comparable measurements. Comparison is via scaleless elasticities of these systems' parameters .

The analysis of study characteristics versus study findings is that component of meta-analysis which Glass claims allows the question of the importance of study quality to be resolved. It is an empirical question as to whether or not "better" studies actually generate outcomes different from "lesser" quality studies.

Farley, Lehmann and Ryan (1981) suggest that this approach is most promising when:

(1) a well developed theory exists, preferably with the form of a functional relationship among the variables specified

(2) a relatively small set of empirical questions is of interest

(3) output measurements from the various studies are expressed in, or easily converted to, comparable units

(4) measurement and estimation methodologies are easily comparable over studies

(5) enough studies exist to provide a sample size adequate for estimation of ANOVA parameters.

In summary it can be seen that the quantitative integration of research, or meta-analysis, consists of a wide range of techniques. What this overview has not delineated is the myriad of specific technical difficulties and issues involved in applying these techniques. The reader is referred to the references in this paper for further details.

CONCLUSION

What has been overviewed in this paper is a process of research integration emphasizing the emerging merits of meta-analytic techniques while at the same time giving properly implemented qualitative integration its proper tue. It is timely that consumer researchers tune into research integration since "to concentrate our energies on more primary research and evaluation studies without systematic integration of previous studies is scientifically and educationally wasteful" (Walberg and Haertel 1980).

Quantitative integration, or meta-analysis consists of a variety of complementary techniques. Technique selection will depend on the integration problem under investigation and the nature and scope of the body of research being integrated.

The apparent benefits of meta-analysis for discovering what we know about consumer behavior, together with the consumer researcher's love of methodological rigor, suggest that it will be widely applied in consumer research integration. Meta-analysis is not, however, a panacea; should not totally replace well executed qualitative reviews; nor should we lose sight of the need to adhere to the basic process of research integration no matter which approach is chosen.

REFERENCES

Arkin, Robert, Harris H. Cooper, and Thomas Kolditz (1980), "A Statistical Review of the Literature Concerning the Self-Serving Attribution Bias in Interpersonal Situations," Journal of Personality, Vol. 48, No. 4. 435-48.

Cook, Thomas, D. and Laura C. Leviton (1980), "Reviewing the Literature: A Comparison of Traditional Methods with Meta-Analysis," Journal of Personality. Vol. 48, No. 4. 449-72.

Cooper, Harris M. and Robert Rosenthal (1980), "Statistical Versus Traditional Procedures for Summarizing Research Findings," Psychological Bulletin, Vol. 87, No. 3. 442-49.

Eysenck, H. J. (1978), "An Exercise in Mega-Silliness," American Psychologist, May 1978, 517.

Farley, John U., Donald R. Lehmann, Michael J. Ryan (1981), "Generalizing from 'Imperfect' Replication," Journal of Business, Vol. 54, No. 4, 597-609.

Farley, John U., Donald R. Lehmann, and Michael J. Ryan ( 1982), "Patterns in Parameters of Buyer Behavior Models: Generalizing from Sparse Replications," Unpublished Working Paper, Graduate School of Business, Columbia University.

Glass, Gene V. (1976), "Primary, Secondary, and Meta-Analysis of Research," Educational Researcher, Vol. 5 (Nov.), 3-8.

Glass, Gene V. (1978), "Integrating Findings: The Meta-Analysis of Research," Review of Research in Education, Vol. 5. Chptr. 9, 351-79.

Glass, Gene V. (1980), Summarizing Effect Sizes," New Directions for Methodology of Social and Behavioral Science, Vol. 5, guest ed. Robert Rosenthal, San Francisco, CA: Jossey-Bass Inc., 13-32.

Glass, Gene V., Barry McGaw, and Mary Lee Smith (1981), Meta-Analysis in Social Research, Beverly Hills, CA: Sage Publications.

Jackson, Gregg B. (1980), "Methods for Integrative Reviews," Review of Educational Research, Vol. 50, No. 3, 438-60.

Kassarjian, Harold H. (1977), "Content Analysis in Consumer Research," Journal of Consumer Research, Vol. 4, 8-18.

Light, Richard J. (1979), "Capitalizing on Variation: How Conflicting Research Findings Can Be Helpful for Policy,-' Educational Research (Oct.), 7-14.

Light, Richard J., and P. V. Smith (1971), "Accumulating Evidence: Procedures for Resolving Contradictions among Different Research Studies," Harvard Educational Review, Vol. 41, 429-71.

Pillemer, David B. and Richard J. Light (1980), "Synthesizing Outcomes: How to Use Research Evidence from Many Studies," Harvard Educational Review, Vol. 50, No. 2, 176-95.

Rosenthal, Robert (1978), "Combining Results of Independent Studies," Psychological Bulletin, Vol. 85, No. 1. 185-93.

Rosenthal, Robert (1980), "Summarizing Significance Levels," New Directions for Methodology of Social and Behavioral Science, Vol. 5, guest ed. Robert Rosenthal, San Francisco, CA: Jossey-Bass Inc., 33-46.

Smith, Mary Lee (1980), "Publication Bias and Meta-Analysis," Evaluation in Education: An International Review Series, Vol. 4, No. 1, guest eds. Herbert J. Walberg and Edward H. Haertel, Elmsfort, NY: Pergamon Press Ltt., 22-4.

Suppe, Frederick (1977), The Structure of Scientific Theories, 2nd Edition, Urbana, IL: University of Illinois Press.

Walberg, Herbert J. and Edward H. Haertel (1980), "Research Integration: Introduction and Overview," Evaluation in Education: An International Review Series, Vol. 4, No. 1, guest eds. Herbert J. Walberg and Edward R. Haertel, Elmsford, NY: Pergamon Press Ltd., 5-12.

----------------------------------------