A Procedure For Integrating Outcomes ACRoss Studies

Kent B. Monroe, Virginia Polytechnic Institute and State University
R. Krishnan, Virginia Polytechnic Institute and State University
ABSTRACT - This paper illustrates a systematic approach to integrating knowledge within a specific research domain. Using the literature in the price - perceived quality as a base, a methodology for conducting an integrative review is developed and evaluated. The paper concludes with a set of recommendations for standardizing the reporting of empirical tests of conceptual paradigms.
[ to cite ]:
Kent B. Monroe and R. Krishnan (1983) ,"A Procedure For Integrating Outcomes ACRoss Studies", in NA - Advances in Consumer Research Volume 10, eds. Richard P. Bagozzi and Alice M. Tybout, Ann Abor, MI : Association for Consumer Research, Pages: 503-508.

Advances in Consumer Research Volume 10, 1983      Pages 503-508


Kent B. Monroe, Virginia Polytechnic Institute and State University

R. Krishnan, Virginia Polytechnic Institute and State University


This paper illustrates a systematic approach to integrating knowledge within a specific research domain. Using the literature in the price - perceived quality as a base, a methodology for conducting an integrative review is developed and evaluated. The paper concludes with a set of recommendations for standardizing the reporting of empirical tests of conceptual paradigms.


In consumer research, a traditional recommendation at the end of an article is the clarion call for more research to clarify the results, some of which appear to be contradictory to previous findings. Moreover, the next published study on the research area begins by noting the inconsistent findings of previous research. Round and round the process continues without a clear answer to the underlying research question. The research surrounding the hypothesized relationship between price and perceived quality is a case in point. Every so often we see a new article again attempting to answer the question of whether people to positively associate higher prices with higher quality. The end result is we have one more study that adds to the box-score of either yes or no.

If consumer research is to achieve progress, it is necessary to follow the process of knowledge accumulation and refinement. Knowledge accumulation requires a careful, systematic review of previous studies, noting inconsistencies and conflicts. Moreover, the review should seek to resolve these conflicts by discovering the underlying sources of the conflicts. Unfortunately, previous summaries of past research have not realized these goals. The traditional review has been more a literary approach as it chronicles who did what and with what statistically significant or insignificant results. Rarely, has such a review told us about the magnitude of the effects found nor has such a review combined and compared sets of studies. Further, the analysis underlying the review primarily has been qualitative and subjective.

Some of the major shortcomings of traditional attempts tv synthesize knowledge include: (1) methodological deficiencies of the literature search process, i.e., incomplete literature searches; (2) qualitative and judgmental reviews without analytical rigor; (3) literary and chronological reporting style; (4) highly uncritical reviews; (5) lack of definitive results.

The objective of this paper is to develop and illustrate a methodology for synthesizing knowledge. The illustrations will be drawn from an integrative review of the price-perceived quality research literature. Some analytical techniques for combining and comparing studies will be developed and illustrated. The paper concludes with an assessment of the knowledge synthesis process and a set of recommendations for facilitating future knowledge synthesis.


Given the objective of constructing theories of consumer behavior, perhaps a first task is to construct a verbalization of the phenomenon in question. One of the first steps in developing a verbal theory is to search the literature for propositions, conceptualizations, theoretical orientations, and assumptions. A second aspect of the literature search is to discover what propositions and variable relationships have been tested. Moreover, as findings are collected, consolidation should occur, leading to confirmed linkages among variables. Surely, such a necessary first step in the theory building process would have a well established methodology. Unfortunately, it is the rare methodology book that addresses the issue of performing an integrative review of the literature of a research domain. ' This section sets out a methodology for conducting I literature reviews for the purpose of knowledge synthesis.


The first step in performing an integrative review is to determine the substantive questions that guide the search process. The questions may be very broad such as asking: "What factors influence the use of price in the purchase decision?" Other questions may be relatively narrow, such as asking: "Whether higher prices will imply higher product quality to prospective buyers?" Skill in asking proper questions is important to the progress of consumer research.

There are a number of sources that can help frame questions when conducting a review (Jackson 1980). One important source is available theory on the topic. Theory can suggest important questions or issues that need investigation. A second source to consult is previous reviews in the topic area. Sometimes a previous review will recommend fertile areas for knowledge synthesis. It is important that one should be as critical of previous literature reviews as one is of empirical research. Indeed, reviews that simply summarize the discussion sections of studies probably are smoothing still further the already smoothed inconsistent or ambiguous findings. As is developed below, a review must concentrate on the results sections of reported research.

A third source for developing questions is the actual literature being reviewed. It is advisable not to frame the specific question until at least a cursory examination of the literature has been completed. The advantage of the preliminary scan is that one may discover the need to become narrower or, conversely, broader in the questions to be investigated. Moreover, other viable questions may surface that would not have been asked. Finally, the fourth source is one's own insight, intuition, and ingenuity. Creativity of the investigator should not be minimized, but at the same time, it should be stressed that objectivity is needed to cement definitive answers to puzzling questions.

In the present illustration, the question was determined by noting the stress by researchers that there was inconsistent evidence whether consumers imputed quality for products on the basis of price. Moreover, a compiling of the price research literature indicated that there was a sufficient set of "data" to test the generalization that there is a positive relationship between price and perceived product quality.

Data Collection and Sampling

Literature reviews need to specify the population of studies investigated and how the particular sample of reviewed studies was selected. At a minimum, the reviewer should locate as many studies in the research domain as possible. The literature review then must report the search strategy so that others may evaluate the quality of the search process. Once the population of studies has been identified, the reviewer should also specify any decision rules for excluding studies, or whether a random sample of studies was taken. In the present illustration, 48 studies were identified. A careful coding of these studies indicated that the largest set of studies wherein there was some homogeneity of research design and analytical procedures was the price-perceived quality studies utilizing some form of analysis of variance as the statistical technique. This paper reports on the integrative analysis of this subset of studies. The sample of these studies is not random.

Coding the Data

One of the most important tasks of the review is to detail the characteristics of the studies examined. The researcher should take the perspective of a detective and examine each study microscopically. A coding form should be developed that reflects the nature of the research to be examined and the possible sources for variation in results. To the extent possible, the characteristics of the studies and their findings are categorized and quantified to facilitate their integration (Glass, McGraw and Smith 1981). Table 1 presents the coding form developed for the present study.

The purpose of developing this detail is to provide a base for analyzing whether variations in outcomes can be traced to fundamental differences in the studies. An example of a summary of the studies' characteristics is given in Table 2.

Quantifying the Studies' Results

Every attempt should be made to quantify the results of the reviewed studies. This quantification may vary from the relatively unsophisticated counting of significant/ insignificant studies to the determination of magnitude of effect measures with directional signs. To the extent that the results are quantified according to standardized metrics, studies can be compared and combined to test hypotheses about the research questions. The meaning or results does not refer to the original investigator's conclusions (Rosenthal 1982). Rather, the term results refers to the simple relationship between two variables. It is necessary to address the question: "What is the relationship between price and perceived product quality? Once a studies' results have been broken into the two variable context, the next step is to compute estimates of the magnitude of the relationship, the effect size.

Counting Methods. Perhaps the easiest way to begin the task of analyzing results is to prepare a distribution of results across the reviewed studies. The distribution of p-values for 28 price-perceived quality results indicated that 16 of the 28 results reported statistical significance at the .05 level or less. From a box-score report, the score is 16 to 12 in favor of the positive price-perceived quality relationship. Obviously, this unsophisticated approach offers little insight to resolving the relationship issue.



In other reviews where there are both positive and negative relationships reported, then greater care must be taken when using the counting method. At a minimum, the categories of statistically significant positive, nonsignificant positive, zero difference, nonsignificant negative, and statistically significant negative should be used. Also, care must be exercised by the reviewer to be sure that the original null hypothesis tested for each study was of "no difference". The reviewer should also check whether the hypotheses were all of the same type: one-tailed, or two-tailed. Often reported studies only report significance or not. Unless the p-values are reported, the reviewer does not know whether the level of Type I errors are the same across the studies. Finally, the above approach which is often used in less systematic reviews, does not provide information on the magnitude of the differences. As everyone knows, statistical significance is dependent on the sample size as well as the differences between treatments:

Test of Significance = Effect Size x Sample Size   (1)



Comparing Studies. Before synthesizing results, studies should be tested for homogeneity of results. Are the results across the studies reasonably similar, or are there considerable differences? When the results vary to some extent, the variations may be due to the quality of the methodology, sampling error, or differences in the constructs investigated. A test for homogeneity helps alert the reviewer to one or more of these possibilities.

For the 28 results reviewed, it was possible to compare studies both according to their p-values as well as their effect sizes. With the assumption that reported statistical insignificance without a corresponding p-value was at a p-level of .50, and assuming all statistical significance of less than .01 to be equivalent to a p-level of .01, the significance test was conducted. A standard normal deviate, Z, was determined for each exact p-value with the same directional sign as the original study. All p values must be one-tailed. The statistical significance test of homogeneity of Z's is (Rosenthal 1982):

EQUATION  is distributed as x2 with N-1 df.   (2)

For our data, EQUATION was 28.85. The probability of this chi square value with 27 df is approximately 0.50. Thus, these 28 p-values appeared to be relatively homogeneous.

It is also possible to test the statistical homogeneity of- the effect-size estimates-. However, for the price-perceived quality studies we were not able to compute the effect sizes for all studies. The effect size d =(Ml-M2)/s was computed for each simple two-variable result. These d-values were transformed into correlation effect sizes utilizing the relationship r = d/ (d2+4) 1/2 (Cohen 1969). The r-values were transformed into their associated Fisher z's to test the homogeneity of the r's using (Rosenthal 1982):


In this equation, z is given by:


It has been suggested by several researchers that a no-price treatment ought to be included in price-perceived quality studies. The argument is that this "control" treatment provides a base line for comparing the effect of price treatments on subjects' perceptions of produce quality. Three studies were found with a price absent treatment for which effect sizes could be computed for the conditions of whether subjects were provided brand name information. For the brand name absent condition, the test of homogeneity of effect size produced a chi-square value of 8.27 with 14 df, implying homogeneity of magnitude of effects. A chi-square value of 2.18 with B df was found for the brand name present condition, also implying homogeneity of results. While there was some variation in the effect sizes across the two conditions (eight of the twenty-four effects were negative), nevertheless, this variation is well within the possibility of chance.

A second substantive issue is when, in the presence of price information, what are the effects of higher prices on quality perception? A number of studies presented price information at different price levels. There were a variety of prices used depending on the products used in the study and the time of the study. However, by categorizing the prices into low or high for a two-level price-treatment, or low, medium, and high for a three level price treatment, some comparisons across studies could be made as illustrated in Table 3 for the high vs. low price comparison. The analysis for low price vs. medium price or high price, and medium price vs. high price indicated heterogeneous results. Further, of the 37 effect sizes, only two were negative, with both occurring in the medium vs. low price comparison, and these negative effects were the smallest in absolute value of any shown calculated. The largest effect sizes occurred in the within research designs.



Combining Studies. Once the results of the set of studies have been compared, the next step is to combine the p levels of these studies. The objective is to obtain an overall estimate of the probability that the set of p levels could have occurred if the null hypothesis of no relationship between the two variables was true (Rosenthal 1980).

The method of adding Zs requires converting all p levels to the appropriate Z values, summing, and dividing by the square root of the number of studies being combined. For the main price effects, the sum of the Zs was 36.7. Therefore:


The new statistic, Zm, is distributed as Z and is significant at the .01 probability level, one-tailed. This combined probability supports the majority of the studies that found a significant positive price-perceived quality relationship.

Another question of interest is how many additional studies reporting a p value of .50 (Z=O) would be required to reduce the significant Z-value of 6.94 to just significant at the .05 p-level. Using the relation

N = (EZi/1.645)2   (6)

it can be determined that" = 498. Therefore, 470 additional results reporting a p value of 0.50 would be necessary to lower the overall probability level to 0.05.

Since we are also interested in the combined estimate of the effect size, we should also perform a similar analysis. Using the r effect size estimate and the Fisher r to z transformation z was calculated (Rosenthal 1982):


A table of Fisher z values was then used to find the r associated with the mean z. The combined effect sizes for the price present vs. price absent conditions were relatively small, r = .05 and -.01. For the direct price comparisons, the combined effect sizes were considerably larger, r = .45, .30, and .245.

Reporting and Interpreting the Results

Once the review has been completed, an important step is to report the results of the analyses and, as much as feasible, arrive at some substantive conclusions. At this stage, the effect of the analytical results on the underlying theory or conceptual framework is vitally important. The reviewer must indicate whether new theory has been induced, old theory has been confirmed, or old theory has been disproved. Where appropriate, the reviewer should suggest the impact of these findings on policies or practices. Finally, it would be helpful if the reviewer could also suggest how future research on the topic could be designed either in terms of substantive issues or methodological issues. Suggestions for other types of reviews might also be mate.

At a minimum, the written report of the review ought to describe the sampling, measurement, analyses, and findings. All analytical procedures should be described in detail. If the number of reviewed studies is not too large, a summary table of the characteristics of each reviewed study should be prepared. Moreover, tables reflecting the p values, effect sizes, and other data leading to the reviewer's conclusions ought to he provided so that any reader may reanalyze the studies and check the reviewer's conclusions.

Substantively, the review and analysis of the price-perceived quality studies seems to confirm the hypothesis that buyers tend to positively relate price and product quality. Methodologically, although the number of investigations is small, there appears to be little differential effects whether a price absent treatment is included in studies investigating the influence of price on product evaluations. However, a number of substantive or methodological issues must be investigated and resolved before this conclusion can become definitive including the methodological quality of the studies.


In recent years, there has been a number of calls for more replication of prior research and for more published literature reviews. For example, the new editor of the Journal of Marketing Research has called for review manuscripts. However, as the JMR editors have suggested, a review article should "advance the field by virtue of its insightful, integrative, and critical evaluation of a research domain. The discipline of consumer research is mature enough to have a sufficient number of studies in several research areas to warrant integrative reviews. Yet, the best reviews of research have rarely provided more information than the direction or the relationship between the variables investigated and whether the tests reached a particular p-level.

Fortunately, this state of affairs is changing and literature reviews are becoming more systematic and quantitative. While such a change is indeed desirable, we must be cautious not to believe that qualitative reviews are undesirable, nor to accept blindly the methods of quantitatively assessing a research domain.

All literature reviews contain a number of qualitative judgments such as the population of studies to be reviewed, the constructs to be investigated, the criteria for categorizing studies, or the assessment of the methodological quality of the reviewed studies.

Increased quantification of a literature review helps disaggregate a large number of studies, but it does not necessarily improve the quality of the inferences made from such reviews. For careful reviews of a research domain sensitive to assumptions, a quantitative assessment of a research domain can be a valuable part of the discovery process. But, when a quantitative review is done by uncritical reviewers who are swayed by its apparent simplicity and objectivity, the outcomes could be misleading and even wrong (Leviton and Cook 1981). For this reason, we now provide a list of advantages and disadvantages of quantitatively assessing a research domain.


1. A quantitative assessment of a research domain is systematic, clearly articulated, and replicable. It provides a public procedure for documenting the reviewer's interpretations, and provides other researchers with an opportunity to interpret the data differently (Rice 1978).

2. It can be used with data from the best and flawed studies of a research domain. However, a control procedure must be utilized to check on whether the flawed studies bias the results in a particular direction (Jackson 1980).

3. By increasing effective sample sizes, the power of statistical tests is enhanced and may help resolve inconsistent findings from smaller data sets (Pillemer and Light 1980).

4. When a sufficient number of studies is available, it permits the use of multivariate statistical techniques to investigate relationships among key characteristics of the reviewed studies: research design, subjects, treatments, settings, and findings.

5. It enhances the ease of manipulating the data from a large number of studies, and increases our ability to isolate relationships that test either the substantive or methodological influences on the dependent variable (Cook and Leviton 1980).

6. It helps the reviewer to obtain a more accurate average effect size estimate.

7. Combining studies increases the number of data points from which to describe the relationship between the variables. By looking across studies, we can consider a wider range of values for a variable, thereby providing for an examination of the mathematical form of the relationship.

8. A formal classification and analysis of studies can be useful for a grounded approach to theory development. It facilitates the use of the building block approach to theory development in that results from smaller studies may be combined into a larger conceptual framework.


1. Often reported studies do not provide sufficient detail to enable a quantitative assessment of the research domain to be complete.

2. Because of differences in operationalizing constructs across studies, a quantitative assessment often is forced to have a broad orientation for the constructs. This difficulty could lead to disregard for theoretical relevance and produce misleading inductive inferences (Cook and Leviton 1980).

3. Unless there are a large number of studies, the sample of studies reviewed must be relatively homogeneous across methods and constructs. The more heterogeneous studies are, then we must have larger samples of studies to assume, with some confidence, that we have sampled the true underlying distribution of studies.

4. There is the problem of how to handle variations in sample size across studies. Should each result be weighted by its sample size, or should each result be an unweighted datum?

5. There is the problem of the lack of a common metric across studies. As in the illustrative review of the price-perceived quality literature, different operationalizations of the dependent variable occurred, different measuring scales were used, variations in subjects' tasks and frame of reference were used, and different types of statistics were reported. Analyzing standardized effect measures helps overcome this problem, but we still must cope with the possibility that the standardized effect size obscures the absolute magnitude of the effects (CooPer 1981; Rosenthal and Rubin 1979).

6. An important issue in quantitatively assessing a research domain is the methodological quality of the reviewed studies. If the studies have not been subjected to tests for reliability and validity, whit confidence do we have in the combined results?

7. The value of a quantitative assessment of a research domain is limited to inductive approaches to theory development. Thus, conclusions based on such systematic reviews must be viewed as hypotheses worthy of future empirical tests.


It is clear we need to discover the "facts" about our discipline. To this end we need to review in a systematic manner previous research on a topic and to synthesize previous findings. However, as illustrated in this paper, there are a number of problems and limitations to this part of the discovery process. Some of the problems can be overcome with better research procedures. Other problems can be eased with a better reporting of original research procedures and results. The recommendations given below are a call for standardizing the reporting of research procedures and results. Such standardization will facilitate the synthesis of knowledge and the identification of progress and issues within a research domain.

1. Research should routinely report:

a. treatment means and standard deviations;

b. effect sizes;

c. total sample sizes, and, if applicable, cell sizes;

d. p-values for both significant and non-significant statistical tests;

e. complete analysis of variance tables if applicable.

2. Researchers should carefully describe:

a. operationalization of the constructs;

b. actual treatment levels used;

c. measurement procedures and scales used;

d. sampling procedures used for selecting subJects, products, conditions, or settings;

e. the measurement quality of their scales;

f. the justification for the research design used.

A systematic, integrative, and quantitative assessment of a research domain can be an important contribution to knowledge. Such reviews will be valuable when i applied and interpreted with care. As with any technique, there are many judgments and assumptions to be r made. We must be especially mindful of the limitations of such an approach and guard against the faddish misuse of the procedures illustrated in this paper.


Churchill, Gilbert A. and Perreault, William D., Jr. (1982), "JMR Editorial Policies and Philosophy," Journal of Marketing Research, 19 (August), 283-287.

Cohen, Jacob (1969), Statistical Power Analysis for the Behavioral Sciences, New York: Academic Press.

Cook, Thomas D. and Leviton, Lara C. (1980), "Reviewing the Literature: A Comparison of Traditional Methods with Meta-Analysis," Journal of Personality, >8 (December), 469-472.

Cooper, Harris M. (1981), "On the Significance of Effects and the Effects of Significance," Journal of Personality and Social Psychology, 41 (no. 5), 1013-1018.

Glass, Gene V., McGaw, Barry and Smith, Mary Lee (1981), Meta-Analysis in Social Research, Beverly Hills, CA: Sage Publications.

Jackson, Gregg B. (1980), "Methods for Integrative Reviews," Review of Educational Research, 50 (Fall), 438-460.

Leviton, Laura C. and Cook, Thomas D. (1981), "What Differentiates Meta-Analysis from Other Forms or Review," Journal of Personality, 49 (June), 231-936.

Pillemer, David B. and Light, Richard J. (1980), "Synthesizing Outcomes: How to Use Research Evidence From Many Studies" Harvard Educational Review, 50 (May), 176-195.

Rice, Robert W. (1978), "Formal Classification or Research Information," American Psychologist, 33 (March), 249-264.

Rosenthal, Robert (1980), "Summarizing Significance Levels," in R. Rosenthal (ed.), New Directions for Methodology of Social and Behavioral Science: Quantitative Assessment of Research Domains, No. 5, San Francisco: Jossey-Bass, 33-46.

Rosenthal, Robert (1982), "Valid Interpretation or Quantitative Research Results," in D. Brinberg and L. Kidder (eds. ), New Directions for Methodology of Social and Behavioral Science: Forms of Validity in Research, No. 12, San Francisco: Jossey-Bass, 59-75.

Rosenthal, Robert and Rubin, Donald B. (1979), "A Note on Percent Variance Explained as a Measure of the Importance of Effects," Journal of Applied Social Psychology, 9 (No. 5), 395-396.

(A list of the price-perceived quality studies reviewed for this paper can be obtained from the authors.)