Toward the External Validity of the Information Integration Paradigm

Roger Marshall, Nanyang Technological University
Christina Kwai-Choi Lee, University of Auckland
Jennifer Yee Sum, University of Auckland
ABSTRACT - This paper reports an experiment conducted to test the external validity of the information integration paradigm. Consumers' decision processes are studied as they make quality judgements of orange juice in a naturalistic setting, with the information available controlled in a manner more typical of the market-place than in integration experiments reported thus far. An averaging model is supported.
[ to cite ]:
Roger Marshall, Christina Kwai-Choi Lee, and Jennifer Yee Sum (1995) ,"Toward the External Validity of the Information Integration Paradigm", in NA - Advances in Consumer Research Volume 22, eds. Frank R. Kardes and Mita Sujan, Provo, UT : Association for Consumer Research, Pages: 78-83.

Advances in Consumer Research Volume 22, 1995      Pages 78-83


Roger Marshall, Nanyang Technological University

Christina Kwai-Choi Lee, University of Auckland

Jennifer Yee Sum, University of Auckland


This paper reports an experiment conducted to test the external validity of the information integration paradigm. Consumers' decision processes are studied as they make quality judgements of orange juice in a naturalistic setting, with the information available controlled in a manner more typical of the market-place than in integration experiments reported thus far. An averaging model is supported.

Research in the area of impression formation has been dominated for the last two decades by the information integration approach of Norman Anderson and his associates. Information integration is a research paradigm which allows the researcher to study theoretical combination rules (model algebra) by examining actual combination rules (cognitive algebra) employed by experimental subjects as they make judgements in controlled conditions (Bettman, Capon & Lutz 1975). Based on the use of factorial ANOVA design, this theory has been extended to provide insights into an impressive array of domains of human judgement ranging from psychophysics (magnitude estimation) and decision theory to consumer attitudes (Cohen, Miniard & Dixon 1980).

The two integration models that have received the greatest amount of attention and have been most extensively tested are the adding and the averaging models (Anderson 1981). The averaging model depends on the postulation that the weight, or importance, of an attribute in the model varies according to the weights of the other attributes to be integrated. The adding model, on the other hand, assumes that the weight of each attribute is independent of the other attributes' impact. In short, adding implies a cognitive system in which "the more [items of information] the better", whereas averaging implies the contrary; where "more may not be better" (Shanteau & Ptacek 1983).

The results of the numerous studies conducted and reported in the psychology literature enable Shanteau and Ptacek to state with confidence that "...averaging is a very common psychological mechanism for consumer information processing" (1983, p.149). The relevance and importance of this statement to marketers and advertisers is significant; it is odd, then, that there is a relative paucity of reported research into consumer judgements using the integration paradigm. A comment to this effect was made by Troutman and Shanteau in 1976. Then, more than a decade after the technique was first introduced to the Journal of Consumer Research by Bettman et al. in 1975, Lynch also wrote that "in relatively few published articles in marketing and consumer research has the integration paradigm been used to study basic consumer judgement processes" (1985, p.2). There are, inter alia, two reasons for this apparent oversight.

The first is simply that there has been no perceived lacuna in the marketing and buyer behaviour literature. The additive model, and in particular that of Fishbein and Ajzen (1975) seems firmly ensconced. Moreover, a strong argument can be mooted that linear models such as Fishbein's fit the data so well that a barrier has effectively been placed in the way of further investigation into the processes of information integration (Anderson & Shanteau 1977). Markin (1980, p.320) wryly comments that consumer behaviourists are "loath to attack the gods of convention". A quick perusal of the current texts reveal that in the present context this is certainly the case; very few give more than a cursory mention of integration theory or averaging models of impression formation. (See, for example, Assael 1988; Bennett 1988; Engel, Blackwell & Miniard 1990; Howard 1989; Kotler & Armstrong 1989; Peter & Olson 1990.)

The second reason is altogether different, pertaining to specific criticisms of the technique. Many of the technical criticisms levelled have been answered more or less satisfactorily, but continue to provide material for debate. Unfortunately, the criticism of integration theory that is probably the most damning in the eyes of marketers remains unanswered - that there is little reported evidence available to show that the techniques can be applied in market-place conditions. Before this theme of external validity is further pursued, the basic integration technique under discussion is briefly revisited.


Research into attitude formation, human perception and judgement usually requires the use of some form of simple algebraic model. Anderson is no exception in this regard, and originally used algebraic methods to present his ideas (Anderson, 1965). Nevertheless, a graphical approach is conceptually easier to grasp, and so will be used here.

Anderson builds his substantive theory into the measurement technique, thus forming what he calls "functional theory" (Anderson 1974a). In practice, this involves asking respondents to provide judgements of series of stimuli on some scale, then decomposing these judgements mathematically to derive the rules used in their formation; an approach reminiscent of conjoint analysis.

To illustrate the idea, consider Figure 1A. Here, two attributes of some imaginary product have been varied over three levels to provide six combinations of information with which a subject can form a judgement of the product in question. The mean judgements of the sample group are plotted against values on the Y-axis; and the High, Average and Low values of one attribute joined. Note that the lines are parallel. That this is so is partly dependent on an underlying assumption of integration theory, that an item of judgement information retains a constant value when combined with other items of information. There is no interaction between the two lines, thus it can be stated that the judgements for each level are both consistent and independent.

Figure 1B shows the same plots, with an additional series representing the mean judgement of the group when presented only with information about the variable shown on the X-axis. The (dashed) line created by joining these new plots is obviously not parallel with the other two, and actually exhibits a crossover with one of them. As respondents have only been given one item of information, we can be sure that they have employed an additive paradigm in arriving at their judgement in this instance. The fact that the slope of the dashed line is steeper than that of the other two suggests that the integration of the extreme high and low value information has resulted in a judgement value closer to the mean; i.e. an averaging paradigm has been used when two items of information were integrated. More simply, "the addition of moderately polarized information to highly polarized information decreases the polarity of the response" (Anderson 1965a, p.397). In fact, although the cross-over effect is central to the proof of averaging, an actual crossover is not necessary; it is only necessary to show that the slope of the line is significantly steeper (Troutman & Shanteau 1976). ANOVA provides a useful and rigorous test of parallelism.




There are four major facets to the accusations of poor external validity of the integration paradigm; the artificial settings of most integration experiments to date, the almost exclusive use of pencil-and-paper tests in the reported integration literature, the problem of inference, and the use of procedural techniques that fall short of replicating marketing conditions.

With regard to the first charge above, Ebbessen and Konecni (1980) are vocal critics, claiming that the laboratory simulations that provide underpinning for the averaging model lack validity in real-world settings. Levin, Louviere, Schepanski and Norman (1983) rush to the defence of the technique, taking the line that generalizations from laboratory to market are justified in the light of the external validation provided for the results of a number of such studies. This is a little unconvincing for many consumer behaviour students and marketing practitioners, however; and even Levin goes on to sound a note of caution about expecting results acquired in the laboratory to be exactly duplicated in a naturalistic setting.

The use of paper-and-pencil tests refers to the prevalence of studies that present information to subjects as written statements rather than in the form of actual products with various (manipulated) attributes. Obviously, the former technique provides a much closer control of the information being manipulated, but raises the possibility that other variables not specifically included might yet inferentially affect the analysis.

Cohen et al. (1980) question the validity of the critical crossover test of Anderson's theory along these same lines, by referring the readers to the experiment conducted by Troutman and Shanteau (1976). In this experiment, expectant parents were randomly offered a series of nine written descriptions of diapers; each description combined one of three levels of absorbency with one of three levels of durability. Quality ratings elicited from the subjects, along with ratings of products described by offering information on only one of the product attributes, suggested that an averaging model was being used in the judgement information-integration process. Cohen et al. suggested (page 39) that respondents' inferences about the missing attribute data accounted for the averaging. Levin et al. (1984) carried out an experiment of their own to validate Cohen et al.'s idea. Their research offers some support, in that "...inferences based on interstimulus relationships appear to occur only when the relationship between stimulus dimension is strongly established and when the missing information is deemed crucial to the required judgement" (p101).

In the market-place, of course, the marketer can rarely present attribute information in isolation. As well as the difficulty of using the crossover test in the face of possible inferences being made from the missing judgement dimension, there is an additional hazard. In the event that information items of minor importance are being manipulated, then the combination rules used to integrate them may be obscured from the researcher by stronger inferences drawn from the more important (non-manipulated) evaluative criteria.

The final external validity problem mentioned above relates to the procedural techniques used in a typical integration experiment. Researchers such as Cohen et al. (1980) have criticized Anderson's experimental procedures contending that they create "a task context in which averaging becomes a heuristic for subjects to use" (p.165). Anderson has never failed to supply his subjects with clear instructions as to how to treat the given information. The instructions usually assure subjects that each attribute is as important as the other and that an equal amount of attention should be paid to each. "This is intended to help ensure the assumption of equal weighting, which is necessary for the parallelism prediction under the averaging model" (Anderson 1974b, p.264). Furthermore, the scales are firmly anchored and subjects are given practice runs in order to familiarize them with the use of a measurement scale.

In fact, these procedures are perfectly justifiable in the sense that integrationalists are not really concerned with the absolute utility ascribed to a particular stimulus by a particular individual respondent, but rather the way that utility is affected when the stimulus is combined with another. In practice too, failure to anchor the scale carefully may well be compensated for merely by enlarging the sample until a reasonable distribution around the judgement mean has been achieved. It is true, however, that consumers are not coached to make judgements about quality or value when selecting products in an everyday situation, neither do they necessarily assign equal weight to all of their evaluative criteria. Although it is reasonable to claim that specific targeted market segments may well have similar absolute values for a given attribute, and that consumers should need no coaching in assessing the quality of most commonly purchased items, the procedural question needs to be addressed in order to maximise validity.



The purpose of the experiment described in this paper was to conduct an investigation of impression formation using the integration paradigm in a naturalistic setting, under conditions that would test the external validity of the model. This is to address the criticism that IIT experiments have been, for the most part, confined to unrealistic laboratory conditions. There is a trade-off between experiments undertaken in laboratory conditions and those totally based in market situations. To provide data that would allow the claim of true external validity for IIT techniques would call for consumption decisions to be made and analyzed, with money changing hands and all the usual market-place pressures, distractions and risks present. The experiments described here did not go that far, because the focus of interest was still the paradigm used to integrate information; and the maintenance of that focus demanded that a modicum of control be retained. The basic research task here was to ascertain whether or not information integration techniques are sufficiently robust that they could distinguish between adding or averaging processes even when subjects made their quality judgement of a consumer good within a more realistic market setting than has hitherto been used.


The general design of this experiment is a 3 x 3, fully crossed, within-subject factorial, with 3 levels of orange juice purity (pure, watered and no information included (herein referred to as "not included")) and 3 levels of brand (Freshup, Woolworths and generic).


Judgements about perceived quality of orange juice were solicited from 25 women. More specifically, this quota sample consisted of mothers, who were house-persons of European descent, between the ages of 25 and 45 years, and had lived in Auckland (New Zealand) for at least five years. This forms a single-culture sub-set of a primary target market of the major producer and marketer of fruit juices in New Zealand, the New Zealand Apple and Pear Marketing Board (NZAPMB). (The customer profile and juice used in the experiment were provided by that organisation.)


The stimuli selected for the study consisted of six clear and three opaque, white plastic 1-litre bottles of orange juice, all with professionally-produced labels. Three clear bottles contained an orange juice of a heavy texture and density, in which orange sediment was clearly visible. Each of the three bottles carried either a Freshup, Woolworths or generic brand on their label. The other three clear bottles contained a thinner orange liquid, that had the appearance of a cordial rather than a pure juice, but bearing the same Freshup, Woolworths or generic brand label. The three opaque bottles bore identical labels but actually contained water to ensure an identical weight to the bottles filled with juice. The nine products thus created closely resembled typical supermarket products.

Respondents' perceptions of product quality were captured on a scale card created by drawing a line on a piece of card, and labelling the ends "$1.50" and "$3.50".


Prior research had been undertaken to test brand names for their quality associations for the target group and to find their evaluative criteria for orange juice. An inspection of previous research and discussion with the NZAPMB identified price, purity and brand as the principal evaluative criteria. Once this was established, realistic purity and price levels were subjectively determined by scanning supermarket shelves. These attributes and levels were then pre-tested, until satisfactory separation of scale values was achieved, by conducting a series of 12 dummy-runs of the integration exercise with members of the public who fitted the description of the target group.

In a further pre-test survey, respondents were required to tick a box labelled "High quality", "Medium quality", or "Low quality" against each of three brands. The highest quality brand was the "Fresh-up" brand of the New Zealand Apple and Pear Marketing Board. A lower, but still positive, quality attribution was accorded to the housebrand of one of New Zealand's largest food chains, Woolworths. (Although unrelated to the Woolworth companies in the United States and Great Britain, the Australasian - and later New Zealand - Woolworths chain has developed from the same "dime-store" beginnings as the overseas counterparts.) The generic label in New Zealand carries a very low perception of quality (this has also been noted in previous published work, see Robertson and Marshall, 1987).

Thus nine "products" were created. A rich coloured juice and a watered, cordial-type juice with a Freshup, a Woolworths brand and no brand; and the three opaque bottles carrying the same three brand labels.


Data collection took place mid-week in a busy shopping centre in an upper-middle class residential district. Full cooperation was received from the Management of the Shore City Shopping Mall in Takapuna, Auckland. The researcher introduced himself as a member of the University undertaking academic research. After potential subjects had been approached by the researcher and invited to participate, they were asked a short series of screening questions about their age, children and purchase frequency of orange juice, to ensure that they fitted the profile of the target group. No refusals were encountered.

Subjects were invited to sit on a chair in an alcove formed by potted plants at the foot of a stairway. Each interview was conducted one-on-one, although many of the women had small children with them. In these situations the child was given a small carton of orange drink to keep them occupied. The task was explained carefully, with the researcher checking frequently to ensure that the instructions were understood.

Quality ratings were eliciting from respondents in terms of the price that they believe to represent the quality of each alternative product, rather than a quality rating on some arbitrary scale. Thus the inference problem posed by the existence of evaluative criteria other than those manipulated in the experiment (brand and juice purity) is overcome by explicitly including all three evaluative criteria. Furthermore, it was thought that respondents should be able to use prices to reflect quality far easier than to record perceived quality on a 100-point scale; after all, there can be few people who do not frequently undertake this process during every-day shopping activities.

No preliminary training was given, but the end points of the scale were anchored to compensate for the somewhat small sample size. Anchors were placed by gaining agreement that the lowest quality 1-litre bottle of orange juice retailed normally for about $1.50, and the dearest, highest quality juice for about $3.50. The appropriate stimuli were inspected during this discussion. Subjects were told that they were expected to consider each bottle as it was presented, and call out a price that, in their view, reflected its quality. Instructions were given carefully, and subjects were asked to explain what they thought they had to do before the judgement task began, in order to ensure that the process was fully understood.

Quality judgements of each alternative were recorded by the researcher as they were called out by the subject. The subject held the scale card as a reminder, and their oral responses recorded by the researcher so that the respondents did not have access to the record of their earlier judgements. Respondents were allowed to hold and inspect the bottles one at a time, as they were presented in a different, predetermined, random order for each respondent. This is important, because if stimuli were presented in the same way on each occassion, then an order bias might develop which could offer an alternative explanation for the results. Respondents took readily to the judgement task, seemingly relating closely to the use of a price level to express their quality judgement. Using price in this way also avoided the need to prompt subjects to make quality rather than preference judgements. The whole judgement procedure rarely took longer than five minutes for each individual.


The data was analyzed using the established procedures of functional measurement. First, the data was plotted and visually scanned, then subjected to rigorous test by ANOVA. The purpose of the ANOVA test is three-fold. First, it is to rigorously confirm the existence or otherwise of parallelism between the judgement plots (a lack of interaction indicates parallelism). Second, a significant main effect does give an indication that the attributes selected were indeed relevant in the judgement process. Third, the comparative weights of the different attributes in the judgement can be calculated, to ensure that integration has actually occurred.

Anderson suggests that weights represent the salience or relevance of the judgement dimension (Anderson, 1981). In terms of IIT, analysis of two attributes both displaying equal but very small weighting would be meaningless (it doesn't matter how people treat unimportant information). Similarly, if two attributes are combined, but one has a very low weighting and the other a very high, then the one could be overpowered to such an effect that the "integrated" judgement may show no difference to one in which only the high-weighted attribute is used (ie, no integration takes place). Assessment of the weights of information items can be made from a post hoc inspection of the ANOVA analysis. Because of the direct relationship between power and sample size, the F test gives no information about the strength of the effects in an ANOVA analysis (Keppel, 1973). Rao (1977) states that the relative importance of each attribute in an analysis of variance is best assessed from a comparison of the proportion of the sum of squares due to each attribute. It is the Hays formulation of omega squared (w2) that is used as the primary indicator of weight in all the analyses that follow.

Under inspection by ANOVA, the 2 x 3 analysis of the plots for pure and watered juice over the three brands shows a significant main effect (F=39.07, p<.001, w2=.67). In addition, the visual impression of parallelism is confirmed (F(Pure x Watered)=1.452, p=.237), demonstrating meaning consistency. There is evidence of unequal weighting, although not too severe to impair the results (w2(Brand)=.18; w2(Purity)=.49).

The main effect for the 3 x 3 analysis also shows an F value of 39.07 (p<.001, w2=.42). Furthermore, the interaction between the watered juice and the juice in the opaque white bottles (giving no information about purity) is significant (F=5.9, p=.003). Thus the cross-over effect visible in Figure 2, suggesting averaging, is statistically significant.


The overall results obtained in the research lends clear support for averaging, and - by implication - the many studies that have found similar results for a host of decisions in the laboratory setting. This latter point is really the nub of the issue, because there is an abundance of published research work that has already led to generalizations being made about the use of averaging integration models; these generalizations have been strengthened by the present research, and hopefully made more meaningful to marketing readers. Even further, it is even possible that some review of Information Integration methods and the consequent development of an averaging judgement paradigm may be made in future marketing texts.

To really address the issue of external validy will call for further work along the same lines, but moving yet closer to the marketplace. Thus, perhaps, a situation might be contrived where consumers are observed using their own money to back their judgements of products decribed with different information on their labels. The control problems here, though, are formidable.

The practical implications of the support for averaging can be discussed at a general and specific level. Specifically, of course, the conclusion is very clear - Woolworths in New Zealand should note that using their own brand name for high quality merchandise will probably lower the quality perception of the good! In a more general sense, any marketer can fairly easily establish the evaluative criteria for his or her products or services, which can then be evaluated by a sample of the target market. In reality it is not always ethically or practically possible to manipulate the product attributes to the extent utilised in the present experiment setting; but certainly efforts should be made by marketers to change perceptions of marginally positive attributes, or to emphasize or de-emphasize relevant information accordingly.

For advertisers, the recommendation that flows from this work is to minimize advertised material to the principal benefits and not to lower overall attitudes with the inclusion of extra, less positive information about a product or service. This coincides with the old advertising maxim about effective messages containing a single, unique selling point, in that further information may detract from overall evaluation of the product unless the new information is at least as positive as the old.



The most important implication, within the present context, is that techniques of information integration seem sufficiently robust to cope with the unravelling of judgements made by consumers, even when those product evaluations are made in a realistic manner and within a naturalistic market setting. Even the potential problem caused by the unequal weighting of the attributes was insufficiently serious to invalidate the results. Thus, it does seem as if wider acceptance of IIT within the marketing community would be justified, at least on the score of external validity.


Anderson, Norman H. (1965), "Averaging Versus Adding As A Stimulus Combination Rule In Impression Formation," Journal of Experimental Psychology, 70 (4), 394-420.

Anderson, Norman H. (1970), "Functional Measurement And Psychophysical Judgement," Psychological Review, 77 (3), 153-170.

Anderson, Norman H. (1974a), "Algebraic Models In Perception," in Handbook of Perception, Vol.II, Psychophysical Judgement and Measurement, Edward C Carterette and Morton P. Friedman (eds), New York: Academic Press, 215-298.

Anderson, Norman H. (1974b), "Information Integration Theory: A Brief Survey", in Contemporary Development In Mathematical Psychology, Vol. 2, Krantz D.H., R.C.Atkinson, R.D.Luce, P.Suppes (eds), San Francisco: W.H. Freeman and Company.

Anderson, Norman H. (1981), Foundation Of Information Integration Theory, New York: Academic Press.

Anderson, Norman H. (1982), Methods Of Information Integration Theory, New York: Academic Press.

Anderson, Norman H. and Margaret A. Armstrong (1989), "Cognitive Theory And Methodology For Studying Marital Interaction", in Dyadic Decision Making, David Brinberg and James Jaccard (eds), New York: Springer-Verlag, 3-50.

Assael Henry (1988), Consumer Behaviour And Marketing Action, Melbourne: Thomas Nelson Australia.

Bennett Peter D. (1988), Marketing, New York: McGraw Hill Book Co.

Bettman James R., Noel Capon and Richard J. Lutz (1975), "Multiattribute Measurement Models And Multiattribute Attitude Theory: A Test Of Construct Validity," Journal of Consumer Research, 1 (March) 1-15.

Cohen Joel B., Paul W. Miniard and Peter R. Dixon (1980), "Information Integration: An Information Processing Perspective," Advances in Consumer Research, 7, 161-170.

Ebbersen Ebbe B. and Vladimir J. Konecni (1980), "On The External Validity Of Decision-Making Research: What Do We Know About Decisions In The Real World", in Cognitive Processes In Choice And Decision Behaviour, Thomas S. Wallsten (ed), Hillsdale, New Jersey: Lawrence Erlbaum Associates Publishers, 21-45.

Engel James, Roger D. Blackwell and Paul Miniard (1990), Consumer Behaviour, Chicago: The Dryden Press.

Fishbein Martin and Icek Ajzen (1975), Belief, Attitude, Intention And Behaviour, Reading, Mass: Addision-Wesley Publishing Company.

Hays, W. J. (1963). Statistics for psychologists. New York: Holt, Rhinehart, Winston.

Howard John A. (1989), Consumer Behaviour In Marketing Strategy, New Jersey: Prentice-Hall, Inc.

Keppel, G. (1973). Design and analysis: A researcher's handbook. Englewood Cliffs, NJ: Prentice Hall Inc.

Kotler Philip and Gary Armstrong (1989), Principles Of Marketing, New Jersey: Prentice-Hall, Inc.

Levin Irwin P., Jordan J. Louviere, Albert A. Schepanski and Kent L. Norman (1983), "External Validity Tests Of Laboratory Studies Of Information Integration," Organizational Behaviour and Human Performance, 31, 173-193.

Lynch John G. Jr. (1985), "Uniqueness Issues in the Decompositional Modelling of Multiattribute Overall Evaluations: An Information Integration Perspective" Journal of Marketing Research, Vol.22 (February), 1-19.

Markin Rom J. (1980), "The Role Of Rationalization In Consumer Decision Processes: A Revisionist Approach To Consumer Behaviour," Journal of The Academy of Marketing Science, 7 (4), 316-334.

Peter Paul J. and Jerry C. Olson (1990), Consumer Behaviour And Marketing Strategy, Homewood, Il: Irwin.

Rao, V. R. (1977). Conjoint measurement in marketing analysis. In J. N. Sheth (Ed.), Multivariate methods for market and survey research (pp. 257-286). Chicago, IL: American Marketing Association.

Robertson Kim R. and Roger Marshall (1987), "Amount of Label Information Effects on Perceived Product Quality and Effectiveness", International Journal of Advertising, London, Vol.6, No.2.

Shanteau James (1988), "Information Integration Theory Applied To Consumer Behaviour," in Proceedings of the Division of Consumer Psychology 1987, L.F. Alwitt (ed), Washington D.C.: American Psychological Association, 100-102.

Shanteau James and Charles H. Ptacek (1983), "Role And Implications Of Averaging Processes In Advertising," in Advertising and Consumer Psychology, Larry Percy and Arch G. Woodside (eds), Massachusetts: Lexington Books, 149-167.

Troutman, C. Michael and James Shanteau (1976), "Do Consumers Evaluate Products By Adding Or Averaging Attribute Information," Journal of Consumer Research, 3 (September), 101-106.

Troutman, C. Michael and James Shanteau (1977), "Inferences Based On Nondiagnostic Information," Organizational Behaviour and Human Performance, 19, 43-55.

Troutman, C. Michael and James Shanteau (1989), "Information Integration In Husband-Wife Decision Making About Health-Care Services," in Dyadic Decision Making, David Brinberg and James Jaccard (eds), New York: Springer Verlag, 117-151.

Vaughan, G. M., & Corballis, M. C. (1969). Beyond tests of significance: Estimating strength of effects in selected ANOVA designs. Psychological Bulletin, 72(3), 204-213.