Understanding and Analyzing Contingency Data

Edgar A. Pessemier, Krannert Graduate School of Management, Purdue University
[ to cite ]:
Edgar A. Pessemier (1979) ,"Understanding and Analyzing Contingency Data", in NA - Advances in Consumer Research Volume 06, eds. William L. Wilkie, Ann Abor, MI : Association for Consumer Research, Pages: 606-610.

Advances in Consumer Research Volume 6, 1979      Pages 606-610

UNDERSTANDING AND ANALYZING CONTINGENCY DATA

Edgar A. Pessemier, Krannert Graduate School of Management, Purdue University

When analyzing individuals' manifest judgments about the attributes of objects, the process by which these data are generated becomes a subject of real interest. Although the object's attributes may be continuous measurable or discrete, in principle the individual's response to any object-attribute stimulus must be categorical (Young). Not infrequently, the response categories are very limited. For example, a judge can say yes, no, or don't know to the question, "Does this object have this attribute?" Frequently, these responses will be code 1 if yes and otherwise 0. Other coding schemes may be used, e.g., yes = 1, don't know = 0 and no = -1. Several methods for analyzing this type of perceptual data will be described at a later point. Now, attention will be devoted to examining alternative psychological processes which might lead to individual judgments in the form noted above.

A large experimental literature supports the general concept of discriminal dispersion first described by Thurstone (59) and recently elaborated by Luce (77), Yellott (77) and others. This theory states that an external stimulus or signal generates an internal representation. The transformation is stochastic, producing different internal values in response to each exposure to the stimulus. These internal representations form a distribution along a psychological continuum. This distribution can be characterized by its location, shape and dispersion. In most work, a normal, lognormal or logit distribution has been employed. When the particular theory is applied to comparative and categorical judgments, it is possible to scale stimuli and/or stimuli boundaries along a psychological continuum.

COMPARATIVE JUDGMENTS (SEE BOCK & JONES 68)

Here, a standard stimulus is selected and each of the remaining stimuli are compared to the standard. Each of the remaining stimuli are judged to be greater than, equal to or less than the standard stimulus. The instantaneous internal values of the standard and the remaining stimuli determine whether each stimulus is judged to be greater than, equal to or less than the standard. In turn, these instantaneous values depend on the location, shape and dispersion of the standard and other stimuli.

In the present context, the standard stimuli can be thought of as a threshold level for responding yes to the question, "Does this object have this attribute?", If yes, don't know, and no responses are admissible, then a high and low standard can be used. When the instantaneous internal value of the test stimulus falls above the instantaneous value of the high threshold, the response is yes. When the test stimulus value falls below the low threshold, the response is no. When the test stimulus value falls between the high and low thresholds, the response is don't know.

In the above discussion, responses are assumed to have purely subjective, perceptual content. If objectively correct responses can be determined, there is no clear cut payoff for making the right judgment. Furthermore, in the case of many object-attribute stimuli, e.g., "Does car A have a racy appearance?', an objectively correct response cannot be specified. In these cases, judgmental thresholds will tend to be subject specific. Some subjects will provide more yes, don't know, or no responses than other subjects but their response category propensities on a judgment will not be associated with the payoff from a correct or in correct choice of category. Under these conditions there appears to be little reason for response criterion to reflect anything but the internal value of the stimuli and past experience with encoding the value as a yes, don't know and no. Nevertheless, the attribute may influence the criterion. A difficult-to-judge attribute (larger dispersion) may spread the locations of the thresholds, increasing the proportion of don't know judgments.

The thresholds' locations (and spread) may be selected in relation to experience such that all objects are judged to have some significant proportion correctly judged in the yes and/or no categories. In the absence of these results, the threshold would serve no useful purpose. The conflicting forces determining the best proportion relate to discrimination and information content. Rarely observed values will contribute little to the average capacity to discriminate among objects but rare observations are strong discriminators of single objects [16]. An individual's selections of response thresholds may be subconscious and inaccessible to direct observation. Furthermore, they may be subject to both unconscious learning and maturation.

SIGNAL DETECTION THEORY (SEE COOMBS, DAWES & TVERSKY 70)

Signal detection theory separates the sensory aspects of how an individual encodes each stimulus and how an individual makes decisions about responses to the internal value of the stimulus. Here, the signal being sent is known. Due to noise in transmission or encoding error, the internal values of signal or stimuli may not be a good representation of the known values. Furthermore, the response of an individual may be conditioned by the payoffs associated with giving a particular response, when this response does or does not match the stimulus or transmitted signal. Take a simple case where y = signal sent, n = no signal and a respondent must answer Y = yes, signal present, or N = no, signal absent.

The possible signal-response states are:

VYy, - VNy, - VYn, VNn

Since the V's are the values assigned to outcomes, the expected value of a yes and no, given an observation x (and its associated internal value) is

E(Y|x) = VYyp(s|x) - VYnp(n|x)

and

E(N|x) = VNnp(n|x) = VNyp(s|x).

The response should be yes as long as E(Y|x) > E(N|x). As noted below, the best decision is the one for which the response is yes if the likelihood ratio, 1(x), is greater than or equal to the threshold,

EQUATION

In the context initially described here, judgments are assumed to be perceptual and outcome costs are very and/or roughly equal. When this is the case,

EQUATION

The subjective probabilities on the left hand side of the above equation are functions of the individual's discriminal processes.

TEST THEORY (BIRNHAUM 68, RASCH 66).

The general concepts and computation apparatus employed in theories of comparative judgments and in signal detection theory can be used in more complex categorical judgment tasks. In the case at hand, these extensions will contribute little new. Test theory, however, offers some interesting additional insight, particularly as it relates to the Law of Comparative Judgments (Brogden, 77). The Rasch and Birnhaum models express the probability that individual i will correctly respond to item (signal or stimulus) a, as:

EQUATION

where

exp qi   = the individual (ability) parameter,

and

exp da  = the stimulus (difficulty) parameter.

The model involves distributions of ability across individuals and distributions of difficulty across tests. Assuming the item's difficulty and the respondents ability all refer to the same domain, a con, non scale may be employed. Therefore, responses are the result of sampling from the two distributions and computing that pia as a function of the distance qi - da. Note that

EQUATION

where L-l(pia) is the logistic inverse, and is similar to the normal inverse that appears in the Law of Comparative Judgment.

OTHER RELATED MODELS

Closely related to the test theory model is the Guttman scale (Torgerson, 58) along which both stimuli (items) and individual respondents may be scaled. In this case, it is assumed that items can be located along a continuum according to the degree to which they possess the attribute being scaled and that individuals can also be located along the same continuum. Items above the respondent on the scale will receive one response (yes, no; agree-disagree;...) and items below the respondent will receive the other response. Since this model is deterministic, the location of the stimuli and individual forecast the individual's complete response set.

Other related procedures are various kinds of bioassay and reliability theory models where the strength of treatment or time in use is used to predict the probability of an event such as death, recovery, purchase, failure etc. Events are discrete and the treatments increase the probability that an event occurs. Individuals have different thresholds at which the treatment level or stimulus triggers the event. The use of Probit, Normit or similar form of analysis lead to a linear equation linking the amount of the treatment with the predicted proportion of the treated population that will respond, e.g., die, recover, buy the product once, or say yes. When this is the case, alterations in the physical properties of products may be used to forecast the proportion of all potential buyers who would respond that the product had a particular attribute, say sweet flavor.

Finally, multivariate versions of some of the above models may also be found. For example, the multivariate logistic distribution might be used to estimate the degree to which two or more variables influenced the proportion of the treated subjects who positively responded.

Summary

To one degree or another, the above models represent response theories linking specific types of responses, yes, no, and possibly don't know to the question, Does this stimulus object have this attribute? Each model includes specific assumptions about human perceptual judgments and responses but differences are less noticeable than similarities. All models involve an internalized value of each stimulus and one or more thresholds or standards used by subjects to make categorical responses. None of the theories directly indicate how to determine the appropriate response for an individual since the internal value of the stimulus is not observable and the consequences of Y, N, DK responses to purely perceptual judgments offers little guidance. Correctness, truthfulness, and social conformity may be significant motives but it is far from clear how to incorporate these factors in models of individual response.

INFORMATION AND DISCRIMINATION

Having devoted attention to some alternative response models for contingency data, it is useful to look at the general behavioral influences affecting the observed responses in a particular case. Three elements deserve special attention; how objects enter an individual's evoked set, the attributes that are employed in the perceptual process applied to this set and how objects are encoded into contingency data.

The Evoked Set. For any class of objects, the more similar a new object is to the objects already in an individual's evoked set, the less perceptual variety it will contribute to an evoked set. Generally, the utility of perceptual redundancy among objects is Low. It is the different object that gets attention and becomes a new addition to the set of objects about which the individual is most aware. Therefore, individual search is directed towards finding collections of objects that differ in important ways from each other and not simply adding highly similar objects.

The Attribute Set. Given the above incentives, it is plausible to believe individuals will use attributes that effectively discriminate among the objects in their evoked sets. Non-discriminating attributes do not help an individual identify genuinely new or surprising objects. Therefore, individuals are expected to store and use attribute data that help enhance the useful variety of objects in their evoked sets. Their attributes are learned from experience and usually have evaluative content.

The Stored Contingency Data. At this point it is useful to recall that the information content in a message, Object _________ has (does not have) attribute __________, depends on the probability that any object of the type in question has the attribute. The information content of an event whose probability of occurrence is x is

h(x) = -log x.

The expected information (before knowing which object is being evaluated) is

H(x) = -S ni=1 xi log xi  

and the expected information is a maximum when x = .5 for each event (Theil, 67). Therefore, before knowing which object in a set must be identified, an individual will select independent attributes such that each one is possessed by just half the objects and there is no uncertainty (don't know judgments) about the presence or absence of the attribute.

Summary

Individuals seek evoked sets of objects (objects about which they are knowledgeable) which are varied and can be readily identified. To accomplish this purpose, the individual selects meaningful attributes across which objects can be expected to significantly vary. Finally, the selection of attributes, objects and the encoding scheme seeks to balance the presence and absence of each attribute across each object, thereby maximizing the expected information content of messages about the set's perceptual content.

INFORMATION, LEARNING AND MATURATION

When confronted with questions about the presence or absence of an attribute for a genuinely new or novel object, the candid individual must answer "don't know" to every object attribute question. As learning takes place, an assortment of determinant attributes and growing number of non-redundant objects are associated with the evoked set. In this sense, the perceptual map of the individual grows in both complexity and clarity. Early, each new stimulus tends to have a high degree of novelty and receives much attention. Later, additions to the "new" class of objects tend to be fitted into an established structure and it becomes progressively more difficult to make the individual aware of new additions. At some point, knowledge about the evoked set may become unitized in the manner described by Barbara Hayes-Roth, 77) and the objects and their attribute evaluations may become resistant to change. Ultimately, new additions will tend to displace old objects in the evoked set.

A second way to look at the learning process is to re- define(Luce's 77) b operator non-linear learning model. To do so, let the strength of a response to object i and attribute j on trial n be vijn. The strength can be a simple inverse function of the discriminal dispersion of object i along the jth attribute continuum. The receipt of an informative exposure on trial n will increase vij,n-1 , the strength of the response (or decrease the prior dispersion of i along j). The constant proportional change from trial to trial is aij. If no informative exposure is received on trial n, the prior strength vi~ n-1 will decrease by the constant proportional change bij. Therefore, one of two events happens on each trial,

vijn = aij vij,n-1     informative trial

or

vijn = bij vij,n-1    uninformative trial

where

vijn = (discriminal dispersionj of i on j at trial n) -1

and

aij > 1 > bij.

For a simple, single threshold model, it follows that the probability of a yes on the nth trial is

EQUATION

when

Bij = aij/bij.

The time series profile of various learning and forgetting sequences can be represented by the above model and it can be adapted to cover yes, no and don't know responses.

PRELIMINARY DATA ANALYSIS

Once a set of contingency data has been collected from a group of individuals, two questions arise; how, if at all, can the data be pooled and what, if any, preliminary transformations may be useful? The first question largely concerns the degree to which response differences are due to noise or due to heterogeneous classes of respondents. Some types of individuals may be well informed and perceptive while others may be ignorant and unperceptive. Whatever factors may be at work, an initial effort to find response typologies by clustering methods should be instructive. It may indicate the absence of important response taxonomies or it may lead to classifying the subjects according to their response profiles. In the latter case, each group's contingency data would be separately analyzed and reasons for group difference could be investigated by discriminant analysis or analysis of variance methods.

The second data question concerns potential data transformations. Several important related issues will be discussed in later sections. Here, three transformations deserve comment. If differences in the overall frequencies of yes, no and don't know responses are present, normalizing the values of the responses may be considered. Doing so changes the numerical values of yes (and possible don't know and no) responses so each subject's data sum equals the mean sum across all subjects. Another possible transformation would weight each subjects object-attribute response by the information content of the object-attribute responses for the average subject. In other words, more weight would be assigned to rare attributes, those possessed by only a few objects. Third, a priori weights could be assigned to responses for each object and/or attribute to achieve some research objective. For example, effective salience weights might be employed.

Whatever analysis sample or samples are finally chosen, and/or whatever preliminary transformation has been applied, one or more aggregate data matrices are employed in subsequent analyses. Each cell of each matrix contains the sum across subjects of the entries in an individual's object by attribute contingency matrix. Finally, if repeated measures are obtained over time, one or more aggregate matrices would represent each cross section.

DATA GENERATION MODELS - AN ALTERNATIVE VIEW

Another way to look at the yes, no, don't know response to perceptions about objects is to consider a multidimensional space spanned by components of object attributes. For an extended discussion of such spatial models, see (Green and Carroll, 77), (Lingoes 77) and (Pessemier 77), and the associated bibliographies. Consider a perceptual space (component reduced space) containing 3 objects, x, y and z, and three (unit length) attribute vectors. Here, orthogonal projections of objects on attributes defined the attribute levels of each object.

The above perceptual structure can be used to generate contingency data of the type discussed above by observing that

a) Objects with high (positive) attribute levels will be judged "attribute present" and objects with low (negative) attribute levels will be judged "attribute absent." The present-absent region may be the space falling outside a hyper-sphere with its center at the origin of the space. Alternatively, it may be the regions beyond asymmetric cut offs along each attribute.

b) DK responses will occur whenever the internal value of an object falls inside the boundaries of the present-absent region. The effect of this region on DK responses is similar to the effect of discriminal dispersion on DK responses. When discriminal dispersions are large, the don't know response rate is relatively high. This rate decreases as discriminal dispersion decreases. When the hypersphere is large, more values fall in the don't know response region and as it decreases, the rate of don't know responses declines.

In the above formulation, early in the life cycle of product or product class, products will tend to have a large diameter hypersphere and high don't know response rates. Later in the product life cycle, the diameter of the hypersphere will shrink to reflect the increasingly clear product perceptions (small discriminal dispersions of more experienced individual judges). This time-dependent change is likely to be a significant aspect of the adoption-diffusion process. Advertising and the observation of others doubtless rank high among influences on the rate at which the perceptual process clarifies.

Although the details will not be discussed here, Simulated perceptual data can be readily generated which reflect the above structural characteristics and dynamic behavior. The simulated data are initially produced in interval form and then converted to yes, no, don't know responses. Therefore, the simple recovery characteristic of various multivariate procedures can be directly examined. In this manner, changes in discrimination and recovery can be examined over time and for data that reflect various levels of perceptual clarity. Finally, the reduced perceptual spaces produced with metric data can be compared to perceptual spaces developed from the analysis of contingency data. Several approaches are discussed in the next section which assume a homogenous population.

MODELS FOR THE ANALYSIS OF CONTINGENCY DATA

The simplest approach to the analysis of contingency data would be non-dimensional. For example, the THAID algorithm could find the extent to which individuals' yes, no and don't know judgments about the presence or absence of attributes discriminate one object from another (Morgan and Messenger, 73). The result of such an analysis would be a hierarchical scheme or tree which indicated how attribute judgments could be best used to identify an object. When developed at regular time intervals, the trees could easily represent the perceptual-communications dynamics of a product class over the adoption diffusion-cycle or the product life cycle.

A second approach is dimensional. It uses object-attribute contingency data solely for the purpose of developing a proximity matrix from pooled contingent judgments. A number of proximity measures, e.g., weighted or unweighted distances and correlation indices, could be used for this purpose, Depending on the choice of proximity measure and the specific objective, a variety of metric and non-metric multidimensional scaling algorithms are available to obtain a reduced space object configuration. The inter-object distances in this space can be interpreted as object-object dissimilarities. As an aid to interpretation, attribute (vectors) may be located in the same space.

A third approach transforms proportions or probabilities. Thurstone's theories of discrimination suggest the use of response function deviates. If pij is the proportion of yes answers to the question, "Does object i have attribute j? and F-1 is the inverse response function, the desired deviate (distance) measure is

yijF-1 (pij).

The yij values would appear in each cell of the attribute by object matrix which is subjected to further analysis.

A related approach transforms the proportion or probability into the entropy of the observation (message). In this case, the cell entries in the transformed contingency matrix are

yij = ln(pij).

With respect to either method outlined above, the question of how proportions are computed becomes a question of concern. For example, it may be computed over all possible positive responses, over all possible positive responses less the don't know responses, or over only the positive responses for an attribute. The following section will illustrate some of these possibilities.

A fourth approach is also dimensional but makes direct use of the pooled contingency judgment. Ries [14] has described the basic computations required to use the approach originally suggested by Benzecri [1]. The model described below expands their work by including yes, no and don't know judgments. Subjects judge m objects, Om, by noting the presence or absence of each of N primary attributes, An. The presence of an attribute is recorded by a one, the absence of an attribute is recorded by a zero and uncertainty about the presence or absence of an attribute is recorded by a blank. By using an "ambiguity" attribute to record the number of uncertain attribute judgments or blanks, the level of perceptual ambiguity surrounding each object can be easily measured.

Recording as many positive attributes as necessary to describe an object extends the perceptual domain used by (Ring 78) where only one object, the one judged to have the most (or least) of an attribute, received a score of one. Furthermore, including an ambiguity attribute and allowing uncertainty judgments insures that an explicit judgment is recorded for every attribute on every object. The judgments made by single subjects can be recorded in a matrix format. Pooled data for groups of subjects can be recorded in a similar matrix layout. If attributes are rows, for any object column, the count of 1, measures the object's attribute complexity, the count of Os measures the attribute's simplicity, and the count of blanks measures the object's ambiguity. On the other hand, attributes can be analyzed by rows where the count of is measures the attribute's rarity and the count of blanks measures the attribute's ambiguity. If the row, column and grand mean of non-zero entries in the data matrix are nj., n.j, and n.. , then a matrix of "normalized deviations" has entries given by

EQUATION

The Y matrix of "normal deviations" is decomposed and plotted by the computational procedure described by (Ries, 74)and programmed for convenient application by (Jarboe 78) The output of this analysis yields a product map in which the attribute "product ambiguity" is appropriately represented and onto which the level of each product or brand can be projected.

The static characteristics of this map are of interest and can be used for a variety of analytical purposes. Of greater interest, however, are dynamic characteristics of the map, particularly during the earlier phase of a product's or product class' life cycle. During this period, the repeated use of contingency maps of the type just described can track the changing levels of primary product attribute perceptions and the changing levels of ambiguity about products and their various attributes.

SUMMARY

The practical value of contingency data has been emphasized. Investigators who need simple response modes, e.g., in telephone surveys, yes-no or yes-no-don't know responses may be essential. Also, respondents usually find these categories are natural verbal responses to a wide range of judgmental questions. In questionnaires, these formats are among the easiest for respondents to use and for investigators to code for further analysis.

The main section of the paper briefly described alternative psychological models for generating individual judgments recorded in the forms noted above. Principal attention was devoted to judgments about continuous attributes of objects. These data are needed to produce perceptual maps for various classes of choice objects such as candidates, automobiles or restaurants. Also, interpretation of the frequency of positive and don't know responses was discussed, particularly as the rate may be a function of individual information processing and maturation.

The final section dealt with various procedures which can be used to transform contingency data and produce spatial representations of objects that are interpretable in terms of determinant attributes. The direct analysis of don't know responses was also discussed. Preliminary field tests of several methods of data collection, reduction and interpretation are underway. Perhaps other investigators will test the appropriateness of the alternative psychological models which may underlie the manifest data.

REFERENCES

J. P. Benzecri, L'analyse de donnees Vol. 2, L'analyse des correspondences, Dunod, Paris, 1973.

A. Birnhaum, "Some Latent Trait Models and Their Use in Inferring an Examinee's Ability," in F. M. Lord and R. N. Novick, eds., Statistical Theories of Mental Test Scores, Addison Wesley, 1968.

R. Bock and Lyle V. Jones, The Measurement and Prediction of Judgement and Choice, Holden-Day, 1968.

H. E. Brogden, "The Rasch Model, The Law of Comparative Judgment and Additive Conjoint Measurement," Psychometrika, Vol. 42, December 1977, pp. 631-634.

Clyde H. Coombs, Robyn Dawes and Amos Tversky, Mathematical Psychology, Prentice-Hall, 1970, pp. 165-200 and 270-273.

Paul E. Green and J. Douglas Carroll, Mathematical Tools for Applied Multivariate Analysis, Academic Press, 1976.

Glen R. Jarboe, Market Map Analysis of Patronage Behavior, unpublished dissertation, Purdue University, (in preparation).

Barbara Hayes-Roth, "Evolution of Structures and Processes,'' Psychological Review, 94, May 1977, pp. 260-278.

James C. Lingoes, ed., Geometric Representations of Relational Data, Ann Arbor: Mathesis Press, 1977.

Duncan R. Luce, "Thurstone's Discriminant Process Fifty Years Later," Psychometrika, Vol. 42, December 1977, pp. 461-489.

James N. Morgan and Robert C. Messenger, THAID: A Sequential Analysis Program for the Analyses of Nominal Scale Dependent Variables, Institute for Social Research, University of Michigan, 1973.

Edgar A. Pessemier, Product Management: Strategy and Organization, Wiley/Hamilton, 1977, pp. 205-258.

G. Rasch, "An Individualistic Approach to Item Analyses," In P. L. Lazarsfeld and N. W. Henry, ed., Readings in Mathematical Social Science, Chicago: Science Research Associates, 1966.

Paul N. Ries, "Joint Space Analysis of Contingency Data," Proctor & Gamble, 1974.

Larry J. Ring and Charles W. King, "A Multiple Discriminant Analyses Approach to the Development of Retail Store Positioning," Advances in Consumer Research, H. Keith Hunt, ed., Vol. V, Ann Arbor: Association for Consumer Research, 1978, pp. 227-234.

Henri Thiel, Economics and Information Theory, North-Holland, 1967.

L. L. Thurstone, The Measurement of Value, University of Chicago Press, 1959.

Warren S. Torgerson, Theory. and Methods of Scaling, John Wiley & Sons, 1958.

John I. Yellott, Jr., "The Relationship Between Luce's Choice Axiom, Thurstone's Theory of Comparative Judgment, and the Double Exponential Distribution," Journal of Mathematical Psychology, 15, 109-144 (1977).

Forrest W. Young, "Optimal Scaling With a Variety of Models," Psychometric Laboratory, University of North Carolina, (undated).

----------------------------------------