A Measure of Halo

Joel Huber, Columbia University
William James, University of Alabama
ABSTRACT - Several different ways to measure halo are reviewed and a new measure proposed which is logically consistent and simple to implement. The use of familiarity judgments as an intervening variable is proposed as a way to estimate the degree of directional flow in halo judgments.
[ to cite ]:
Joel Huber and William James (1978) ,"A Measure of Halo", in NA - Advances in Consumer Research Volume 05, eds. Kent Hunt, Ann Abor, MI : Association for Consumer Research, Pages: 468-473.

Advances in Consumer Research Volume 5, 1978      Pages 468-473


Joel Huber, Columbia University

William James, University of Alabama


Several different ways to measure halo are reviewed and a new measure proposed which is logically consistent and simple to implement. The use of familiarity judgments as an intervening variable is proposed as a way to estimate the degree of directional flow in halo judgments.


Halo effect appears to be one of those concepts which (like a rainbow) is clear at a distance but dissolves on closer inspection. Halo occurs where affect or preference for an object biases attribute judgments in such a way as to make the judgments more consistent with preference. It may be lamentable, but is certainly understandable and, in certain situations, even quite reasonable. Up close, however, the rainbow fades. It becomes difficult to determine where the edges of halo are; where it exists and where it does not. Its central prerequisite, that changes in affect cause shifts in perception, becomes hopelessly confounded with the alternative explanation, that shifts in perception cause changes in affect. Various definitions of halo, while possessing a degree of face validity, do not allow an unequivocal measure of the strength of the bias due to halo or even its direction. In this study, the use of attributes which have clear physical counterparts, such as size or temperature, allows a relatively independent measure or bias due to halo. It then becomes possible to treat halo as a variable that can be explained by other variables. This begins to move halo effect from the status of a vague explanation to an operationally defined theoretical construct.

Halo has been defined in many ways. These are summarized in Table 1. Much of the work is by psychologists working with personnel evaluation forms. There it is found that a general attitude towards the person being rated, the ratee, results in a corresponding bias in the rating of more objective attributes, such as rated ability to perform a certain task. The measures of halo used have been of four kinds. The first is related to the correlation between attributes, the second to the variance of those attribute ratings for a given subject, the third as an additive interaction between the rater and the object, and the final measure is related to the degree to which preference predicts attribute judgments.

Defining halo as the excessive correlation between attributes goes back to the initial conceptualization of the concept (Wells, 1907; Thurstone, 1920). To the extent that preference causes attribute judgments, it will result in a spurious correlation between the attributes due to a preference factor. Unfortunately, as is pointed out by Bingham (1939), there is a degree of correlation expected even without the halo effect being operative. For example, it is reasonable to expect, for most subjects, that cornering ability and sportiness in automobiles correlate with preference regardless of halo. The problem of determining when the level of correlation becomes excessive has not, and probably cannot, be answered. Taylor and Hastman (1956) use average attribute correlation as a measure of halo to see if different preliminary instructions to raters reduce halo. While instructions as to the pitfalls of halo do appear to reduce the average correlation between attributes, it is impossible to determine whether halo has been reduced or whether raters have merely learned to make artificially independent ratings.



Brown's (1968) definition of halo in terms of attribute variance is based on the notion that a given overall rating will produce correspondingly high or low ratings on the components. If the attributes are coded so that they all produce affective response in the same direction (e.g., more is better), then halo is hypothesized to reduce the variance of the attribute judgments about any given subject. Once again, however, there is no way to tell when the variance is sufficiently high to indicate lack of halo. By pointing out to raters the problem of low variance, Brown (1968) and Borman (1975) are able to significantly increase attribute variance after training, but it is unreasonable to conclude that the change represents less halo effect rather than an artifactual scale effect by compliant subjects.

Working across raters, Guilford (1954) defines halo as a rater x object interaction term. By this definition, the preference of a rater for a particular object is hypothesized to result in a constant added to or subtracted from each attribute for that object. Thus, if a particular rater likes an object, ratings on all dimensions should increase for that subject. There are two things to note about this formulation. First, as with the variance measure, all attributes must be coded in the same direction. Given halo bias, preference for an object will increase ratings on attributes where "more is better", decreasing those where "more is worse". The effect of two oppositely coded attributes tends to lessen or cancel Guilford's interaction term. The second problem is conceptual. We have here defined halo to be the effect of preference on attribute ratings. But preference need not be the only cause of a rater x object interaction. Attributes can themselves have a degree of intercorrelation so that a high rating on one attribute may lead to distorted ratings on other attributes. For example, in rating candidates, if one is considered to be intelligent, he is also likely to be rated higher on many other ratings, thus resulting in a rater x object interaction. To the extent that there are factors other than preference contaminating judgments, Guilford's measure will overestimate the halo effect.

There are, however, two advantages with this formulation of halo. First, even though significant rater x object interaction does not positively indicate the presence of halo, a lack of interaction argues strongly for the lack of halo effect. Second, the method provides a reasonable measure of halo defined in the larger sense of the biasing effect of all attributes, including preference, on each other. This, in contrast to either the correlation or the variance definitions, provides an independent measure of halo which can then be used as a basis to begin to understand the phenomenon.

Beckwith and Lehmann (1975) provide a definition of halo that goes even further towards making the concept operational. While it is not totally clear what their measure is, halo appears to be defined as the coefficient of preference in a regression predicting attribute ratings once the effect of average belief has been removed. This is:


Equation (1) is parameterized across objects rated by a given subject on an attribute. The idea here is that individual belief can be decomposed into two additive components, the first component representing the effect of average beliefs, which is assumed to be relatively halo free; and the second component representing the effect of preference which alters perception through halo. The larger dj2 the greater halo is presumed to be for that subject.

Beckwith and Lehmann use a system of simultaneous equations and two-stage least squares to estimate the parameters in their system, which, while commendable in principle, resulted in an indeterminate solution. In the case of the Beckwith and Lehmann paradigm, it appears that the coefficients of simple least squares estimation of Equation (1) result in very similar re-suits as the more complex simultaneous system. Thus, this definition of halo appears, in this instance, to be relatively independent of whether an individual or simultaneous system of equations is used.

Even taking the more simplified form of their model, there are two problems with measuring halo as in Equation (1). The first problem stems from the use of average belief as a surrogate to "true" belief. In doing so the model biases downward the effect of halo when there is considerable agreement across subjects with respect to preference and biases it upward when individual beliefs have a large, but legitimate, individual component.

For example, if different raters have similar tastes in objects, then average beliefs are likely to be contaminated by a kind of average halo effect. Thus, if all raters like a candidate, then it is reasonable that all attribute beliefs will be biased in a direction consistent with this average affect. In this case, the coefficient of Ai in Equation (1) will be attenuated since average attribute measures already contain a large component of individual preference. Thus, the effect of similarity of preference across subjects is to bias downward the measure of halo.

The opposite bias occurs when an attribute has a large, but legitimate, individual component$ Consider the attribute of "fit" as it is applied to the preference for a jacket for an individual. To the extent that individual body sizes differ, the average measure of fit of any given jacket is likely to be relatively unrelated to individual preference. A large man will not have a strong preference toward a jacket that has the highest average fit. By contrast, individual fit is likely to be highly correlated with preference and thus lead one to conclude erroneously that there is a strong halo effect with respect to that attribute.

The final difficulty with the Beckwith and Lehmann formulation is making behavioral sense of Equation (1). That is, individual belief is formulated as a weighted combination of average belief and individual preference. But, following a perceptive comment by Johansson et al. (1976), it is hard to see how B*ij can be considered an input to an individual's belief when it is generally not known to the individual. Thus it is nonsensical to assert that Equation (1) represents what an individual could be doing, consciously or unconsciously. Such an average belief might make sense if the individual were asked what he felt the average belief was. Such a solution, however, runs into problems with respect to the direction of causality since a person's perception of average belief may be seriously tainted by their actual belief.

Thus, there are three problems with the Beckwith and Lehmann formulation of halo. The first concerns the identification problems arising from the use of a simultaneous system of equations, the second concerns the bias in halo estimates brought about by the likelihood that average belief is itself contaminated by halo, and the third concerns the behavioral meaningfulness of Equation (1) given that average belief is not typically known to the person making the rating.


In what follows, a concept of halo is developed that seeks to alleviate or avoid the above problems. Simultaneity is avoided by having all measures simply be correlation coefficients. The use of the average subjects' belief is avoided by using attributes that have clearly defined physical counterparts. This objective measure is then used in place of the average belief. Finally, the problem of using an unknown input in the belief equation is avoided by defining halo as a function of the extent to which the objective value differs from the individual's belief. Thus the "true" value only enters to define perceptual error and as such is not expected to be known by the individual.

To illustrate the development, data from a study of 14 masters students with respect to their preferences toward 17 cities as sites for their first job is used. They were shown pairs of cities and asked which they would prefer to live in given that work related variables, such as opportunity for advancement, were kept constant. They were then asked how much more they would have to be paid per year to choose the less preferred city. From these dollar differences between pairs of cities, a one-dimensional preference scale was constructed for each subject using a procedure developed by Pessemier and Teach (1966) and described by Huber and James (1976). The process resulted in reliable individual preference scales of interval quality.

Three attributes were chosen for analysis, perceived size, average temperature, and opportunities for spectator sports in the city. These were measured on 10-point Likert Scales. The physical attributes are, respectively, population, average temperature, and the number of major professional sports teams in the city. Since the analysis is made on the basis of univariate relationships, no need was felt to include all determinant attributes. In this case, the attributes were chosen because it was relatively easy to find corresponding physical attributes that could serve as the basis for measuring perceptual error.

Thus the input to the measure of halo contains three elements for each individual: a preference or attitude score for each city, a set of beliefs for each city across three attributes, and finally a set of physical measures that correspond to the beliefs. The particular terms used and their operational definitions are provided in Table 2. Conceptually, halo is defined as the degree to which preference for an object biases beliefs about that object in such a way as to make them more consistent with the preference. Perceptual error is defined as the difference between the perceived rating of a city and the best fitting linear approximation given its "true" value. If this residual is positive it means that relative to other cities the individual has judged this city to have a higher rating than is justified given its physical qualities. Halo is then simply the degree to which preference accounts for this perceptual error. This is operationalized as the correlation between the residual of the psychophysical transform and the preference score.



An example may make the logic of this operationalization more clear. Suppose that within the range of cities tested that a person likes warm climates--that being in a warm climate is important. Under halo it is hypothesized that if a person likes a city if will be misperceived to be warmer than it actually is. If correspondingly, less preferred cities are seen to be colder than is justified, then there will be a high positive correlation between perceptual error and preference.

Notice that the direction and the size of the halo coefficient will be a function of whether the attribute is liked and how much it is liked. In the example, if warmth is disliked then those cities that are disliked will be perceived as colder--producing a negative relationship. In a similar way, the importance of an attribute should be related to the actual size of the halo measure. Thus if warmth, or coldness, is very important to an individual then there is more pressure to bring beliefs in line with preferences. One measure of the degree of importance is the simple correlation between preference scores and beliefs. This measure reflects whether in general a high level of preference is associated with a high level of belief.

These two measures, the halo measure and the correlational importance measure are provided in Table 3. The correlations across subjects are greater than .75 for each attribute. Moreover, it is visually clear that even within subjects the two measures covary strongly. Although this relationship is in part due to a computational artifact stemming from the use of preference in both measures, the strength of the relationship makes behavioral sense and lends credence to this measure of halo.



The relative importance of preference in determining attributes can be looked at from another perspective. For the three attributes, size, spectator sports and warmth, the physical values of these attributes accounted for an average of 61%, 60% and 78% of their variance, respectively. From values of halo in Table 3, it is evident that preference accounts for about 10% of the remaining variance. Thus, halo accounts for a relatively small proportion of the error in attribute judgments. This result, for attributes that have sharply defined physical counterparts, differs from the result of Beck-with and Lehmann study, using more subjective attributes. They found that preference accounts for about 30% of the variance in attribute judgments. This makes sense, the more subjective the attribute, the easier it is for preference to legitimately distort judged attributes. By contrast, the objectivity of a variable like size, or average warmth, makes it more difficult for subjects to distort its value.

Thus, this measure of halo enables one to estimate halo in a way that makes conceptual sense and is simple computationally; for reasons of parsimony alone it is to be preferred to the Beckwith and Lehmann version.

It has two disadvantages. First as operationalized it can only be applied to attributes that have clearly defined physical counterparts. While the method could be used by simply substituting the average subjective belief for the objective measure, such a solution would lead to the problem of average belief contaminated by halo referred to earlier. A partial solution is to develop elaborate psycho-physical transforms that predict the subjective attribute (say, good dating opportunities for a city) as a function of several objective attributes (say, the percent of population that is 18-25 years old and the number of nightclubs per capita).

The second problem is more serious but cuts across all measures of halo presented thus far. This is the problem of the direction of causality. There is an alternative explanation for the results we have presented which we shall call the "random experiences explanation." It will be shown that this accounts for the same results as the halo explanation.

Consider once again the person who likes warm cities. Suppose he erroneously considers a city, say St. Louis, to be warmer than it really is. Given that there is a strong positive affective value attached to warmth, St. Louis can be expected to have greater preference due to the misperceived warmth. Furthermore, the degree of preference bias will be approximately proportional to the degree of positive affective value attached to coldness. Thus, the random experiences and halo explanations account for the same experimental results. In the data provided so far, they are empirically equivalent. The difference between the two is in the direction of causality between beliefs and preference. The correlational methods used cannot determine causal direction. Moreover, it is likely that neither explanation is true to the exclusion of the other, but rather that causality runs in both directions to a greater or lesser extent depending on the subject and the attribute, The next section provides a way to begin to identify the predominant direction of causal flow for different attributes and subjects.


If the link between preference and perceptual error is primarily due to the translation of legitimate misperception of attributes into preference ratings, then one would expect this link to diminish as one becomes more familiar with the objects being judged. On the other hand, to the extent that the halo explanation is correct, selective perception should operate to keep the relationship between preference and perceptual error high even as one becomes more familiar with the stimuli.

In the present study, subjects were asked to specify level of familiarity with each city on a ten-point scale. This can be used to test the halo versus the random experiences explanations by estimating the following equation.

EQUATIONS  (2)   and  (3)

What is expected under the random experiences explanation is that familiarity will reduce the correlation between preference and error. Thus gj1 and gj2 should have opposite signs. If halo is operant, familiarity should have no effect and the coefficient of gj2 should be zero. Equation (2) can be estimated by changing its form to Equation (3) and simply using regression. This was done for each of the three attributes for the fourteen subjects in the study. Two-thirds of the cases tested did not have significant gj2 terms at a 0.20 level of significance, thus generally supporting the halo hypothesis. Of course, there were only 17 cities in the regression so the power of the test was not very strong. In addition, the correlation between Ai and FAMi C Ai results in multi-colinearity which further weakens the power of the test. Since, however, neither hypothesis is expected to be true to the exclusion of the other, the magnitude of the gj2 term can be used to segment subjects on the basis of relative magnitude of the halo effect.

Figure 1 provides the familiarity-adjusted halo level for subjects displaying low halo effects. Notice that for subject 1, preference is negatively correlated with misperceptions of average temperature, but that the degree of this negative correlation diminishes on those cities about which the subject is more familiar. This implies that as cities become more familiar, the perceptual bias due to preference will diminish. Had halo effect been the explanation for the relationship between preference and error, then the slope of the line would have been more nearly horizontal.

Subject number 2 shows a high positive correlation between preference and affective error. This results from a general preference for warm climates so that preferred cities are misperceived in the direction of being warmer. Once again, however, note that the strength of this bias decreases with increased familiarity. In both of these cases, one would hypothesize that these subjects would be receptive to new information about the warmth of the city and it would be reflected in their preference for cities.



By contrast, where the slope is not significant, it can be inferred that preferences are the primary cause of beliefs and that new information is not likely to reduce the bias due to preference. This can be inferred from the fact that for those attributes where -j2 is not significant, increasing familiarity does not reduce error.

In this study, approximately one-third of the attributes resulted in equations which contradicted the halo explanation at a .20 significance level. This implies that while halo is dominant in the majority of cases, there is a significant minority for which random experiences provides a better explanation. Therefore, it is important to make assessments of the relative importance of the two explanations on a case-by-case basis rather than attempting generalizations across all subjects or attributes. Seen in this way, the significance of gj2 represents a descriptive statistic that enables one to group subjects in terms of the degree of appropriateness of the halo explanation. The ability to measure the relative importance of halo has promise as a marketing tool. To the extent that a subject's perceptions are contaminated with halo, it is unlikely that further communication, contrary to this preference, is going to be accepted in an unmodified form. On the other hand, to the extent that marketers can identify those whose perceptual bias is reduced with greater familiarity, it is likely that those individuals will be good targets for marketing communications.


In contrast to other definitions of halo, the operationalization presented here is simple and logically follows from the idea of halo as perceptual bias due to preference. While the issue of the direction of causality between preference and beliefs is not completely resolved, the use of familiarity ratings provides a promising technique to identify those cases where the causal flow is predominantly in one direction or the other.

The study suggests several paths for future research. While in this model the physical value is defined as the "true" value of an attribute, other measures are possible. Conceptually, what is needed is that the "true" value approximate the halo free rating the individual would give provided enough time and information. For those variables that have clearly defined physical counterparts, the physical values serve this function well. As mentioned earlier, the average rating across subjects is likely to fail on both criteria; it will neither be halo free nor approximate the asymptotically true value for most individuals. To generalize the model, what could be done is to expand the psycho-physical transform to include several physical components that make up higher-order psychological ratings. For example, multiple regression could be used to predict ratings on sportiness in automobiles as a function of such measurable quantities as width-to-height ratio, cornering ability, as well as dummy variables such as rack-and-pinion steering and disk brakes. Such extended psychophysical transforms would be useful not only in measuring halo, but also in providing guidance to designers attempting to create a sporty car.

Familiarity proved to be a promising variable to discriminate between the halo and the random experiences explanations. In this case, a global familiarity rating on each city was used rather than separate familiarity ratings on each attribute. This oversimplification could result in problems to the extent that subjects are more familiar with one aspect of a city than another. Thus, the model could be improved by including a familiarity or degree of certainty estimate for each attribute.

Finally, experimentation is needed to determine if in fact those who appear to have less halo in their judgments are more susceptible to persuasive communications. If this turns out to be true, then the methodology provided here could become a valuable basis for segmenting subjects in terms of sensitivity to advertising or new information.


Neil Beckwith and Donald Lehmann, "The Importance of Halo Effects in Multi-Attribute Attitude Models," Journal of Marketing Research, 12(August, 1975), 265-275.

W. V. Bingham, "Halo, Invalid and Valid," Journal of Applied Psychology, 23(1939), 221-228.

Walter C. Borman, "Effects of Instructions to Avoid Halo Effect on Reliability and Validity of Performance Evaluation Ratings," Journal of Applied Psychology, 60 (October, 1975), 556-560.

Eva Metzger Brown, "Influence of Training, Method, and Relationship on the Halo Effect," Journal of Applied Psychology, 52(June, 1968), 195-199.

J. P. Guilford, Psychometric Methods (New York: McGraw Hill, 1954).

Joel Huber and William James, "The Marginal Value of Physical Attributes: A Dollarmetric Approach," presented at the 7th American Marketing Association Attitude Conference, Hilton Head, S.C., February 1976.

Johny K. Johansson, D. J. MacLachlan and R. F. Yalch, "Halo Effects in Multi-Attribute Models: Some Unresolved Issues," Journal of Marketing Research, 8(February, 1976), 414-417.

T. J. Keaveny and A. F. McGann, "A Comparison of Behavioral Expectation Scales and Graphic Rating Scales," Journal of Applied Psychology, 60(December, 1975), 695-703.

Barbara Koltov, "Some Characteristics of Intrajudge Trait Intercorrelations," Psychological Monographs, No. 552 (1962).

E. A. Pessemier and R. D. Teach, "A Single Subject Scaling Model Using Judged Distances Between Pairs of Stimuli," Institute Paper No. 143, Krannert Graduate School of Industrial Admin., Purdue University (1966).

E. K. Taylor and R. Hastman, "Relation of Format and Administration to the Characteristics of Graphic Rating Scales," Personnel Psychology, 9(1956), 181-206.