The Determinants of Esthetic Value and Growth

Joel Huber, Duke University
Morris B. Holbrook, Columbia University
ABSTRACT - When 32 subjects gave similarity, preference, and attribute ratings on 15 recordings by jazz saxophonists, two primary perceptual dimensions emerged: speed and complexity. In information-theoretic terms, these dimensions are analogous to the rate of flow (units per second) and level (bits per unit) of information exposure. Average preference was negatively related to complexity, and change in preference was negatively related to both complexity and speed.
[ to cite ]:
Joel Huber and Morris B. Holbrook (1980) ,"The Determinants of Esthetic Value and Growth", in NA - Advances in Consumer Research Volume 07, eds. Jerry C. Olson, Ann Abor, MI : Association for Consumer Research, Pages: 121-126.

Advances in Consumer Research Volume 7, 1980     Pages 121-126


Joel Huber, Duke University

Morris B. Holbrook, Columbia University

[The authors gratefully acknowledge the support of Columbia University's Faculty Research Fund.]


When 32 subjects gave similarity, preference, and attribute ratings on 15 recordings by jazz saxophonists, two primary perceptual dimensions emerged: speed and complexity. In information-theoretic terms, these dimensions are analogous to the rate of flow (units per second) and level (bits per unit) of information exposure. Average preference was negatively related to complexity, and change in preference was negatively related to both complexity and speed.


This paper examines the determinants of esthetic value and growth for a particular kind of artistic object: jazz recordings. Though earlier studies of esthetic perception and preference have obtained ratings of well-known pieces of music (Wedin 1972), musical types (Nordenstreng 1968), rhythmic patterns (Gabrielsson 1973), or experimentally manipulated melodic fragments (Crozier 1974), this is believed to be the first that has investigated subjects' reactions to stylistic differences between real musical performances within the framework of established multidimensional scaling techniques.

The large number of listening comparisons involved in the present study suggested that evaluations would evolve over the course of the task. Accordingly, a model was formulated to represent each recording's evaluation by a mean value plus a time trend reflecting its change in preference from the beginning to the end of the experiment.

A further stage of the analysis related the individual preference and growth components to perceived attributes of the recordings. A reduced space was derived from a discriminant analysis of 18 rating scales for each stimulus. While there were large individual differences, the average subject began by preferring recordings that were more simple and less complex and, through the course of the experiment, grew to like these selections even more. Surprisingly, this result did not vary with the subject's degree of knowledge about jazz or music in general. Tentatively, it appears that the information load imposed on subjects by the nature of the jazz material resulted in an unusually demanding situation where greater simplicity rather than complexity was generally desired.



Samples of recorded saxophone solos were selected to differ over relatively few controlled dimensions but otherwise to be highly similar. If these stimuli had been too dissimilar, the criteria for judging them might have unduly reflected stylistic subsets which could have frustrated attempts to model the overall evaluation process. Accordingly, the solos were all exemplars of post-bop mainstream styles. Each was based upon the major 12-bar blues form and consisted of three choruses from the middle of a longer improvisation so that the names of the tunes could not be identified. Accompaniment was provided in all cases by a conventional rhythm section of bass, drums, and piano or guitar. Thus, insofar as possible using examples of real recorded jazz, musical form and instrumentation were held constant among recordings.



The 15 stimuli are described more specifically in Table 1. Notice that an attempt was made to balance type of saxophone (alto versus tenor) and stylistic orientation (East versus West Coast). (Generally, the East Coast school has a rougher and more dissonant sound compared with the smoother and more consonant West Coast approach.) As further indicated by Table 1, key, tempo, and recording date were also measured, but not experimentally balanced.


The task consumed about four hours and required each subject to listen to all pairs of recordings and, for each pair, to indicate (1) the degree of perceived similarity (6-point scale), (2) the one preferred, and (3) the strength of that preference (6-point scale). To reduce the effects of fatigue, these paired comparisons were broken into two sessions conducted on different days. Following the second day's judgments on pairs, subjects were also asked to rate each recording on the 18 attributes shown in Figure 1. These 18 attributes had been distilled from a larger set of 93 used in an earlier test of similar stimuli (Holbrook and Huber 1979). The attributes retained were those that (1) accounted for a significant proportion of variance among recordings and (2) were not included in the general evaluative dimension. Thus, they were chosen both to span the perceptual space and to be relatively affect free.


Subjects were recruited through sign-up sheets on the bulletin boards of the business and music departments at a major university. On those sheets, applicants indicated (1) the number of jazz recordings listened to in a typical month and (2) proficiency in playing a musical instrument. On the basis of these criteria, four groups of eight subjects each were selected so as to represent the four cross-classified combinations of jazz familiarity and musical literacy: high jazz/high music, high jazz/low music, low jazz/high music, and low jazz/low music. For their participation, each subject received a phono album chosen from among the artists that comprised the study. This relatively small reward had the effect of assuring that all subjects were internally motivated to listen to the recordings and thus were likely to attend to and learn from the experience. It was expected that there would be substantial differences among the four groups in their perceptual and affective responses to the jazz stimuli. As will be shown, however, these expectations proved largely unfulfilled.


The Measurement of Esthetic Perception

The positions of the recordings in perceptual space were investigated using the 18 attributes shown in Figure 1 in an analysis comparable to the procedure outlined by Holbrook and Huber (1979). First, a principal components analysis was run on the attribute-by-object-and-subject matrix (i.e., correlations between attributes were computed across both objects and subjects). The five summative indices that emerged from this factor analysis were: (1) activity = fast + active + busy; (2) harmoniousness = in-tune + consonant + well-recorded + light; (3) age = traditional + old; (4) complexity = unpredictable + shifting + improvised + complex + changeable + random; and (5) masculinity = masculine. These indices were then used to predict the identity of the 15 recordings in a multiple discriminant analysis, with data again pooled across subjects. The first two discriminant functions accounted for 92% of the variance in the indices and were therefore chosen as the basis for a 2-dimensional perceptual space.



The justification for this methodology is detailed by Holbrook and Huber (1979; Huber and Holbrook, 1979). Briefly, the discriminant analysis orients the space such that there is maximum relative agreement on the indices of the same recordings by different subjects and maximal difference between the average indices of different recordings. In other words, the MDA space is that which best allows one to predict or discriminate among recordings given knowledge of their indices. The indices, being based on the principal components analysis, serve to combine highly collinear variables into a smaller set of measures that are, at most, weakly correlated. Experience has shown that this preliminary reduction greatly increases the stability of the emerging MDA space.

Separate discriminant spaces were built for the four subject groups differing in knowledge of jazz and ability to play an instrument. Such spaces, however, failed to differ meaningfully across populations. It had been hypothesized, for example, that more experienced subjects would exhibit spaces of higher dimensionality and would discriminate more cleanly between stylistic schools of jazz. They did not. This apparent homogeneity of perception across groups was further corroborated by an INDSCAL (Carroll and Chang 1970) analysis performed on the 6-point similarity measures. There were no significant differences in the idiosyncratic subject weights across the four groups. Thus, it is safe to use the aggregate MDA space based upon ratings provided by the full set of 32 subjects.

The aggregate MDA solution appears in Figure 2 where attributes have been projected into the space to help identify its axes. This projection of attributes was performed by regressing each set of attribute means across the 15 musicians on their axis coordinates (cf. Carroll 1972). The orientation of each arrow indicates the direction of the greatest explained variability in that attribute within the space. The length of the arrow from the origin is proportional to the strength of that relation, as measured by R2. Thus, for example, the horizontal dimension is virtually collinear with the objective variable "beats per minute" as presented in Table 1. The perceptual attributes "busy," "active," and "fast," are also directed along the horizontal dimension indicating that this axis represents the speed of the recording, measured both physically and perceptually. Since these performances are couched primarily in eighth notes (i.e., two notes per beat), "speed" refers to beats per minute rather than to notes per beat. In information-processing terms, this dimension can therefore be taken to represent units per second in that it reflects the rate of exposure of eighth-note clusters.

The vertical dimension, by contrast, appears to represent musical complexity or, to pursue the analogy with information theory, bits per unit. Thus, attributes that facilitate processing--consonant, in-tune, West Coast, old, traditional--are projected upward in the space while those placing a strain on processing capacity--random, changeable, complex, unpredictable, shifting--are projected downward.



In summary, then, the horizontal dimension clearly represents the rate of speed at which musical units (notes, beats, or measures) are being presented in a recording. The vertical dimension reflects the complexity or number of bits per unit. While this analogy with information theory is suggestive and receives support from the psychological literature on esthetics (Berlyne 1971; Garner 1962; Meyer 1957; Moles 1966; Munsinger and Kessen 1964), it should be stressed that no attempt has yet been made to measure information load in naturally occurring musical excerpts of the type used here. Though such measures have been attempted in the cases of paintings (e.g., Nicki and Moss 1975) and English prose (e.g., Holbrook 1978), their difficulties of application to real music appear to be virtually insuperable. However, since scales of subjective complexity have been found to relate strongly to measurable information in artificially constructed visual and auditory stimuli (Berlyne 1971, pp. 198 ff.), there is reason to expect the same correspondence to hold for real musical performances. In this light, it is encouraging that the perceptual space can be meaningfully interpreted in terms of information-processing demands. Into this space, the preference vectors for individuals and segments were projected in the next stage of the analysis.

Measurement of Individual Value and Growth Vectors

Given paired-comparison preference ratings of the type described earlier, there are fairly standard ways (ScheffT 1952; Bechtel and O'Connor 1979) to estimate individual utility scales such that the difference between items on a one-dimensional measure of utility approximates, in a least-squares sense, the original preference differences. By using the general linear model and appropriately chosen signed-dummy variables, moreover, it is possible to estimate both an average utility and a linear time trend for each stimulus. Such an estimation model may be written as


Using this formulation, least-squares approximations of the parameters can be estimated with con, non regression packages by defining a dummy variable (dk) for each stimulus and a time x stimulus variable ((t - t)dk) for each growth trend. The constant term (B) serves to estimate order bias while the regression coefficients (Uk and Tk) estimate the value and growth parameters for stimulus k. Since the complete set of dummy variables is linearly dependent, the full matrix of predictor variables is singular and cannot be inverted. Accordingly, one stimulus is dropped from the estimation procedure. Its value and growth parameters are set arbitrarily at zero to serve as reference points for scaling the utilities and trends of the other stimuli.

Equation 1 was estimated independently for each subject. Across individuals, approximately 64% of the variance was accounted for by the model--about 2% due to the bias terms, about 9% due to the trend parameters, and the remaining 53% due to the average utility values. The low variance accounted for by the trend parameters resulted in only 11% of these coefficients reaching statistical significance at the 0.05 level. Thus, in spite of the time-consuming nature of the experiment, there were only small changes in individual preferences from its beginning to its end. Apparently, tastes built up over a lifetime of listening to music were firmly enough entrenched to resist change over even a 4-hour period of intensive exposure to unfamiliar recordings. As a result, individual trends in growth of esthetic value were relatively weak.

In sum, analysis of the graded paired comparisons for each subject produced measures of average esthetic value and growth for the 15 recordings. The next stage of the analysis related these individual affective measures to the group perceptual space presented earlier.

The Determinants of Esthetic Value and Growth

Estimation of the model shown in Equation 1 generated 1 x 15 vectors of value parameters and growth coefficients for each subject. Additionally, mean vectors were computed for each of the four subgroups. To determine the effects of esthetic perceptions o n individual and group preferences, these value and growth vectors were projected into the 2-dimensional MDA space shown in Figure 2. Using an approach comparable to that discussed earlier, this projection was accomplished by regressing value or growth coefficients on the speed and simplicity coordinates across the 15 saxophonists. To check for nonlinearities, squared coordinates were also included in separate regressions (cf. Carroll 1972). However, since only 6 of 64 such nonlinear terms were statistically significant at the 0.05 level, it appears reasonable to summarize the results with a linear vector model.



Figure 3 indicates the positions of the group value vectors in the perceptual space. Ail four groups tended to prefer the simpler, less dissonant recordings. It had been hypothesized that the groups with greater knowledge of jazz or music in general would be more oriented toward the complex stimuli. However, with the exception of the high jazz/low music group (which tended to prefer greater speed), no such contrast between groups was found.

Furthermore, as is indicated in Figure 4, individual vectors showed far more variance within than between subgroups. Figure 4 gives projections for only those 18 subjects whose regression equations were statistically significant at the 0.05 level, but the same pattern remained even if the less significant respondents were included. That is, while there was a general preference for simpler and more consonant recordings, there were large individual differences that suggested no clear pattern with respect to jazz knowledge or musical training.



For individuals, an average of only 1.6 of 15 preference shifts was statistically significant at the 0.05 level. It followed that virtually none of the individual growth vectors resulted in statistically significant projections in the perceptual space. At the group level, however, somewhat more consistent trends were apparent.

Figure 5 plots the average change vectors for the four segments and the positions of the average value vectors for the first and second halves of the experiment. With the exception of the low jazz/low music segment, all group preferences shifted to the left, indicating that growth in preference generally favored slower music. Notice, however, that though the change vectors were significant at the 0.10 level, shifts in the value vectors between the two time periods were fairly small and fell within normal statistical limits. Thus, for three of the four groups, the directions of growth in esthetic value were consistent but weak.


This predominant consistency across respondent groups suggests that the average respondent tended to prefer improvisations that required less processing per musical unit. Furthermore, over time, those that presented units at a slower rate--and thereby placed smaller demands on the speed of cognitive processes--came to be liked even better. While such a strain toward simplicity might be expected in the context of difficult problem solving, it had not been anticipated in the case of esthetic appreciation. Indeed, one of the present study's motivating hypotheses had been that more musical training, jazz knowledge, or listening experience would lead to greater ability to cope with complexity and, consequently, stronger preferences for more challenging works.



Perhaps this otherwise puzzling result can be explained in terms of the curvilinear relationship between complexity and stimulus value proposed by Berlyne (1971) and shown in Figure 6. Clearly, whether one obtains a positive, negative, or nonmonotonic relationship between stimulus complexity and esthetic value depends upon which segment of the curve is in effect. It had been assumed im-plicitlythat naive subjects would be operating along the right-hand side of the curve--but that general musical training, knowledge of jazz, or experience with the specific stimuli would cause a shift to the left. Instead, it appears that all subjects operated in the right-hand side throughout the course of the experiment. Apparently, then, the set of stimuli was generally too complex to permit the anticipated group and learning effects to make themselves felt. In retrospect, it does seem that the tapes were higher in uncertainty than conventional recorded jazz due to the manner in which three-chorus excerpts were extracted from the middle of recorded solos. This procedure had the advantage of preventing recognition of specific blues tunes (with possible attendant cues concerning the identity of the artist), but it had the concomitant drawback of causing musical passages to begin abruptly and end inconclusively. It might therefore be conjectured that this extra component of complexity tended to push all four groups beyond the optimal point "B." If this were the case, the strain toward the more simple and slow pieces would have been predicted by Berlyne's paradigm.




The present paper has examined the determinants of esthetic value and growth for a set of jazz recordings. Two general surprises have emerged. The first involved the comparability of responses between groups differing in their knowledge of jazz and ability to play an instrument. Here, preference structures had been expected to depend on jazz familiarity and/or past musical training. To the contrary, however, the variance within groups was far greater than that among groups. The second surprise involved the positioning of value and growth vectors within the perceptual space. All of the group preference vectors had high loadings on the simplicity dimension. Further, the effect of repetition, rather than encouraging the liking of more challenging selections, appeared (for ail but one group) to favor those that placed smaller demands on information processing.

A possible explanation has been offered of both surprises in accord with Berlyne's (1971) proposed nonmonotonic relationship. Such explanations must be regarded as speculative, however, until validated by studies wherein stimulus complexity is systematically varied over a range broad enough to test the full hypothesis of nonmonotonicity. In the present case, for example, saxophone solos by certain popular rock-and-roll musicians might be used to anchor the simplistic end of the scale while improvisations by some of the more free-form avant-garde artists might push complexity to the point of aversion at the opposite end of the continuum. If such controlled variations can be used to flesh out the Berlynian curve, some progress may be made toward determining the level and rate of exposure of musical information that are optimal in conferring esthetic value and encouraging growth in preference.


Bechtel, Gordon G. and O'Connor, P.J. (1979), "Testing Micropreference Structures," Journal of Marketing Research, 16, 247-57.

Berlyne, D. E. (1971), Aesthetics and Psychobiology, New York: Appleton-Century-Crofts.

Carroll, J. Douglas (1972), "Individual Differences and Multidimensional Scaling," in Multidimensional Scaling: Theory and Applications in the Behavioral Sciences, ed. Roger N. Shepard, A. Kimball Romney, and Sara Beth Nerlove, New York: Seminar Press, 1972.

Carroll, J. D. and Chang, J.J. (1970), "Analysis of Individual Differences in Multidimensional Scaling via an N-way Generalization of 'Eckart-Young' Decomposition," Psychometrika, 35, 283-319.

Crozier, J.B. (1974), "Verbal and Exploratory Responses to Sound Sequences Varying in Uncertainty Level," in Studies in the New Experimental Aesthetics, ed. D. E. Berlyne, New York: John Wiley.

Gabrielsson, Alf (1973), "Similarity Ratings and Dimension Analyses of Auditory Rhythm Patterns. II," Scandinavian Journal of Psychology, 14, 161-76.

Garner, Wendell R. (1962), Uncertainty and Structure as Psychological Concepts, New York: John Wiley.

Holbrook, Morris B. (1978), "Effect of Subjective Verbal Uncertainty on Perception of Typographical Errors in a Proofreading Task," Perceptual and Motor Skills, 47, 243-50.

Holbrook, Morris B. and Huber, Joel (1979), "Separating Perceptual Dimensions from Affective Overtones: An Application to Consumer Aesthetics," Journal of Consumer Research, 5, 272-83.

Huber, Joel and Holbrook, Morris B. (1979), "Using Attribute Ratings for Product Positioning: Some Distinctions Among Compositional Approaches," Journal of Marketing Research, 16.

Meyer, L. B. (1957), "Meaning in Music and Information Theory," Journal of Aesthetics and Art Criticism," 15, 412-24.

Moles, A. (1966), Information Theory and Esthetic Perception, Urbana, Ill.: University of Illinois Press.

Munsinger, H.L. and Kessen, W. (1964), "Uncertainty, Structure and Preference," Psychological Monographs, 78(9).

Nicki, R.M. and Moss, Virginia (1975), "Preference for Non-Representational Art as a Function of Various Measures of Complexity," Canadian Journal of Psychology, 29, 237-49.

Nordenstreng, Kaarle (1968), "A Comparison Between the Semantic Differential and Similarity Analysis in the Measurement of Musical Experience," Scandinavian Journal of Psychology, 9, 89-96.

ScheffT, H. (1952), "An Analysis of Variance for Paired Comparisons," Journal of the American Statistical Association, 47, 381-400.

Wedin, Lage (1972), "A Multidimensional Study of Perceptual-Emotional Qualities in Music," Scandinavian Journal of Psychology, 13, 241-57.