# Using Multivariate Nominal Scale Analysis to Identify Demand Segments For Interracial Housing

^{[ to cite ]:}

Kenneth L. Bernhardt and Thomas C. Kinnear (1974) ,"Using Multivariate Nominal Scale Analysis to Identify Demand Segments For Interracial Housing", in NA - Advances in Consumer Research Volume 01, eds. Scott Ward and Peter Wright, Ann Abor, MI : Association for Consumer Research, Pages: 201-217.

[The authors wish to thank Frank Andrews, Robert Messenger and Laura Klem, all of the Institute for Social Research, the University of Michigan, and C. Merle Crawford, Graduate School of Business Administration, University of Michigan for their most generous assistance. This study was partially funded by grants from Levitt Building Systems, Incorporated, Stirling Homex Corporation and the Industrial Development Division of the Institute of Science and Technology at the University of Michigan.]

[Kenneth L. Bernhardt is Assistant Professor of Marketing, Georgia State University. Thomas C. Kinnear is Assistant Professor of Business Administration, University of Western Ontario.]

INTRODUCTION

Many analysis techniques have been utilized in segmentation research. Regression, MCA, AID, cluster analysis, factor analysis, discriminant analysis, canonical correlation and multidimensional scaling have all been utilized as procedures to identify segments or to identify the characteristics of segments. Frank, Massey and Wind (1972) present a scheme to indicate where most of these procedures might properly be utilized.

The purpose of this paper is to present a new segmentation research procedure, in the context of an important social marketing problem. The new procedure is Multivariate Nominal Scale Analysis (MNA). MNA uses nominally defined independent variables to predict a nominally defined dependent variable. It can, therefore, be used to identify the characteristics of nominally defined segments using predictors having the weakest scale level assumption.

The social marketing problem to be examined relates to the marketing of interracial housing. It is a well-documented finding that persuasive communications are more effective when directed to receivers who already hold attitudes consistent with the content of the communication (Berelson and Steiner, 1964). The research question here is: can those holding positive attitudes about the concept of interracial housing be identified in terms of their personal characteristics? If this segment can be identified, communications efforts can be more effectively directed to get these individuals to act upon their convictions. Achievement of the goal of wide-spread integrated housing will depend on the actions of those who now support the idea. The intent of this paper is to identify some of the personal characteristics of this important segment through the use of the MNA technique.

METHOD OF ANALYSIS

Multivariate Nominal Scale Analysis (MNA)

MNA is a new data analysis technique developed by Dr. Frank M. Andrews and Dr. Robert C. Messenger at the University of Michigan's Institute for Social Research (Andrews and Messenger, 1973). Essentially, it is an extension of the Multiple Classification Analysis (MCA) program (Andrews, Morgan and Sonquist, 1969) that has been utilized in a number of marketing studies (Newman and Staelin 1971 and 1972; Peters, 1970). MCA accepts nominally scaled independent variables and assumes an intervally scaled dependent variable. MNA accepts both nominal independent and dependent variables, in the context of an additive model. The ability to predict a nominally-defined variable using nominally-defined independent variables constitutes a significant methodological advance in data analysis. In their classification schemes for multivariate data analysis methods, Sheth (1971) and Kinnear and Taylor (1971) noted the absence of any techniques to easily accomplish this type of analysis. Before MNA, a nominally-defined dependent variable had to be dichotomized and the analysis performed with MCA or dummy variable multiple regression (1971). If the dependent variable had more than two categories, it was possible to use dummy variable discriminant analysis (DVDA). However, MNA has significant input and output advantages over DVDA. For DVDA the user must create his own dummy independent variables for input. Further, the output from MNA is much more readable. Since many consumer behavior dependent and independent variables are at a nominal level, the need for a procedure like MNA is well established.

Because MNA is new, a detailed description of its procedures will be undertaken. Andrews and Messenger (1973) describe MNA as being based on the principle of repeated application of least squares dummy variable regression (Suits,1957). Specifically, the set of original predictor variables (X1, X2, ..., Xp) is transformed into a set of dummy predictor variables (x1, x2, .... xc1, ...xr) by treating every nonempty code of each predictor as a new dummy variable and by assigning a value of 1 when the code appears and 0 when it does not appear.

The resulting data set of dummy predictors has one linear dependency for each set of dummy predictors associated with an original predictor. These yield a singular matrix which would prevent proper least squares estimation to be carried out. Therefore, the linear dependencies must be eliminated by omitting one dummy predictor from each set. This procedure yields a set of r = c-p independent dummyized predictors, where c = the total number of categories in the independent variables and p = the number of predictors.

The dependent variable is also dummyized to form a set of G dummy dependent variables where G is the number of non-empty dependent variable codes. Then, the set of r dummyized predictors is applied successively to the complete set of G dummy dependent variables, using the criterion on minimizing the error sum of squares, which forms the least squares criterion, given by:

ESSl = Sw

_{k}(y_{kl}-y_{kl})^{2}(l= 1,2,...G)

where

ESSl= error sums of squares for the lth dummy dependent variable,

w_{k}= individual k's weight,

y_{kl}= individual k's score on the lth dummy dependent variable,

y_{kl}= individual k's predicted score for the lth dummy dependent variable

and where

y

_{kl}= B_{lo}+ B_{l1}x_{k1}+ B_{l2}X_{k2}+ ... + B_{lr}x_{kr}(l = 1,2,...G)

here,

x_{km} = the mth dummy predictor score for kth individual.

and B = the regression coefficients.

Partial derivatives of the ESS's with respect to the B coefficients are then calculated. These partials are then set to zero, yielding the G normal equation sets (Cooley and Lohnes, 1971).

In mathematical notation:

yields the relevant normal equations.

Solution of these G equations gives the B values for the predictive equations and a set of forecasts of individual scores {Yk1, Yk2,...,YkG}. This solution yields values expressed as deviations from the one dummy prediction that was omitted from each set. It is possible to present the predictive equations in a more easily understood form, while at the same time assigning values to the previously omitted codes. MNA does this by transforming the results to a form where coefficients are expressed as deviations from the mean of the Qth dependent variable. Here,

y

_{l}= y_{l}+ A_{l1}x_{1}+ A_{l2}x_{2}+ ... A_{lc}x_{c}(l = 1,2,...,G)

where

y_{l} = the mean of ...e lth dependent variable

and A_{lm} = mth transformed dummy predictor regression coefficient for lth dummy dependent variable.

The Alm's are expressed as deviations from the grand means {Y1, Y2, ... , YG}. This system yields forecasts that are identical-to-the previous system for all individuals and has coefficients attached to all categories of all independent variables.

Statistics Generated by MNA

MNA generates both bivariate and multivariate statistics. Two bivariate statistics are produced to measure the strength of the relationship between the dependent variable and each predictor. The first is the one-way analysis of variance eta-squared statistic which is calculated for each dummy dependent variable and then summarized into a generalized eta-squared. Eta-squared measures the explained variance of each code and the generalized eta-squared statistic measures the explained variance across all codes; i.e. the ratio of explained sums of squares to total sums of squares. [Andrews and Messenger note that the concept of variance when applied to a nominally scaled dependent variable is a subtle one . . . the generalized R2 is actually a variance-weighted average of the R2's which result from separate analyses of each category of the dependent variable when each category is treated as a dummy variable.]

A more useful bivariate statistic, the bivariate theta (@y), is a relatively new statistic formulated by Messenger (1971) to measure the strength of association with correct placement in the dependent variable code as the criterion. Theta is defined as the proportion of the sample correctly classed when using a prediction to-the-mode strategy in each frequency distribution of each category of the predictor variable. For example, Table l presents a set of data from the cross-tabulation of a 3 code dependent variable Y, with a 3 code independent variable Xi. The numbers in the cells are the number of people in the sample assigned to the cells. If we knew nothing about the effect of Xi on Y, our best prediction concerning Y would be Y2, the mode. That is, Oy = 400/1000 = .40 and we will have correctly classified subjects 40 percent of the time. Knowledge of Xi allows for improved classifications. Specifically, if we knew the subject is in X1 the best guess is Y1, if he is in X2 the best guess is Y2 and so on. Then.

^{q}Y/X_{i}= (300 + 300 + 200)/1000= .80

and we have correctly classified 80 percent of the subjects.

Messenger and Madell (1972) note that qy is really just a more intuitively appealing form of the Goodman and Kruskal Lambda statistic, Ri, which is defined as the proportion of reduction in error given predictor Xi's codes:

l

_{i}= (q_{Y/Xi - qy) / (1 - qy)}= (.80 - .40)/(1 - .40)

= .40/.60

= .67

Thus, li is a linear transformation of q_{Y/Xi}.

AN ILLUSTRATION OF BIVARIATE THETA

The multivariate statistics generated by MNA parallel the bivariate statistics described above. These are the generalized multiple R2 and the multivariate theta statistic. The latter statistic is defined as the proportion correctly classed using a decision rule of predicting each individual as being in that dependent variable category having the maximum forecast value for that individual and written as:

q

_{Y}/X_{1}, X_{2}, ..., X_{n}, or q_{M}

It is the probability of placing a subject in the correct nominal category of the dependent variable, Y, given knowledge of the code values of the independent variables, X1, X2, ..., Xn, when using a prediction to the mode strategy.

The MNA technique is essentially a series of parallel MCA runs using each of the dummy variables in turn as the dependent variable. For each of the dependent variable codes, a predicted probability (Om) of each subject being in that category is calculated. Each subject has a probability figure associated with each code of the dependent variable category that is associated with the highest of these probabilities. A check is then made against the actual category and the proportion of subjects correctly classified is then calculated.

THE STUDY

The Data

The data utilized in this study were collected by means of personal interviews held with 193 male heads of households who had purchased a house in the $15,000 to $35,000 price range in a period up to a maximum of six months before the interviews. The respondents were selected using a multistage probability sample based on warranty deed registrations in urban counties in Michigan. The results allow inferences to the buyers of this price-range of houses in urban areas of Michigan. The data used here are a part of a much larger study of husband and wife behavior and attitudes related to the housing purchase decision process.

The Dependent Variable

The dependent variable is a measure of the extent of agreement with the statement that "it is a good idea for neighborhoods (clusters of five or six houses) to have people of different racial backgrounds". Respondents were classified as being in one of five categories ranging from strongly agree to strongly disagree. In all likelihood, this variable approximates an ordinal scale. To analyze this data using a regression routine or MCA would require the assumption that this variable forms an interval scale. By utilizing the MNA model with this variable, we make the weakest possible assumption about the scale properties, a methodologically safer assumption. In interpreting the MNA results given for the individual categories of the dependent variable we can look for the ordinal behavior of the variable. We can easily interpret from nominal findings to possible ordinal implications, but we cannot interpret fallacious interval findings back to ordinal implications.

The Independent Variables

Eight independent variables are analyzed as possible predictors of categories of the dependent variable. They are:

(1) Education of husband

(2) Occupation of husband

(3) Family income

(4) Stage of life cycle

(5) Self confidence level of husband

(6) Family size

(7) Whether or not wife is employed

(8) Ratio of house payment to total income

All of these predictors were treated categorically.

THE RESULTS

Table 2 presents the MNA results for the most interesting variables. The first finding to note is the overall percentage distribution of respondents over the five categories of the dependent variable. We note that 10.4 percent were classified as agreeing strongly, 26.4 percent as agreeing somewhat and so on. The modal category was "agree somewhat", yielding Oy equal to .264. That is, if we knew nothing about the characteristics of the respondents, we could predict the modal category and be right 26.4 percent of the time. The independent variables, X1, X2,...,Xp serve to increase our ability to predict above this base level.

We also note the strength of relationship between the set of independent variables and the dependent variable. We do this by three ways. First, the generalized R2 equals .16. This indicates that approximately 16 percent of the variance in the dependent variable is explained. Second, we can examine the category specific R2ts. This examination indicates that the "disagree somewhat" category was best predicted by the independent variable (r2 = .22) and that the "disagree strongly" category was least well predicted (r2 = .10).

Another way to examine the overall relationship between the dependent and independent variables is to note the multivariate theta value. This value is the percentage of respondents that could be correctly classified with knowledge of the independent variables. Multivariate theta, Oy/X1,X2,...,Xp, equals .440. By comparing this value to Ey, .264 we note that these independent variables allow us to increase our correct prediction level by 17.6 percentage points (44.0 - 26.4).

MNA also produces a number of predictor specific calculations and statistics. The generalized eta squared and the bivariate theta are utilized to indicate the strength of the bivariate association between an independent variable and the dependent variable. For example, for occupation n2 is .04 and O is .36, indicating that occupation explains 4 percent of the variance and correctly classifies 36 percent of the sample. MNA also gives category-specific beta squareds and beta squareds for each predictor. The latter statistic is an approximation of the ability of a predictor to explain variance of each category of the dependent variable while holding constant all other predictor variables. [Beta square is regarded by Andrews and Messenger as an experimental statistic whose precise interpretation is open to further investigation.]

The details of how each category of an independent variable is associated with each category of the dependent variable are also available. MNA produces three sets of figures for each category of each independent variable to show these relationships. The "percent" figures give the bivariate percentage distribution of respondents across the categories of the dependent variable. By comparing rows of percents we can see for example that 54.4 percent (19.2 + 36.2) of professional and technical respondents agree to some extent with the dependent variable statement while 29.4 percent (10.3 + 19.1) of the blue collar respondents agree.

The "coefficient" figures give the effect of being in a specific category of a predictor variable on the likelihood of a respondent being in each category of the dependent variable. These coefficients are the heart of the multivariate analysis. An individual's predicted probability of being in a specific category of the dependent variable is equal to "overall percent" for that category plus the coefficients across all predictor categories relevant to that respondent and that dependent variable category. The coefficients can be interpreted as the amount of increase or decrease in likelihood of dependent variable category membership after holding constant all other predictor variables.

The "adjusted percent" figures are formed by adding the coefficient for that category of the dependent variable to the relevant "overall percent". The result is the percentage distribution of respondents across categories of the dependent variable after allowance has been made for the effects of other predictors.

Examination of the results presented in Table 2 yield a portrait of those who hold positive attitudes about the concept of interracial housing. They can be described as tending to have the following characteristics:

Specifically, those having all these characteristics have a predicted chance of agreeing strongly of 30.9 percent (base of 10.4 percent plus sum of coefficients of 20.4 percent) [TABLE]. Also, they have a predicted chance of 76.4 percent of agreeing somewhat (base of 26.4 percent plus sum of coefficients of 50.0 percent). Thus it appears that MNA has successfully identified the characteristics of a segment which is in agreement with the concept of integrated housing.

The question remains as to how well the types of predictions presented above correctly classify subjects. Table 3 presents a classification matrix that compares actual classifications on the dependent variable with the categories predicted by MNA. The diagonal elements indicate the proportion correctly classified for each dependent variable category. Table 3 also shows the nature of the misclassifications that did occur. For example, we note that for those who agreed somewhat 60.8 percent were correctly predicted, none were incorrectly predicted as agreeing strongly, while 15.7 were incorrectly predicted as not knowing, etc.

The expected ordinal nature of the dependent variable allows for stronger conclusions based on the results in Table 3 than would be possible if the dependent variable were nominal. We note that most of the misclassifications occurred on one category on either side of the correct classification. These are the types of misclassifications we would expect when using an ordinal dependent variable. Misclassification to categories farther removed from the true category are not prevalent.

We are then able to make fairly strong statements about what the knowledge of the predictor variables can do for us in this instance. Specifically, we note we would only misclassify subjects as being at some level of agreement 24.2 percent of the time, if they really strongly disagreed, 22.0 percent if they somewhat disagreed and 25.0 percent if they did not know. Further, we note that of those predicted to have strong agreement, 55.0 percent are actually at some level of agreement, and 60.8 percent of those predicted as agreeing somewhat actually did agree somewhat. Knowledge of the predictor variables adds greatly to our ability to correctly classify subjects and allows us to make descriptive statements about the characteristics of people classified as being in Particular categories or segments.

SUMMARY AND IMPLICATIONS

This paper has described MNA, a new multivariate analysis procedure that is capable of handling nominal variables as both predictors and criterion variables. Its potential usefulness to marketing researchers appears to be extremely significant. Nominally defined dependent variables such as brand choices, consumer typologies, types of behavior etc. can now be effectively examined in a regression type of analysis.

As for the concept of interracial housing, the findings indicate that those holding positive attitudes can be described in terms of their personal characteristics. This aspect has important implications for government agencies, builders associations and real estate brokers' associations who are concerned with increasing the amount of integrated housing. Their appeals, media choices and types of houses built should all be consistent with this target audience. The results show a chance to practice effective differentiated marketing. This optimism is constrained only by the certain limitations of the study. These include the small sample size, the dominance of white in the sample (93%), and the use of an untested multivariate technique.

REFERENCES

Andrews, F.M., and Messenger, R.C. Multivariate Nominal Scale Analysis. Ann Arbor,Michigan: Institute for Social Research. University of Michigan,1973.

Andrews, F.M., Morgan, J.N., and Sonquist, J.A. The Multiple Classification Analysis Program. Ann Arbor Michigan: Institute for Social Research; University of Michigan, 1969.

Berelson,B., and Steiner, G.A. Human Behavior: An Inventory of Scientific Findings. New York: Harcourt, Brace, and World, 1964, 529.

Cooley, W.W. and Lohnes,P.R. Multivariate Data Analysis. New York: John Wiley & Sons, Inc., 1971, 52.

Frank, R.E., Massy, W.F., and Wind, Y. Market Segmentation. Englewood Cliffs, New Jersey: Prentice-Hall, Inc., 1972 139-169.

Kinnear, T.C. and Taylor, J.R. "Multivariate Methods in Marketing Research: A Further Attempt at Classification", Journal of Marketing, 34 (October,1971), 56-59.

Messenger, R.C. Theta User's Guide. Unpublished manuscript, Institute for Social Research, University of Michigan, Ann Arbor, Michigan, 1971.

Messenger, R.C., and Mandell, L.M. "A Modal Search Technique for Predictive Nominal Scale Multivariate Analysis", Journal of the American Statistical Association, 67 (December,1972).

Newman, J.W. and Staelin, R. "Multivariate Analysis of Differences in Buyer Decision Time", Journal of Marketing Research, 8 (May 1971). 192-198.

Newman, J.W. and Staelin, R. "Prepurchase Information Seeking for New Cars and Major Household Appliances", Journal of Marketing Research, 9 (August,1972), 249-257.

Peters, W.H. "Using MCA to Segment New-Car Markets", Journal of Marketing Research 7 (August,1970), 360-363.

Sheth, J.N. "The Multivariate Revolution in Marketing Research", Journal of Marketing, 34 (January,1971), 13-19.

Suits, D.B. "Use of Dummy Variables in Regression Equation" , Journal of American Statistical Association, 52 (1957), 548-551.

----------------------------------------