An Evaluation of the Servqual Scales in a Retailing Setting

ABSTRACT - A series of articles by Parasuraman, Zeithaml, and Berry has traced the development of a theory that attempts to explain how consumers acquire perceptions of the quality of service firms. Parallel with their theory development, Parasuraman, et al. have experimented with various ways of measuring the hypothetical dimensions of service quality. Their latest effort resulted in a set of scales they have named SERVQUAL.


David W. Finn and Charles W. Lamb, Jr. (1991) ,"An Evaluation of the Servqual Scales in a Retailing Setting", in NA - Advances in Consumer Research Volume 18, eds. Rebecca H. Holman and Michael R. Solomon, Provo, UT : Association for Consumer Research, Pages: 483-490.

Advances in Consumer Research Volume 18, 1991      Pages 483-490


David W. Finn, Texas Christian University

Charles W. Lamb, Jr., Texas Christian University


A series of articles by Parasuraman, Zeithaml, and Berry has traced the development of a theory that attempts to explain how consumers acquire perceptions of the quality of service firms. Parallel with their theory development, Parasuraman, et al. have experimented with various ways of measuring the hypothetical dimensions of service quality. Their latest effort resulted in a set of scales they have named SERVQUAL.

The research reported here examined the usefulness of SERVQUAL in a retail setting. Results do not support the proposition that the instrument can be used to assess perceived service quality in retailing.


One of the most critical challenges that U.S. firms faced in the 1980s was to provide consistently high quality goods and services (Leonard and Sasser 1982; Zeithaml, Berry, and Parasuraman 1986;). There is a growing body of evidence indicating that providing high quality goods and services enhances profitability, improves productivity, increases market share and return on investment, and reduces costs (Thompson, DeSouza, and Gale 1985; Rudie and Wansley 1985; Phillips, Chang, and Buzzell 1983; Garvin 1983; Deming 1982; Gale and Klavans 1985; Ishikawa 1985). It seems likely that the current emphasis on improving and maintaining high quality will have a substantial influence on management practice throughout the 1990s. As John Young, president of Hewlett-Packard has noted, "A corporate strategy that focuses on quality as a key element is the best way companies can respond to the [competitive] pressure they face" (Young 1985).

Quality, however, "is an elusive and indistinct construct" (Parasuraman, Zeithaml, and Berry 1985). Defining and measuring quality are complicated because the concept can be viewed from several different perspectives. Garvin (1983), for example, identified five completely different approaches lo defining quality. Lewis and Booms (1983) and Gronroos (1982) have also discussed problems associated with defining and measuring quality. It is even more complicated when the quality is associated with the intangible aspects of services as compared lo the tangible characteristics of physical products.


In a 1986 Marketing Science Institute Working Paper (MSI), Parasuraman, Zeithaml, and Berry (1986) offered a theory that consumers' perception of the quality of a service offering is a function of five separate quality perceptions. Figure 1 illustrates their theory that: (1) perceived quality of tangibles (physical facilities, equipment, and appearance of personnel), (2) perceived quality of reliability (ability to perform the promised service dependably and accurately), (3) perceived quality of responsiveness (willingness to help customers and provide prompt service), (4) perceived quality of assurance (knowledge and courtesy of employees and their ability to convey trust and confidence), and (5) perceived quality of empathy (caring and individualized attention the firm provides its customers) all influence consumers' perception of the overall service quality of a service firm.

In the original MSI piece and in related articles (1985, 1988), Parasuraman et al. hypothesized that the five dimensions of service quality are, themselves,- related to the discrepancy between consumers' expectations and perceptions. Specifically, they proposed that "service quality, as perceived by consumers, stems from a comparison of what they (consumers feel service firms should offer (i.e., from their expectations) with their perceptions of the performance of firms providing the services"(p. 16). As Figure 1 illustrates, their theory holds that perceived service quality is a function of the magnitude and direction of five specific perceptual discrepancies.

To test the adequacy of this theory, measures of the constructs were needed. In a 1988 Journal of Retailing article, Parasuraman, et al. described a series of iterations that led to the identification of 22 items that appear to measure the five dimensions. They labeled their scales SERVQUAL. SERVQUAL is comprised of 22 pairs of questions: one question from each pair asks consumers to describe their expectations, the other question asks for their perceptions. The instructions for using SERVQUAL are to subtract the expectations score from the perceptions score and to use the result as one of 22 measurement items. Table 1 lists the components of the SERVQUAL scales. Four items purport to measure consumers' perceptions of TANGIBLES quality, five items measure RELIABILITY quality, four items measure RESPONSIVENESS quality, four items measure ASSURANCE quality, and five items measure EMPATHY quality.


The SERVQUAL scales that have been offered to consumer researchers are the result of ONE data collection. Parasuraman, et al.'s exploratory factor analysis of that data set led them to propose the 22 items as measures of the 5 dimensions. Before they are accepted as "off the shelf' measures of the dimensions of perceived service quality, they must be subjected to further testing. To date, very few studies have appeared that test the accuracy of either the theory or the measurement scales. One study by Babakus and Mangold (1989) concluded that SERVQUAL is not 5-dimensional in a health care setting. This paper tests whether SERVQUAL can be used in a retail setting.




The five dimensions identified in Figure 1 and Table 1 are not directly observable; they are theoretical constructs. To say that items 1 through 4 form a measure of the construct named tangibles quality is to say that the answers an individual gives to those pairs of questions depend upon how much tangibles quality s/he perceives. Similarly, unobserved reliability quality causes the answers to question pairs 5 through 9; responsiveness quality causes the answers to question pairs 10 through 13, and so on. Figure 2 illustrates these relationships.

Figure 2 can be thought of as a representation of a factor analysis model where the lij's are factor loadings linking the theoretical factors to the measures. The di's symbolize the presence of measurement error. The curved lines linking the factors represent possible correlations among the factors, and the magnitudes of the correlations are represented by Fij's. As Figure 2 shows, each measured item is linked to only one theoretical dimension. This illustrates the requirement that each measure is a manifestation of only one construct, and therefore measures only that construct (in practice, the items that represent one factor are combined into a composite score (often the mean of the item scores) to provide a measure of the factor. The composite score is meaningful only if each of the measures is unidimensional (Gerbring and Anderson 1988)).

If the SERVQUAL scales possess construct validity in a retail setting (i.e., if the twenty-two items included in the instrument measure the five distinct dimensions identified by Parasuraman, et al.(1988)), then a survey of retail store customers should produce results that conform to the model as specified in Figure 2.


The Sample

To insure that a variety of retail firms was included, a quota of 60 - 70 shoppers from each of four different retail store types was set. The four different types of stores were: (1) stores like KMart, Wal-Mart, etc., (2) stores like J.C. Penney, Scars, etc., (3)stores like Dillards, Foley's, etc., and (4)stores like Saks, Neiman Marcus, etc.

Eleven hundred random telephone numbers were purchased from a commercial sampling house. Each telephone interviewer sought female shoppers from one of the four store types. The interviewer asked a filter question to see if any female in the household had shopped at that type of store before continuing into the questionnaire.



The Questionnaire

The questionnaire was the same as the Parasuraman, Zeithaml and Berry (1988) instrument except that a 5-point, rather than a 7-point, scale was used (this change was suggested by the developers of the scale. Also, a five point scale is easier to use in telephone interviewing). Table 1 explains how the answers were coded.

Data Collection

The interviewers dialed 1,100 telephone numbers to get the target samples of users of the different types of retailers. The response rate was 31.9 percent. The final sample had 65 users of stores like K-Mart, 66 users of stores like Sears, 58 users of stores like Dillards, and 69 users of stores like Neiman Marcus.





Confirmatory factor analysis using LISREL V (Joreskog and Sorbom 1981) was used to assess the fit of the data to the model. Confirmatory factor analysis is based on the matrix of variances and covariances (or a correlation matrix when the data are standardized) of the observed variables (the 22 items). The fit of this actual data to the theoretical model is computed by constructing a covariance matrix that should occur if the model is correct (Figure 2), and then comparing the observed covariance matrix to that theoretical covariance matrix.

If the data fit the model, confirmatory factor analysis can supply estimates of the lij's, the correlations among the factors, and the variances of the di's. Equally important, the LISREL program can supply various indicators of how well the observed data fit the hypothesized model as well as diagnostic tools for identifying problems with the model.

One measure of fit is the Chi-Square goodness-of-fit statistic. This statistic is computed under the null hypothesis that the observed covariances among the answers came from a population that fits the model. A statistically significant value in the goodness-of-fit test would suggest that the data do not fit the proposed model, i.e., that the observed covariance matrix is statistically different than the hypothesized matrix. Strictly speaking, it is rare to have empirical data that meet all the assumptions required to use the Chi-square test. Joreskog and Sorbom (1986, pp. I.38 - I.39) state:

the statistical problem is not one of testing a given hypothesis . . . but one of fitting the model lo the data and to decide whether the fit is adequate or not.... Instead of regarding X2 as a test statistic one should regard it as a goodness (or badness) of fit measure in the sense that large X2 values correspond to bad fit and small X2 values correspond to good fit.

When the overall fit is bad, Joreskog and Sorbom suggest comparing every observed covariance with every theorized covariance and computing the normalized residuals. A residual greater than two in magnitude provides a hint at where the model is incorrect.

Assessment of the Overall Fit of the Model

Bagozzi and Yi (1988, p.76) have pointed out that "one of the first things that should be done before examination of the global to see if any anomalies exist in the output." Examples of anomalies in the output are (1) negative estimates for the variances, (2) correlation estimates greater than 1, and (3) extremely large estimates for the parameters. None of these anomalies were present in the output reported here.

Table 2 shows the evaluation of the fit of the retail store data to the SERVQUAL measurement model. The large chi-square value of 377.64 implies that it is extremely unlikely that the data represent random variation from the model. Therefore, it is appropriate to look at other indicators. All the other indicators of fit demonstrate a decided lack of fit. Notice particularly the normalized residuals. The twelve normalized residuals that are greater than two involve fifteen of the twenty-two variables included in the model. Measured items 19 and 21 (Table 1) accounted for six of these large residuals in their pairings with other items.

One of the diagnostic tools available in LISREL is a table of modification indices, which helps to identify specific problems with models. An analysis of the output used to construct Table 2 suggested that items 19 and 21 are not unidimensional measures of the "empathy" construct. Accordingly, an alternative-measurement model was tested. The SERVQUAL measure of Empathy Quality was reduced from a five item scale to a three item scale (composed of items 18, 20, and 22 in Table 1). Indicators of the fit of that model are shown in Table 3.

The large Chi-Square value again suggests that the model is not properly specified. A further hint at bad fit is the root mean square residual, which is almost 25 percent of the size of the correlation estimates. These results indicate that the SERVQUAL measurement model is not appropriate in a retail store setting.

Interestingly, the five multi-item scales proposed by Parasuraman, et al. meet acceptable standards of reliability for exploratory research, ranging from .59 for Tangibles Quality to .83 for Reliability Quality. This quirk underscores the well known dictum that correlated sets of items do not necessarilY measure anything


The results of this study challenge the validity of the SERVQUAL scales as measures of the determinants of perceived quality in retailing. Four possible explanations for this conclusion are as follows: (1) the study reported here produced results that are atypical; (2) the SERVQUAL scales do not capture the essence of the service quality construct in retailing; (3) perceived service quality in retailing is not a function of the 5 constructs identified by Parasuraman, Zeithaml, and Berry (1988); or (4) the differences in data gathering methodologies used (telephone versus self-administered questionnaires) accounted for the differences between the results the two studies.





Atypical Results

Although it is possible that the sample population is atypical, this is not likely. The sample was randomly drawn from a medium size (about 1 million people) SMSA by a commercial sampling firm. Respondent selection procedures were standard for a telephone survey. There is no apparent reason to believe that the study produced atypical results.

Service Quality in Retailing

Parasuraman, et al (1988) followed rigorous procedures to develop general scales for measuring the dimensions of perceived service quality in a wide range of service categories, but these scales have never been tested beyond the ONE data set that resulted in the scale. That data set was collected from banking, credit card, repair and maintenance, and long distance telephone firms.

Babakus and Mangold (1989) reported problems with the SERVQUAL scales for measuring perceived quality of hospital service. Retailing may be another example of a service industry in which the SERVQUAL scales are inappropriate for measuring the five constructs identified and described by Parasuraman et al. (Tangibility Quality, Reliability Quality, etc.).

Different Constructs in Retailing

It is also possible that perceived service quality in retailing is not a function of the 5 dimensions identified by Parasuraman, et al. (1988). As Zeithaml, Parasuraman, and Berry (1985, p.43) themselves have noted,

While it is useful to generalize about the characteristics of services and service businesses. it appears to be equally important to recognize that differences exist among various services and among the firms that market them.

The service categories that were used in the development of SERVQUAL (appliance repair and maintenance, retail banking, long distance telephone, and credit cards) are very different than goods retailing, and clearly fall closer to the pure service end of the pure service - pure goods continuum than store retailing. It may well be that consumers use different criteria to evaluate competing goods retailers than they use to evaluate retailers that are primarily or exclusively service firms.

Different Methods

Parasuraman et al. (1988) used self administered questionnaires to gather their data. The results reported here are based upon data gathered using a telephone survey instrument. Although it is possible that differences in data gathering methodology accounted for some differences in results, it is unlikely that the differences would be of sufficient magnitude to reject the model.


A major challenge facing many retailers is finding ways to differentiate themselves from competitors. One alternative available to some retailers is to provide superior customer service. Unfortunately, the quality of an organization's services cannot be measured objectively and precisely, making it difficult to gauge success in reaching that goal.

The purpose of the study reported here was to assess the validity of an instrument designed to measure perceived service quality in a variety of business settings including retailing. If valid, the instrument could bc used for a variety of purposes such as tracking customers' perceptions of the quality of service provided by a retailer or measuring consumers' perceptions of the differences in service quality among competing outlets and organizations. This information would be useful for designing marketing strategies.

The results of this study do not support Parasuraman, Zeithaml, and Berry's (1988) conclusion that SERVQUAL can be used to assess the quality of firms in a wide range of service categories. Specifically, data gathered regarding different types of retail stores did not fit the SERVQUAL measurement model.

The immediate implication is that retailers and consumer researchers should not treat SERVQUAL as an "off the shelf" measure of perceived service quality. Much refinement is needed for specific companies and industries.

In the longer term, further research in retailing and other service categories is needed to examine the construct validity of SERVQUAL. Unresolved questions include the following: (1) Are the dimensions of service quality the same regardless of service category? (2) Are the five dimensions of service quality identified by Parasuraman, Berry, and Zeithaml (1988) generic? And (3), does the SERVQUAL instrument measure the determinants of perceived service quality in all service industries? The results reported here suggest that the construct validity of SERVQUAL should be examined on an industry by industry basis before it is used to gather consumers' perceptions of service quality.


Babakus, Emin, and W. Glynn Mangold. 1989. "Adapting the 'SERVQUAL' Scale to Health Care Environment: An Empirical Assessment." In Enhancing Knowledge Development in Marketing. Eds. P. Bloom, R. Winer, H. Kassarjian, D. Scammon, B. Weitz, R. Speckman, V. Mahajan, and M. Levy. Chicago: American Marketing-Association: 195.

Bagozzi, Richard P. and Youjae Yi. 1988. "On the Evaluation of Structural Equations Models." Journal of the 9 my of Marketing Science 16 (Spring): 74-94.

Deming, W. Edwards. 1982. Quality, Productivity, and Competitive Position. Cambridge, Massachusetts: Massachusetts Institute of Technology.

Gale, Bradley T. and Richard Klavans. 1985. "Formulating a Quality Improvement Strategy." The Journal of Business Strategy (Winter): 21-33.

Garvin, David A. 1983. "Quality on the Line." Harvard Business Review (September-October): 64-75.

Gerbring and Anderson. 1988. "An Updated Paradigm for Scale Development Incorporating Unidimensionality and Its Assessment." Journal of Marketing Research 25 (May): 186-192.

Gronroos, Christian. 1982. Strategic Management and Marketing in the Service Sector. Cambridge, Massachusetts: Marketing Science Institute.

Ishikawa, Kaori. 1985. What is Total Quality Control? Englewood Cliffs, New Jersey: Prentice-Hall, Inc.

Joreskog, Karl G. and Dag Sorbom. 1981. LISREL V. Chicago: National Education Resources.

Joreskog, Karl G. and Dag Sorbom. 1986. LISREL VI. 4th Ed. Mooresville, Indiana: Scientific Software, Inc.

Leonard, Frank S. and W. Earl Sasser. 1982. "The Incline of Quality." Harvard Business Review (September-October): 163-171.

Lewis, Robert C. and Bernard H. Booms. 1983. "The Marketing Aspects of Service Quality." In Emerging Perspectives on Services Marketing. Eds. L. Berry, L. Shostack, and G. Upah. Chicago: American Marketing Association: 99-107.

Parasuraman, A., Valarie A. Zeithaml, and Leonard L. Berry. 1985. "A Conceptual Model of Service Quality and Its Implications for Future Research." Journal of Marketing 49 (Fall): 41-50.

Parasuraman, A., Valarie A. Zeithaml, and Leonard L. Berry. 1986. "SERVQUAL: A Multiple-Item Scale for Measuring Consumer Perceptions of Service Quality." Cambridge, Massachusetts: Marketing Science Institute.

Parasuraman, A., valarie A. Zeithaml, and Leonard L. Berry. 1988. "SERVQUAL: A Multiple-Item Scale for Measuring Consumer Perceptions of Service Quality." Journal of Retailing 64 (Spring): 1240.

Phillips, Lynn W., Dae R. Chang, and Robert D. Buzzell. 1983. "Product Quality, Cost Position and Business Performance: A Test of Some Key Hypotheses." Journal of Marketing 47 (Spring): 26-43.

Rudie, Mary J. and H. Brant Wansley. 1985. 'The Merrill Lynch Quality Program." In Services Marketing in a Changing Environment. Eds. TM. Bloch, G.D. Up h, and V.A. Zeithaml. Chicago: American Marketing Association.

Thompson, Phillip, Glenn DeSouza, and Bradley T. Gale. 1985. The Strategic Management of Services Quality. Cambridge, Massachusetts: Strategic Planning Institute.

Young, John A. 1985. "The Quality Focus at Hewlett-Packard." The Journal of Business Strategy (Winter): 6-9.

Zeithaml, Valarie A., Leonard L. Berry, and A. Parasuraman. 1985. "Problems and Strategies in Services Marketing." Journal of Marketing 49 (Spring): 33-46.

Zeithaml, Valarie A., Leonard L. Berry, and A. Parasuraman. 1986. "Communication and Control Processes in the Delivery of Service Quality." Journal of Marketing 52 (April): 35-48.



David W. Finn, Texas Christian University
Charles W. Lamb, Jr., Texas Christian University


NA - Advances in Consumer Research Volume 18 | 1991

Share Proceeding

Featured papers

See More


A Slack-Based Account of Pain of Payment

Justin Pomerance, University of Colorado, USA
Nicholas Reinholtz, University of Colorado, USA

Read More


N9. Effects of Awe on Consumers’ Preferences for Bounded Brand Logos

Fei Cao, Renmin University of China
Xia Wang, Renmin University of China

Read More


Institutional Influence on Indebted Consumers’ Understanding of Wants and Needs

Mary Celsi, California State University Long Beach, USA
Stephanie Dellande, Menlo College
Mary Gilly, University of California Irvine, USA
Russ Nelson, Northwestern University, USA

Read More

Engage with Us

Becoming an Association for Consumer Research member is simple. Membership in ACR is relatively inexpensive, but brings significant benefits to its members.