This paper appeared in the Journal of International Marketing and Marketing Research, Vol. 26, February 2001, 41-46. ©2001
Do Numeric Values
Influence Subjects’ Responses to Rating Scales?
Taiwo Amoo, Ph.D.
Assistant Professor of Business and Quantitative Methods
Brooklyn College (C.U.N.Y.)
E-mail: tamoo@snet.net
Hershey H. Friedman, Ph.D.
Professor of Business and Marketing
Brooklyn College (C.U.N.Y.)
E-mail: x.friedman@att.net
Do Numeric Values
Influence Subjects’ Responses to Rating Scales?
Abstract
A study was conducted with a random sample of 139 college students to determine whether using different numbering schemes for rating scales would achieve different results. Two different rating scales, one numbered from +4 to -4 and another numbered from 9 to 1, were employed. The results indicated that the +4 to -4 scales produced more positive evaluations than did the 9 to 1 scales. It appears that a negative number next to the negative-evaluation descriptor, e.g., "awful," makes the descriptor seem much more negative than when the lowest number on the scale is a 1. This causes a difference in the means and frequency distributions of scales numbered +4 to -4 as compared with using the same scales with numbers ranging from +9 to +1.
Introduction
One of the important tools in the arsenal of a researcher is the rating scale. Rating scales are used to measure everything from voting intentions to customer satisfaction. It is no exaggeration to state that rating scales are indispensable to researchers in such disciplines as marketing, management, political science, management information systems, psychology, and sociology, among others.
All measuring devices, including rating scales, must be objective if they are to have any value to a researcher. Unfortunately, responses to rating scales can be affected by many different factors. These factors include: the number of scale categories used and whether the number of categories is even or odd; whether or not the rating scale is balanced (an equal number of favorable and unfavorable response choices); wording of questions; and the connotations of category labels (Churchill and Peter, 1984; Friedman and Amoo, 1999a, 1999b; Malhotra, 1996). Rating scales are also problematic when used to determine whether groups that are different in some way (e.g., wealthy individuals and poor individuals, men and women, or Americans and Chinese) rate their intensity of feelings about something (Goode, 2001). For instance, it would not be appropriate to use rating scales to determine whether homeless people living in Calcutta have a different level of happiness from individuals living in Beverly Hills, or whether smokers desired cigarettes more than heroin addicts desired heroin. The reason is that the connotations of descriptors such as "extremely satisfied" or "very strong desire" are not necessarily the same for different groups.
Some rating scales contain numbers. For instance, the Least-Preferred Coworker (LPC) scale used in management and described in Nahavandi (2000, p. 116) employs 18 rating scales with bipolar adjectives and numbers from 8 to 1 (or 1 to 8 when the negative-evaluation side of the scale is on the left), as follows:
Pleasant 8 7 6 5 4 3 2 1 Unpleasant
Another scale described in Nahavandi (2000, p. 71) uses numbers from 5 through 0, with 5 representing "certainly always true" and 0 indicating "certainly always false." The LMX 7 scale discussed in Northouse (1997, p. 126) uses both numbers and descriptors. For some of the items, 1 indicates "none" and 5 indicates "very high." For another item in the scale, 1 indicates "rarely" and 5 indicates "very often." If the rating scales contain numbers, then it is quite possible that these numeric values can change the connotations (or the intensity of the connotations) of the scale descriptors. Does the use of a 1 to represent "none" produce the same results as the use of 0 to indicate "none?" Does the use of 1 to indicate "strongly disagree" and 5 to indicate "strongly agree" yield the same results as a scale where -2 represents "strongly disagree" and +2 represents "strongly agree?"
Schwarz et al. (1991) found that the responses of German adults to the question "How successful would you say you have been in life?" was influenced by the numeric values provided to give meaning to the scale labels. When the scale ranged from 0 ("not at all successful") to 10 ("extremely successful"), 34 percent selected values between 0 and 5. When the scale went from -5 ("not at all successful") to +5 ("extremely successful"), only 13 percent selected the values of -5 to 0. The authors concluded that numeric values ranging from 0 to 10 suggest "the absence or presence of the attribute to which the scale pertains," i.e., degree of success. Negative values from -5 to 0, on the other hand, suggest the presence of the opposite of the attribute, i.e., being a failure.
Schwarz and Hippler (1995) replicated the above findings. In this study, German participants were asked to rate politicians using a rating scale anchored with the descriptors "don’t think very highly" and "think very highly." The scale was numbered either from -5 to +5 or 0 to 10. Mean ratings for the -5 to +5 scale were more positive than for the 0 to 10 scale. They also found that this effect existed whether the study was conducted by written questionnaire or telephone survey.
Haddock and Carrick (1999) asked students to evaluate Tony Blair one day before the election for British Prime Minister on a scale ranging from "not at all _______" to "extremely ______" (caring, friendly, honest, intelligent). Students were randomly assigned to two types of rating scales; one had the endpoints anchored from -5 to +5 and the other used numbers from 0 to 10. When the scale ranged from -5 ("not at all _____") to +5 ("extremely ____"), Blair was rated more favorably than when the scale ranged from 0 ("not at all _____") to 10 ("extremely ____"). 78% selected the positive side of the -5 to +5 scale and 63% selected the positive side of the 0 to 10 scale. Haddock and Carrick also found that the scale used (-5 to +5 or 0 to 10) before the election affected participants’ evaluations in a survey conducted five days after the election. Apparently, a more positive evaluation --even if caused by the artifice of the numbering of the scale-- had a carryover effect and influenced subsequent judgments about Blair.
The purpose of the current study was to determine whether the effect of numbers on rating scales would be present in research using American subjects and with the type of scales more typically used in marketing and business research, e.g., client satisfaction ratings, likelihood of making a change, and overall performance.
Method
A sample of 139 undergraduate students was obtained at a large New York City college. Each subject was asked to complete a two-page questionnaire consisting of eight questions, and was randomly assigned to one of two versions of the questionnaire. The first question was the same for both questionnaires. Subjects indicated the extent of their agreement, on a five-point scale going from "strongly agree" to "strongly disagree," with the statement "The core studies increased my skills in _______" This question resulted in seven different ratings since skills in seven areas were listed (Vocabulary, Reading, Writing, Listening, Communicating, Reasoning, General Knowledge).
The rating scales for the next four questions, the experimental manipulation, were different for the two questionnaires. The four questions asked subjects for: their overall evaluation of their college (OVERALL); the likelihood of changing colleges (CHANGE); the quality of their instructors (QUALITY); the value of their education at the college (VALUE). The scales on these four questions alternated between a +4 to -4 scale and a 9 to 1 scale. The first questionnaire began this section with the +4 to -4 scale on the OVERALL question, whereas the other questionnaire used the 9 to 1 scale on this question. For example, this section of the first questionnaire is displayed in Figure 1. For analysis purposes, these four questions were all coded from 1 to 9, with 1 representing the descriptor on the left and 9 representing the descriptor on the right.
---INSERT FIGURE 1 ABOUT HERE.---
The last three questions were the same for both conditions. These were: an overall rating (6-point scale ranging from "excellent" to "awful") of the core courses offered at the college and some demographics information.
Results
As noted above, subjects were randomly assigned to one of the two questionnaires in order to ensure that that the two questionnaire-groups (Q-groups) were equivalent. Thus any question that was identical for the two questionnaires should have produced statistically equivalent means. There were eight such measures for both Q-groups that allowed the computation of means: seven agree/disagree scales and an overall rating of the core courses. A multivariate analysis of variance (MANOVA) on these eight measures resulted in a Wilks' Lambda of .958, which is approximated by an F-statistic (d.f. = 8 and 126) of .697; p > .69. This indicates that the vectors of eight means for the two Q-groups were statistically equivalent. Indeed, the univariate t-tests were also not significant. Apparently the two Q-groups were statistically equivalent, as expected.
In analyzing the data arising from the experimental manipulation, a multivariate analysis of variance comparing the two vectors of four means -- comparing the responses on the +4 to -4 scale with the responses to the 9 to 1 scale -- resulted in a Wilks' Lambda of .596, which is approximated by an F-statistic (d.f. = 15 and 122) of 5.508; p< .000. Thus, analyzing the responses to the two different scales from a multivariate approach (i.e., considering the responses to all 4 items simultaneously) showed that there was a significant difference in how subjects responded.
---INSERT TABLE 1 ABOUT HERE.---
Table 1 indicates that all four of the univariate t-values were significant at the .05 level. For the OVERALL question, using the +4 to -4 scale as compared to the 9 to 1 scale caused subjects to be more reluctant to use the negative-evaluation side of the scale. The mean for the +4 to -4 scale indicated a shift to the left side (towards the positive numbers of +4, +3, +2, and +1) when compared to the 9 to 1 scale (mean of 3.49 vs. mean of 4.61). Apparently, the negative numbers (-4, -3, -2, -1) made the "not at all satisfied" anchor on the right side seem much more negative than when the numbers on the right side of the scale ranged from 4 to 1. In fact, it can be readily seen from Table 2 that 11% (8/72) selected the negative-evaluation side of the +4 to -4 scale (i.e., the values of -4, -3, -2, -1); whereas, 26% (18/69) selected the negative-evaluation side (i.e., the values of 4, 3, 2, and 1) of the 9 to 1 scale.
---INSERT TABLE 2 ABOUT HERE.---
For the CHANGE question, the mean for the +4 to -4 scale was 6.28 as opposed
to 7.25 for the 9 to 1 scale. Since both scales were coded 1 to 9, this again
indicates a reluctance on the part of respondents to use the negative-evaluation
side of the scale when it consists of negative numbers. Table 2 shows that 54%
(37/69) selected the negative-evaluation side of the +4 to -4 scale; whereas,
79% (57/72) selected the negative-evaluation side of the 9 to 1 scale. Interestingly,
for this scale, the right side of the scale was not a negative evaluation, but
simply indicated a reluctance to change colleges. Here, the +4 to -4 scale indicated
a higher likelihood of changing colleges than did the 9 to 1 scale. Again, the
negative number adjacent to the "extremely unlikely" side of the scale made
this seem much less unlikely than when the adjacent number was a 1.
For the QUALITY question, the mean for the +4 to -4 scale was 3.51 as opposed to 4.61 for the 9 to 1 scale, i.e., a higher rating for the quality of teachers. This too was caused by the following: 10% (7/72) selected the negative-evaluation side of the +4 to -4 scale; whereas, 26% (18/69) selected the negative-evaluation side of the 9 to 1 scale.
For the VALUE question (ranging from ‘extremely valuable" to "not at all valuable"), the mean for the +4 to -4 scale was 3.26 versus 4.10 for the 9 to 1 scale, i.e., a higher rating for the value of the education received at the students’ college. We see from Table 2 that 7% (5/68) selected the negative-evaluation side of the +4 to -4 scale; whereas, 23% (16/71) selected the negative-evaluation side of the 9 to 1 scale.
Conclusion
Rating scales are used to make various important decisions including academic tenure, hiring decisions and product changes. It is therefore important to understand how rating scales work. This research shows that the choice of numerical values for rating scales can influence the results. The findings of Schwarz et al. (1991) can be extended to many different types of scales, including scales dealing with overall satisfaction and hedonic ratings. Using a scale going from, say, +4 to -4 will not achieve the same results as a scale ranging from 9 to 1. It appears that subjects perceive the negative-evaluation side of the scale as being more negative when there are negative numbers on that side rather than positive numbers. In the present study, this happened consistently for four different types of rating scales. In other words, a scale ranging from +4 to -4 does not produce the same results as a scale ranging from 9 to 1 even if both employ the same descriptors. In fact, an unethical researcher who desires to cause an object to have a better evaluation simply has to use negative numbers on the negative-evaluation side of the scale rather than positive numbers. Further research should be conducted to determine which of the two types of evaluation scales, if any, produce the most valid, and least biased, results.
References
Churchill Jr., Gilbert A. and J. P. Peter (1984). "Research Design Effects on the Reliability of Rating Scales: A Meta-analysis." Journal of Marketing Research, 21 (4), 360-375.
Friedman, Hershey H. and Taiwo Amoo (1999a). "Rating the Rating Scales." Journal of Marketing Management, 9 (Winter), 114-123.
Friedman, Hershey H. and Taiwo Amoo (1999b). "Multiple Biases in Rating Scale Construction." Journal of International Marketing and Marketing Research, 24 (3), 115-126.
Goode, Erica (2001). "Researcher Challenges a Host of Psychological Studies." The New York Times, January 2, F1-F7.
Haddock, Geoffrey and Rachael Carrick (1999). "How to Make a Politician More Likeable and Effective: Framing Political Judgments Through the Numeric Values of a Rating Scale." Social Cognition, 17(3), 298-311.
Malhotra, N. K. (1996). Marketing Research: An Applied Orientation. Upper Saddle River, NJ: Prentice Hall.
Nahavandi, Afsaneh (2000). The Art and Science of Leadership. Upper Saddle River, NJ: Prentice Hall.
Northouse, Peter G. (1997). Leadership: Theory and Practice. Thousand Oaks, CA: Sage Publications.
Schwarz, Norbert, Barbel Knauper, Hans J. Hipler, Elisabeth Noelle-Neumann, and Leslie Clark (1991). "Numeric Values May Change the Meaning of Scale Labels," Public Opinion Quarterly, 55(4), 570-582.
Schwarz, Norbert and Hans J. Hipler (1995). "The Numeric Values of Rating Scales: A Comparison of their Impact in Mail Surveys and Telephone Interviews," International Journal of Public Opinion Research, 7, 72-74.
Figure 1
Four Experimental Questions: Version One
Overall, how satisfied are you with ___College?
Extremely satisfied +4 +3 +2 +1 0 -1 -2 -3 -4 Not at all satisfied
What is the likelihood that you will change colleges?
Extremely likely 9 8 7 6 5 4 3 2 1 Extremely unlikely
Overall, how would you rate the quality of teachers at ___College?
Excellent +4 +3 +2 +1 0 -1 -2 -3 -4 Awful
Overall, how valuable is the education you received at ___College?
Extremely valuable 9 8 7 6 5 4 3 2 1 Not at all valuable
Table 1
Univariate t-test Results for the Four Variables
|
+4 to -4 scale
|
9 to 1 scale
|
|||||||
| Variable |
n
|
Mean
|
s.d.
|
n
|
Mean
|
s.d.
|
t-value
|
probability
|
| Overall |
72
|
3.49
|
1.66
|
69
|
4.61
|
2.01
|
-3.62
|
.000
|
| Change |
69
|
6.28
|
2.59
|
72
|
7.25
|
2.26
|
-2.38
|
.019
|
| Quality |
72
|
3.51
|
1.53
|
69
|
4.61
|
1.93
|
-3.75
|
.000
|
| Value |
68
|
3.26
|
1.67
|
71
|
4.10
|
1.98
|
-2.68
|
.008
|
Note: The left side of the scale was coded as 1 and the right side as 9 for both scales.
Table 2
Frequencies for the 4 Measurements by Scale Type
|
|
|
First 4 points on scale |
Middle point |
Last 4 points on scale |
|
Overall Scale |
+4 to -4 |
59 |
5 |
8 |
|
9 to 1 |
43 |
8 |
18 |
|
|
Likelihood Of Change Scale |
+4 to -4 |
15 |
17 |
37 |
|
9 to 1 |
9 |
6 |
57 |
|
|
Quality of Teachers Scale |
+4 to -4 |
57 |
8 |
7 |
|
9 to 1 |
36 |
15 |
18 |
|
|
Value of Education Scale |
+4 to -4 |
56 |
7 |
5 |
|
9 to 1 |
47 |
8 |
16 |