Conversation reviews

Customer service rating scales - which one should you go for?

It's been 65 years since Peter Drucker wrote down the fundamental business principle relevant to this day: "What gets measured, gets managed." It holds true for customer service, too. If you want to stay in control of your support quality - maybe even raise the bar - you need to get your numbers straight.

To collect accurate, objective, and comparable data about your customer service performance, you need to understand that what you measure is as important as how you measure it. You can only trust the interpretation of the metrics that you've set up correctly.

That's why we're digging into the topic of picking the right rating scales in your support feedback surveys. With options ranging from giving a thumbs up to choosing between an endless number of score points, you need to figure out which system works best for your company.

9 out of 10 companies focus on the KPIs that reflect customers' opinions, while only a few track their team’s performance against their own internal quality standards. We believe in the harmony of these perspectives, which is why we built the conversation review tool Klaus to counterbalance the situation.

Thus, this post explores the different rating scales that you can use in customer service conversation reviews. However, as we're looking at the advantages and disadvantages of universal assessment systems, you can apply this knowledge to measuring any support-related feedback - including that from your customers.

2-point scales

Binary assessments allow people to give only negative or positive ratings. There are no in-betweens, the answer can be either yes or no, approve or disapprove, agree or disagree.

Though you can add an option to leave questions unanswered, this additional choice cannot affect the total ticket score calculation. The results of this metric reflect the ratio between positive and negative ratings only.

Pros: Results are easily comparable, as reviewers cannot interpret the evaluation range differently. If you look at customer interactions, agents' responses can either be correct or incorrect in each category.

Many teams love the possibility to skip some rating categories without skewing the review results (compare this to the 3-point scale below) while keeping assessments quick and simple.

Cons: Some reviewers might feel that they're missing a way to rate conversations that were correct but not perfect. If you're looking for more nuanced scoring systems, see if 5-, 7-, or 10-point scales would suit you better.

Also, it is crucial to keep an eye on how reviewers use rating categories. If the majority of your support QA consists of neutral evaluations that have no impact on the ticket score, you should look deeper into why reviewers refrain from giving positive or negative ratings. The quality of your customer service depends on it.

Klaus uses a 2-point scale for assessing customer service conversations, allowing reviewers to give thumbs up/down, or a neutral rating, to tickets in all rating categories. So far, it's proven to be an efficient means of providing feedback to support agents, leaving little room for misunderstandings related to rating scales.

3-point scales

Some companies prefer to give their reviewers a third option to the otherwise black and white rating system. It's the possibility to say that the agent’s response was neither good or bad; it was "neutral". Look at it as point 0 between positive and negative.

If you compare this to the 2-point rating scale, you will see that they can both have an option to bypass the binary assessment. However, the difference between these system lies in how the ticket score is calculated: with a 3-point scale all ratings have an impact on the total score, while 2-point scales ignore skipped ratings (if used at all).

Pros: Reviewers can differentiate between responses that were amazing, horrible, and OK. "That's fine, continue" is great feedback that often falls short in support team motivation packages.

Cons: "Neutral" sounds awfully lot like "an easy way out" to some people. It might tempt reviewers to rush through the QA process and avoid leaving uncomfortable feedback.

If you want to improve the quality of your support, you need to pinpoint the agents who are underperforming and the aspects of the conversations that are not hitting the mark. Make sure that you don't hide your opportunities for growth behind the neutral category.

4-point scales

A four-point scale is a detailed version of the binary rating system. It pushes reviewers to choose between two positive and two negative scores. It also gives them an opportunity to distinguish between terrible and weak; satisfactory and exceptional.

These responses are translated into -2, -1, 1, 2 points on the rating scale. There is no neutral option, so all rating categories must be labeled as good or bad.

Pros: 4-point scales allow you to report contrasts like how many agents were right vs wrong in certain categories. With a binary scoring system, you won’t run the risk of drawing misleading conclusions due to neutral ratings that wouldn’t belong to either side of the scale.

Cons: There is no neutral midpoint, so reviewers are always expected to choose between negative and positive scores - even if a ticket seems to be neither this nor that. Cases like this can distort the results and damage the reliability of the entire QA process.

Most teams don't feel comfortable without having a neutral rating options, so 4-point scales are quite a rare sight in customer service.

5-point scales

Companies that are looking for ways to get a more nuanced overview of customer conversations usually adopt 5-point rating scales. With five scores to choose from, reviewers can assess the degree to which the ticket performed.

This system usually has two negative, one neutral, and two positive options that stand for scores from 1 to 5. These can reflect assessments like "excellent", "good", "OK", "bad", "horrible".

Pros: Reviewers can easily make a distinction between the aspects of the conversation that agents handled perfectly, and those that were correct but could still be improved.

Cons: People might have a different understanding of what counts as bad or very bad. The consistency of your customer service relies on coherent assessments, so that's something to pay attention to. Calibration exercises become more useful as you increase the number of options reviewers have to choose from.

If you're planning to use a 5-point rating scale in your conversation reviews, make sure you explain what contributes to a response being excellent or horrible, instead of simply good or bad.

7-point scales

Seven-point evaluations add another level of detail to the support review process. It divides the positive and negative sides of the scales into three sub-ratings.

You can find 7-point scales like "excellent", "very good", "good", "neutral", "bad", "very bad", "horrible". These are also often accompanied by adjectives like "very", "moderately", and "slightly" to distinguish between the degrees of positive or negative attributes.

Pros: The large range of options should cover almost all support evaluation cases. With three choices of intensity on both sides of the spectrum and a neutral option in the middle, the reviewers should find suitable scores for all rating categories.

Cons: Reviewers' subjective interpretation of the scale can bias your conversation review results. What one reviewer regards as "horrible" might feel like "very bad" or just "bad" to others.

When using a 7-point rating scale, you should keep an eye on your reviewers' evaluation trends. Do any of them continuously stand out with especially harsh or questionably positive assessments? Regular calibrations help to iron out the differences in their interpretations and guarantee an equal level of quality in your customer service.

11-point scales

There is no limit to the number of points one can draw on a support rating scale. However, eleven is usually the highest that we've seen teams go.

11-point scales are more common in customer-based ratings than in internal evaluations. One of the most prominent use cases of this system is the Net Promoter Score survey that investigates how likely customers are to recommend the service to others on the scale from 0 to 10.

Most teams interpret the results of scales with more than 10 ratings in two steps: first, they group similar scores, and then calculate the ratios on 3- or 5-point scales. This way, options like "slightly OK", "moderately OK", and "very OK" are added up into a single response "OK".

Pros: Instead of choosing between semantically expressed options, reviewers are usually expected to think in numbers on scales with 10 and more points. Some find it easier to score tickets 8 or 9 than to figure out if, for example, the agent expressed product knowledge in a very good or excellent manner.

Cons: With 11-point rating scales, it becomes so difficult to guarantee that all reviewers interpret rating points in the same manner that you won't even try. Most of the time, everybody evaluates customer conversations based on their understanding of the meaning behind these points, and you will form broader and (hopefully) meaningful conclusions from there.

The rating scale that you use for evaluating your customer service conversations always has an impact on the review process and its results. With alternatives ranging from binary ratings to scales with scores of 10 and more, you should think carefully which system works best for your team. As we aim to give countable insight into customer service quality through internal assessments, we're using a 2-point rating scale in our conversation review tool Klaus. This way, we can be sure that all reviewers understand the scores in the same manner and that the results are comparable regardless of who provided the feedback.

If you'd like to see how thumbs up/down ratings can help you boost the quality of your support, give Klaus a go. Industry leaders like Automattic, PandaDoc, and Figma are already enjoying the perks of systematic customer service feedback.

Get the Support Quality Blog posts via email

No spam, maybe also an occasional cat picture.