November 15, 2020

Using natural language understanding to measure customer satisfaction

By Hadi Ebrahimnejad, Senior Applied Scientist


As a member-obsessed business, Accolade uses many avenues to get user feedback on our product and service offerings. This helps ensure positive engagement, which has been shown to improve healthcare outcomes and reduce expenses.

In call centers, the most common way of gauging caller experience is through automated post-call surveys. Such feedback is often collected by transfer to an Interactive Voice Response (IVR) system following the interaction with the agent, or by an email automatically sent after the call. Accolade leverages these and other methods to monitor and adjust to feedback from our members.

Post-call IVR surveys are the trusted industry norm for capturing satisfaction, but as with all survey research, they have some disadvantages. Probably the most notable is that surveys introduce a level of self-selection bias. This self-selection bias can be exacerbated by post-call IVR surveys with low response rates.

Measuring Customer Satisfaction in Voice Calls

At Accolade, we use a variety of channels to engage with our members and to provide awareness of various programs that their employer may be providing as part of a benefits package, such as the availability of seasonal flu shots, incentives, or disease management programs. Such channels include secure mobile and web applications, email and postal mail for general benefits information, along with telephone voice interactions, both inbound to our front-line care teams and outbound from one of our Accolade Health Assistant® representatives or clinicians to a member.

Measuring and understanding member satisfaction in phone calls is a particularly valuable signal for Accolade. Such feedback provides insight into quality of our service as well as operational feedback, allowing us to constantly improve our processes. Not only do we have the ability to generate a score and monitor for changes in trend, but we can also compile qualitative feedback and drive improvement based on the feedback.

One tool employed by our Accolade Health Assistant® representatives during voice calls is to verbally solicit feedback by using a scripted “CSAT prompt”. This enables the Accolade Health Assistant® representative to address any outstanding issues and solicits a verbal statement of satisfaction or dissatisfaction during the call itself. Accolade constantly improves on ways of asking for caller satisfaction, to minimize bias in the member reply and maximize the natural flow of the interaction. Calls are automatically transcribed, and the CSAT prompt and the member response are present in the transcript.

Even with call transcription, there are significant challenges in automatically identifying a CSAT exchange between an Accolade Health Assistant® representative and a member. Accolade considers human to human interaction primary, and in the natural flow of a conversation, a scripted CSAT prompt rarely appears exactly as scripted.

Each engagement with a member has its own unique aspects, and we train our Accolade Health Assistant® representatives to be able to engage in a conversational manner. Therefore, each call can be unique, and even a scripted prompt may get slightly changed from call to call. From the member’s perspective, this can be desirable, because the rephrased prompt fits more naturally within the conversation. Even if a Accolade Health Assistant® representative adheres very closely to the script, verbal pauses, repeated words, and errors in the transcription can prevent a simple structured search from finding the prompt. Errors in the transcription alone can be a considerable challenge, as the fraction of incorrectly identified words (Word Error Rate) can be as high as one in ten.

Identifying CSAT Prompt Based on the Meaning

Over the past three years, there has been considerable progress in the field of natural language understanding (NLU), towards computer representation of concepts in words, phrases and sentences. These advances have been accelerated by the advent of modern neural network models. One such technology enables representation of words and phrases as points in high-dimensional space, or “vectors”, in such a way that words and phrases with similar meaning are close to each other in space.

Word embeddings, which map individual words to high-dimensional vectors, are among the first successful examples of this approach. In such an approach, a vector representing a given word does not change, even if the word radically changes its meaning. Consider two possible meanings of the word “flies” in the classic example “Fruit flies like a banana.”

The next evolution of word embedding technology represents words as vectors, taking into account context. Even more recent technology, known as transformers and deep averaging networks, can extend beyond individual words, converting phrases, sentences, and large bodies of text into meaningful vectors. Accolade has found sentence embeddings, which represent sentences as vectors, quite suitable for identifying a CSAT exchange in a call transcript.

Sentence embeddings enable us to quantify how likely it is for any contiguous set of words to be the CSAT prompt. A sentence embedder represents each health assistant utterance as a vector, which can be compared with the vector representing the CSAT prompt.

This approach enables the system to identify the prompt even when it is spoken differently or transcribed inaccurately. It scans through the entire call, evaluating the similarity of everything the Accolade Health Assistant® representative says with the expected prompt. The most similar sequence of words is then declared as the CSAT prompt, provided that its similarity score is above a carefully chosen threshold. The figure below shows how this works in practice.


Figure 1 Semantic similarity to the CSAT prompt (“Did I answer all your questions to your satisfaction today?”) for words inside a window sliding over an agent’s turn. The score increases as the window starts to overlap with the prompt, eventually peaking at where most words are captured. Note that the model correctly locates the prompt in spite of the transcription errors.

The sentence embedding system performs much better than simply searching for the words in the CSAT prompt. In the figure below, the orange line represents the performance of a system that scans for individual words. The blue line represents the performance of the sentence embedding approach, a clear improvement.


Figure 2 Comparison of sentence embedding with keyword matching for identifying health assistant utterances containing the CSAT prompt. The y-axis represents the fraction of time the CSAT prompt was identified. Higher coverages in the early and later segments correspond to coaching and process improvement, broadening the types of calls where the CSAT prompt is given. The sentence embedding model consistently identifies more correct utterances than searching on keywords.

Analyzing Caller Responses

Upon finding the CSAT prompt, the system analyzes the member’s immediate response. Like Accolade Health Assistants®, Accolade members phrase their answers in many different ways. Using the same sentence embedding approach, member responses are converted to semantic vectors. This provides numeric evaluation of the degree of positive feedback, as well as a means to uncover themes and topics beyond simple satisfaction.