All Posts

How to Structure the Sentiment Analysis Process for Insurance Data

Sentiment reveals a lot about what customers think about an insurance brand, including how well customer representatives are resolving issues and how happy customers are with the underwriting process. This is where the sentiment analysis of structured and unstructured data can help insurers understand how their customers are feeling.

Sentiment Analysis, also commonly referred to as Opinion Extraction or Sentiment Identification, utilizes natural language processing techniques. The process utilizes the classification of subjects and attitudes.

How to Structure the Sentiment Analysis Process

The first step of the process is to determine what the focus is for your sentiment analysis. Typical areas of attention in the insurance industry are claims handling, policy pricing, product, the underwriting process, and the company brand.

The next step is to understand what is the corpus of the data. For example, structured data can be in the form of a survey, customer feedback, conversations, and call center data, including both text and voice.

Where are the sources of this data? Within the insurance ecosystem, the data sources containing customer information could span across enterprise application data, third party data providers, social media, news, and analyst reports. In technical terms, customer data can be broadly divided into two parts:

  • System data with a specific format, and
  • Unstructured free text

The final step is sentiment classification by its polarity: labeling the sentiment in a scale of neutral, positive or negative signals or any other variation of this scale. The polarity can also indicate the magnitude of the signal in either direction.

Other considerations one should be cognizant of during the structuring process:

  • What geography is considered while selecting the sources of data?
  • What is the line of business: health, life, property & casualty, agent analysis?
  • What is the mechanism to collate the data as an input? Is it a huge textual data file or voice data files collated for many years? Or is it the free text that needs to be extracted from emails?
  • What is the objective of doing sentiment analysis? Is it to identify the disposition of a particular customer, or is it to analyze for a group of people segmented by pay, education, age or region. The objective could be extended to measure the performance of a state/region/group/department etc.

Sentiment Analytics Challenges and Solutions

There are challenges with using unstructured and structured data for sentiment analytics. Fortunately, there are solutions to many of these challenges today, and as AI technologies continue to evolve, there are more answers on the horizon.

Unstructured data

The data available is free text and doesn’t have a particular schema. As an example, if one wants to analyze the customer conversation log, the accuracy could be really low. Today, techniques like natural language processing can help us identify the sentiment from unstructured data within text, videos, and images.

Context identification is the most challenging part of unstructured data analysis. In structured data, a template or a fixed format can be defined for the data sourced from an internal application or any machine or system, but in this case, it is difficult to identify the context to which each phrase or words or sentences can be mapped to.

Unlike templatized data or a fixed scope data, where the parsing is a collection of regular expressions that are looking for specific patterns (i.e., security, failures, etc.), for unstructured complex data, it might be sentimental analytics using machine learning algorithm & techniques which understand emotions. For identifying context, it is preferable to have a non-deterministic approach which is more scalable.

Understanding positive negation statements, which generally is not understood by systems. For example, a complaint by a customer who says, “I never said that Y company is not good.” The NLP mechanisms can find it challenging to determine the actual sentiment and intent. Many techniques are still unable to derive the sentiment when it comes to sarcastic remarks.

Accuracy concerns of sentiment analyzer: A large deal of unstructured data processing depends on the lexical analysis of the language. The common technique for feature mapping can be word embedding. The scalability and accuracy issues may occur if there are multiple languages involved, as some approach for sentiment analysis may be language-specific. Hence a different set of dictionaries should be maintained and the analytical model should be trained to look for important keywords or groups of keywords in the subsequent data.

Other techniques may also involve Language modeling using neural networks.

To populate the first layer of a neural network, it requires word embedding for the first layer of representation of language. For deep neural network training, a good number of labeled documents will be referred so that the network can do machine learning in identifying the sequence of words and assigning it to sentiment labels, for better performance of the network.

Library enrichment for important keywords for a domain: Many have witnessed the importance of the word “claims” and “price” when identifying sentiment of a customer within the context of an auto insurance claim.  There is a significant difference when the same word or words like “claimed” and “price” are used outside of the insurance domain.

Therefore, it calls for the continual upgrade of the library or dictionary used that specifies which words should be ignored and which words should be given special significance depending on the line of business.

Structured Data

For structured data, a major effort is spent on analyzing the templates or the scope in which data will be received as an input from the customer. This process includes analyzing the system and customizing the input processing module to identify the semantics of data and attach a sentiment weight for each data point as part of the configuration.

On the flip side, every time the source system for input is known, it is easier to reuse the same template or data structure for processing the data for sentiment analysis. For example, before even we analyze sentiment, for a group of 100 customers, the system knows that the data belongs to survey data which comes from an automated system for collecting survey queries and answers.

It is also known what is the data columns or survey queries and expected answer categories.

Most importantly, the context is already known for structured data. A person already knows if the survey has been done to analyze the retention scores for a period or the survey was used to analyze the cross-sell opportunity for a region or set of customers.

Sentiment analysis when combining both structured and unstructured data

When free text is used as a subjective answer to a formatted survey question, definitely a general sentiment analysis may not work the best. 95% of cases will require all the unstructured data to be converted to structured data with initial knowledge of the human language. This initial knowledge of human language obtained via supervised/self-supervised learning is transferred to the system. This helps to benchmark a starting point.

The next step is to use the initial knowledge of the human language in combination with additional abstraction layers at a lower level, where more detail is available for precisely mapping the input text to output sentiment labels.

Feature mapping and sentiment classification techniques are involved in both structured and unstructured data. Some of the solutions may opt for rule-based or for machine learning-based sentiment classifiers.


Sentiment analytics is a powerful tool for understanding how your customer feels about your service or product. It’s important to understand the focus of the data – where is the data coming from? What area of the business are you focusing on? What is the format of the data? Data can come in many different formats – structured or unstructured – and both have their own challenges. Once you have all the necessary information, you are ready to apply natural language processing to understand how your customer is feeling.

Interested in learning how Aureus can help you leverage machine learning to predict your customer's behavior? Click on the link below to get more information.

More Information

Stuti Singh Magdani
Stuti Singh Magdani
Stuti is the Sr. Product Development Manager at Aureus. She has completed her Bachelor of Engineering from Krishna Institute of Engineering & Tech & MBA (IT) from Symbiosis Centre for Information Technology, Pune. She has worked with Culture Machine, Citrix, Cummins, HCL Technologies.

Related Posts

Data and Innovation: 2 Sides of the Same Coin

As we set our feet in 2023, having experienced a roller-coaster ride last year thanks to the geopolitical tensions and some lingering rub-off effects of COVID-19, it drives home that "change is the only constant." Like any other industry, insurance is undergoing paradigm changes at different levels, whether recruiting potential candidates or customer onboarding, to name a few. However, a common thread that ties the myriad business functions of an insurance company has been data and innovation. There has been an ever-increasing need for insurance providers to use data and embrace innovation in their routine activities, eventually to stand the cut-throat competition.

Intelligent Risk Assessment in Insurance

Risk Management is a core function within the insurance industry. It is a vital responsibility of the underwriting team. Insurance companies collect data scattered across different business units in various formats – some of which are paper and digital,  most of which are typically unstructured. The underwriting team doesn't have immediate access to the information required for internal and external decision-making, resulting in delays in making decisions and costly mistakes.

Why Does the Long-term Nature of Life Insurance Products Make Customer Retention Difficult?

Most insurers offer similar products and services, which makes it challenging to attract new customers and retain them. As an industry, insurance is low-touch, and insurers seldom interact with their customers. A report shows that the top companies have an average customer retention rate of 93 - 95 percent, while insurance companies have an average of 84 percent.