All Posts

How to Structure the Sentiment Analysis Process for Insurance Data

Sentiment reveals a lot about what customers think about an insurance brand, including how well customer representatives are resolving issues and how happy customers are with the underwriting process. This is where the sentiment analysis of structured and unstructured data can help insurers understand how their customers are feeling.

Sentiment Analysis, also commonly referred to as Opinion Extraction or Sentiment Identification, utilizes natural language processing techniques. The process utilizes the classification of subjects and attitudes.

How to Structure the Sentiment Analysis Process

The first step of the process is to determine what the focus is for your sentiment analysis. Typical areas of attention in the insurance industry are claims handling, policy pricing, product, the underwriting process, and the company brand.

The next step is to understand what is the corpus of the data. For example, structured data can be in the form of a survey, customer feedback, conversations, and call center data, including both text and voice.

Where are the sources of this data? Within the insurance ecosystem, the data sources containing customer information could span across enterprise application data, third party data providers, social media, news, and analyst reports. In technical terms, customer data can be broadly divided into two parts:

  • System data with a specific format, and
  • Unstructured free text

The final step is sentiment classification by its polarity: labeling the sentiment in a scale of neutral, positive or negative signals or any other variation of this scale. The polarity can also indicate the magnitude of the signal in either direction.

Other considerations one should be cognizant of during the structuring process:

  • What geography is considered while selecting the sources of data?
  • What is the line of business: health, life, property & casualty, agent analysis?
  • What is the mechanism to collate the data as an input? Is it a huge textual data file or voice data files collated for many years? Or is it the free text that needs to be extracted from emails?
  • What is the objective of doing sentiment analysis? Is it to identify the disposition of a particular customer, or is it to analyze for a group of people segmented by pay, education, age or region. The objective could be extended to measure the performance of a state/region/group/department etc.

Sentiment Analytics Challenges and Solutions

There are challenges with using unstructured and structured data for sentiment analytics. Fortunately, there are solutions to many of these challenges today, and as AI technologies continue to evolve, there are more answers on the horizon.

Unstructured data

The data available is free text and doesn’t have a particular schema. As an example, if one wants to analyze the customer conversation log, the accuracy could be really low. Today, techniques like natural language processing can help us identify the sentiment from unstructured data within text, videos, and images.

Context identification is the most challenging part of unstructured data analysis. In structured data, a template or a fixed format can be defined for the data sourced from an internal application or any machine or system, but in this case, it is difficult to identify the context to which each phrase or words or sentences can be mapped to.

Unlike templatized data or a fixed scope data, where the parsing is a collection of regular expressions that are looking for specific patterns (i.e., security, failures, etc.), for unstructured complex data, it might be sentimental analytics using machine learning algorithm & techniques which understand emotions. For identifying context, it is preferable to have a non-deterministic approach which is more scalable.

Understanding positive negation statements, which generally is not understood by systems. For example, a complaint by a customer who says, “I never said that Y company is not good.” The NLP mechanisms can find it challenging to determine the actual sentiment and intent. Many techniques are still unable to derive the sentiment when it comes to sarcastic remarks.

Accuracy concerns of sentiment analyzer: A large deal of unstructured data processing depends on the lexical analysis of the language. The common technique for feature mapping can be word embedding. The scalability and accuracy issues may occur if there are multiple languages involved, as some approach for sentiment analysis may be language-specific. Hence a different set of dictionaries should be maintained and the analytical model should be trained to look for important keywords or groups of keywords in the subsequent data.

Other techniques may also involve Language modeling using neural networks.

To populate the first layer of a neural network, it requires word embedding for the first layer of representation of language. For deep neural network training, a good number of labeled documents will be referred so that the network can do machine learning in identifying the sequence of words and assigning it to sentiment labels, for better performance of the network.

Library enrichment for important keywords for a domain: Many have witnessed the importance of the word “claims” and “price” when identifying sentiment of a customer within the context of an auto insurance claim.  There is a significant difference when the same word or words like “claimed” and “price” are used outside of the insurance domain.

Therefore, it calls for the continual upgrade of the library or dictionary used that specifies which words should be ignored and which words should be given special significance depending on the line of business.

Structured Data

For structured data, a major effort is spent on analyzing the templates or the scope in which data will be received as an input from the customer. This process includes analyzing the system and customizing the input processing module to identify the semantics of data and attach a sentiment weight for each data point as part of the configuration.

On the flip side, every time the source system for input is known, it is easier to reuse the same template or data structure for processing the data for sentiment analysis. For example, before even we analyze sentiment, for a group of 100 customers, the system knows that the data belongs to survey data which comes from an automated system for collecting survey queries and answers.

It is also known what is the data columns or survey queries and expected answer categories.

Most importantly, the context is already known for structured data. A person already knows if the survey has been done to analyze the retention scores for a period or the survey was used to analyze the cross-sell opportunity for a region or set of customers.

Sentiment analysis when combining both structured and unstructured data

When free text is used as a subjective answer to a formatted survey question, definitely a general sentiment analysis may not work the best. 95% of cases will require all the unstructured data to be converted to structured data with initial knowledge of the human language. This initial knowledge of human language obtained via supervised/self-supervised learning is transferred to the system. This helps to benchmark a starting point.

The next step is to use the initial knowledge of the human language in combination with additional abstraction layers at a lower level, where more detail is available for precisely mapping the input text to output sentiment labels.

Feature mapping and sentiment classification techniques are involved in both structured and unstructured data. Some of the solutions may opt for rule-based or for machine learning-based sentiment classifiers.


Sentiment analytics is a powerful tool for understanding how your customer feels about your service or product. It’s important to understand the focus of the data – where is the data coming from? What area of the business are you focusing on? What is the format of the data? Data can come in many different formats – structured or unstructured – and both have their own challenges. Once you have all the necessary information, you are ready to apply natural language processing to understand how your customer is feeling.

Interested in learning how Aureus can help you leverage machine learning to predict your customer's behavior? Click on the link below to get more information.

More Information

Stuti Singh Magdani
Stuti Singh Magdani
Stuti is the Sr. Product Development Manager at Aureus. She has completed her Bachelor of Engineering from Krishna Institute of Engineering & Tech & MBA (IT) from Symbiosis Centre for Information Technology, Pune. She has worked with Culture Machine, Citrix, Cummins, HCL Technologies.

Related Posts

Understanding Agency Sentiment

In our previous blog article, “Using AI for Increasing Agent Productivity,” we discussed how many insurance companies can only analyze agent productivity based on the premiums written and the loss ratio of their network of independent agencies. In part 2 of our series of articles on “The Top 3 Emerging Trends for Agent/Advisor Analytics Using AI”,  we will focus on the benefits of understanding agency sentiment for insurance companies that utilize a network of independent agencies.

Using AI for Increasing Agent Productivity

Currently, many insurance carriers can only analyze agent productivity based on the premiums written and the loss ratio of their network of independent agencies. Looking only at past results doesn’t necessarily provide an accurate view of how an insurance carrier can increase agent productivity going forward. By using AI for increasing agency productivity, insurers can now predict the best course of action as opposed to waiting to review past results.

AI Lessons From a Mind Master and a Grandmaster

Chess and similar games have always been used to measure the “intelligence” of machines. Chess grandmasters have always seen an able sparring partner in a good chess engine running on a capable computer. The positional evaluation, which comes by intuition and is honed and sharpened by unforgiving hours of grueling practice, can be expressed as a set of mathematical models that fast computers can use to create gameplay.