All Posts

How Artificial Neural Networks Unlock Insights from Unstructured Data

One of the challenges faced by data scientists is dealing with unstructured data using traditional machine learning models. These models are trained on structured data that have input features with corresponding output labels. When using unstructured data, the data cannot be directly used as an input feature. One approach is to use Artificial Neural Networks (ANN) to unlock business insights from unstructured data.

The Growth of Unstructured Data

Traditionally most of the data that was available to companies was structured data. Structured data rows is defined as data that can be easily organized in traditional relational databases in the form of and columns. For insurance companies, the structured data they possess includes:

1. Data about their customer

A. Contact details

B. Demographic information like their age, gender, etc.

2. Policy information like issue dates, expiry dates, premium, sum assured, renewal details, etc.

3. Claims history like claimed amount, date of claim, claim status, etc.

4. Product information like characteristic features of the various insurance plans

5. Prospect Data

6. Data on the company's agents

However, in recent years, with technology getting smarter, faster, and more widely accessible, companies' amount and type of data have changed drastically. Trends such as a surge in social media usage and easy availability of cameras in cell phones have resulted in much data being available to companies in the form of images, videos, audio, and free text. Even for the companies, the cost of dealing with this kind of data has gone down significantly with cloud storage and cloud computing availability.

This type of data is commonly referred to as unstructured data. It is difficult and impractical to store and process this data in the form of traditional tables with rows and columns, i.e., it is difficult to define a standard structure for this type of data.

Insurance companies naturally possess and deal with a good deal of unstructured data, a few examples being:

  • Social media comments
  • Customer emails
  • Audio files of customer calls
  • Customer feedback from surveys
  • Images/videos of car accidents/car damage (auto insurance)
  • X-ray images (health insurance/life insurance)
  • Scanned document images submitted at the time of policy issuance

The challenge of dealing with unstructured data using traditional machine learning models

Traditional or classical machine learning models are trained on data with certain input features and corresponding output labels. The machine learning model learns from this data and progressively improves its ability to predict the input features' output label.

One complication is that the data we have cannot be directly used as input features with unstructured data. While some processing and transformation of data is required even for structured data, for unstructured data converting the data to features requires a good deal of technical and domain expertise. Feature engineering is a very crucial step in using classical machine learning techniques with unstructured data.

Let’s take the example of an auto insurance company, determining the damage to a car using images.

How Artificial Neural Networks Unlock Insights from Unstructured Data pic 1

This image by itself cannot be used as an input feature. Deliberate feature engineering must be done to define edges, corners, contours, change in shades, etc. by different calculations using pixel values.

Deep Learning using Artificial Neural Networks

Artificial Neural Networks (ANN) are a set of algorithms inspired by the human nervous system's anatomy. ANN models are composed of multiple layers, and each layer is composed of several neurons. Each of the neurons receives information as input from the previous layers and passes on its calculations as output to the next layer. These layers and neurons together form a network, which is called as the artificial neural network.

How Artificial Neural Networks Unlock Insights from Unstructured Data pic 2

Why do artificial neural networks perform better than classical models for unstructured data?

The principal benefit of using Artificial Neural Networks for unstructured data is these models' ability to detect input features on its own. There are quite a few types of ANNs that deal with unstructured data. Here, we will take the example of 2 types of neural networks – Convolutional Neural Network and Recurrent Neural Network.

Convolutional Neural Network

Taking the example of an auto insurance company accessing the damage to cars using car images:

This is a classic example of object detection. The purpose of the problem is to detect dents and scratches.

Convolutional Neural Network (CNN) is a special kind of ANN that works best for image analytics. Image analytics can get quite complicated as each image is composed of a large number of pixel values. If the image is colored, that also adds to the complexity. CNN modifies ANN to reduce the complexity of this input while retaining the algorithm's ability to discern the various features from the images.

Like every other ANN, CNN may also be composed of many layers. To detect the defects in the car image, the earlier layers of CNN may be simply detecting only the vertical and horizontal edges. Each successive layer detects increasingly complex features till the final layers can finally detect the dents and scratches in the car image.

CNN works better than the classical methods, as no domain or technical expertise goes into defining what features are critical in defining dents. The CNN algorithm optimizes its feature detection in such a way that the accuracy in detecting dents is also maximized.

CNNs have a lot of potential in the health and life insurance market, where companies deal with a lot of images like X-rays, MRI scans, etc. The companies can use CNNs at the stage of underwriting. The models can be used to identify abnormalities in these scans faster and with higher accuracy. This information can be used to access the risk and price the policy accordingly.

The models can also be used to examine the scans at the time of claims and possibly even identify fraudulent cases. While such use cases surely need medical personnel's expertise, artificial neural network models can help speed up the process and identify more accurate evidence.

Recurrent Neural Network

Recurrent Neural Network (RNN) is another type of ANN that is primarily tailored to look at sequences as the input. The most common application of RNN is that of text, which is represented as a sequence of words.

For an insurance company, one of the main parameters that it looks at is customer feedback. While customer feedback, if given in the form of ratings, can be a good source of structured data, the feedback is most often more valuable if expressed in the form of free text. Sentiment analysis on free form customer feedback has been successfully attempted many times using classical machine learning methods such as SVM.

The benefit of using RNN in such cases is that it has an internal memory that considers the words in the text and the order in which they appear. This optimizes the accuracy with which it recognizes the text sentiment.

e.g., Your branch service is very good.

The word good has a positive sentiment attached to it. The word 'very' by itself has no sentiment attached to it. RNN, as an algorithm with memory, while looking at the word 'good,' recognizes that when it is preceded by the word 'very' and enhances the positive sentiment inherent in the word 'good.'


CNN and RNN are just two examples of most used ANNs for unsupervised data. Artificial neural networks and deep learning have been around for a long time. They were not used widely until recent times as ANNs were computationally expensive in terms of time and cost. Between computation becoming cheaper and faster, and data becoming more accessible and more varied, artificial neural networks applications to unsupervised data continue to grow.

Are you trying to extract unstructured data insights to better connect with your customers? Click on the link below to get more information.

More Information

Neeraja Vaidya
Neeraja Vaidya
Neeraja has a Post Graduate degree in Economics from Mumbai University and is a part of the data sciences team, responsible for developing statistical models for products.

Related Posts

Transfer Learning: A New Age of Machine Learning

In recent years, Machine Learning (ML) algorithms have advanced and are now capable of learning accurate and complex patterns provided large and labeled data samples are available. However, many ML implementations fail to generalize when new data points are encountered, especially data points with different and unseen patterns or conditions from training samples.

Trust in the Evolution of the Customer's Journey

This is part 1 of a 2-part series "Trust: The Key Ingredient for a Successful Insurance Customer Journey." Today, everyone in the business world is talking about the customer journey and experiences starting from E-commerce, banking, and many other industries. So, what is customer experience? What is new about it?

3 Ways to Target the Right Customers in the Insurance Industry

This is Part 3 of our blog series, "Data Science Use Cases in Insurance." The insurance industry isn’t the same as it was 20 years ago. It has become much more competitive as tech companies come into the picture with new and innovative ways to compete in order to gain a foothold in the insurance industry. Consumers want to save money and will make their decisions based on the lowest price available. Some websites will help the consumer compare carriers’ prices and offerings to choose the best deal. Unfortunately, this is causing insurance companies to make price their priority over quality and customer satisfaction.