All Posts

Stream-based Data Integration – No Formatting Required

One of the challenges insurers face when implementing any new cloud-based application into their workflow is the integration of both internal and external data. Gaining access and permission to use internal data can be the first hurdle. Adding the requirement to format the data in a specific format can be a show-stopper.

Stream-based data integration eliminates the need to format data in a specific way regardless of the source and is also a key component for enabling real-time predictive analytics capabilities.

What is Stream-based Data Integration?

Stream-based data integration is the ability to access and query data within a very short time after the data is captured. It allows data to be analyzed almost immediately – no waiting for batch processing to complete to get your data. Stream-based data integration, or stream processing, allows you quick access to time-sensitive information.

Stream-based data integration has emerged as a preferred method for data scientists and analysts to manage and manipulate data in complex environments that involve a variety of data stores, integration methodologies, and processing needs.

Stream-based data integration enables organizations to use data in real-time to: 

  • Improve predictive analytics
  • Extend distribution channels
  • Modernize aging systems – refresh and extend legacy applications

Stream-based data integration is required to generate analytics results in real time. By building data streams, you can feed data into analytics tools as soon as it is generated and get near-instant analytics results.

Fraud detection is an excellent example of where stream-based data integration is commonly used. When claim transaction data is streamed, fraudulent activities can be detected in real time allowing these transactions to be investigated quickly and before they are completed.

How Does it Work?

All data that comes in is captured and stored, even if it is not being used immediately (we can always find a use for it in the future). As the data comes in, it is processed and analyzed. The results are then examined, and alerts are given when a certain threshold or criteria is met.

In an upcoming blog, we will elaborate in more detail how the variables and events from data streams are configured, identified, and stored for use by predictive analytics. 

Three Types of Data

The insurance industry produces an enormous amount of data on a daily basis that contains valuable information for predictive analytics. This data is categorized into three types of data:

1. Structured Data

This is clean, organized data that can be easily searched by search engines. The format is clearly defined and adhered to. It contains fields such as dates, numbers, and a predefined list of values. The rules are defined in the database, and adherence is pretty much guaranteed. This allows for little to no surprises when working with the data. Structured data is the easiest type of data to work with because it’s very predictable; it also represents a very small percentage of all data.

Keep in mind that just because data is structured, that doesn’t mean it’s simple. There are hundreds of structured systems, each with thousands of tables and hundreds of thousands of data structures. The complexity of data goes beyond the relatively simple structure and format of the data to the semantics of what each data value actually means and how it relates to other values.

2. Semi-structured Data

Semi-structured data defines the data format and its meaning and mandates that the application understands how to handle the data. This data cannot easily be analyzed but will most likely have metadata that can be analyzed. It may be formatted according to certain rules that can vary depending upon what data is provided. For example, image or x-ray files contain mostly pixels, and some words, but will most likely have metadata that will provide more insight.

When semi-structured data is exchanged between organizations, the data will be in XML/JSON format with metadata tags that can be analyzed. Below are some examples of semi-structured data in the insurance industry when in XML/JSON format:

  • ACORD (Association for Cooperative Operations Research and Development)
  • HIPAA (Health Insurance Portability and Accountability Act)
  • SWIFT (Society for Worldwide Interbank Financial Telecommunication) files

3. Unstructured Data

Unstructured data cannot be easily searched and doesn’t follow any specified format. Handling semi-structured and unstructured data is where Big Data comes in. This data is very complex and challenging to process. 

Here are some examples of unstructured data:

  • Voice messages
  • Claim diary notes
  • Surveys
  • Underwriter notes
  • Email
  • Call Center Logs

Insurers also use a large amount of unstructured data from external sources such as:

  • Medical records
  • DMV reports
  • Social media posts

Sources such as emails contain specific information such as the sender, recipient, subject and body text. The content in the body of the email is generally more difficult to extract in an ordered fashion. Until relatively recently, processing unstructured data wasn’t cost-effective due to the difficulties involved. Now, with natural language processing (computer understanding of normal human language), data integration technology can understand the sentiment or other ideas contained within the body of emails or social media communications.

Where is stream-based data integration used? 

Stream-based data integration has become more and more popular and is being used in all areas of business.

Financial Services: Real-time risk analysis, as well as monitoring and reporting of financial securities trading, is a huge help to this industry. It can also calculate foreign exchange prices.

Insurance: Insurance fraud is a growing epidemic, and stream-based data integration is an effective way to detect and deter it. It can identify and raise alerts on suspicious activity to give the insurer near real-time information when specific events occur.

Marketing: Real-time marketing can create advertisements and offers tailored to specific geographical areas or situations unique to a customer.  

Retail: Many leading retailers are now using streaming data to have up-to-date data on their inventory and even their sales patterns at a particular store. This allows companies to respond to their customers’ buying patterns at unprecedented speed and granularity.

Supply Chain and Logistics: The ability to track shipments in real-time and detect and report on potential delays in arrival is standard expectations these days from consumers. It can also control stocking levels based on a change in demand and shipping predictions. 

The Benefits of Stream-Based Data Integration

In the insurance industry, it almost always comes down to whether or not there is business and IT resources available to work on the project.

1.  No reformatting of data is required

Very often, the data you need is siloed in different areas of the business and IT departments. Obtaining access to all the different data sets within the organization can be challenging. The fact that the data does not have to be reformatted is a benefit because the business areas don’t have to spend precious extra time in reformatting or organizing the data.

2.  Start with a two or three data sources as opposed to ten

With this approach, you can start with just a few data sources and add on over time. You don’t need to have the perfect structure at the beginning.

3.  Stream-based data integration is more practical and easier than traditional approaches

Traditional approaches such as data warehouses are not ideal for enabling real-time predictive analytics capabilities, as they use the following approach for data ingestion:

  • Fixed target structure in which to ingest data
  • Source data is transformed to fit into the target structure
  • Any 'alien' data is just ignored and dropped
  • Unstructured data is sparingly allowed
  • Reporting and analytics are run on top of this target structure
  • Structure is reviewed periodically for a change in definitions.

Stream-based data integration allows for organizations to absorb data without defining it enabling them to discover the structure and value of data on an ongoing basis.


Stream-based data integration eliminates the need to format data in a specific way regardless of the source and is also a key component for enabling real-time predictive analytics capabilities. Stream-based data integration also supports structured data, semi-structured data, and unstructured data.

In addition to the insurance industry, stream-based data integration has been deployed in financial services, marketing, retail and supply chain and logistics.

Organizations can start predictive analytics projects with only one or two data sources as opposed to waiting to have the perfect structure in place.

Stream-based data integration allows for organizations to absorb data without defining it enabling them to discover the structure and value of data on an ongoing basis.

Interested in learning how Aureus can help you understand and utilize stream-based data integration and how it can help you understand your customer's journey? Click on the link below to get more information.

More Information


Nitin Purohit
Nitin Purohit
Nitin is CTO and co-founder at Aureus. With over 15 years of experience in leveraging technology to drive and achieve top-line and bottom-line numbers, Nitin has helped global organizations optimize value from their significant IT investments. Over the years, Nitin has been responsible for the creation of many product IPs. Prior to this role at Aureus, Nitin was the Global Practice Head for Application Services at Omnitech Infosolutions Ltd and was responsible for sales and profitability of offerings from application services across geographies.

Related Posts

Transfer Learning: A New Age of Machine Learning

In recent years, Machine Learning (ML) algorithms have advanced and are now capable of learning accurate and complex patterns provided large and labeled data samples are available. However, many ML implementations fail to generalize when new data points are encountered, especially data points with different and unseen patterns or conditions from training samples.

Trust in the Evolution of the Customer's Journey

This is part 1 of a 2-part series "Trust: The Key Ingredient for a Successful Insurance Customer Journey." Today, everyone in the business world is talking about the customer journey and experiences starting from E-commerce, banking, and many other industries. So, what is customer experience? What is new about it?

3 Ways to Target the Right Customers in the Insurance Industry

This is Part 3 of our blog series, "Data Science Use Cases in Insurance." The insurance industry isn’t the same as it was 20 years ago. It has become much more competitive as tech companies come into the picture with new and innovative ways to compete in order to gain a foothold in the insurance industry. Consumers want to save money and will make their decisions based on the lowest price available. Some websites will help the consumer compare carriers’ prices and offerings to choose the best deal. Unfortunately, this is causing insurance companies to make price their priority over quality and customer satisfaction.