All Posts

Stream-based Data Integration – No Formatting Required

One of the challenges insurers face when implementing any new cloud-based application into their workflow is the integration of both internal and external data. Gaining access and permission to use internal data can be the first hurdle. Adding the requirement to format the data in a specific format can be a show-stopper.

Stream-based data integration eliminates the need to format data in a specific way regardless of the source and is also a key component for enabling real-time predictive analytics capabilities.

What is Stream-based Data Integration?

Stream-based data integration is the ability to access and query data within a very short time after the data is captured. It allows data to be analyzed almost immediately – no waiting for batch processing to complete to get your data. Stream-based data integration, or stream processing, allows you quick access to time-sensitive information.

Stream-based data integration has emerged as a preferred method for data scientists and analysts to manage and manipulate data in complex environments that involve a variety of data stores, integration methodologies, and processing needs.

Stream-based data integration enables organizations to use data in real-time to: 

  • Improve predictive analytics
  • Extend distribution channels
  • Modernize aging systems – refresh and extend legacy applications

Stream-based data integration is required to generate analytics results in real time. By building data streams, you can feed data into analytics tools as soon as it is generated and get near-instant analytics results.

Fraud detection is an excellent example of where stream-based data integration is commonly used. When claim transaction data is streamed, fraudulent activities can be detected in real time allowing these transactions to be investigated quickly and before they are completed.

How Does it Work?

All data that comes in is captured and stored, even if it is not being used immediately (we can always find a use for it in the future). As the data comes in, it is processed and analyzed. The results are then examined, and alerts are given when a certain threshold or criteria is met.

In an upcoming blog, we will elaborate in more detail how the variables and events from data streams are configured, identified, and stored for use by predictive analytics. 

Three Types of Data

The insurance industry produces an enormous amount of data on a daily basis that contains valuable information for predictive analytics. This data is categorized into three types of data:

1. Structured Data

This is clean, organized data that can be easily searched by search engines. The format is clearly defined and adhered to. It contains fields such as dates, numbers, and a predefined list of values. The rules are defined in the database, and adherence is pretty much guaranteed. This allows for little to no surprises when working with the data. Structured data is the easiest type of data to work with because it’s very predictable; it also represents a very small percentage of all data.

Keep in mind that just because data is structured, that doesn’t mean it’s simple. There are hundreds of structured systems, each with thousands of tables and hundreds of thousands of data structures. The complexity of data goes beyond the relatively simple structure and format of the data to the semantics of what each data value actually means and how it relates to other values.

2. Semi-structured Data

Semi-structured data defines the data format and its meaning and mandates that the application understands how to handle the data. This data cannot easily be analyzed but will most likely have metadata that can be analyzed. It may be formatted according to certain rules that can vary depending upon what data is provided. For example, image or x-ray files contain mostly pixels, and some words, but will most likely have metadata that will provide more insight.

When semi-structured data is exchanged between organizations, the data will be in XML/JSON format with metadata tags that can be analyzed. Below are some examples of semi-structured data in the insurance industry when in XML/JSON format:

  • ACORD (Association for Cooperative Operations Research and Development)
  • HIPAA (Health Insurance Portability and Accountability Act)
  • SWIFT (Society for Worldwide Interbank Financial Telecommunication) files

3. Unstructured Data

Unstructured data cannot be easily searched and doesn’t follow any specified format. Handling semi-structured and unstructured data is where Big Data comes in. This data is very complex and challenging to process. 

Here are some examples of unstructured data:

  • Voice messages
  • Claim diary notes
  • Surveys
  • Underwriter notes
  • Email
  • Call Center Logs

Insurers also use a large amount of unstructured data from external sources such as:

  • Medical records
  • DMV reports
  • Social media posts

Sources such as emails contain specific information such as the sender, recipient, subject and body text. The content in the body of the email is generally more difficult to extract in an ordered fashion. Until relatively recently, processing unstructured data wasn’t cost-effective due to the difficulties involved. Now, with natural language processing (computer understanding of normal human language), data integration technology can understand the sentiment or other ideas contained within the body of emails or social media communications.

Where is stream-based data integration used? 

Stream-based data integration has become more and more popular and is being used in all areas of business.

Financial Services: Real-time risk analysis, as well as monitoring and reporting of financial securities trading, is a huge help to this industry. It can also calculate foreign exchange prices.

Insurance: Insurance fraud is a growing epidemic, and stream-based data integration is an effective way to detect and deter it. It can identify and raise alerts on suspicious activity to give the insurer near real-time information when specific events occur.

Marketing: Real-time marketing can create advertisements and offers tailored to specific geographical areas or situations unique to a customer.  

Retail: Many leading retailers are now using streaming data to have up-to-date data on their inventory and even their sales patterns at a particular store. This allows companies to respond to their customers’ buying patterns at unprecedented speed and granularity.

Supply Chain and Logistics: The ability to track shipments in real-time and detect and report on potential delays in arrival is standard expectations these days from consumers. It can also control stocking levels based on a change in demand and shipping predictions. 

The Benefits of Stream-Based Data Integration

In the insurance industry, it almost always comes down to whether or not there is business and IT resources available to work on the project.

1.  No reformatting of data is required

Very often, the data you need is siloed in different areas of the business and IT departments. Obtaining access to all the different data sets within the organization can be challenging. The fact that the data does not have to be reformatted is a benefit because the business areas don’t have to spend precious extra time in reformatting or organizing the data.

2.  Start with a two or three data sources as opposed to ten

With this approach, you can start with just a few data sources and add on over time. You don’t need to have the perfect structure at the beginning.

3.  Stream-based data integration is more practical and easier than traditional approaches

Traditional approaches such as data warehouses are not ideal for enabling real-time predictive analytics capabilities, as they use the following approach for data ingestion:

  • Fixed target structure in which to ingest data
  • Source data is transformed to fit into the target structure
  • Any 'alien' data is just ignored and dropped
  • Unstructured data is sparingly allowed
  • Reporting and analytics are run on top of this target structure
  • Structure is reviewed periodically for a change in definitions.

Stream-based data integration allows for organizations to absorb data without defining it enabling them to discover the structure and value of data on an ongoing basis.


Stream-based data integration eliminates the need to format data in a specific way regardless of the source and is also a key component for enabling real-time predictive analytics capabilities. Stream-based data integration also supports structured data, semi-structured data, and unstructured data.

In addition to the insurance industry, stream-based data integration has been deployed in financial services, marketing, retail and supply chain and logistics.

Organizations can start predictive analytics projects with only one or two data sources as opposed to waiting to have the perfect structure in place.

Stream-based data integration allows for organizations to absorb data without defining it enabling them to discover the structure and value of data on an ongoing basis.

Interested in learning how Aureus can help you understand and utilize stream-based data integration and how it can help you understand your customer's journey? Click on the link below to get more information.

More Information


Nitin Purohit
Nitin Purohit
Nitin is CTO and co-founder at Aureus. With over 15 years of experience in leveraging technology to drive and achieve top-line and bottom-line numbers, Nitin has helped global organizations optimize value from their significant IT investments. Over the years, Nitin has been responsible for the creation of many product IPs. Prior to this role at Aureus, Nitin was the Global Practice Head for Application Services at Omnitech Infosolutions Ltd and was responsible for sales and profitability of offerings from application services across geographies.

Related Posts

Data and Innovation: 2 Sides of the Same Coin

As we set our feet in 2023, having experienced a roller-coaster ride last year thanks to the geopolitical tensions and some lingering rub-off effects of COVID-19, it drives home that "change is the only constant." Like any other industry, insurance is undergoing paradigm changes at different levels, whether recruiting potential candidates or customer onboarding, to name a few. However, a common thread that ties the myriad business functions of an insurance company has been data and innovation. There has been an ever-increasing need for insurance providers to use data and embrace innovation in their routine activities, eventually to stand the cut-throat competition.

Intelligent Risk Assessment in Insurance

Risk Management is a core function within the insurance industry. It is a vital responsibility of the underwriting team. Insurance companies collect data scattered across different business units in various formats – some of which are paper and digital, most of which are typically unstructured. The underwriting team doesn't have immediate access to the information required for internal and external decision-making, resulting in delays in making decisions and costly mistakes.

Why Does the Long-term Nature of Life Insurance Products Make Customer Retention Difficult?

Most insurers offer similar products and services, which makes it challenging to attract new customers and retain them. As an industry, insurance is low-touch, and insurers seldom interact with their customers. A report shows that the top companies have an average customer retention rate of 93 - 95 percent, while insurance companies have an average of 84 percent.