aureus-insights_logo

All Posts

Cheers to Stream-based Data Integration...and to Never Losing a Byte!!

Enterprise applications belong to a vibrant ecosystem and consequently the data they generate is large and varied. Enterprises both benefit and suffer from this nature of application and data.Whenever a new application is to be deployed in an enterprise that integrates with the applications in the ecosystem, the precondition is an 'expansive data definition with referential value' on day 1 to start integration. Traditionally, this approach to data integration involves identifying a target data structure, and force fitting data from all sources into it. This is done to ensure a 'seamless' integration - never mind the loss of data considered irrelevant.

Data Warehouse Approach

Traditional data warehouses use the following approach for data ingestion:

  • Fixed target structure in which to ingest data
  • Source data is transformed to fit into the target structure
  • Any 'alien' data is just ignored and dropped
  • Unstructured data is sparingly allowed
  • Reporting and analytics is run on top of this target structure
  • Structure is reviewed periodically for change in definitions.

So how do we absorb data without defining so that we can discover the structure and value of data on an ongoing basis?

Benefits of Stream-based Data Integration

In the real world, data gets collected or elicited through events - enterprise-initiated or customer-initiated. The data collection approach from transactional systems to analytical systems can follow a similar approach if event data can have a variable structure. This works best if data flows as streams from multiple sources. This approach is called event-based data modelling using data streams.

  1. Unit of integration is a data packet which contains a series of connected name-value pairs.
  2. Data packets flow into the data ingestion environment into one or more target streams. The packet is considered for processing on a stream if the minimum variables required for a stream are present in the packet.
  3. Each data packet can provide foundational information for multiple different events.
  4. Data is stored as events; data for the same event can be provided incrementally.
  5. The variables in a packet that are unknown to the data ingestion environment are not ignored or dropped Instead, they are retained at all events that are identified from the data packet.
  6. Since events are real world concepts, they form an excellent foundation for analytical models targeted towards behavioral outcomes.
  7. Data ingestion can start very quickly with conformance to a minimum set of variables required per stream.
  8. Adjunct variables can be discovered after ingestion and used for analytical value
  9. Works very well with the real-time paradigm in which the businesses of today compete.

In consumer-based enterprise businesses where relationships are long term and are influenced by experience, the following streams are essential:

  • Customer
  • Relationship
  • Transactions
  • Interactions

These streams abstract the natural structures of the processes that govern the operations of the business. And to top it all, it can be kick-started quickly and improved upon in an on-going basis.

Advantages of Stream-based Integration

The following are some of the technical advantages of stream-based integration:

  1. Cloud Technologies: applications like Salesforce hosted on the cloud can be connected to via Native API or integration APIs like MuleSoft by using adapters.
  2. Legacy application integration: legacy applications that allow connectivity via Message Queues or flat files can also integrate in a stream-based environment
  3. Batch-base upload: data which is available after EOD processing or is available from external systems as flat files can also integrate on streams
  4. Real-time integration: enterprises environments with ESD can easily connect in real-time to the web service end-point of the related stream. Also, applications that can call web services can also connect at real-time.

Conclusion

If you want to see analytics results in real-time, then stream-based processing is the way to go. A stream-based approach to data integration preserves the sanctity of data through its life cycle. Also, it fits in easily with many different environments or sources such as the cloud, legacy applications, batch uploads and real-time integration.


Download the Aureus Analytics whitepaper "Data Integration with CRUX" to learn how data stream-based integration is used to bring together multiple datasets in real-time.

Download

 

Nitin Purohit
Nitin Purohit
Nitin is CTO and co-founder at Aureus. With over 15 years of experience in leveraging technology to drive and achieve top-line and bottom-line numbers, Nitin has helped global organizations optimize value from their significant IT investments. Over the years, Nitin has been responsible for the creation of many product IPs. Prior to this role at Aureus, Nitin was the Global Practice Head for Application Services at Omnitech Infosolutions Ltd and was responsible for sales and profitability of offerings from application services across geographies.

Related Posts

Data and Innovation: 2 Sides of the Same Coin

As we set our feet in 2023, having experienced a roller-coaster ride last year thanks to the geopolitical tensions and some lingering rub-off effects of COVID-19, it drives home that "change is the only constant." Like any other industry, insurance is undergoing paradigm changes at different levels, whether recruiting potential candidates or customer onboarding, to name a few. However, a common thread that ties the myriad business functions of an insurance company has been data and innovation. There has been an ever-increasing need for insurance providers to use data and embrace innovation in their routine activities, eventually to stand the cut-throat competition.

Intelligent Risk Assessment in Insurance

Risk Management is a core function within the insurance industry. It is a vital responsibility of the underwriting team. Insurance companies collect data scattered across different business units in various formats – some of which are paper and digital, most of which are typically unstructured. The underwriting team doesn't have immediate access to the information required for internal and external decision-making, resulting in delays in making decisions and costly mistakes.

Why Does the Long-term Nature of Life Insurance Products Make Customer Retention Difficult?

Most insurers offer similar products and services, which makes it challenging to attract new customers and retain them. As an industry, insurance is low-touch, and insurers seldom interact with their customers. A report shows that the top companies have an average customer retention rate of 93 - 95 percent, while insurance companies have an average of 84 percent.