What Is Streaming Analytics?
Streaming analytics is a form of analytics which can continuously ingest, process and analyze real-time streaming data. Data can come from various real-time sources perpetually. This enables you to take immediate action while events are still happening. Streaming analytics platforms can gather and analyze large volumes of data arriving in "streams" from always-on sources. These include sensor data, telematics data, machine logs, social media feeds, location data and change data capture (CDC) data from traditional and relational databases & data warehouses.
How Is Streaming Analytics Different from Traditional Data Analytics?
What’s the difference between streaming analytics and traditional analytics? The answer lies in when data gets analyzed. Traditional analytics follows a store-first-then-analyze-paradigm. This involves first storing data in a cloud repository like a cloud data lake or cloud data warehouse. From there, you can analyze it the data on a batch mode to derive insights. Traditional analytics is also mainly applied to data at rest. In streaming analytics, you analyze the data first in a continuous manner while the events are still happening. Then, you store the relevant data for batch analysis. This way, the streaming analytics platforms can handle the scale and constant flow of information. It also delivers continuous insights to users across the organization.
What Are the Benefits of Streaming Analytics?
The top benefits of streaming analytics are:
- Improve operational efficiencies:
The data and analytics team can understand ongoing events faster and act on them immediately. The operations team can also get access to continuous data. This enables them to track and work on KPIs to keep the operations running.
- Reduce infrastructure cost:
Streaming data analytics and ingestion cloud services can help your organization manage workloads efficiently. It does so by auto-scaling clusters to optimize costs.
- Provide faster insights and actions:
An end-to-end AI-powered streaming analytics solution can ingest any data from any source at any latency. This processes and operationalizes it continuously for faster insights to all users.
Streaming analytics extracts business value from data in motion whereas traditional analytics tools use data at rest. Organizations in every industry have some form of streaming data available. Data sources include applications, social media, sensors, devices, websites and more. Flexible tools and processes are needed to make this data accessible for use.
Real-time streaming data analytics can help trigger an alert, for example, in real-time. This can be set up using a real-time streaming extract transform load (ETL) processing pipeline. This could be helpful in a case where fraud is detected on a customer’s account. As a result, the customer gets a better service experience with faster response times. In addition, a company may discover and act on minor outages or incidents right away. Addressing them before they grow into broader, more serious issues; avoiding higher costs to resolve issues or potential harm to their reputation.
Additionally, businesses can use information derived from analytics in real time. This can identify anomalies and business changes as they occur. Examples would be a sudden spike in demand for a product or service or a defect in manufacturing. Such information allows companies to take instant action in a way they might not be able to do otherwise.
Streaming Analytics Platforms and How Informatica Integrates with Them
The streaming data analytics platform vendors typically consist of open-source Apache projects. These include Apache Spark, Apache Kafka, Storm, and Flink. It also includes proprietary vendors like IBM, TIBCO, SAS, Software AG, Amazon Kinesis, Azure Event Hub and Google Pub/Sub. Informatica’s Intelligent Data Management Cloud™ (IDMC) includes code-free and low-code streaming data processing services. These services help organizations connect to just about any open-source and commercial streaming solution. They’re enabled to ingest, process and operationalize the streaming data. This helps to run advanced analytics and AI consumption.
Informatica Stream Processing
In addition to streaming data processing services, Informatica’s IDMC cloud-native data management platform delivers more than 250 other services, as well as the CLAIRE® unified metadata and AI layer. IDMC also provides over 10,000 metadata-aware connectors, including for cloud ecosystems such as Amazon Web Services, Microsoft Azure and Google Cloud.
In addition, Informatica has partnered with Datumize to maximize IDMC’s compatibility with IoT. The platform also offers comprehensive cloud capabilities and is available on a consumption-based pricing model.
Moreover, IDMC provides a single architecture and user experience for data ingestion. Data can be ingested via streaming, batch or another way. This leads to stream processing and data integration solutions. You can see this architecture in Figure 2. Features relevant to streaming include: real-time ingestion, mass ingestion, automated handling of schema and data drift, and a Kappa messaging architecture.
What Does the Informatica Stream Processing Platform Do?
Informatica's solution for stream processing consists of several services within IDMC. The backbone of this solution comes in the form of three services: Cloud Mass Ingestion, High-Performance Messaging for Distributed Systems, and Data Engineering Streaming. Between them, these provide — respectively — streaming ingestion, high-speed messaging, and stream processing. Other IDMC services can add to this core, including Cloud Data Integration and Cloud Data Governance and Catalog. Taken as a whole, IDMC lets you ingest streaming data and move it to wherever it needs to be. At the same time, it enables you to process, transform and govern data as necessary.
Let's look at these services in greater detail. Cloud Mass Ingestion provides format-agnostic data movement and mass data ingestion. This includes file transfer, CDC (change data capture) and exactly once database replication. It also offers mass streaming ingestion from various sources. This comes with real-time monitoring, alerting and lifecycle management.
High-Performance Messaging for Distributed Systems is what the name suggests. It’s a performant messaging system boasting "ultra-low latency," targeted at distributed systems. In addition, it provides high resiliency and guaranteed message delivery. Alternatively, you could leverage Kafka or another messaging service using Informatica's connectivity options. For instance, Cloud Data Integration can gather and load batch data from Kafka directly. These have an "at least once" delivery guarantee. Cloud Data Governance and Catalog can also scan Kafka deployments to extract relevant metadata. An example would be message structure.
Data Engineering Streaming is a continuous event processing engine built on Spark Streaming. It is designed to handle big data. It supports batch and streaming data and features out-of-the-box connectivity to various messaging sources and no-code visual development. This includes hundreds of prebuilt transformation functions, connectors and parsers. You can also pipe in your code or build your functions using Informatica's business rules builder. Essentially, it allows you to enrich your streaming data in real time. This could mean improving data quality, masking sensitive data or aggregation.
IDMC also supports Spark Structured Streaming. This can be important if you want to aggregate streaming data based on event time, as opposed to processing time. Hence, you can reorder data that has arrived out of order before delivering it to your data target. It also supports Confluent Schema Registry. This can parse Kafka messages, retrieve message structure and handle schemas as they change and grow.
Moreover, you can use IDMC’s CLAIRE AI to augment your solution with machine learning. For example, you can use it to automatically detect and generate schemas. This is beneficial because it allows you to discover and rectify data (and schema) drift. Informatica also provides ready-made integration data pipelines for data science and machine learning. This helps you apply machine learning and AI models to your streaming flows.
Informatica recognizes the need to govern your streaming data. Our robust suite of IDMC data governance services is available to do so. This includes data cataloging, preparation, discovery, lineage and visualization. These are all designed with self-service and collaboration in mind. Security features like masking, authentication and access control are also available. And the Operational Insights service provides real-time job monitoring, analytics and visualization.
Real-World Streaming Analytics Examples & Use Cases
Streaming data analytics is used across a variety of industries. Here are a few examples:
Embed intelligent sensors in devices throughout the production line and supply chain. Analyzing the data from these sensors in real-time makes it easier to spot problems. That way, they can correct them before a product leaves the production line. This improves production and efficiency of operations — and saves money. For more on Industrial Internet of Things (IoT) and IoT data management, see this article.
Identify anomalous behavior and suspicious activities. Organizations can flag security issues for immediate investigation. Taking early action can minimize damage, as opposed to fixing a problem after it occurs.
Track reservations in real time. For example, a hotel might notice that there’s high availability in the late afternoon. It could then text or email special promotions to guests in their area to fill those empty rooms that night.
Improve sales in retail by combining real-time customer transaction records and historical spending patterns. This can help identify opportunities to generate real-time offer alerts.
Improve patient healthcare by collecting and batch processing bedside monitor data. This helps clinical researchers better understand and detect diseases.
- Financial services:
Fortify financial services risk management. Prevent credit card frauds by tracking customer transactions in real time. When anomalies are detected, companies can send SMS alerts for potentially fraudulent activities.
Accelerate innovation in telecommunications and improve customer experience to reduce customer churn. Organizations can capture and analyze the network connectivity and predict any potential downtime. They can then inform the customers via SMS or email.
Data Streaming in Action: Customer Success Stories
KLA is a leading maker of process controls and yield management systems. With their business growing, they needed analytics capabilities. They wanted to provide data-driven outcomes for their customers. They also needed to perform CDC in Snowflake for high performance data capture. They looked to Informatica and Snowflake to help them with their cloud-first data strategy. With Informatica Cloud Mass Ingestion, they’re able to deliver continuous data replication. When KLA moved change data into their Snowflake centralized cloud data lake, in one weekend they moved 1,000 Oracle database tables representing 12 years of historical ERP data. They also captured and integrated incremental Oracle data changes directly into Snowflake.
Ovo, which is part of Lippo Group, is Indonesia’s largest payment, rewards and financial services platform. They wanted to increase customer engagement with the Ovo ecosystem. This includes Ovo apps, Ovo points, Ovo deals and Ovo merchants. With IDMC’s data engineering streaming service, they can now stream large quantities of customer data in real-time from their apps to identify customer behavior and patterns. They can also integrate the data with customer transactions and patterns to deliver personalized real-time campaigns in less than 15 seconds.
Get Started with Streaming Analytics
Informatica’s Intelligent Data Management Cloud offers highly capable and well-integrated stream processing as part of its overall data management functionality. This enables you to ingest, process and analyze real-time streaming data. The end result improves operational efficiency, optimizes costs and provides faster insights.
Informatica ranked #2 in the Gartner® Market Share Analysis: Event Stream Processing Platforms, Worldwide 2021 Report.1
To learn more about how to implement streaming analytics, read this white paper, “Change Data Capture for Real-Time Data Ingestion and Streaming Analytics.”
Gartner is a registered trademark of Gartner, Inc. and its affiliates. Gartner does not endorse any vendor, product or service depicted in its research publications, and does not advise technology users to select only those vendors with the highest ratings or other designation. Gartner research publications consist of the opinions of Gartner’s research organization and should not be construed as statements of fact. Gartner disclaims all warranties, expressed or implied, with respect to this research, including any warranties of merchantability or fitness for a particular purpose. GARTNER is a registered trademark and service mark of Gartner, Inc. and/or its affiliates in the U.S. and internationally and is used herein with permission. All rights reserved.
1Gartner, Market Share Analysis: Event Stream Processing Platforms, Worldwide, 2021, Varsha Mehta, et al, 24 June 2022