C01-potential-at-work

Turn to CDC for real-time data

If you are looking to extract value from data, remember that big data is still too slow and unstructured for real time.

paw2_dev_turncdc_656x370

CDC is effective because it flags only the data that has been changed or updated. As a result, smaller batches of data can move more quickly through the system on an as-needed basis.

In the worlds of business and technology, almost everything is up for debate. But one thing is clear and irrefutable: Smart organizations make decisions based on data. As data volumes grow in size and complexity, the demands for processing that data grow as well. So how do you ensure that decision makers have real-time data when they need it?

Unfortunately, many traditional extract, transform, and load (ETL) tools can fall short when it comes to providing real-time updates. They can become a challenge when entire batches of data have to be staged, updated, and moved. The issue is further complicated by installed bases of legacy applications that often lack data creation and change dates.

Capturing change data

One solution is change data capture (CDC), an approach to data integration based on the identification, capture, and delivery of changes made to enterprise data sources. Otherwise known as event-based data integration, CDC is not new. But it is effective in that it flags only the data that has been inserted, updated or deleted. As a result, smaller subsets of data can move more quickly through the system on an as-needed basis.

CDC's benefits include:

  • Improved IT responsiveness to business needs
  • Enhanced business agility
  • Reduction in IT costs through lower resource usage

CDC also benefits vertical industries such as financial institutions, manufacturing companies, and health insurance companies that rely on real-time information and routinely run large batch jobs. If changes are flagged by CDC, queries are done only on the changed data rather than the entire batch. Otherwise, the amount of data queried would prohibit real-time reporting.

CDC not a big data best bet

Just as there are areas where CDC shines, there are also circumstances where the data is too unwieldy to capture. Many people point to big data as an untapped resource that will become valuable once the processes and technologies are in place to harness that data. But big data consists largely of unstructured or semi-structured datasets from mobile devices, social networks, log files, machines, and Web-based applications.

CDC loses its effectiveness when data changes unpredictably and invisibly, as it does in flat files. Contrast this with the structured data that exists in databases and data warehouses. The good news about structured data is that it’s just that—structured. As a result, changes are easy to detect by CDC. Without structured data, developers will need to use “tail” and named pipe files to flow only the changes for efficient big data processing.

Further explore how IT departments and developers can tap into CDC to deliver up-to-the-minute data in "Informatica Big Data Edition: Change Data Capture: Driving Results with Event Driven Data."

Related content

dev3_old-school-technique-for-new-school-big-data-656x370.jpg

An old-school technique for new-school big data

Use this obscure technique to realize the benefits of big data while avoiding flat file bottlenecks.

paw2_dev_profiletest_656x370.jpg

Profile, test, measure, improve, repeat

Learn from your own experience, and Disney's, by leaving nothing to chance and everything to data.

paw2_dev_sayhello_656x370.jpg

Say "Hello World" to Hadoop

Understand the inherent risks and complexities that an innovative solution can bring to already complex big data efforts.

cc03-lexis-nexis.png

Lexis Nexis

Informatica gives Lexis-Nexis the power to shape the world with information and technology.