Surfing the Mavericks of Data with AI

Last Published: Aug 05, 2021 |
John Haddad
John Haddad

GVP, Product Marketing

We all know by now that AI needs large amounts of data to increase the power of predictive analytics, but with the exponential growth in data, data in turn needs AI.

managing the big waves of data requires AI

I live a mile from the Pacific Ocean where some of the greatest athletes surf some of the largest waves in the world at a site called the Mavericks just off of Half Moon Bay in California. Surfing the biggest waves requires thousands of hours of learning through experience so that the ability to react instantaneously is fully automated through muscle memory. Similarly, managing big waves of data requires AI.

In surfing, you need to identify where the big wave is going to occur, then quickly prepare for the wave by gaining momentum and catching the lip of the wave. You need optimal balance to ride the wave and to govern your abilities in case you need to bail out of the wave to minimize your risk of injury. All in all, this requires split-second decisions that can be achieved only through intensive learning. AI for data management also requires intensive learning, but in this case machine learning is needed to manage the huge wave of data feeding your AI and analytic algorithms.

There are four major benefits of AI for data management – productivity, efficiency, data experience, and data governance. A few examples are cited in the blog “4 Ways AI Puts Rocket Fuel in Your Data Management.” Informatica has given its AI/ML engine for data management a name. We call it CLAIRE, and it’s built on a unified enterprise metadata management foundation to power much of the automation and guided user behavior across the Informatica Intelligent Data Platform. To make this more real, let’s take a quick look at CLAIRE in action. We’ll focus on improving customer engagement through customer analytics, since that is a concern that’s top of mind with many business and IT executives.

Using CLAIRE to Improve Customer Engagement

When improving customer engagement, you need to know your customer through details such as their transaction history, interactions, and behavior. You also need to know what actions would help improve their overall customer experience. To pull together a more complete view of your customers, CLAIRE uses innovative “synthesis matching” to discover non-obvious relationships and stitch together a full Customer 360 consisting of demographic, account, transaction, interaction, and unstructured data. Then it’s up to the data scientists and analysts to build analytic models that predict next-best actions that will improve customer experience.

These data scientists and analysts run hundreds of experiments in search of a valid model. Therefore, they need to be able to find and prepare data quickly. This is where an AI-powered data catalog and data prep tool are critical. With thousands of data assets across the enterprise, CLAIRE automatically discovers relationships between datasets to identify composite business entities (e.g., purchase orders, invoices, etc.). CLAIRE automatically tags data domains similar to how Facebook tags people in images. And CLAIRE identifies similar datasets using the Bray-Curtis and Jaccard coefficients. When you combine all of this information with user behavior, CLAIRE is able to make intelligent recommendations of datasets to data scientists and analysts.

Once a data scientist has built and validated their predictive model, it’s time to hand off the data pipeline that feeds the model to a data engineer. CLAIRE can assist the data engineer by automatically discovering structure in complex files such as clickstream and machine data. Customer analytics is only as good as the data that feeds the AI and predictive analytic model algorithms – and customer data can’t be used if it is not governed properly per industry regulations, data quality standards, and corporate policies.

To assist with governing data, CLAIRE assists business and data stewards by automatically linking business terms to datasets saving thousands of hours of manual effort. As policies are mapped to business terms, data quality checks are generated based on data quality rules associated with these polices. CLAIRE uses ML and NLP (natural language processing) to parse data quality requirements specified in plain text to automatically identify or generate a data quality rule.

To protect customer data and comply with data privacy regulations such as GDPR and CCPA, CLAIRE automatically identifies sensitive data. It also evaluates and scores data that in combination can identify data subjects that can be used in subject registry reporting. Furthermore, CLAIRE uses behavior analytics to detect usage anomalies and suspicious activity to avoid data breaches.

CLAIRE not only assists in data discovery, data pipeline development, and data governance, but also in deployment and maintenance through intelligent and automated DevOps. AI-powered operational analytics helps optimize capacity planning and auto-scale data management runtime resources.

In summary we see how CLAIRE helps a wide spectrum of data professionals work to improve customer engagement through AI and analytics:

  • Data analysts will find it easier to locate and prepare the data they need
  • Data scientists will gain an understanding of the data faster
  • Data engineers will find many implementation tasks partially or even fully automated
  • Business users will quickly identify data that should be subject to prescribed data governance and compliance controls
  • Data stewards will find it easier to visualize the quality of data
  • Data security and privacy professionals will find it simpler to detect data misuse, protect sensitive data, and demonstrate that appropriate controls are maintained
  • Administrators and operators will have the power of predictive maintenance and performance optimization of data management processes

As the waves of data get bigger and bigger, so does the role of AI for data management. Our vision is that CLAIRE will continue to learn from millions of existing mappings and user actions to self-integrate, self-heal, self-tune, and self-secure data across the enterprise. To learn more about CLAIRE and AI for data management, I encourage you to download and read the new Informatica white paper, “Artificial Intelligence for the Data-Driven Intelligent Enterprise.”

First Published: Apr 09, 2020