There is a story I’ve always liked related to the invention of chess. The reward the inventor requested from his emperor for coming up with this fascinating game was…grain. But he asked for it to be calculated in a unique way. He requested that one grain of rice be placed on the first square, then two on the second square, doubling each time until all 64 squares were covered. The emperor and his team realized the enormous futility of the task before even counting out grains of rice for half the squares on the board. Furious at being made a fool of, notwithstanding the quick education on geometric progression, the emperor had the inventor beheaded.
However oblique the analogy, we are facing a similar situation with the exponential growth of data. It is estimated that overall accessible data doubles every 12-18 months, and the data in the cloud is expected to grow 10x in the same period. All this data needs to be processed, made usable, and trustworthy while adhering to governance policies. Adding to all this is the requirement to move quickly and respond to changes in businesses strategy and processes. The effort involved in preparing the data for digital transformation initiatives has gone up in complexity, in step with the amount of data growth. By some estimates, there is a need to have close to one million additional data science professionals by next year to tackle this immense and growing workload. If you add the number of data engineers, data stewards, and other data professionals needed then the number of data professionals skyrockets into tens of millions.
We cannot solve these challenges by continuously throwing more engineers and developers at the problem -- this is not an issue that can be solved at linear, human scale. Traditional approaches are riddled with inefficiencies. Projects are implemented in silos with little end-to-end metadata visibility and limited automation. There is no learning, processing is expensive, governance and privacy steps are repeated over and over again. So how can organizations move at the speed of business, enable self-service, better serve their customers, increase operational efficiency, and rapidly innovate?
This is where AI shines. AI can automate and simplify tasks related to data management – across discovery, integration, cleansing, governance, and mastering of data. Machine learning methods can learn and take over mundane, repetitive tasks, freeing developers and users to work on high value, innovative projects. AI improves data understanding, and identifies data privacy and quality anomalies. AI is a perfect partner to developers, analysts, stewards, and business users, speeding up tasks through automation and augmentation with recommendations and next-best actions.
AI is most effective when you think about how it can help you accelerate end-to-end processes across your entire data environment. That’s why we consider AI an essential part of a System Thinking approach to data management. With System Thinking, you have an overall framework for data-driven digital transformation. And as you start implementing this framework, you’ll start to see major benefits.
In general AI benefits data management teams in four major ways: Improving productivity of data professionals, improving efficiency of operations, providing a more intelligently guided data experience and deeper understanding, and speeding up data governance processes. Below are a few examples to show what’s possible today.
A recommender system for data integration helps data engineers rapidly build mappings to extract, transform, and deliver data. The recommender learns from existing mappings, it understands the business content of databases and file systems, and suggests appropriate transformations for standardizing and cleansing data before delivering to target systems and data consumers.
In a typical enterprise, thousands of data integration processes run every day. The monitoring of these processes is largely passive with administration tools just logging time taken, CPU and memory consumed. AI can learn from historical values of time-series data in log and monitoring files and proactively flag outlier values as well as predict issues that may occur if not handled ahead of time.
When a real-world entity (e.g., Patient Record or Sales Order) is stored in a database or a set of files, its data gets shredded and distributed into multiple tables or files – optimizing for storage and performance. AI can detect relationships among data and reconstitute the original entity quickly. Users don’t have to remember or look up outdated documentation on primary-key/foreign-key relationships and manually join the various datasets by hand. Furthermore, AI can identify similar datasets and make recommendations based on usage patterns, data quality, and crowdsourced collaboration.
A common but tedious step in data governance is to associate business terms to physical data elements to establish business context and relevance for data elements and make data understandable to users. AI, in many cases, can automatically link business terms to physical data using a combination of natural language processing (NLP) techniques and business type identification. This can dramatically reduce the drudgery of this error-prone task.
These are just a few examples of the many ways Informatica applies AI and machine learning techniques to propel data management. AI helps us tame the exponential growth in data volume and complexity. Unlike the emperor we don’t have to, however symbolically, lop off the head of the inventor. With AI we can unleash the transformative power of data.
Read more about System Thinking in our blog series: