Five Industry Trends That Drive the Need for Intelligent Data Cataloging

Last Published: Aug 05, 2021 |
Maneeza Malik

The year 2020 will live in our collective memory as a challenging, disruptive and for countless families and individuals, a devastating year. Like many of my fellow citizens, I exhaled a sigh of relief as the new year began. While we are not out of the proverbial woods yet, I feel an innate sense of optimism that we will finally turn a corner in 2021. I feel there is much to be hopeful for as COVID-19 vaccines are more widely distributed and administered this year.   

As a Leader in the Gartner 2020 Magic Quadrant for Metadata Management Solutions [1], we witnessed a number of trends unfold in 2020 that we believe will gain further momentum in 2021.  For instance, we saw a substantial increase in digital transformation initiatives across industries and geographies.   Enterprises of all types and sizes are on an accelerated digital transformation trajectory with a greater sense of urgency. According to IDC, over 50% of all IT spending will be allocated to digital transformation initiatives by 2024. Moreover, digital transformation investments will grow at a CAGR of 15.5% from 2020 to 2023 and are expected to reach $6.8 trillion as enterprises become digital-at-scale businesses.[2]

Trend #1: “Data Literacy” at Scale

Many of the enterprises we engaged with indicated that data cataloging was at the core of their digital transformation strategy. In fact, data cataloging underpinned every single strategic use case – from democratizing the use of data and enabling self-service analytics and AI at scale to enabling enterprise-wide data governance and compliance initiatives as well as accelerating the migration to a multi-cloud environment.   We expect this trend to continue at a faster pace in 2021.

As expressed by many, the need to become “data literate” is greater than ever before. The “new normal” for enterprises will not only revolve around how well they operate in a highly complex data landscape while remaining agile, responsive and competitive but also how well they support a remote workforce.

Becoming “data literate” will require enterprises to have full visibility into their data. It will require knowing what data they have and where it resides. Enterprises will need the ability to easily and rapidly discover their data and have a detailed understanding of it right down to data quality, age, privacy and governance constraints, who owns it, whether the data is certified for use – all coupled with continuously fostering a culture of collaboration on data. According to Gartner, “By 2023, organizations that promote data sharing will outperform their peers on most business value metrics.”[3]

Trend #2: Data Science and AI Go Mainstream

We believe that 2021 will be a pivotal year for data science and AI adoption. It is estimated that 76% of enterprises are prioritizing AI and machine learning in their 2021 IT budgets.[4]  The use of machine learning is increasingly becoming ubiquitous as virtually every enterprise application and business process will leverage some level of embedded machine-learning algorithms or robotic process automation (RPA) to enable hyper-automation and intelligence at scale.

With increased automation and integration across business processes and throughout the data, analytics and AI pipelines (from data ingestion, data cataloging, data governance, data quality and privacy, data preparation, data modeling, analytics and data visualization), enterprises will be in a much better position to truly extract value from their AI and data science investments. Moreover, they will be able to extend data science and AI to a broader community of users within their organizations (expanding beyond a small pool of data scientists to include citizen data scientists).   

At Informatica, we understood early on the importance of machine learning and an integrated approach to data management. For instance, the Informatica Enterprise Data Catalog, powered by the metadata-driven intelligence in the Informatica CLAIRETM AI-engine, leverages advanced machine learning algorithms to automate and add intelligence to a host of data curation and metadata management tasks (from data discovery, identifying data similarities, and making data recommendations to business term associations). 

An ever-growing number of our customers, from large to mid-size enterprises, use Enterprise Data Catalog to empower their data science teams with rapid data discovery and collaboration on data.  Users are leveraging the catalog to operationalize a host of self-service analytics and AI initiatives that range from Customer 360, fleet optimization, and risk management to drug discovery, among others.

Trend #3:  From One Cloud to Multi-Cloud

The rapid adoption of cloud is clearly the new normal for enterprises both as a strategic initiative in order to reduce costs, drive innovation and achieve greater agility and efficiencies but also to ensure business continuity. According to IDC, by the end of 2021, 80% of enterprises will have migrated to the cloud.[5] Moreover, it is predicted that 93% of organizations will enforce a multi-cloud strategy in 2021 and beyond in order to reduce dependency on a single vendor, prevent vendor lock-in, and to mitigate long-term risks.

While a multi-cloud strategy has its share of benefits, it further increases the level of complexity. Enterprises will have to navigate managing multiple data sources across clouds and on-premises environments with the potential risk of creating a new set of data and technology silos. 

To reduce some of this complexity, enterprises will look to leverage an intelligent data catalog that is cloud-agnostic. For instance, Enterprise Data Catalog enables users to easily discover the data they have regardless of where it resides. With end-to-end data lineage and impact analysis capabilities, users can gain in depth understanding of all the transformations their data underwent at a granular level throughout the data lifecycle from source to target across on-premises and multi-cloud environments. Additionally, it allows enterprises to maintain a holistic view of their data with a single data catalog versus having to maintain, reconcile and toggle across multiple data catalogs that are cloud specific.

Trend #4:  Advanced Data Lineage Is the New Norm

Data lineage is key to driving successful data-driven business transformations such as enterprise-wide data governance and compliance, data warehouse modernization and self-service analytics and AI. However not all data lineage tools are created equal. Enterprises will increasingly look for advanced data lineage capabilities in 2021 that allow them to extract deep data lineage with detailed information at a granular level from a host of data sources in order to build a comprehensive data catalog.

The need for data lineage extraction that is both deep and broad will only increase as the enterprise data landscape gets more complex. Moreover, in many regulated industries, users will need granular understanding of their data to support data science and AI initiatives as well as meet stringent regulatory compliance stipulations.

Trend #5:  MLOps Takes Off  

MLOps is a methodology and set of best practices, inspired by DevOps, that promotes greater collaboration and communication between data scientists, citizen data scientists, data engineers, data stewards, and various operation professionals to help manage the development and production of machine-learning pipelines throughout their lifecycle.  It is anticipated that MLOps is positioned to be a game-changer in operationalizing AI at scale and it will likely gain steam in 2021. According to Gartner, “By the end of 2024, 75% of enterprises will shift from piloting to operationalizing AI, driving 5X increase in streaming data and analytics infrastructures.” [6]

Given that any AI (machine and deep learning) initiative is heavily dependent on the use of quality data to build accurate models, we expect intelligent data catalogs to play a pivotal role in empowering MLOps teams to rapidly discover, validate, and collaborate on the data they need to build robust, iterative and continuous machine learning pipelines.

To learn more, I would like to invite you to visit the Informatica Enterprise Data Catalog web page. You may also wish to read a couple of our latest eBooks: Drive Your Business Forward with a Catalog of Catalogs and Extract Value from Your Data with AI-Powered Data Discovery.

Sources:

  1. Gartner, Magic Quadrant for Metadata Management Solutions, Guido De Simoni, Mark Beyer, Ankush Jain, Alan Dayley, 11 November 2020
  2. IDC FutureScape: Worldwide Digital Transformation 2021 Predictions
  3. Gartner, Data Sharing Is a Business Necessity to Accelerate Digital Business, Lydia Clougherty Jones, 3 December 2020
  4. Forbes: 76% of Enterprises Prioritize AI and Machine Learning in 2021 IT budgets
  5. Data Center Frontier: IDC Sees Enterprise Cloud Shift Accelerating in 2021
  6. Gartner, Top 10 Trends in Data and Analytics, 2020, Rita Sallam et al., 11 May 2020
First Published: Mar 21, 2021