Why Do Enterprises Today Need Data Cataloging More Than Ever?

May 02, 2021 |
Mamun Reza

Data Engineering, Product Marketing

Organizations need to turn their data assets into revenue and profits

The data landscape is changing and growing rapidly. Data has become the new competitive battlefield of our time. As a result, many organizations today are committed to become “data-driven.” However, leading organizations seem to be failing in these efforts.

The 2019 Big Data and AI Executive Survey published by NewVantage Partners revealed the percentage of organizations identifying themselves as being data-driven has been steadily declining from 37.1% in 2017 to 32.4% in 2018 to 31.0% in 2019. Furthermore, the survey which comprised of 64 C-level business and technology executives from organizations like American Express, Ford Motor, GE, GM, Johnson & Johnson, etc. mentioned 72% executives stated that they have yet to forge a data culture while 53% stated that they are not yet treating data as a business asset.

Organizations need to turn their data assets into revenue and profits. And the first step in any data-driven digital transformation initiative is to manage your data as an enterprise asset: take inventory of it, assess its value, and maximize its use—just like you do with any significant investments and resources.

Challenges organizations face

In today’s organizations data is diverse and distributed across many different departments, applications, and data warehouses and data lakes (some on-premises, others in the cloud), making it a challenge to know exactly what data you have and where. Further, the lack of visibility into the movement of data across the data supply chain can be overwhelming. With the increase in the number of data sources, types and formats, the data landscape becomes even more complex.

Consider the following challenges that occur due to such circumstances:

  1. Limited View of Organization’s Data — Distributed data makes it challenging to get a complete view of an organization; silos tend to present only a fragmented picture of the business activities. As a result, you end up missing valuable business insights hidden within the data.
  2. Compromised Data Integrity — Data silos inherently create data fragmentation which leads to potential loss of data quality. And such data weakness can cause analysis problems later, even costly damages. For example, the Canadian power company, TransAlta, while using spreadsheets to store, analyze, and move their data, made a simple cut and paste mistake that cost them $24 million.
  3. Data exfiltration – In a self-service analytics environment, accidental data exfiltration is high risk and can cause a loss of face for the enterprise, like when an employee leaving the Federal Deposit Insurance Corporation (FDIC) accidentally downloaded data for 44,000 FDIC customers onto a personal storage device and took it out of the agency.
  4. Increased cost — Data has a financial cost, too – storing data incurs infrastructure cost, moving data incurs transportation cost, even collecting and using data, costs. And under such circumstances, factors like data redundancy, maintenance, and duplication will demand more resources.
  5. Validating data pipelines – Compliance requirements need you to ensure that all approved data assets are sourcing data exclusively from authorized data sources and the data pipelines are not erroneously using unauthorized data. There is a growing need to know where the sensitive and personal data assets are, and how they move around the organization. The key concern is ensuring that metadata stays within the customer’s cloud organization or project.
  6. Hindered Collaboration — Data silos emerge out of the organizational silos formed from organizational separations. As these layers of separation build on top of each other, creating both cultural boundaries and technical incompatibilities, collaboration within the organization becomes more difficult.

So, how do you address these issues?

Build a single source of truth of your organization’s data – A data catalog helps you address all the above problems and beyond. A data catalog creates and maintains an inventory of data assets through the discovery, description, and organization of distributed datasets. The data catalog provides context to enable data engineers, data scientists, data stewards, data/business analysts, and other lines of business data consumers to find and understand relevant datasets for the purpose of extracting business value. A data catalog is essential to business users and decision makers because it synthesizes all the details about an organization’s data assets across multiple data dictionaries by organizing them into a simple and consumable format. Turning enterprise data into a competitive advantage requires business users to be able to easily access, understand, and use trusted, clean, high-quality data across the enterprise.

How should you evaluate a data catalog?

Before embarking on a data catalog evaluation, you must figure out what you want to accomplish with one. Data catalog tools are exciting because they can democratize data across an organization. However, data is only meaningful to decision makers if it is enriched with context, which comes from people and metadata. Connecting data to its context is the difference between making the right or wrong decisions with data. For example, when using the imperial versus metric systems and using the wrong unit definition to hang a shelf might not seem to be a big problem. However, to NASA, that gap in understanding cost $125 million in 1999.

To address such complexities more and more organizations today are embracing Artificial Intelligence (AI) and machine learning models with the power of cloud that are helping them make informed business decisions, create competitive advantage, and accelerate digital transformation. According to a leading analyst firm, there is a growing demand for cloud capabilities, connected data architectures, metadata and the automation of routine and nonroutine tasks through application of AI.

5 must-have capabilities of a data catalog

These five capabilities help ensure you can make the most of your enterprise data. 

  • Automated Intelligence: Automated processes within data catalogs that incorporate machine learning and AI help avoid manual data tagging, classification, and organization. These technologies can also leverage data usage and queries to link or assign business context to data assets at scale.
  • Data Democratization: An accessible data catalog allows even non-technical users to locate and utilize data, enabling collaborative data use across an enterprise.
  • Data Discovery and Lineage Analysis:  Two key pillars of any data catalog; together they help build trust and confidence in the data that you use to derive business insights.
  • Data Governance/Protection – Integrating an automated data governance solution with a data catalog ensures data users can access data compliantly and securely, according to their needs. So, although everyone can access the same data catalog, only users with the appropriate permissions will have access to certain data sets, thereby protecting sensitive data.
  • Metadata Curation: As organizations increasingly adopt a hybrid multi-cloud environment, a data catalog tool that can connect to and extract metadata from multiple databases (data warehouses/lakes, ETL and BI tools), is key to scaling data access in a centralized catalog.

Get the most from your data with an Intelligent enterprise data catalog

Informatica Enterprise Data Catalog (EDC) is an AI-powered data catalog that provides a machine learning-based discovery engine to scan and catalog data assets across the enterprise—across multi-cloud and on-premises. Powered by the CLAIRE® engine it provides intelligence by leveraging metadata to deliver recommendations, suggestions, and automation of data management tasks. Informatica EDC enables both business and IT users to easily discover and understand relevant and trusted data with powerful semantic search, end-to-end data lineage, automatic domain discovery, integrated data quality and profiling statistics, holistic relationship views, intelligent recommendations, and an integrated business glossary. Informatica EDC integrates with Informatica Intelligent Cloud Services (IICS) data management services that provide critical capabilities for cloud data warehouses, data lakes, and lakehouses. IICS Cloud Data Integration provides high-performance ETL, ELT, data ingestion, synchronization, and replication for multi-cloud environments. Learn how Informatica has been helping organizations achieve:

  • Self-service analytics – Generali, the third largest insurance provider in the world, wanted to democratize and organize data by establishing an enterprise data catalog to allow employees to easily discover and inventory data assets. Informatica Enterprise Data Catalog combined with Informatica Axon Data Governance helped them improve data quality and accessibility, save time on data governance and discovery and strengthen their data-driven analytics capabilities.
  • Data Governance – In order to increase data governance maturity, L.A. Care Health Plan, one of the largest publicly operated health plans in the United States, chose Informatica to help it keep data semantics consistent across the enterprise, agree on enterprise-wide definitions and terms, and develop business context around data from intake to its final destination for reporting and analytics.
  • Cloud Modernization – Informatica empowered a major broadcasting company to modernize public broadcasting by leveraging customer insights to build loyalty and improve audience satisfaction. Informatica EDC helped increase the value of customer data by making it easier to locate, understand, and re-use.
  • 360-degree view of business – AIA Singapore wanted to start a new phase in their data governance journey and gain a better understanding of their business terms, data lineage, and the quality of data at the source. Informatica Axon Data Governance integrated with Enterprise Data Catalog and Data Quality empowered them to build a complete solution for next-generation data governance.

It’s harder than ever to manage data today, as organizations collect huge volume of data from many heterogeneous sources (like ERP, CRM, streaming devices, SaaS applications, etc.) generating data in real-time with different qualities. While a basic Data Catalog can help however an Intelligent Enterprise-Class Data Catalog allows you to discover, understand and trust data across:

  • Cloud and on-premises
  • All data ecosystems
  • All business use cases

Organizations using an intelligent enterprise-class data catalog are able to scan tens of millions of records, catalog hundreds of data sources, correlate thousands of business terms in minutes (versus days or weeks) and discover end-to-end data lineage in seconds (versus days, weeks or even months).

If you want to gain end-to-end visibility, find any data with a simple search, and use data you can trust, watch this video to learn how Informatica can help: