How Data Catalogs Deliver Data Intelligence
What is a Data Catalog?
A data catalog is a tool that helps data users assess which data assets are available and provides relevant information about that data. Data catalogs help you identify and organize information about your data:
- The source and origins of the data (data provenance)
- The lineage of the data
- The data's classification
- The location of the data
Make Better Decisions with a Data Catalog
A data catalog is a lot like a menu at a restaurant. Before you select and order a meal, you browse the menu to learn about the available options (with details about those options). A data catalog works in a similar way.
If you're a discerning diner, you might want to know more before you place your order. You might want to know where the food originated, how to avoid certain ingredients or clarify the method of preparation. A similar thing happens in organizations as they become more data driven. Data consumers want to understand their organization's data. They want to ensure that they can trust the asset before "consuming" it.
And much like their dining counterparts, modern data consumers need to trust the data because they understand that “you are what you eat.” Data consumers know that their outcomes depend on the data consumed.
As organizations continue toward their goal of data-driven innovation, they need to depend on reliable data. A data catalog can help. Users can use a data catalog as a place to collect, index, relate, annotate and consume information about data assets. Data catalogs offer users a single, centralized source of data that’s accessible, trusted and consistent. And that makes it easier to maximize the value of your data.
You can use an AI-powered data catalog to automate data discovery and data classification. Automation lets you create a relevant and trusted view of your data. This view enhances collaboration and leads to better, more intelligent decision making. It also helps to provide contextual meaning and insights through derived metadata.
Why Do You Need a Data Catalog?
Your data has the potential to change your business. Data-driven, actionable insights are essential to a company’s ability to compete. But if you don’t know what data you have, where it’s located or understand its lineage, it’s difficult to get full value from it. A data catalog helps minimize the frustration and inefficiency associated with data silos. Instead, users throughout the company can use the data catalog to locate the data they need to achieve critical business outcomes.
Some ways that a data catalog can benefit your organization:
- Improve collaboration and trust at scale as part of a data governance program
- Fuel data analytics and digital transformation initiatives with trusted data
- Provide a foundation for analytics with a better understanding of data quality
- Establish a basis for data democratization and data sharing in a data marketplace
- Accelerate data and analytics governance with a faster time to value
Data Catalog Benefits
Data catalogs let data and business analysts scan and catalog data assets across the entire enterprise. This helps them find the most relevant data — and, ultimately, turn that data into insights driving efficiency, growth, and profitability.
- Cloud modernization
- Data Governance
- Self-Service Analytics
- 360-View of Data
Cloud Modernization: Migrate Data Warehouses to the Cloud Without Breaking the Business
A data catalog is your best tool for providing the context and curation you need to create a well-informed migration plan:
- Where the data lives
- The sources and origin of the data (data provenance)
- Understanding where it comes from and how it’s moved
- How the data is used and transformed
- Whether the data is trustworthy and fit for use
Your organization can use this information to identify the most relevant and trustworthy data. You can also determine who relies on what data to get their work done. Cataloging and tracing the complex relationships of your data lets you perform impact analyses. These analyses allow you to understand the downstream effects of migrating workloads and data sets to the cloud before you begin. This helps identify critical data sets and better understand risks. Learn more about modernizing and moving data warehouses to the cloud in the eBook, “Accelerate Your Cloud Journey with an Intelligent Data Catalog.”
Data Governance: Use Your Data with Confidence
Data governance is a set of principles, standards and practices that ensures your data is reliable and consistent. Governance allows you to trust the data to:
- Help drive business initiatives
- Inform decisions
- Power digital transformations
A data catalog helps identify critical data elements that need governance. You can then use end-to-end data lineage to inform, define and enforce governance policies.
Much of the traditional need for data governance stemmed from regulatory and legal requirements. Risk management and policy compliance still play a significant role for companies in some industries. But today’s organizations also use data governance to deliver trusted data for other business needs.
A governance rule can be any practice that the organization wishes to follow or enforce. Governance usually codifies policies, procedures and best practices such as:
- Where you can store certain types of data
- What applications can use the data
- Preferred methods for data protection (such as encryption and password strength) to reduce risk
- How to back up data
- Who can access data (and under what conditions)
- When you should destroy archived data
A successful data governance framework and program enables you to scale for growing volumes of data and adapt as technologies evolve.
Self-Service Analytics: Make It Easy for Data Users to Turn Data into Business Insights
Intelligent data cataloging empowers everyone who uses your data. Data catalogs make more data visible and understandable, and enable self-service access. An intelligent data catalog offers end-to-end visibility into data sources and lineage. This allows business users and data analysts alike to locate relevant and trusted data. Data analysts, engineers and scientists can use intuitive, cloud-based analytics tools. Business users, data architects and stewards can have more confidence in the data — without bottlenecks. This self-sufficiency delivers greater productivity and user satisfaction.
Holistic View of Data: Realize Value Faster
Today’s organizations benefit from a holistic view of their data. They can:
- Enhance customer experience
- Avoid or minimize disruptions in their supply chain
- Drive more value in digital commerce
- Provide insightful financial planning and analysis
In the past, organizations have usually stored data across myriad departments and systems. The results of this approach? Data that’s incomplete, inconsistent, duplicative and fragmented. A data catalog changes all that. It delivers a comprehensive understanding of your data. You know what you have, where it’s coming from, how it’s related to other data, how it gets used and more. The result: an organization that’s more agile, resilient and competitive.
An AI-powered intelligent data catalog gives your data assets meaning and contextual relevance. This intelligence allows your organization to:
- Extract technical metadata from a wide variety of structured and unstructured sources. These can include databases, data warehouses, cloud data stores, applications, legacy systems and more.
- Extract the most granular metadata and track data dependencies across data sources. You can use advanced metadata scanners to run end-to-end data flow analysis. Some multivendor ETL tools also let you extract metadata and lineage from proprietary third-party systems.
- Integrate with a business glossary to create business context for cataloged data assets. The right tool can easily and automatically import and link business terms and definitions. You can capture complex relationships among data assets, and discover non-obvious relationships. A metadata knowledge graph continuously updates all metadata — structural, semantic and usage — as data flows through an enterprise.
- Discover and classify data assets using both AI-driven and human-assisted curation A data catalog can accelerate discovery, classification and detect similarity between data assets.
- Capture the complex relationships among data assets and discover non-obvious relationships The knowledge graph continuously updates all data and metadata — structural, semantic, usage — flowing through an enterprise. The inference engine deduces user intentions and provides recommendations on the best way to consume data for each use case. This combination ensures that trusted data gets to where it’s needed at the right time.
Types of Data Catalogs
Data catalogs have emerged as a foundational need for modern data-driven organizations. As a result, they have become ubiquitous in data management solutions. But not all data catalogs are created equal. Some are part of other tools, such as business intelligence tools (like Tableau) or analytics tool (like Databricks). These data catalogs only scan and catalog datasets and reports within that environment for a limited use case. The large cloud platform providers offer their own data catalogs as well. Microsoft Azure, Amazon Web Services (AWS) and Google Cloud Platform (GCP) offer catalogs focused on their own cloud ecosystems. These data catalogs do not cover data that resides on-premises or in other cloud ecosystems, often promoting vendor lock-in.
True enterprise-scale data catalogs have the ability to scan, catalog and inventory data of all types, across all data sources and across cloud and on-premises. This allows them to remain more platform- and vendor-agnostic. Some enterprise-scale catalogs take it a step further, offering the ability to scan and ingest metadata from other data catalogs. Because these solutions act as a “catalog of catalogs,” they can provide a comprehensive metadata system of record for the organization.
Data Catalog vs. Data Dictionary vs. Business Glossary
A data catalog, data dictionary and business glossary are related but distinctly different concepts. A data catalog is a tool that scans, indexes and catalogs data across an organization to provide a searchable inventory of all data assets. Data catalogs play a critical role in simplifying discovery and understanding of data.
A data dictionary provides technical documentation about the data and metadata that you have in your databases. Technical users refer to a data dictionary when they need to fully understand the data in databases.
A business glossary contains the definitions of commonly used business terms in the organization. The business glossary typically serves as the single authoritative source for concepts and definitions of commonly used business terms. Associating business glossary terms and their definitions to physical datasets helps add valuable business context to the data.
Data Catalog Examples: Customer Success Stories
From healthcare and insurance companies to engineering and construction firms, companies are using intelligent data cataloging tools as a foundation for transformation.
Streamlining Data-Driven Insights at Point72
Point72’s mission is to be the industry’s premier asset management firm, inventing the future of finance. A focus on data-driven insights helps drive customer value. They’re changing how they develop their people and how they use data to shape their thinking. It all starts with having the confidence to find, understand, trust and access reliable data to enable analytics programs.
Their “catalog of catalogs” approach has cut their data discovery time by as much as 75%. They’ve connected their Informatica data catalog with Databricks metastore to streamline metadata-driven insights. They now spend more time accelerating analytics and AI with faster data discovery and data preparation. Meaning they can spend more time analyzing data, driving new business outcomes and mitigating risks, and less time searching across disconnected data sources.
Breaking Down Data Silos at HelloFresh
As a meal-kit pioneer, HelloFresh is changing the way people eat. Providing access to high-quality data for all HelloFresh employees is critical to helping them deliver step-by-step recipes and fresh, affordable, pre-portioned ingredients right to customers’ doors. They needed to establish new standards for data governance, and ensure they had the right tooling to catalog metadata and track data lineage.
HelloFresh now uses Informatica data cataloging to classify and document data from its Cloudera cloud data warehouse and other critical sources. Better documentation of datasets in one central place gives HelloFresh employees greater clarity about where the data comes from, who owns it, and whether it meets their needs. Informatica provides the data platform with integrated data quality capabilities for cataloging and data governance requirements.
Coordinating a Better Response at UNC Health
UNC Health comprises UNC Hospitals and its provider network, the clinical programs of the UNC School of Medicine, and 12 affiliate hospitals and hospital systems across North Carolina. To deliver a coordinated response to COVID-19, they needed to understand the impact of the pandemic and get clear, concise information to facilitate decision-making and improve patient outcomes.
UNC Health deployed Informatica Enterprise Data Catalog to automatically catalog enterprise data and allow data analysts, developers, and architects to view it in tables and columns, so they could easily understand data lineage and expedite impact analysis.
Becoming More Customer Centric at Assicurazioni Generali
Generali is one of the world’s largest and oldest insurance providers. Like many companies, it is undergoing digital transformation to become more customer centric. It relies heavily on insured and policy data and wanted to create a data-driven culture across all its business units.
Generali established an enterprise data catalog to democratize and organize data to enable employees to easily discover and inventory data assets. Integrating Informatica Data Catalog with Informatica Data Governance allowed them to quickly find the data they need to govern, manage data effectively and uncover analytics insights.
Increasing the Quality of Provider Data at L.A. Care Health Plan
L.A. Care’s mission is to provide access to quality healthcare for Los Angeles County’s low-income communities and support the safety net required to achieve that purpose. The organization grew quickly after the passage of the Affordable Care Act and needed to protect, govern and manage vast amounts of patient information, as well as leverage its data for analytics to improve the health of its population.
L.A. Care integrated Informatica Data Quality, Enterprise Data Catalog and Axon Data Governance to improve its population health information efforts with governed, high-quality data that provides invaluable insight into the county’s most vulnerable residents.
Data Catalog Resources
Discover how AI-powered intelligent data catalogs let you discover, inventory, and organize data assets quickly and accurately.
- Informatica was named a Leader in the “IDC MarketScape: Worldwide Data Catalog Software 2022 Vendor Assessment.” Download your copy now.
- Learn how you can enable core data initiatives — download your copy of the “Build the Data Foundation for Every Digital Transformation Priority” eBook
- Discover 4 ways to start with data catalog
- Learn more about 5 key benefits of data and analytics governance
- Read why enterprises need data cataloging now more than ever
- Learn more about the benefits of AI-augmented data cataloging