5 Key Takeaways from the Back to Basics: Data Catalog 101

Last Published: Mar 07, 2022 |
Melanie Henry
Melanie Henry

Sr. Product Marketing Manager

What does a data catalog have in common with a farm-to-table restaurant menu? One of our data catalog experts, Khuan Tan, Senior Director of Product Management, joined Nathan Turajski, Senior Director of Product Marketing, to explain this analogy during the first installment of our Back to Basics: Data Catalog webinar series — Data Catalog 101. During the lively discussion about the basics of data cataloging, we answered questions that included “What is a data catalog?” and “What should be included in a data catalog?” and took some questions directly from the audience.

If you missed the first episode of our Back to Basics: Data Catalog webinar series, don’t worry: you can catch a replay of Episode 1 on demand. We’ve also summarized some of the key takeaways from the session below:

1. A Data Catalog is Your “Menu” for Data-Driven Innovation

To provide a simple, relatable answer to the question, “what is a data catalog?” we used an analogy comparing a data catalog to a restaurant menu: Just as diners browse restaurant menus to see the options available (and details about those options) before they select and order a meal, a data catalog is a tool that helps users assess the data assets available and whether an asset is fit for use by providing an inventory of available data alongside relevant information about the data.

As diners become more discerning, they want to know more than the available meal options: they also want to know where their food originated, what it contains, and how it was prepared…Likewise, as organizations become more data-driven, data consumers want to thoroughly understand their organization’s data and ensure it can be trusted before “consuming” it.

To continue with the food and dining analogy, modern data consumers want to understand their data because they understand that “you are what you eat.” They realize that their outcomes depend on the data consumed.

As organizations continue moving towards data-driven innovation, the need for high-quality, reliable data increases. A data catalog greatly assists organizations with this need by enabling stakeholders to inventory, find, and understand relevant data, helping them to ensure that the data is trusted and is prepared for consumption.

2. Metadata Drives the Data Catalog

Metadata, simply described as “data about data,” is what drives data catalogs and provides information that helps data consumers understand the quality and fitness of data for usage.

During our conversation, we discussed the three primary elements of a data catalog:

  1. Technical Metadata: data structures, relationships, semantics, transformations, and lineage
  2. Business Metadata: business definitions, data asset relationships, data flows, critical data elements
  3. Wisdom of the Crowd: tribal knowledge, relevance, usage, ratings, comments, Q&A

Together, these three components work together and increase the value of data in a catalog by providing users with extensive information about the data, allowing them to decide whether the data is fit for use.

3. Must-Have Capabilities for a Data Catalog Include Comprehensive Metadata Connectivity and AI-enabled Automation

Broad and deep metadata connectivity is a core capability of a data catalog. To capture existing data assets, along with relevant metadata, a data catalog should be able to connect to a variety of sources across an organization’s data landscape, including legacy applications, on-premises systems, and cloud platforms.

A data catalog also should utilize AI-enabled automation for efficient, scalable data classification, discovery, association, curation, and more.

Learn about additional must-have capabilities in our eBook, Build the Data Foundation for Every Digital Transformation Priority.

4. When Adopting a Data Catalog: Develop a Strategy. Start Small. Refine and Expand.

The most successful organizations have a clearly defined strategy when adopting a catalog. This strategy should include objectives and measurable outcomes, as these considerations will drive the program. Next, for quick yet impactful results, start with a small pilot project, focused on a high-priority use case and pain points.

Finally, once the pilot project is successful, refine methods based on lessons learned from the pilot, and expand by adding more users to the pilot, or moving to other use cases, for example.

Check out our step-by-step workbook, 7 Best Practices to Drive Data Catalog Adoption, for guidance on how to get started with a data catalog.

5. A Data Catalog Can Help Accelerate Various Organizational Initiatives

Data catalogs can facilitate a range of activities from cloud migration to data science projects. For instance, in the case of migrating to the cloud, a data catalog enables users to understand what data assets exist within an organization and the potential impact of moving these assets to the cloud, which is extremely helpful when selecting and prioritizing assets to migrate.

Register for the Back to Basics: Data Catalog Webinar Series

Be sure to join us for the next episode of our Back to Basics: Data Catalog series, where in addition to learning about the fundamentals of data discovery, you will have an opportunity to have your questions answered by the experts. As a reminder, the remaining episodes in the series will cover data lineage, data asset analytics, and data and analytics governance. Select your episode(s) of interest or join us for all of them. Register today!

First Published: Mar 03, 2022