4 Ways to Start with Data Catalog

Sep 09, 2022 |
Arun G Krishnan

Technical Marketing Consultant

You've probably read and heard a lot about how data catalogs help organizations operationalize data governance and data privacy at scale. But before you can get started, it’s better if you know the answers to a few questions:

  1. Do you need a data catalog?
  2. What are the critical business needs that you can address with a data catalog?
  3. Where is the best place to start?
  4. What factors should you consider when you’re selecting a data catalog tool?

To learn the answers to these and other questions, please keep reading.

Figure 1: A data catalog makes it easy for you to do everything from finding data to ensuring data quality. Figure 1: A data catalog makes it easy for you to do everything from finding data to ensuring data quality.

 

Data catalogs play a central role in building data-driven culture for an organization. Traditionally, data catalogs have served as a central data search and discovery platform, allowing users to take ownership of their data. Today’s data catalogs have evolved into a collaborative platform for both data owners and users, enabling data democratization as well. This makes it easy for data users to find data, technical and business metadata, understand data policies, view lineage and gain trust through data quality. Such objective information will empower analytics and data engineering teams, which will assist overall data governance. For example, data lineage visualizations can be used to identify critical data, its usage and its downstream impact during mergers and acquisitions. The data catalog can help you choose the best reporting system that meets your specific data quality needs.

Figure 2: A data lineage visualization enables you to assess the downstream impact of critical data. Figure 2: A data lineage visualization enables you to assess the downstream impact of critical data.

 

Do You Need a Data Catalog?

An organization doesn’t need a data catalog if its data engineers can correctly respond to questions about its data, what it means, where it comes from, how it is connected, and its downstream changes. Now, if you have data users who are constantly running around asking others about what a specific data means — or attempting to validate reports and their contents — that’s a sign that you don't have scalable data governance operating model in place. With so many business functions, data sources, cloud data lakes, and more, it is impossible for everyone to have all the information (unless they had somehow been able to set up every data pipeline from beginning to end). And that’s not usually the case — data engineers only work on specific data pipelines in data-intensive industries such as healthcare and retail, which have complex analytics architectures.

Identify Your Business Needs

There are plenty of data governance and catalog tools available in the market for you to choose from based on your budget, data needs and product capabilities. But which one is best suited for you? Do you need all the bundled tools and features they offer? You may have already read quite a bit about these tools and probably seen many demos to help you make the right choice.

Take your time, discuss and answer the questions below as clearly as you can before you finalize your selection of a data catalog tool.

  1. What are the key business use cases for a data catalog in your organization?
  2. What are the desired business and technical capabilities that you need to achieve your governance objectives?

Describing the business need for a data catalog for your organization should be the first item on your agenda when thinking about investing in a data catalog. Discuss with your business and technical peers. Make a list of the business use cases you wish to solve with the tool and identify what metadata is required to be in the catalog. Your data peers need to understand what data problems you’re trying to solve and agree on shared responsibilities before you move forward. Below are some of the most typical business requirements.

  • Tailored for specific industry
  • Common business vocabulary across the organization
  • Supports all data sources and formats currently in use
  • Metadata management
  • Understand the upstream systems that contribute to data and downstream systems that make use of data
  • Observe changes in quality and identify data issues
  • Operationalize data privacy policies that apply to data

Where Is the Best Place For You to Start?

Your objectives for the data catalog should be more than just metadata search. With the right solution, you can enable users to take actions from that insight. Start small by identifying a critical project in the organization. For example, in a retail company, that critical project could be data governance for customer order details to improve cross-sell and upsell opportunities. Form a team involving data owners and users and plan on implementing your first data catalog project. Begin with identifying the data sources you need for this project and bringing the data into the data catalog. If necessary, data owners and users should validate the description, policies, quality and lineage, and add additional business context so it’s easily understood by a larger audience. Once you have classified, profiled and added data to the catalog, you can reuse the information for future data governance activities. For greatest impact, be sure to highlight how the catalog improves business processes for users and overall data quality. Your next goal should be to drive adoption by creating organizational workflows around the data catalog.

Figure 3: A business glossary allows you to document common business terms and help ensure consistent data usage. Figure 3: A business glossary allows you to document common business terms and help ensure consistent data usage.

 

For more on the Fundamentals of Data Cataloging, check out the webinar series, Back to Basics: The Fundamentals of Data Cataloging.

Much More than a Data Catalog

Not all data catalogs are created equal. Some can do much more than cataloging and lineage. For customers in heavily regulated industries, for example, the catalog should make it easy to identify which data is impacted by specific regulations. The lineage also enables them to see the source of your data and how it is consumed. An inbuilt data quality engine in the data catalog is a key enabler for effective data governance of your data estate.

Figure 4: Data quality helps ensure that data is trustworthy. Figure 4: Data quality helps ensure that data is trustworthy.

 

An AI-powered data catalog can provide intelligence by leveraging metadata to deliver intelligent recommendations, suggestions and data management task automation. It uses a machine-learning-based discovery engine to scan and catalog data assets, on-premises and in the cloud.

Informatica can serve as the one-stop shop for a data governance program. Our technology platform is built to be modular, integrated and highly interoperable. In addition, our CLAIRE® engine applies artificial intelligence and machine learning to automate formerly manual processes like data discovery, cataloging, reporting and even applying metadata so your team can spend more time on analysis and strategy.

Build your business use cases first, start small, set measurable goals and make sure you can showcase the ROI. You should choose a tool based on the complexity and size of your data ecosystem. Ideally, you should look for a product partner whose offering can scale and help with AI-powered auto-discovery, auto glossary associations, recommendations, active metadata management, policy management and end-to-end lineage.

Remember, this is a journey. That makes it particularly important to choose a partner wisely. It’s always better if you can base your selection on the partner’s depth of expertise — ideally, demonstrated with an organization similar to yours. That way, you can be sure you’re choosing a partner that meets customer needs and can support you in the long run. And, just like any software you procure for your organization, you also need people’s buy-in and contribution to succeed. So, socialize the concept and get people onboard to collaborate and contribute throughout the selection process. Otherwise, you may end up with a shiny new tool that no one uses.

Ready to Get Started? Here’s What You Can Do Next.

Register for our Meet the Experts: Empowering Business with Cloud Data Governance & Catalog Webinar.

In “Meet the Experts: Empowering Business with Cloud Data Governance & Catalog,” you will explore what’s new in Informatica’s data governance solution and how it can empower your business with trusted data. We’ll cover:

  • Data cataloging as the key to intelligent data governance
  • Unified capabilities across Cloud Data Governance and Catalog
  • Risk management with intelligent data discovery

Click here to register for the on-demand webinar.