In December 2017, data catalogs officially became cool when Gartner published its report “Data Catalogs Are the New Black in Data Management and Analytics.” Demand for data catalogs has been strong and lived up to the hype over the last couple of years. And over the last year, this demand has translated into an increasing number and variety of data catalog offerings from vendors – standalone data catalogs, domain-specific data catalogs from cloud platform providers, and data catalogs embedded in other tools like data preparation and analytics.
At the same time, as enterprises embark on data-driven business transformations and the data landscape becomes increasingly complex, intelligent, enterprise-scale data catalogs have emerged as a foundational part of the modern data environment. A robust, well-designed catalog simplifies and accelerates the implementation of all data management projects by providing the enterprise-wide metadata foundation to add context to data and fuel metadata-driven artificial intelligence and automation.
Unfortunately, as in any market, more options doesn’t necessarily imply more value and clarity for customers. On the contrary, many of the data cataloging solutions are limited along one or more of these dimensions:
- Data Coverage: limited to specific systems or platforms, so unable to provide complete view of enterprise data landscape.
- Use cases: limited to only one use case, like data governance or self-service analytics, so unable to fully support all data-driven business priorities.
- Lineage: limited ability to trace data lineage across systems, across cloud and on-premises, or extract lineage from code (e.g., stored procedures, ETL) and complex applications.
- Metadata: limited to capturing basic technical metadata, so unable to provide the rich context that can be extracted from other types of metadata such as business, operational and usage metadata.
- Scalability: unable to address true enterprise-class requirements for scanning and cataloging tens of millions of data assets.
So, if you are looking for a true enterprise-scale catalog to catalog all of your data and provide the metadata foundation for all of your data-driven business priorities, the choice of a limited catalog can seriously set you back in terms of lost efficiencies, effectiveness, and time. To avoid this, here are six key questions to consider as you go about selecting the right data catalog to power your data-driven business priorities:
- Can you catalog all of your data? Look for a data catalog that can scan and capture metadata from all data sources across the enterprise – on-premises databases and data warehouses, cloud data stores, BI tools, ETL tools, legacy and mainframe systems, complex enterprise applications, and even other data catalogs.
- Can you understand the lineage of your data and the impact of any changes? Look for a data catalog that can provide granular, end-to-end lineage across systems, and across cloud and on-premises with no black boxes, including the ability to drill down into complex data transformations (e.g., stored procedures and ETL code)
- Can you automate data curation leveraging AI/ML? Relying on just manual and crowd-sourced data curation will not scale. You need AI/ML-powered intelligence for automated domain discovery, tagging and classification, detection of similar data, and association of business glossary terms to physical data assets.
- Does your data catalog facilitate seamless collaboration? You should be able to easily tap into shared data knowledge that’s distributed across the enterprise, and leverage the best of AI and human expertise.
- Can your data catalog support all business use cases? Look for integrated solutions to address key business use cases like data governance, data privacy, self-service analytics, and cloud modernization.
- Can you ramp up to support true enterprise scale? Check if your data catalog can scale to support tens of millions of data objects across hundreds of data sources.
If your data catalog doesn’t meet any one of these requirements, you run the risk of having metadata silos that don’t provide the enterprise-wide visibility and support you need for all business cases. To learn more about why you need an enterprise-class “catalog of catalogs” to move forward with your data-driven business initiatives, download our eBook “Drive Your Business Forward With a Catalog of Catalogs.” And read this Maersk case study for a real-world perspective on how they are taking a multi-pronged approach to data-driven business transformation by putting in place a scalable data governance program, modernizing their data infrastructure with a cloud data lake, and transforming their business intelligence and self-service analytics—all of which are being enabled by an intelligent, enterprise-scale data catalog.
 Gartner, Inc., Data Catalogs Are the New Black in Data Management and Analytics, by Ehtisham Zaidi,Guido De Simoni,Roxane Edjlali,Alan D. Duncan, 13 December 2017