Metadata management is critical to building a data-driven business. But before we can define metadata management, we have to define metadata.
Metadata is data about data. It provides the context you need to use your data effectively and unleash its full value. Specifically, metadata helps you discover data, understand the relationships between different pieces of data, track how data is used, and assess the benefits and risks associated with that use. As data continues to grow at an explosive rate as well as become more distributed, metadata is fueling mission-critical processes—which is why metadata management now plays a central, strategic role in driving digital transformation.
So what is metadata management? It is the portfolio of best-practice processes and technologies that allow businesses to manage this data about their data and derive insights for more effective data management. It allows users of all kinds – business, technical, and operational – to search for, understand, and securely access the data they need to do their jobs.
Why is metadata management important?
Metadata management is a foundational practice for any organization striving to make better use of its data. It ensures data is trusted and accessible as appropriate to data consumers throughout the enterprise. Most critically, it gives data consumers visibility into what data is available and answers any questions they might have about its data lineage.
Data lineage is essentially the “provenance” for data: an ongoing and continuously updated record of where data originates, how it moves through the organization, how it gets transformed, where it’s stored, who accesses it, and other key metadata. Because data lineage clarifies the availability, ownership, security, and quality of data as it flows across the organization, it is an essential part of metadata management.
Reasons that metadata management – encompassing data lineage – is important:
Shows users what data is available
Ensures visibility into where the data is coming from, if it can be trusted, and how to find it
Enables transparency across hybrid and multicloud architectures, giving a complete view into data across on-premises and cloud data repositories
Leveraged automation to eliminate the need for tedious, manual curation of data sets
Managing the breadth of data
Metadata management must be able to handle the breadth of data across the enterprise. There are two dimensions to breadth. The first dimension is the ability to collect metadata from sources across the enterprise, both on-premises and cloud. The other dimension of breadth is about the four different types of metadata that must be managed:
Technical metadata management: This includes managing metadata that includes database schemas, mappings and code, transformations, and quality checks
Business metadata management: This involves management of glossary terms, governance processes, and application and business context metadata
Operational and infrastructure metadata management: Managing run-time stats, time stamps, volume metrics, log information, and system and location metadata falls into this category
Usage metadata management: Managing such things as user rating, comment, and access-pattern metadata come into play here
Active metadata represents a new metadata concept. The idea is that metadata is even more valuable if it is overlaid with artificial intelligence (AI) or machine learning (ML). Active metadata management doesn’t just help users analyze metadata passively, but in and of itself recommends or triggers action. For example, metadata can flag missing or incorrect data. It can then help improve the quality of analytics by automatically correcting and enriching the metadata to improve decision-making and avoid costly mistakes.
How does metadata management work?
To build a robust data foundation to support decision-making, follow five distinct steps: build, operate, control, consume, and measure. Intelligent metadata management helps with every step in that lifecycle.
This step in the framework is managed by enterprises’ data architects. This includes identifying the datasets to be moved and making decisions about whether all or only part of those datasets needs to be moved.
The metadata management solution needed for the build stage is the data catalog.
Metadata management at this stage enables enterprises to:
Perform domain discovery that helps identify sensitive data
Find similar and related data that can be moved to a data repository
Make information on data profiling and quality available as data is identified for onboarding
In the operate step, responsibility for metadata management falls on data engineers and operations leaders. They need to manage the constant changes to the data, associate the right business context – through metadata – to the data, and avoid duplication of data.
Two tools are essential at this point. Data catalog and data governance tools.
The benefits of metadata management during this step include:
Get notified of any data changes or any additional data coming into a data warehouse or data lake through impact analysis, lineage analysis, and data duplication analysis
Easily decide if the new data needs curation, business context association, or sensitivity analysis
Ensure the governance lifecycle for all onboarded data
Change workflows as data or business context changes
This metadata management step is managed by the data quality engineer, the data steward, or both, working together. They ensure that the data is of sufficiently high quality to base important business and operational decisions on, and that usage of sensitive data adheres to both internal privacy policies and external regulations.
Apply data quality rules at scale across all data warehouse and data lakes
Ensure data privacy policies like GDPR and others are adhered to as data makes its way to consumers
Make better decisions
Improve employee productivity
Data scientists, data analysts, business users, and citizen scientists are involved at this step. Their ongoing challenge: to find, discover, and understand the data they need to do their jobs.
Data catalog, data governance, and data marketplace solutions are critical at this point.
The benefits to organizations include:
Access a semantic search capability that helps find data in business language
Review and rating capability ensuring consumers make informed decisions on using data
Help users leverage data in a manner that helps in specific analytics use cases
Enable users to “shop” for the required data and place requests for new data
Finally, we get to measuring the performance and value of the data that is now stored in a data warehouse or data lake. Data stewards are in charge now, and they must monitor how the datasets are performing and how to measure their value to the business.
Solutions to use during this phase of metadata management: data governance and data catalog.
The benefits include:
Understand the quality of each data asset
Ensure the right business context is associated
Apply the right data-quality rules
Measure, monitor, and manage the value of data assets
Metadata management best practices
Here are eight best practices for running a successful metadata management operation.
Make metadata management active
As mentioned earlier, applying AI and ML to metadata management and making it active delivers significant benefits. Among other advantages, active metadata management can automatically integrate systems based on learned patterns and adjust for external changes; it can also provide the business context to the data required to increase confidence when users perform analytics to make business decisions.
Create a common metadata foundation
Enterprises should also create a common metadata foundation that can be used to deliver insight and intelligence for all data management processes. In effect, you want to be able to support all data management business use cases from one place with an integrated approach. By leveraging this shared metadata foundation, you get a window into your data across its entire lifecycle.
Gather metadata from full breadth of data sources
Another best practice is being able to collect data from all types of data sources across both multi-cloud and on-premises. You want to avoid having holes in your data because the metadata might be in the form of a code inside a database, and your tools are not able to parse that code. Your metadata could even be trapped in an Excel spreadsheet silo. An inability to pull it out and make it available as part of your shared metadata foundation will only hold you back.
Make metadata management a priority when creating your data warehouse and data lake strategies
Given the critical nature of data-driven business transformation today, and the importance of metadata for that, you have to make metadata management an essential part of your strategizing for data lakes or data warehouses when considering scale, performance, and other attributes.
Include the source of metadata in data lineage
ETL software, BI tools, relational database management systems and modeling tools, enterprise applications, and custom applications all create their own data about your data. This metadata is key to understanding where your data has been and how it has been used.
Involve owners of metadata sources in verifying data lineage
The owners of the tools and applications that create metadata about your data know better than anyone else how timely, accurate, and relevant the metadata is. Leverage that knowledge.
Identify appropriate metadata solutions
Many metadata management solutions exist. Choosing the right ones is only part of the battle. The metadata management team should also promote awareness and use of the approved tools to embed the right attitude and aptitude for metadata management throughout the IT team.
Develop a metadata stewardship program
Your metadata team should include metadata stewards who take the high-level metadata management policies and implement them on the ground. Developing such a program can be the single most important indicator of a successful metadata management strategy.
Customer success stories
Global insurance and reinsurance company AXA XL, based in Hamilton, Bermuda, and Stamford, Connecticut, has more than 100 offices across six continents. It was deploying a big data infrastructure entirely in the cloud that fed a Microsoft Azure-based data lake, and wanted to standardize on a single data management platform with integrated products to manage all of its metadata. With Informatica Enterprise Data Catalog, it was able to attain more profitable growth through cross-selling and upselling of insurance policies by making data actionable and easy to find. Then, using Informatica Enterprise Data Preparation, it was able to democratize data discovery and preparation to give data scientists, analysts, and actuaries faster and deeper insight that drove faster policy introductions by the company.
Evolve Your Approach to Metadata Management
For any business serious about its data, metadata management is critical. You need to cover all types of metadata—including active metadata, use intelligent metadata management tools, and follow best practices to get the most out of your investment in big data.
Learn more about metadata and how to manage it with these resources by downloading our eBook, “Why an Intelligent Data Catalog is Essential to Digital Transformation.”