Person managing product information

Metadata management is critical to building a data-driven business. But before we can define metadata management, we have to define metadata.

Metadata is data about data. It provides the context you need to use your data effectively and unleash its full value. Specifically, metadata helps you discover data, understand the relationships between different pieces of data, track how data is used, and assess the benefits and risks associated with that use. As data continues to grow at an explosive rate as well as become more distributed, metadata is fueling mission-critical processes—which is why metadata management now plays a central, strategic role in driving digital transformation.

So what is metadata management? It is the portfolio of best-practice processes and technologies that allow businesses to manage this data about their data and derive insights for more effective data management. It allows users of all kinds – business, technical, and operational – to search for, understand, and securely access the data they need to do their jobs.

Why is metadata management important?

Metadata management is a foundational practice for any organization striving to make better use of its data. It ensures data is trusted and accessible as appropriate to data consumers throughout the enterprise. Most critically, it gives data consumers visibility into what data is available and answers any questions they might have about its data lineage.

Data lineage is essentially the “provenance” for data: an ongoing and continuously updated record of where data originates, how it moves through the organization, how it gets transformed, where it’s stored, who accesses it, and other key metadata. Because data lineage clarifies the availability, ownership, security, and quality of data as it flows across the organization, it is an essential part of metadata management.

Reasons that metadata management – encompassing data lineage – is important:

  • Shows users what data is available

  • Ensures visibility into where the data is coming from, if it can be trusted, and how to find it

  • Enables transparency across hybrid and multicloud architectures, giving a complete view into data across on-premises and cloud data repositories

  • Leveraged automation to eliminate the need for tedious, manual curation of data sets

Managing the breadth of data

Metadata management must be able to handle the breadth of data across the enterprise. There are two dimensions to breadth. The first dimension is the ability to collect metadata from sources across the enterprise, both on-premises and cloud. The other dimension of breadth is about the four different types of metadata that must be managed:

  • Technical metadata management: This includes managing metadata that includes database schemas, mappings and code, transformations, and quality checks

  • Business metadata management: This involves management of glossary terms, governance processes, and application and business context metadata

  • Operational and infrastructure metadata management: Managing run-time stats, time stamps, volume metrics, log information, and system and location metadata falls into this category

  • Usage metadata management: Managing such things as user rating, comment, and access-pattern metadata come into play here

Active metadata represents a new metadata concept. The idea is that metadata is even more valuable if it is overlaid with artificial intelligence (AI) or machine learning (ML). Active metadata management doesn’t just help users analyze metadata passively, but in and of itself recommends or triggers action. For example, metadata can flag missing or incorrect data. It can then help improve the quality of analytics by automatically correcting and enriching the metadata to improve decision-making and avoid costly mistakes.

How does metadata management work?

To build a robust data foundation to support decision-making, follow five distinct steps: build, operate, control, consume, and measure. Intelligent metadata management helps with every step in that lifecycle.


This step in the framework is managed by enterprises’ data architects. This includes identifying the datasets to be moved and making decisions about whether all or only part of those datasets needs to be moved.

The metadata management solution needed for the build stage is the data catalog.

Metadata management at this stage enables enterprises to:

  • Perform domain discovery that helps identify sensitive data

  • Find similar and related data that can be moved to a data repository

  • Make information on data profiling and quality available as data is identified for onboarding


In the operate step, responsibility for metadata management falls on data engineers and operations leaders. They need to manage the constant changes to the data, associate the right business context – through metadata – to the data, and avoid duplication of data.

Two tools are essential at this point. Data catalog and data governance tools.

The benefits of metadata management during this step include:

  • Get notified of any data changes or any additional data coming into a data warehouse or data lake through impact analysis, lineage analysis, and data duplication analysis

  • Easily decide if the new data needs curation, business context association, or sensitivity analysis

  • Ensure the governance lifecycle for all onboarded data

  • Change workflows as data or business context changes


This metadata management step is managed by the data quality engineer, the data steward, or both, working together. They ensure that the data is of sufficiently high quality to base important business and operational decisions on, and that usage of sensitive data adheres to both internal privacy policies and external regulations.

The solutions that address this are data governance, data quality, and data privacy management tools.

The benefits that accrue during this step of metadata management include:

  • Apply data quality rules at scale across all data warehouse and data lakes

  • Ensure data privacy policies like GDPR and others are adhered to as data makes its way to consumers

  • Minimize risk

  • Make better decisions

  • Improve employee productivity


Data scientists, data analysts, business users, and citizen scientists are involved at this step. Their ongoing challenge: to find, discover, and understand the data they need to do their jobs.

Data catalog, data governance, and data marketplace solutions are critical at this point.

The benefits to organizations include:

  • Access a semantic search capability that helps find data in business language

  • Review and rating capability ensuring consumers make informed decisions on using data

  • Help users leverage data in a manner that helps in specific analytics use cases

  • Enable users to “shop” for the required data and place requests for new data


Finally, we get to measuring the performance and value of the data that is now stored in a data warehouse or data lake. Data stewards are in charge now, and they must monitor how the datasets are performing and how to measure their value to the business.

Solutions to use during this phase of metadata management: data governance and data catalog.

The benefits include:

  • Understand the quality of each data asset

  • Ensure the right business context is associated

  • Apply the right data-quality rules

  • Measure, monitor, and manage the value of data assets

Metadata management best practices

Here are eight best practices for running a successful metadata management operation.

Make metadata management active

As mentioned earlier, applying AI and ML to metadata management and making it active delivers significant benefits. Among other advantages, active metadata management can automatically integrate systems based on learned patterns and adjust for external changes; it can also provide the business context to the data required to increase confidence when users perform analytics to make business decisions.

Create a common metadata foundation

Enterprises should also create a common metadata foundation that can be used to deliver insight and intelligence for all data management processes. In effect, you want to be able to support all data management business use cases from one place with an integrated approach. By leveraging this shared metadata foundation, you get a window into your data across its entire lifecycle.

Gather metadata from full breadth of data sources

Another best practice is being able to collect data from all types of data sources across both multi-cloud and on-premises. You want to avoid having holes in your data because the metadata might be in the form of a code inside a database, and your tools are not able to parse that code. Your metadata could even be trapped in an Excel spreadsheet silo. An inability to pull it out and make it available as part of your shared metadata foundation will only hold you back.

Make metadata management a priority when creating your data warehouse and data lake strategies

Given the critical nature of data-driven business transformation today, and the importance of metadata for that, you have to make metadata management an essential part of your strategizing for data lakes or data warehouses when considering scale, performance, and other attributes.

Include the source of metadata in data lineage

ETL software, BI tools, relational database management systems and modeling tools, enterprise applications, and custom applications all create their own data about your data. This metadata is key to understanding where your data has been and how it has been used.

Involve owners of metadata sources in verifying data lineage

The owners of the tools and applications that create metadata about your data know better than anyone else how timely, accurate, and relevant the metadata is. Leverage that knowledge.

Identify appropriate metadata solutions

Many metadata management solutions exist. Choosing the right ones is only part of the battle. The metadata management team should also promote awareness and use of the approved tools to embed the right attitude and aptitude for metadata management throughout the IT team.

Develop a metadata stewardship program

Your metadata team should include metadata stewards who take the high-level metadata management policies and implement them on the ground. Developing such a program can be the single most important indicator of a successful metadata management strategy.

Customer success stories

Global insurance and reinsurance company AXA XL, based in Hamilton, Bermuda, and Stamford, Connecticut, has more than 100 offices across six continents. It was deploying a big data infrastructure entirely in the cloud that fed a Microsoft Azure-based data lake, and wanted to standardize on a single data management platform with integrated products to manage all of its metadata. With Informatica Enterprise Data Catalog, it was able to attain more profitable growth through cross-selling and upselling of insurance policies by making data actionable and easy to find. Then, using Informatica Enterprise Data Preparation, it was able to democratize data discovery and preparation to give data scientists, analysts, and actuaries faster and deeper insight that drove faster policy introductions by the company.

Evolve Your Approach to Metadata Management

For any business serious about its data, metadata management is critical. You need to cover all types of metadata—including active metadata, use intelligent metadata management tools, and follow best practices to get the most out of your investment in big data.

Learn more about metadata and how to manage it with these resources by downloading our eBook, “Why an Intelligent Data Catalog is Essential to Digital Transformation.”

Additional metadata management resources

Blog: Active Metadata – What It Is and Why It Matters

Informatica’s Approach to Metadata Management