5 Key Takeaways from Back to Basics: Data Lineage

Last Published: Aug 16, 2023 |
Melanie Henry
Melanie Henry

Sr. Product Marketing Manager

If you’ve been searching for information about data lineage and its importance, then hopefully you were able to join us for Episode 3 of our Back to Basics: Data Catalog webinar series. If you missed this episode, be sure to watch it on demand: Back to Basics Series Episode 3: Data Lineage. (Replays of our other previous episodes are available, too.)

In this session, our data catalog and lineage experts (along with a special guest) discussed the how and why of tracking data throughout its lifecycle. The group also answered several questions from the audience. Here are some of the key takeaways from the session:

1. Data Lineage Provides Essential Details About the Origin and Journey of Data

Remember the farm-to-table analogy from Episode 1? In this episode, we compared understanding your data’s lineage to the journey of a coffee bean, from a plant on the farm to the coffee in your mug.

When it comes to holistically understanding your data, you need to know information about the data, including its source and how the data was transformed or enriched. Data lineage provides this knowledge to help you answer simple, yet critical, questions related to where the data originated, where it’s going and where it’s being consumed.

2. Data Lineage Supports a Vast Array of Initiatives

Knowing your data’s lineage is useful for a variety of reasons. One of the primary use cases for data lineage is regulatory compliance. Capturing lineage can assist with identifying where sensitive data is located and how it’s being used, which is helpful for complying with regulations such as General Data Protection Regulation (GDPR).

Although meeting regulatory requirements is a high-priority case for data lineage, other common use cases include performing impact analysis and validation when organizations are migrating to the cloud or moving data to other systems.
Read more about other data lineage use cases in our eBook, AI-Powered Data Lineage: The New Business Imperative.

3. Data Lineage Has Different Audiences with Unique Needs

Within organizations, different stakeholders are interested in understanding their data’s lineage — but for different reasons and at different levels. Generally, these stakeholders fall into one of two broad categories: business stakeholders and technical stakeholders.

Business stakeholders are interested in lineage at a high level: they typically want to know the source of data and need a summary of how systems have transformed the data. When business stakeholders are creating financial reports for instance, they want to know that they can trust and use the data — having more information about where the data originated and the calculations underlying key metrics helps by increasing data intelligence.

Technical stakeholders normally need access to more details about an organization’s data, as they are responsible for initiatives such as migrating data to the cloud, as mentioned in the previous section. Technical teams need granular details to understand the impact of implementing new systems, for example.

To appease both audiences, a data catalog ideally should provide both business and technical lineage views.

4. Data Discovery and Lineage are United in a Data Catalog

Organizations must know what data exists and where it’s located before obtaining its lineage. As we emphasized in Episode 2, data discovery is a significant first step for understanding your data. Luckily, an intelligent data catalog houses multiple capabilities allowing you to both perform data discovery and obtain data lineage automatically.

Using a data catalog helps organizations find and understand their data, which ultimately helps to build trust in the data.

5. Broad and Deep Metadata Connectivity is Necessary for Capturing Data Lineage

Intelligent data catalogs utilize scanners to locate data and metadata automatically across an organization’s data landscape. As many organizations have complex hybrid or multi-cloud data environments, their data catalogs must be able to scan data across diverse systems, such as on-premises databases, data lakes, various cloud platforms, etc.

In addition to being able to scan across a broad amount of data sources, these scanners must be able to perform deep scans of data and obtain detailed information from sources including ETL and BI tools and SQL scripts to relay the data’s lineage to users.

Data catalogs with scanners having both broad and deep connectivity enable a comprehensive understanding of your data.

Register for the Back to Basics: Data Catalog Webinar Series

Be sure to join us for the rest of our Back to Basics: Data Catalog series. We’ll be covering the benefits of data asset analytics and how data catalogs facilitate data and analytics governance. You’ll also have an opportunity to have your questions answered by the experts! Register today!

First Published: Apr 12, 2022