Data Governance and Catalog Fundamental to Modern Architectural Needs

Jul 18, 2022 |
Rameez Ghous

Sr. Manager, Technical Marketing

The World of Data Is Complex

The world of data has been evolving for some time. But the explosion of technology and storage capabilities has sped up the rate of change in volume, variety and velocity of data. According to Forbes, the amount of data created, captured, copied and consumed in the world increased 5000% from 2010 to 2020. Keeping up with the demand in this changing data landscape has been a daunting challenge. Looking for actionable business insights, organizations are turning to well-managed data.

Organizations originally created data warehouse and data marts on premise to address this need to store and manage data. Then came the advent of big data. With the rapid increase of more and more sources and types of data, big data on premises gave way to cloud storage. Today we see organizations that manage their data in data lakes and data lake houses, using cloud, multi-cloud or hybrid cloud models. The large data lakes and data lake houses have helped organizations modernize their data platforms with analytics, data science and AI/ML capabilities.

Figure 1: Pictorial Representation of Data Warehouse, Data Lake and Data Lakehouse (Source: Databricks) Figure 1: Pictorial Representation of Data Warehouse, Data Lake and Data Lakehouse (Source: Databricks)

 

Unfortunately, this approach has created its own set of challenges. These challenges are making some new architectural frameworks — like data fabric and data mesh — the talk of the data community. A key point to note: data governance and catalog is fundamental for the success of both architectural frameworks.

Data Governance and Catalog a Key Requirement for Data Fabric

Companies that choose to augment their data analytics and AI capabilities on cloud often created other issues. Moving these capabilities to the cloud often exposed that their data management strategies were not so well thought out. Meaning the company would struggle with different technologies for data management, or the lack of a unifying framework.

Data Fabric, as defined by Forrester analyst Noel Yuhanna back in the mid-2000s, is being touted to address this challenge, and for good reason. Essentially, data fabric is a metadata-driven way of connecting a disparate collection of data tools to address key pain points. A data fabric solution delivers capabilities such as data access, transformation, integration, lineage and orchestration.

Figure 2: Pictorial Representation of Data Fabric Framework Figure 2: Pictorial Representation of Data Fabric Framework

 

Some of the key building blocks of this framework are data governance, data catalog, metadata and data marketplace. A closer look shows that data governance, metadata and data catalog seem to be the glue that holds the fabric together.

Data Governance one of the Fundamental Principles for Data Mesh

In an attempt to drive their business based on data, many organizations have invested in a central data lake and a data team. But as the data lakes become bigger with the addition of ever-larger data domains, the struggles emerge. The centralized data team tries to learn about newer domains, and they scramble to support a growing and diverse set of users. On the other end of the spectrum, you’ll find the organizations that have completely decentralized their data lakes. These organizations let the individual domain teams manage the data for business needs. This approach leads to data silos, lack of consistency and standards, and loss of efficiency.

Data mesh is an architectural framework that takes the best of both the centralized and decentralized approach. Its four fundamental principles — domain ownership, data as a product, self-serve data infrastructure platform and federated governance — address data and analytics needs and also deals with the drawbacks of both approaches. And the “Federated Governance” principle ensures that there are enterprise standards for data interoperability and governance policies. This can help enterprises adhere to organizational rules and industry regulations. Standards help domain owners gain larger context across different data products. A Federated Governance model also ensures that global regulatory policies like GDPR (or local ones like CCPA or CPRA) will be easy to enforce and manage.
 

Figure 3 Pictorial Representation of Data Mesh Framework Figure 3 Pictorial Representation of Data Mesh Framework

 

A Data-Driven Blueprint for Success

As you can see, data governance and catalog plays a crucial role in some of the most talked-about architectural concepts of today. Gone are the days when data governance and catalog were nice to have (and were often an afterthought). Today, you measure the success of your data management by how well you can enforce its governance. Or how well your organization understands your data (and its context). Meaning that a well-thought-out architecture, with emphasis on data governance and catalog, can provide your organization with a strong blueprint for success.

Learn More

There are several places you can start :

  • Read more about our Intelligent Data Management Cloud, the industry’s only cloud-native, API-driven and microservices-based comprehensive data management platform
  • Watch this video to discover whether data mesh or data fabric is the right fit for your enterprise
  • Read about how data governance is an enabler for data mesh and data fabric.