Enabling Data Governance in a Big Data World

Last Published: Dec 23, 2021 |
Rob Karel
Rob Karel

As new information management tech trends continue to emerge, I try to evangelize a truism that the core data governance framework, processes and best practices are agnostic of – and should support – any technology. For example, I discourage terms like “Master Data Governance”, “Cloud Data Governance,” “Mobile Data Governance” and “Big Data Governance” because they imply there are distinct disciplines separate from good old Data Governance and Data Stewardship. That’s misleading - and risky if your organization actually chooses to have separate siloed teams governing your data. The whole point of data governance is to bridge and eliminate siloes, right? So don’t create new ones. 

That said, I won’t minimize the fact that existing data governance programs face new challenges when it comes to supporting emerging big data use cases. Data stewards in particular are the front line of your data governance effort. Stewards are responsible for operationalizing and ensuring compliance with data governance policies and standards, and are on the hook to bridge old world data standards with the unique requirements emerging in the world of big data or as we like to call it big data engineering.

4 Stages of Data Governance Process

So which core data governance processes will be most critical to bridging the traditional data management needs with the data volume, structure and latency complexities inherent in next generation big data use cases? Let’s start with my the Process Stages of Data Governance that I’ve evangelized for many years:

Data Governance

To ensure your data stewards are big data ready, ensure you enable them with the right skills and tools to:

  1. Profile and discover across vast islands of data. In a big data landscape, targeted column and table based data profiling is insufficient. Stewards need help identifying where relevant data may exist, and evaluate that data’s usefulness across vast volumes. Automated and guided data and metadata discovery will be necessary to support this effort. In other words: it’s easier to find a needle in a haystack with a magnet than with a magnifying glass.
  2. Update your definitions and standards with new business context. Your existing business terminology, taxonomies, relationships, as well as the policies, rules, and standards for data capture, management and consumption for critical information must be passed through the business lens of a big data use case. As a simple example, your definition of a consumer customer in the world of transactional systems may have been limited to individuals who purchased your product in the last year. But in a big data social analytics initiative, marketing analysts may want to expand that definition to those who have purchased at any time but have actively mentioned your product via social networks in the past year (aka influencers).
  3. Operationalizing new data quality and security standards in new environments. Data management professionals must now apply any existing or refreshed data quality and security standards to the new types of data and sources leveraged in big data environments. At the same time, they must implement the necessary workflows for stewards and business consumers to effectively manage exceptions and escalate concerns regarding this data.
  4. Measure your progress and success. Proactively monitoring compliance with data quality and security rules and policies is hard. Measuring value of data governance efforts is very hard. The ability to demonstrate ROI from big data investments in these early days of this journey is extremely hard. But focusing on the critical few metrics that can illustrate the effectiveness of your data governance and big data efforts will mean the difference between recalling that interesting big data experiment on your resume, and a fully funded strategic big data initiative.

Ensure your data governance program, and all the stakeholders dependent on its success, are provided with the appropriate systems, tools and processes. By providing the enabling infrastructure to support Big Data Management requirements, you can ensure your big data engineering initiatives not only deliver information, but - because they are effectively governed - high quality, trusted and valuable actionable insights.

First Published: Nov 11, 2015