I grew up near Johnstown, Pennsylvania, a city that’s known for one very unfortunate incident—the Johnstown Flood of 1889. And it carries several lessons that are just as important when developing data governance for analytics and overseeing data lakes as they are for constructing an actual water-filled lake.
As a child, I learned the full story of the Johnstown Flood: how local wealthy residents wanted a lake to enjoy near their equivalent to a country club, so they hastily modified a dam—without proper planning—and created a lake suitable for boating, fishing, and swimming.
Sadly, after a particularly hard spring storm, the dam broke, with devastating consequences for those living downriver. If there had been procedures put in place to check on the strength of the dam from time to time, a community agreement to support proper planning and building policies from the start, and a joint effort to create a lake that worked for everyone, the whole situation could have been prevented.
It’s no surprise that I’m using this narrative from my youth to illustrate that a data lake can also fail without following the same logic: collaborate on policies, create a data-focused culture that’s supported at all levels, and govern the right amount of data to develop trust in the results. Data governance and analytics are inextricably linked: they both make the other one better. Good data governance for analytics means scientists, analysts, and line of business owners can rely upon the results.
One of the most crucial ways that data governance benefits analytics efforts is the creation of a data-driven culture. From the top down, organizations need a community that embraces the decision to be data-driven. It all comes down to trust. Can you trust the data? If you can’t, then can you trust each other to fix it? From policy creation to deciding who data owners and stakeholders are, enterprise data governance programs depend on collaboration: all teams working together to agree upon policies and procedures that meet the needs of each team, and not just one department. To learn how to set up a data-driven culture, check out our CIO’s guide to developing a data-driven culture.
Committing to a data governance program—no matter what its scale—means your teams must work together to incorporate governance into every process. The good news is, analytics-enabled data governance can help by using machine learning and other intelligent technologies to automate some data quality and monitoring tasks. As these systems “learn” which data governance rules and triggers to watch for, they can reduce the management tasks for data stewards and strengthen your program.
If you want your organization to succeed with analytics, give your teams data that they can trust. This is especially true within data lakes, where a commitment to governance becomes even more important.
As we discuss in the eBook, “Five Keys to Optimize Your Data Lake with Data Governance,” data governance policies must adapt and evolve as your data lake volume rises and more streams feed into the lake. Your policies also need to be more flexible, as not all data in the data lake requires the same amount of governance. Sometimes, such as with customer transaction data, that data needs to be governed. Other times, the data governance policies for your data lake can be less rigid so that data scientists can get more directionally correct patterns out of the data than precise measurements. Perhaps they are trying to know general brand sentiment or trying to predict future product preferences. In these scenarios, organizations still need to trust their data, but they may not need the same level of control than what they would need if they were looking to comply with CCPA, GDPR, or other privacy-centric regulations.
It’s our responsibility in data governance to make those decisions. Simply put, in the world of data governance and analytics, enterprise data governance leaders are the lifeguards of the data lake, ensuring that everyone can enjoy their swimming while trusting that the data environment is safe.
Finally, with so much data coming into the data lake, organizations need to streamline the most common data governance tasks that impact analytics efforts. Today, data quality checks can be automated so that organizations can take full advantage of analytics-enabled data governance. AI and machine learning allow us to teach data governance systems to assist in monitoring the health of our data so that more team members will trust the data, rely upon it, and build even more (and better) analytics projects with it.
While the consequences of not governing your data lake pale in comparison to the repercussions of not adhering to safety policies for an actual lake, they can still hold you back from enjoying the rewards of a successful analytics initiative. By helping to build a data-driven culture that focuses on collaborating instead of controlling, and by choosing the right level of governance for your data lake, your analytics projects can succeed with trusted, governed data at their core. This level of trust will help to ensure that your analysis can be relied upon throughout your organizations to make major business decisions and fuel strategic initiatives.
For more information, check out: Five Keys to Optimize Your Data Lake with Data Governance.