Modernizing Your Data Warehouse? Prioritize Data Governance Early On
At Informatica, we’re like many of our customers in that we’ve been going through our own cloud modernization journey and moving our data infrastructure to the cloud. A key aspect of this journey is modernizing from an on-premises data warehouse to a cloud data warehouse. Let me just save you some time right now. You will be much better off if you take a strategic approach to data management, and in particular prioritize data governance early on in your data warehouse initiatives. In our cloud data warehouse use case, we thought we had the luxury of thinking about data governance later. We were wrong.
Why Data Warehousing Needs Strategic Data Management
Let’s start by making the case for having a well-thought-out architecture for your data management platform. It’s just as important as it is everywhere else in the CIO’s business. As a CIO, you would never manage your application portfolio or your enterprise security on an ad hoc project-by-project basis and hope that the sum of the parts would somehow meet the needs of the whole. Your data management should be no different.
But when it comes to data warehousing, many of these projects don’t arise out of a long-term strategic plan with governance baked in. From our internal experience and from talking to customers, it seems that most data warehouse projects start organically within one function. It may start with a copy of application tables and an on-prem database, with a visualization solution like Tableau on top. It might start in sales operations with the objective of getting better reporting than native Salesforce allows. Or it might start in the finance organization with an ERP-focused use case.
Wherever the project starts, the analytics community within that function understands their data at a very detailed level. They know how the data is created via the processes, and they know how the application works. They’re therefore able to very quickly create large numbers of dashboards and reports that people fall in love with and start to depend on. And then one day they decide to join Salesforce data with ERP data, or Salesforce data with data from Marketo. They want to share it across functions, and then all hell breaks loose. Data governance becomes an issue as soon as data created by one function is used by another. Why, you ask? Read on…
Prioritize Governance Early On
Unless you know to prioritize data governance early on in your cloud data warehouse project, you might not see this coming until it hits you. In our experience, here are some signs that governance will become an issue sooner rather than later.
You don’t have agreed-upon definitions of terms or calculation methods. Let’s say you’ve got two different groups trying to independently calculate the value of net new bookings (NNB) or a renewal rate – two very common KPIs in a subscription type of company like ours. If you’re trying to calculate NNB, you have to know whether you include the upsell amount on a renewal as part of that NNB number. You need to figure out if you take the pre-carved amount or the post-carved amount after the people in finance have done their processing. And you could have two equally talented analytics folks in two different organizations with a different perspective come up with completely different answers. Then you have the unproductive debate that we’re all familiar with about whose report is right.
You can’t find all the data you need. As a dataset starts to grow and the number of users grows, it becomes more and more difficult to find all the data that you need. It might be there. But if you don’t know how to access it and how to interpret it, it’s completely useless. So you need some discovering and cataloging capability.
You’re relying on specialized knowledge that does not scale. Often, when a data warehouse project starts organically there’s no business data model or semantic layer. The result is that everyone needs to understand the data to the same level of detail. Maybe everyone needs to understand how the tables are structured in the application and how the data is created so they can create the joins and select statements in the visualization layer when they’re building the first reports. That just does not scale.
For these reasons, we bumped into the governance problem much faster than we thought we would. We thought it was something we’d come along and do at the end, but it’s really something that if you don’t do toward the beginning then it gets in your way.
Manage Data as a Strategic Asset
Prioritizing data governance for your cloud data warehouse really means thinking of your data as a strategic asset. And if data really is the strategic asset that we all say it is, then we need to manage it that way. CIO assets of the past were infrastructure assets. Most CIOs I talk to can tell me how many virtual machines they run, how many laptops they support, or how many petabytes of storage that they manage. And in 2021 who cares? These assets don’t underpin your digital transformation. Your data does.
If data really is a strategic asset we should be thinking about it the way a CFO thinks about money. If you were to bump into a CFO and ask them if they know where all the money is in their company, hopefully they’re going to say yes. If you ask them do they know who has access to it, you’re going to get the same answer. If you ask do they have processes to control how money moves around and how it’s transacted in the company, they’re going to know that, too. How about having a governance process to ensure that the money is being used for the most productive purpose? Most CFOs can answer positively to those questions.
Ask a CIO any of those questions and change the word money for data and you may get an uncomfortable look. But if data is the asset that digital transformation depends on, we better figure this out quickly. Because our competition is.
CIOs Must Think Long Term
One thing I’ve noticed as we progressed through our cloud data warehouse project is that a CIO, CDO, or head of architecture naturally has a different perspective from the folks on the ground who are just trying to execute the next project. You really need to think long term.
For instance, it may make sense to lay down the data management platform ahead of when you need it. Just as with your application portfolio, the data management platform will work more effectively and it will be faster to innovate on if it’s built deliberately and strategically with an architecture view in mind. But thinking like this and anticipating the future needs for future projects can create tension between your short-term goals and your long-term strategy.
Your unique perspective as the CIO or architect lead may include thinking about the marginal cost of the next project. And the right action may be to build the solution in such a way that it satisfies that first project with the next project in mind. The project teams working on that first project will never have that perspective. I’ve learned to watch out for these points of tension and take them as my cue to step in and make a longer-range decision.
Check out these resources for modernizing your on-premises data infrastructure: