Build Your Healthcare Cloud Data Lake With a Data Governance Foundation

Last Published: Aug 05, 2021 |
Richard Cramer
Richard Cramer

Chief Strategist, Healthcare and Life Sciences

the cure for preventing data swamp is data governance

We’ve all heard the cautionary tale across healthcare (and beyond) that poorly deployed cloud data lakes run the risk of becoming data swamps, where untrusted volumes of data deliver limited-to-no value or insights to the organization. This is not a myth. While most healthcare organizations, providers, and payers have embraced the need to deliver real-time, self-service clinical, patient experience, and operational insights to their front-line caregivers, they haven’t always been successful. We frequently hear from healthcare organizations large and small that have managed to ingest petabytes of data into their lakes, only to struggle to achieve business adoption because trusted, relevant data is hard to find, difficult to understand, and of questionable lineage.

Keep Data Swamps at Bay with Data Governance

The cure for preventing a data swamp is data governance to deliver a “governed data lake.” CTI is one of our consulting partners in the healthcare market. One of their areas of focus is advanced analytics and how to consistently and repeatedly derive business and clinical insights from data. As we all know, reliable and trustworthy data is a prerequisite to analytics. CTI has developed a comprehensive perspective on how to govern a data lake, helping healthcare organizations deliver the value required.


how to govern a data lake CTI


An integral part of CTI’s approach recognizes how data should progress through four different environments, or “regions” as they refer to them, as it becomes progressively more governed, trustworthy and, ultimately—certified for use. The idea of regions is not unique, but CTI has taken it to the next level. The organization clearly defines each region, its purpose and, most important, the governance criteria and milestones required to move from one region to the next.

When we think in terms of a successful healthcare data lake implementation, or even the more recent concept of a data lakehouse—an analytics approach that combines the benefits of both a cloud data lake and a cloud data warehouse—adoption is key. We want a wide variety of data consumers with all manner of workloads to use the data lake instead of using local desktop or server-based tools that are not enterprise-class. By using the data lake, these users will gain the benefit of a modern cloud data platform providing data cataloging, data lineage, data governance, data quality, and other industrial-strength tools and capabilities of a proper data platform.

Analysts or data scientists must have access to enterprise-class tools. And their data should be managed to begin the journey from an ungoverned private workspace onward to data or analysis that is curated, certified, and governed. Only then can data be used as an enterprise asset that can deliver the best health and business outcomes desired.

Maintain Ungoverned Personal Workspaces

This idea of a personal, largely ungoverned workspace for a user is key to the adoption of the data lake platform. There’s a low barrier to entry, great value by leveraging enterprise-class tools, and no need to incur the “burden” of data governance until the data is ready to be shared with others in the enterprise. The important data governance aspect of this personal workspace is to ensure that users understand data is not yet “fit for use”—it’s their sandbox to enable them to understand what’s possible and what additional information and insights they may require.

And so begins the governed data lake journey and data’s progression through the regions…

To learn more about CTI’s comprehensive approach to the governed data lake, listen to the full story on Informatica’s DataTALKS on-demand webinar. There’s also a link to a detailed white paper from CTI following the on-demand presentation.

First Published: Aug 02, 2020