We’ve seen a tremendous amount of innovation in data and analytics over the last 20 years. From Data 1.0, where data was limited to bespoke applications. To Data 2.0, where data was unlocked and brought in for enterprise reporting, data warehouses, and data marts. To Data 3.0, where organizations started innovating with broad-based data-driven transformations.
But today we have Data 4.0. Data 4.0 data is cloud-native, cloud-first, and driven by enterprise wide, enterprise-scale intelligent and automated data management. The focus is on delivering trusted data for trusted insights to a broad range of users. In fact, at its heart, Data 4.0 is about the true democratization of data.
This golden era of data democratization requires trust. That’s accepted now. A recent study found that 81% of executives rate data as very critical to their businesses' outcomes, and 76% of CFOs agree that having a single version of the truth is essential.
Customers agree. Almost three fourths (69%) of global consumers will boycott a company that does not take data protection seriously. Even more (84%) are more loyal to brands with strong data practices. Not to mention the billions in dollars in fines that are levied each year – $6 billion to date and growing – onto organizations that failed to practice effective data privacy. So it’s critical to have trustworthy data.
In this blog, we’ll explain how the worthy goal of trusted data democratization is achievable only with strong, scalable data governance and privacy. In fact, governed data democratization is what every enterprise needs to strive for today. And the only way to get there is through AI- and ML-powered intelligent automated data governance and privacy.
It’s important to note that when I talk about offering access to data to data consumers, I’m not just talking about technical users, but also nontechnical users within lines of business who need easy access to enterprise data they’d previously not known about or been denied.
The numbers of these users are growing. The population of data consumers within enterprises has grown by 500% – that’s fivefold – over the last few years.
The good news is that the data that all these users need to do their jobs exists. However, it needs to be cataloged and connected – and put in context, of course – within all your line of businesses as well as enterprisewide to get full value from it. That means connecting finance data to sales data, and marketing data to IT data, and to R&D and legal and back again.
By connecting all of data from all these lines of business, you can build a data knowledge graph that helps you understand all the various domains of data throughout your organization. This in turn allows you to learn your data language across customers, partners, orders, inventory, employees, suppliers, products, and so on. You can then forge a new self-service model that drives true data democratization across your entire population of users.
This is good for the business. In fact, 80% of organizations who are innovation leaders possess strong data-sharing practices both internally and beyond company borders.
But having said all this, we come full circle: how do your users know that the data they can now access is trustworthy? And how can data stewards be sure that the requested use of the data would comply with external regulations and internal policies? Because the fact is, the practice of data governance is still maturing. Eight out of 10 enterprises feel that strong data governance is critical for successful business outcomes, yet 42% of them don’t assess, measure, or monitor governance in any meaningful way. And only 20% are expected to succeed in scaling their governance programs to support digital business practices.
Clearly, a lot of work still needs to be done on the data governance front.
We define governed data democratization as a holistic approach to managing data that spans the governance teams and all data stakeholders, as well as the policies and rules they create, and the metrics they measure success by.
Governed data democratization allows you to deeply understand your data estate and to link all the policies and controls that apply to it. You do this enterprisewide, across all lines of business. In short, governed data democratization drives a trusted, 360 view of the business.
Governed data democratization is how you put into place the necessary privacy policies to ensure that you maintain consumer trust and at the same time make sure that your organization is strictly in compliance with both external regulatory mandates and internal privacy policies.
And it's on this foundation of governed data democratization that you deliver the right data to the right consumers with the right quality and the right degree of trust.
To achieve governed data democratization, you need intelligent automation. It’s the only way to practically deliver on the vision and scale operationally.
At Informatica, intelligent data governance is delivered through the CLAIRE engine, the AI- and ML-powered metadata intelligence that powers our Intelligent Data Platform.
So what is CLAIRE?
It starts with the ability to scan metadata from all different sources. You name it, we scan it: code or scripts, big data platforms, ETL tools and integration platforms (like DataStage, Oracle Data Integrator, or Talend DI), or business intelligence platforms like Tableau, Power BI, Cognos, and QlikView. CLAIRE spans all file formats, from unstructured to complex hierarchical, to relational data sources. And it’s inclusive of on-premises databases, cloud platforms, and many, many others.
Using this metadata, we train the most comprehensive set of AI and ML data governance algorithms in the industry. These algorithms automate business-rule translation into data quality rules, identify column similarity across thousands of tables, intelligently tag and influence search rankings, perform mass data corrections, do mass data-quality assessments, and even understand data-domain inference. The list goes on and on.
And it's on the foundation of CLAIRE that we are able to deliver the knowledge graphs for metadata I mentioned earlier – the knowledge graphs that give you a better understanding of suppliers, customers, partners, and locations, as well as of the data language used across the various data domains.
This allows you to automatically relate business and technical domains, use Natural Language Processing (NLP) to infer and measure data quality from business rules, and associate data governance policies to dynamically monitor risk of sensitive data.
You can automatically discover data domains, identify them, auto tag them, as well as shine light on black boxes and discover hand coding, lineage, and how data is moving through the various stages of various data pipelines.
We do all this transparently in a business-relevant context to empower the lines of business to understand the quality and context of their data. And it's on this foundation of what we can do for data quality that we also ensure data privacy and protection.
That’s because, as we're scanning for quality and identifying data domains, we're also discovering personally identifiable information (PII) as well as other kinds of sensitive data. We highlight how that sensitive data is moving through the organization to empower you to understand how data is consumed, and how to protect it when and where it needs to be protected.
Once you’ve used CLAIRE to achieve intelligent data governance, you can go even further. CLAIRE helps you curate datasets and to democratize them by publishing them in a governed data marketplace.
Such a marketplace allows data consumers from within your organization to shop for data much like they would shop for goods on Amazon. They can browse different categories of data, and in doing so, discover data assets that they may not have previously known about. They can click on an asset, view its quality ranked across various quality dimensions, and also see what privacy policies apply to it. If interested they put it in a “shopping cart” and request approval to access it at “checkout.” Once approved through automated workflows, the consumer is provisioned with the dataset, and can use it to run analytics, whether directly in Tableau or within an S3 bucket, for example. All transparent, all trusted.
Now, that is true governed data democratization. With intelligence and automation. Powered by CLAIRE.
Join us for the Intelligent Data Summit for Data Governance and Privacy on Informatica Live. By registering, you’ll gain access to on-demand sessions from our recent summits on AI-Powered Innovation; Cloud Data Warehouses, Data Lakes, and Lakehouses; and Business 360.