A recent McKinsey survey on the state of AI in 2020 had some interesting insights. First, it confirms what we have been seeing over the last couple of years: specifically, how organizations are increasing their use of AI to both increase revenues and lower costs. In fact, the data from the study goes even further: it also points to clear evidence of a positive impact on value at the enterprise level. An impressive 22% of respondents reported at least 5% of their organizations’ EBIT was attributable to AI. It also confirms that the global pandemic accelerated the trend towards becoming more data driven. Growth in the use of AI is both an indicator as well as a beneficiary of this trend. The study found that 61% of high-performing companies had increased their investments in AI during the pandemic.
Interestingly, the same study points to some lurking challenges that may impede AI adoption amidst all the rosy indicators of value driven by AI. Respondents from companies who have adopted AI more aggressively also were more likely to report that they saw their models “misperform” during the pandemic. Of course, it’s not surprising that the organizations that are using AI more aggressively are also the ones more likely to report issues. But it’s important to note the underlying reason for this “misperformance”—that is, rapid market shifts during COVID invalidated underlying AI model assumptions, which in turn triggered the need to re-evaluate both the input data and the models. A recent Forbes article highlighted how COVID kicked off a flurry of model building and summarized an academic study of these models that discovered that all of the several hundred models they evaluated had “fatal flaws.” These flaws fell into one of two general categories: data (using small datasets that didn’t represent the population being studied) and transparency (or the lack of it, i.e., limited disclosure of information related to data sources, modeling techniques, and potential sources of bias). And not surprisingly, these findings highlight the paramount importance of data (its scale, quality, lineage, fit for purpose) and the need for transparency (visibility, shared understanding, explainability, performance against business metrics).
As the use of AI models becomes more pervasive across industries, in many domains these types of issues can have outsized impact on consumers and citizens, and ultimately erode trust in AI. With that in mind, let’s take a deeper look at both types of problems. As the CEO of Databricks, one of our partners, said at an Informatica World event, “The hardest part of AI isn’t the AI, it’s the data.” Too often, the issues arise as a result of not paying enough attention to the data that is being used across different stages of the AI model lifecycle like training, validation, and production. A recent Google paper on the importance of data quality for AI called out that “data is the most under-valued and de-glamorized aspect of AI.” Some of the more common data issues that seem to bedevil AI/ML initiatives are:
The good news is that many of these data problems can be addressed with data intelligence that empowers data consumers with the context and understanding required to ensure appropriate use of data. Organizations have to adopt modern data governance practices that enable the definition of policies that govern data quality and the appropriate use of data for different business needs, and the ability to automate the process of tracking and reporting on how we are performing against key business metrics. Even AI-centric data issues such as bias can in many cases be codified into business policies, and relevant data quality rules can be applied to the data and monitored over time.
But is data governance alone enough? What about other types of issues that impact AI model performance? Without getting into the details of how models are developed, some key considerations are:
While the nature and scope of these problems differs from what we outlined earlier for data, we can see common themes around what is needed to adequately manage these and similar issues: the need for documentation, the ability to connect the documentation to what’s being developed and deployed, the ability to set policies, the ability to track and monitor compliance with the policies, the ability to trigger actions or alerts when necessary, and the need to foster collaboration across different teams. These can be addressed through an appropriate governance framework, but clearly, we need a more comprehensive governance framework that goes beyond just data to encompass governance of AI models as well.
Given how critical the data is to model performance, you cannot govern AI models effectively unless you have an integrated solution for governing both the AI models and the data that fuels the models. Interested in learning more? Join us for our upcoming Cloud Data Intelligence Summit where you will be hearing diverse perspectives from analysts, customers and Informatica executives on the topic of agile data, analytics and AI governance in the cloud.
Jun 17, 2022
Jun 17, 2022