Why Data Management and Generative AI Are a Match Made in Heaven

Last Published: Aug 14, 2023 |

Dharma Kuthanur

Vice President, Product Marketing

Generative AI and large language models (LLMs) have taken the world by storm, emerging as hot discussion topics everywhere, from boardrooms to dinner tables. LLMs mark a fundamental shift because of their unparalleled ability to comprehend and generate human-like text with context awareness. This allows LLMs to perform various tasks, from language translation and sentiment analysis to code generation and creative writing. This ability to generalize knowledge and understanding across diverse domains could revolutionize numerous industries and reshape our interactions with AI technologies.

According to a survey conducted by KPMG, 77% of business leaders believe that generative AI will have the most significant impact on their businesses out of all emerging technologies. Additionally, 71% of these leaders plan to implement their first generative AI solution within the next two years.¹

And several industry leaders are already out of the gate with game-changing innovations. For instance, Morgan Stanley Wealth Management is leveraging OpenAI’s GPT-4 LLM to instantly put the know-how of the most knowledgeable person on their wealth management team into every customer’s hands.² In another example, Khan Academy is reinventing the future of teaching and learning with its Khanmigo virtual AI tutor.³

Approaches for Adoption of Generative AI

Technology vendors are competing to create new, better technology to meet demand. Soon after the launch of ChatGPT, OpenAI announced GPT-4, the next version of its LLM. Google introduced PaLM 2, and Meta released an open-source LLM called Llama-v2. There are also many start-ups backed by significant investments that are creating a variety of LLM-based products. General-purpose LLMs are becoming an important foundational tool for generative AI, much like public cloud services were important for the cloud computing era.

As enterprises start looking into how they can adopt LLMs to drive productivity and innovation, they are exploring the following approaches:

Prompt Engineering: The data team feeds the prompts into an LLM and iteratively refines the prompts to get logically coherent and reliable responses.
Fine-tuning LLMs: This describes how data teams can fine-tune an existing LLM with domain-specific data and knowledge to enable more accurate and appropriate responses based on business relevance and context.
Retrieval-Augmented Generation (RAG) with LLMs: Organizations can make even more effective use of domain-specific data, including structured data for AI, using a RAG approach.⁴ This involves retrieving relevant documents or data first, feeding that as contextual inputs to an LLM and then generating the response.
Custom LLMs: Training an LLM from the ground up to develop a custom LLM for the business.

For most enterprise applications, only the first three options are practical since the fourth option (developing a custom LLM) requires deep AI expertise and involves high costs for compute and training.

How Will Organizations Differentiate?

If all organizations rely on the same public LLMs offering similar capabilities, how will they stand out? The answer lies in their data and knowledge base. The effectiveness of prompts and fine-tuning depends on how well organizations use their data and knowledge base to enhance their LLM results. In the Morgan Stanley Wealth Management example referenced above, they have 300 employees who spend time testing and fine-tuning GPT-4 results to make their knowledge base available to all customers.

Combining LLMs with domain-specific data has immense potential to solve real-world problems and boost efficiency. According to the KPMG survey, 73% of senior business leaders believe that generative AI will enhance workforce productivity. However, organizations must overcome challenges to reach this goal. Senior leaders are well aware of the risks and challenges involved — 92% of them in the KPMG survey believe that generative AI implementation poses moderate to high-risk concerns.⁵

If your domain-specific data holds the key to driving value and differentiation, then it stands to reason that many of the biggest challenges and risks are also tied to your organization’s data. Some of the critical data-related questions you will need to tackle are:

Do you have the required quantity of domain-specific data to address your need for training and fine-tuning of LLMs?
Can you quickly integrate data from diverse sources and bring them into one place for analysis?
Does your training data have the required quality, or is it too noisy? Are you able to cleanse, curate and preprocess the data for training?
Is your training data free of bias? How will you monitor that data remains bias-free at each stage?
Do you have trusted, accurate data on key business entities such as customers, products and suppliers? Do you have unified views of these entities?
Can your data scientists and analysts easily find, understand and access the data they need for training and fine-tuning?
How will you ensure sensitive/private data is used appropriately? Is there a way to prevent unintentional “leakage” of sensitive data to public LLMs?
Do you have appropriate policy frameworks, processes and tools to ensure governance and compliance with policies and regulations like the EU AI Act?

Data Management for AI

Addressing these data challenges is critical for organizations to drive scalable and responsible adoption of generative AI. A recent BCG survey on responsible AI indicates that improving responsible AI awareness and measures requires changes in operations and culture.⁶ These changes take an average of three years to implement. So, companies need to get started on this journey to prepare for successfully adopting generative AI. A unified approach to data management that provides all the required data capabilities to address these challenges would set them up on a path to success. The key capabilities would include:

Data cataloging to enable easy discovery, classification and understanding of data and data lineage
Data integration to integrate data from diverse sources and build agile data pipelines
Data quality and observability to understand the quality and health of data at every stage across the data pipeline and identify and fix issues proactively
Master data management to have accurate, trusted, unified views of customers and other critical business entities
Data privacy tools to ensure sensitive data is used appropriately for training and model development purposes
Data governance frameworks and tools to monitor and manage data quality, privacy and compliance with policies and regulations
Data sharing capabilities to promote awareness and re-use of trusted data products and AI models

AI for Data Management

Managing data is becoming increasingly difficult as the amount and types of data grow. Large amounts of data are needed to train LLMs, and secure data pipelines are necessary for this process. Manual processes are not sufficient for managing data at scale. To address these challenges, organizations require automation powered by AI. This includes using AI copilots to automate tasks such as data classification, generating and applying data quality rules, developing data pipelines and managing master data through data matching. Generative AI can speed up data democratization and boost productivity by providing a natural language interface to data management. This technology has the potential to significantly improve data management by delivering a user-friendly interface for all data users to interact with their data.

What’s Next

Here are some resources to help you prepare to adopt generative AI successfully:

The cloud data management conference of the year is going on the road. Collaborate and learn about the latest industry trends and innovations in cloud data management at Informatica World Tour, coming to a city near you.

Get an early look at how you can transform your organization’s data management experience. Sign up for a private preview of CLAIRE GPT, an AI-powered data management tool that uses generative technology.

¹KPMG survey, Generative AI: From buzz to business value
²Forbes, How Morgan Stanley is Training GPT to Help Financial Advisors
³Khan Labs, Khanmigo virtual AI tutor
⁴LinkedIn, Leveraging Generative AI and Retrieval-Augmented Generation for Enterprise Data Integration, Thomas Schweitzer
⁵KPMG survey, Generative AI: From buzz to business value
⁶BCG, Responsible AI at a Crossroads

First Published: Aug 14, 2023

Get fast, free, frictionless data integration

DFW: Transforming travel with the power of data

Inside Paycor’s AI journey with Informatica

Frontier's cloud modernization & AI journey