Realize the Benefits of AI with the Right Framework
In the current economic environment, organizations are focused on KPIs and metrics that matter today and in the near term. They are adopting analytics in the cloud, many by building cloud lakehouses, which blend the best of both worlds in cloud data warehouses and data lakes, enabling them to handle any type of data (structured, semi-structured, and unstructured), coming at any speed (batch and real-time), and in petabyte-scale volumes. Analytics modernization in the cloud offers potentially big benefits: cost savings, supply chain optimization, increased productivity, improved operational efficiency, and more. Similar potential benefits surround more advanced predictive analytics, AI, and ML projects. But AI/ML is hard and new to many organizations. In these times, project success is paramount, and it requires the right framework.
According to Gartner, “A litmus test of organizations’ maturity is how quickly and repeatedly they can get these AI systems into production. Our surveys are showing that organizations are not managing to do this as quickly as they had hoped. The result is an organizational schism, given the high expectations executive boards have regarding the transformative power of AI.” In effect, AI benefits can only be realized when you can productize your machine learning models successfully, on time, and at scale. Hence, the need for machine learning operations (MLOps).
What is MLOps?
"MLOps is the process of operationalizing your machine learning models."
But it needs a framework. When AI/ML projects lack a framework and architecture to support model building, deployment, and monitoring – they fail. To succeed, you need collaboration between data scientists and data engineers for automating and productizing machine-learning algorithms.
DataOps provides a way to operationalize your data platform by extending the concepts of DevOps to the world of data. Extending DevOps, DataOps is built on a simple framework of CI/CD: continuous integration, continuous delivery, and continuous deployment. When you extend this framework further with on an onramp of a data marketplace, you get a solid framework that is MLOps.
The MLOps Framework
There are five steps that form the framework for successful MLOps.
- Understand your business KPIs. Business understanding is the first and the most defining step in the process. This means you have clearly defined OKRs (objectives and key results) and KPIs (key performance indicators). While this is primarily a non-technical phase, Informatica Axon Data Governance can help facilitate collaboration between data stewards and subject matter experts around relevant systems and data for your ML project.
- Acquire data. During the data acquisition phase, data scientists and data engineers collaborate on discovering the data needed for machine learning, ingesting and integrating it into the cloud data lakehouse, ensuring the data quality rules are applied, and data is prepared and ready for modeling. Informatica Enterprise Data Catalog (EDC), Cloud Mass Ingestion (CMI), Data Engineering Integration (DEI), Data Engineering Quality (DEQ), Data Engineering Streaming (DES), and Enterprise Data Preparation (EDP) ensure the data pipelines are processed and made available in the data scientist's workspace.
- Develop ML models. Model development is the core of the MLOps framework. This is where the data scientist shifts his role from advisory and approver to driver. With OKRs and KPIs defined and the datasets ready, the data scientist can leverage his expertise in model development, which is usually iterative until expectations are met. Part of the iterations may involve collaborating with data engineers and data stewards on more data. The Informatica Data Engineering portfolio integrates with ML development tools and processes to support this step.
- Deploy ML models. With data pipelines established from the data acquisition phase, the data engineer integrates the ML model developed by the data scientist and validates it against the production data. This involves further validating OKRs and KPIs. The data pipeline is then deployed into production with DataOps teams for continuous use and monitoring. Informatica Data Engineering Integration helps customers deploy their ML model in the production pipeline.
- Monitor ML models. During the model monitoring phase, a deployed pipeline is integrated with a metrics monitoring mechanism. The DataOps team can then monitor the pipeline metrics, ensuring continued value and increasing confidence in ML. Using Informatica Data Engineering Integration, customers can establish processes across the software development lifecycle. Data Engineering Quality can be used to track profiles and business metrics.
A systematic approach as laid out in the MLOps framework above is essential to the success of your data science use cases. The Informatica Data Engineering product portfolio provides end-to-end functionality for MLOps.
 Gartner, “Predicts 2020: Artificial Intelligence — the Road to Production,” by Anthony Mullen, Saniye Alaybeyi, Van Baker, Arun Chandrasekaran, Alexander Linden, Magnus Revang, Svetlana Sicular, 2 December 2019