Accelerate your data engineering pipeline development on Databricks and govern your Delta Lake with Informatica’s Intelligent Data Management Cloud.
Informatica recently announced advanced ingestion and integration capabilities for Databricks Delta with its summer 2021 release. The new capabilities help data analysts and data scientists move large amounts of data into Databricks Delta Lake for AI, machine learning, and data science projects.
Delta Lake is an open format storage layer that delivers reliability, security, and performance on your data lake — for both streaming and batch operations. By replacing data silos with a single home for structured, semi-structured, and unstructured data, Delta Lake is the foundation of a cost-effective, highly scalable lakehouse.
Without fast access to accurate and prepared datasets, data teams are challenged to build accurate AI and ML models. In addition, inaccurate or incomplete data can skew results and undermine confidence in AI and ML projects.
If you are a data scientist or data analyst who deals with petabytes of data and performs complex analysis on top of it, by now you will have heard of Databricks Delta. With the availability of open, cost-effective unified cloud data infrastructure, businesses can access all data within their organization. This data is useful for a variety of downstream applications, but we’d like to focus on two types of data consumers – data analysts and data scientists who are developing new machine learning models to reliably forecast important business insights.
One of the most important requirements for data scientists and data analysts solving business-critical problems is to have access to accurate, curated, and reliable data. Databricks Delta plays a critical role filling this demand with its multi-cloud presence and support for ACID (atomicity, consistency, isolation, durability) transactions. In an era where machine learning is poised to disrupt any industry, the data lakehouse is a modern data management architecture that dramatically simplifies enterprise data infrastructure and accelerates innovation.
Previously, organizations used to isolate reporting and ML modeling use cases, so separate data lake and cloud data warehouse deployments were common. However, overlap between reporting and ML modeling has brought the two categories together. It is now more advantageous to source data for both use cases from a single data lakehouse, such as Databricks Delta.
To build a data lakehouse with Delta, organizations need to figure out how to bring in data from on-premises or legacy applications. They also need to address important questions such as:
There are many options available to move your data to the cloud. But each option comes with its own advantages and limitations. It’s wise to pick a solution that is proven, industry-leading, comprehensive, and easy to connect to all possible data sources. At Informatica’s summer 2021release, we announced a new connector for Databricks Delta that helps enterprises to build reliable data lakehouses on Delta using Informatica’s Intelligent Data Management Cloud.
Informatica’s latest integration with Databricks Delta, a self-service Intelligent Data Management Cloud (IDMC), allows users to ingest the data at scale onto Delta with a wizard-driven experience and then transform and enrich the data in Delta at scale using out-of-the-box transformations and functions.
Faster access to accurate and prepared datasets is critical for enterprise analytics to deliver better business outcomes. Informatica and Databricks partnered to provide a scalable data and machine learning solution with faster data discovery, ingestion, and preparation that accelerates development and increases model accuracy.
The Informatica partnership with Databricks brings together the best of both platforms. The following capabilities help organizations rapidly build best-in-class AI, ML, and analytics projects to drive meaningful business insights.
The joint Informatica and Databricks solution enables organizations to build and iterate machine learning models faster to address rapid go-to-market demands. As you can see from below reference architecture, the Informatica and Databricks joint solution seamlessly accelerates data engineering pipelines for AI and analytics.
With its summer 2021 release, Informatica is providing new connectivity for Databricks Delta that helps customers source data from Delta tables in their Informatica mappings. With these new capabilities, you can easily ingest data from various cloud and on-premises sources—whether applications, databases, files, streaming, or IoT—and move this data into the Delta Lake. We’ve simplified the UI to the point where it’s very easy to configure: just few steps, and you’re up and running, all set to move data into Delta Lake at a large scale.
Data scientists and data analysts can now broaden use cases and run complex AI and ML models and analytics predictions with more data into Delta.
Informatica’s Intelligent Data Management Cloud also provides Amazon S3 and Microsoft ADLS Gen2 connectivity that may be used to enrich complex files before used by Databricks Delta to create tables. Check out more information on the Informatica S3 connector.
The new Informatica’s Intelligent Data Management Cloud Databricks Delta connector helps customers build new integration pipelines with a user-friendly Informatica GUI. Some of the key features of this connector include:
With Informatica’s new Databricks Delta integration capability, our joint customers can uncover meaningful insights using AI and machine learning at scale.