It’s undeniable that moving to the cloud has become a business imperative. That’s why we’re excited to share that Informatica PowerCenter customers can now modernize their on-premises data warehouse to Databricks using the Informatica PowerCenter Modernization Solution.
This solution includes automation tools that help migrate PowerCenter data pipelines to the Informatica Intelligent Data Management Cloud (IDMC) and repoint them to Databricks Delta. Over 90% of existing PowerCenter mappings, mapplets, sessions and workflows can be migrated automatically with modern integration patterns on IDMC and Databricks. Our customers have reduced their time-to-value by eight times and lowered the cost of migration by 50 percent.
Databricks and the Informatica Intelligent Data Management Cloud Together
Traditionally, Informatica PowerCenter customers have used classic data warehousing practices and load patterns. These worked well with on-premises data warehouses. However, as customers modernize to Databricks, they typically prefer using the data lakehouse pattern recommended by Databricks. Moreover, they want to use Delta Lake features with a medallion architecture and with structured and semi-structured data.
Such customers also prefer using extract, load, transform (ELT) flows for transformations within the data lakehouse. The Informatica Intelligent Data Management Cloud (IDMC) has been architected from the ground up to support such modern features and patterns. At the same time, it supports the initial as-is migration of traditional data flows before customers start optimizing to take advantage of new features. It provides a “best of both worlds” experience — you can connect to Databricks while keeping your existing data flow design as-is at first so the change can be easily validated, and then you can optimize it to take advantage of Databricks recommendations.
IDMC data management features allow users to use their existing extract, transform, load (ETL) data pipelines with Databricks. Or they can switch to using ELT data pipelines with the IDMC cloud mass ingestion service and SQL ELT features that push down data pipelines and transformations to run natively in Databricks SQL.
IDMC also offers comprehensive enterprise data governance for Databricks — including data profiling, data quality and data cataloging — integrated with Databricks Unity Catalog for a comprehensive platform to deliver trusted data to customers. IDMC also simplifies and accelerates complex data management requirements with artificial intelligence (AI) by leveraging the power of CLAIRE, the AI engine in IDMC.
The Benefits of the Informatica Cloud Modernization Program
Using the Informatica cloud modernization program to migrate to Databricks offers you unique advantages. Informatica understands PowerCenter data pipelines and how to map those to IDMC at a very granular level. Our automation tools have built-in, comprehensive feature mapping to make sure that the properties, overrides and parameters configured in your PowerCenter assets are analyzed and mapped to either a matching feature in IDMC or something equivalent. This not only preserves your existing business logic implemented in PowerCenter mappings after migration, but it also frees you from trying to account for everything you have built over the last several years.
The modernization process involves first assessing your existing PowerCenter assets to determine how much of your migration can be automated. Then, you need to apply a minor patch on your setup to link the PowerCenter domain to IDMC. Once that’s done, you will be on a cloud data integration setup that’s compatible with PowerCenter. In addition, you will get access to self-service tools that allow you to migrate and repoint PowerCenter assets (mapping, sessions, workflows, etc.) to IDMC and Databricks. Each of these steps is described in detail below.
Assessment of PowerCenter Install
The assessment service uses various checks to understand the nature of your PowerCenter mappings (including mapplets), tasks of various types and workflows/worklets. It analyzes the use of various design time configurations — such as reusable objects, shortcuts, etc. — and runtime configurations — such as connections, parameters, overrides in sessions, etc. With this, it will determine how many of your assets can be migrated to IDMC automatically. In most cases, Informatica can automate the conversion of 90% of the assets or more. The remaining 10% or less are usually outliers that need specific review and action by your team or the implementation team.
Reports are generated at a summary level and also at an artifact level to identify the migration coverage so you know exactly which data pipelines can be automatically converted and which data pipelines, if any, cannot be converted automatically. They will provide a comprehensive reference for you or any team you are working with to plan the actual migration according to your desired timeline.
Applying Patches for Cloud Data Integration for PowerCenter
The next step is to apply specific patches on top of your existing PowerCenter installation to connect it to IDMC. There are major benefits of doing this, including:
- The patch shifts the control of your environment to IDMC. Now, your domain and administrator console can be configured from IDMC.
- The patch makes the IDMC services needed to repoint your mappings to Databricks and then validate the new data pipelines in IDMC before switching to production available.
- Further patches or upgrades required in this environment are now performed by IDMC automatically.
Then, you can choose to migrate your assets on your own timeline using the self-service tools available on Cloud Data Integration for PowerCenter.
PowerCenter to Cloud Modernization Service
Once you’re on Cloud Data Integration for PowerCenter, you can plan and migrate your assets to Databricks. You can either do it yourself, sign up to work with the Informatica Migration Factory team or work with the approved implementation partner of your choice. You can do this as a major project to migrate to a logical functional area all at once, or you can spread the work over weeks/months at a pace that works for you.
Key features of the modernization service include:
- Deep metadata insights with pattern recognition and recommendations
- Conversion of PowerCenter assets (mapplets, mapping, sessions and workflows) to equivalent assets in IDMC
- Auto repointing from your existing data warehouse to Databricks
- 90 percent or higher automated conversion
- Test data generation and validation
- CLAIRE-based recommendations for most suitable patterns with Databricks
Cloud Data Validation
After the assets are migrated, most customers prefer to run the “before” workflows (PowerCenter with an on-premises data warehouse) and the “after” workflows (IDMC with Databricks) to ensure the functionality is intact. With Cloud Data Validation, you can:
- Compare data sets side-by-side
- Identify data differences and anomalies — missing records, mismatched values, etc.
- Use exact or fuzzy matches
- Take endpoint differences into account (for example, date formats in the on-premises data warehouse vs. those in Databricks)
- Perform and review summary and detailed reporting
Data validation becomes a vital step after migration, and it can significantly speed up your testing process to ensure new data pipeline outputs are consistent with the original mapping outputs.
At this point, it’s beneficial to analyze the flows from an ELT perspective. If your mappings can be run in SQL ELT (advanced pushdown) mode, Informatica converts the mapping logic into Databricks SQL and runs it on Databricks SQL Endpoint. This is done by default if the original mapping in PowerCenter was configured for pushdown. But you can also turn it on for your mappings in general. At runtime, Informatica first attempts to push it down. If it cannot push it down for some reason, it runs it using the default Informatica runtime. It is recommended to do this when you are performing functional validation so the difference in behavior can be addressed.
In addition, at this stage you can revisit your assets on IDMC and consider using modern patterns for some of those. This can involve:
- Using ingestion pattern for extraction of data with Cloud Mass Ingestion
- Considering additional mappings to tweak the design to support SQL ELT (pushdown) to Databricks. Your mappings that are processing data between supported source and target combinations are candidates for SQL ELT to Databricks. Refer to the SQL ELT section in the Informatica Databricks connector for more information on this. Using Databricks Delta SQL for the transformation of data can eliminate data egress, and processing is also faster with the compute power of Databricks.
- Exploring IDMC functionalities, such as advanced mappings, advanced serverless and application integration, and the vastly expanded connectivity available on IDMC.
With the Informatica Cloud Modernization Service, PowerCenter customers can easily modernize their existing data warehouse to Databricks and their existing classical data flows to use modern integration patterns. By automating over 90 percent of the migration process, the Informatica Cloud Modernization program with Databricks significantly accelerates and de-risks modernizing to a state-of-the-art, enterprise-grade, cloud-based data and analytics platform.
To get started:
- Learn more about PowerCenter cloud modernization by visiting: https://www.informatica.com/platform/powercenter-cloud-modernization.html
- Or delve further into the partnership between Informatica and Databricks here: https://www.informatica.com/solutions/explore-ecosystems/databricks.html