In your cloud-based analytics journey, you’re likely using ETL (extract, transform, load) or ELT (extract, load, transform) to move data from your transaction-based business systems (such as ERP or CRM) to a cloud-native data lake and/or data warehouse. If so, I’m sure you’re also looking for ways to save time, improve performance, and reduce costs.
Whether you use ETL or ELT for your data pipeline, you need to transform your data at some point to make sure even the transactional data is analytics ready. The key difference with ETL is that extraction and transformation happen before loading the data into the data warehouse. With ELT, data is loaded into a data lake or data warehouse staging area first, and then some transformation logic is used to push the transformed data into the data warehouse.
Transformation of data is a vital step, regardless of whether you choose ETL or ELT. The data you extract is in a state that is either optimized for transaction updates or designed for object-oriented representation of entities, which is often not useful for analytics consumption. That said, chances are that you have been using some (oftentimes very compute-intensive) transformation logic to remodel your transactional data in a state that data scientists, ML engineers, and data analysts can use.
This is where advanced pushdown optimization can help.
Advanced pushdown optimization (APDO) is a cost and performance optimization technique where, instead of running the transformation logic inside the third-party ETL engine, the logic is translated into SQL or cloud ecosystem-specific commands and pushed down to the cloud data warehouse. This eliminates the time-consuming tasks of pulling it out of the cloud infrastructure, transforming it, and then moving it back in the cloud again. And the reduced overall processing time drops the data egress cost, resulting in a significant cost-benefit for the customer.
Not to mention, cloud is here to stay. In a recent Accenture survey, 90% of business and IT executives agree that to be agile and resilient, their organizations need to fast forward their digital transformation with cloud at its core1. We believe that using APDO can help you succeed in your cloud-based data management initiatives.
So, how does APDO work?
Let’s consider the following use case, where data is being transported from Azure Data Lake Storage Gen2 (ADLS Gen2), the source, to Snowflake, the target. One way to approach this is to extract and pull the data from ADLS Gen2 to a third-party ETL engine, perform all transformations, and then move (again) the data back to the data warehouse. This requires a two-way movement of data from and to your cloud infrastructure, resulting in significant network and data I/O costs.
With APDO, all the underlying transformations get translated into ecosystem-specific commands and SQL statements, and then get executed in the underlying cloud infrastructure itself. Here, you do not have to move the data out of the cloud infrastructure to transform the data.
Below are the sample SQL statements and mapping that got triggered behind the scenes and executed within the cloud platform itself:
For a deeper dive on how APDO helps lower costs and speed up data processing, check out this short video.
While cost reduction due to zero data egress remains one of the major benefits of using APDO, there are other advantages as well:
There are two major use cases for APDO. Both allow you to maximize the value of your existing investment in a cloud data warehouse by harnessing its scalable processing power:
1. Ecosystem pushdown: This covers the data movement and transformation between a data lake and a data warehouse. In Figure 4, Informatica reads the data from data lakes such as Amazon S3 or ADLS and uses cloud ecosystem commands to move data into the cloud data warehouse.
2. Cloud data warehouse pushdown: In this use case, data movement happens within a cloud data warehouse such as Amazon Redshift, Azure Synapse, Google BigQuery, or Snowflake. As shown in Figure 5, the entire transformation logic gets translated into SQL statements and executed on the cloud data warehouse.
If you are currently using Informatica Cloud Data Integration or want to bring scalable, cost-efficient, productive integration to your data architecture, APDO is essential for meeting your cloud data management needs.
Ready to dive in? Get started with a 30-day trial.