Data transformation is the process of taking data that exists in one format or state and converting it into a different format or state. Transformation is the middle step in the extract, transform, and load (ETL) process — and the final step in the extract, load, transform (ELT) process.
Data can serve many purposes. Data is transformed to suit your needs. When you convert raw data from its source — from locations like customer transactions, files or databases — you make that data more usable. Once you’ve transformed data, you can do other things with it. For example, you can use transformed data in analytics to gain trustworthy and actionable business intelligence, verify data quality by enforcing specific date ranges or to simply run business functions. You can even apply mathematical functions within data transformation for machine learning or data science purposes.
To achieve the greatest value from your data, you need to normalize or standardize it to be fit for your needs. Transformations can be active (it will modify the number of rows) or passive (the number of rows is not changed, data is only modified at the row level), and can include cleansing, consolidation, code lookups and general calculations. Some common types of data transformation include:
The Aggregator transformation performs aggregate calculations, such as averages and sums. The integration service performs aggregate calculations as it reads and stores data group and row data in an aggregate cache. The aggregator transformation is an active transformation.
You use a Data Masking transformation to change sensitive production data to realistic test data for non-production environments. The Data Masking transformation modifies source data based on masking rules that you configure for each column. This type of transformation lets you create masked data for software development, analytics, testing, training and data mining. You can maintain data relationships in the masked data and maintain referential integrity between database tables. The Data Masking transformation is a passive transformation.
Use the Expression transformation to calculate values in a single row. For example, you might need to adjust employee salaries, concatenate first and last names or convert strings to numbers. You can also use the Expression transformation to test conditional statements before you pass the results to a target or other transformations. The Expression transformation is a passive transformation.
Use the Joiner transformation to join source data from two related heterogeneous sources residing in different locations or file systems. You can also join data from the same source. The Joiner transformation joins sources with at least one matching column and uses a condition that matches one or more pairs of columns between the two sources. The Joiner transformation is an active transformation.
Use the Filter transformation to filter out rows in a mapping. As an active transformation, the Filter transformation may change the number of rows passed through it. The Filter transformation allows rows that meet the specified filter condition to pass through. It drops rows that do not meet the condition. You can filter data based on one or more conditions. The Filter transformation is an active transformation.
A filter condition returns TRUE or FALSE for each row that the Integration Service evaluates, depending on whether a row meets the specified condition. For each row that returns TRUE, the Integration Services passes through the transformation. For each row that returns FALSE, the Integration Service drops and writes a message to the session log.
Use a Lookup transformation in a mapping to look up data in a flat file, relational table, view or synonym. You can import a lookup definition from any flat file or relational database. You can also create a lookup definition from a source qualifier. You can use multiple Lookup transformations in a mapping. Although the Lookup transformation can be an active or passive transformation, it is mostly a passive transformation.
You can select only the top or bottom rank of data with a Rank transformation. The Rank transformation is an active transformation. Use a Rank transformation to return the largest or smallest numeric value in a port or group. You can also use a Rank transformation to return the strings at the top or the bottom of a session sort order. During the session, the Integration Service caches input data until it can perform the rank calculations.
The Rank transformation differs from the transformation functions MAX and MIN, in that it lets you select a group of top or bottom values, not just one value. For example, use Rank to select the top 10 salespeople in a given territory. Or, to generate a financial report, you might also use a Rank transformation to identify the three departments with the lowest expenses in salaries and overhead. While the SQL language provides many functions designed to handle groups of data, identifying top or bottom strata within a set of rows is not possible using standard SQL functions.
You connect all ports representing the same row set to the transformation. Only the rows that fall within your configured rank measures pass through the Rank transformation. You can also write expressions to transform data or perform calculations.
As an active transformation, the Rank transformation might change the number of rows passed through it. You might pass 100 rows of data to the Rank transformation but choose to rank only the top 10 rows and pass them from the Rank transformation to another transformation.
A Router transformation is similar to a Filter transformation because both transformations allow you to use a condition to test data. A Filter transformation tests data for one condition and drops the rows of data that do not meet the condition. However, a Router transformation tests data for one or more conditions and gives you the option to route rows of data that do not meet any of the conditions to a default output group. The Router transformation is an active transformation.
The Union transformation is a multiple input group transformation that you use to merge data from multiple pipelines or pipeline branches into one pipeline branch. It merges data from multiple sources like the UNION ALL SQL statement to combine the results from two or more SQL statements. Similar to the UNION ALL statement, the Union transformation does not remove duplicate rows. The Union transformation is an active transformation.
You can add an XML Source Qualifier transformation to a mapping by dragging an XML source definition to the Cloud Mapping Designer workspace or by manually creating one. When you add an XML source definition to a mapping, you need to connect it to an XML Source Qualifier transformation. The XML Source Qualifier transformation defines the data elements that the integration service reads when it executes a session. The XML Source Qualifier transformation is an active transformation. You can also apply XML parser and XML generator transformations.
The Normalizer transformation receives a row that contains multiple-occurring columns and returns a row for each instance of the multiple-occurring data. The transformation processes multiple-occurring columns or multiple-occurring groups of columns in each source row. The Normalizer transformation is an active transformation.
To convert hierarchical data models to relational database you can use H2R transformations and change relational data to hierarchical data apply R2H transformation. H2R transformation is predominantly used to convert the XML JSON hierarchical input to relational outputs.
The answer depends on your priorities and resources. A dedicated cloud data integration environment offers versatility, scalability and high availability. But if your data is already in the database, your cloud data warehouse may be the better option because then you can limit data movement.
ELT allows you to do the transformations using the compute resources of the cloud data warehouse – without having to move the data out of it.
Transforming data after uploading it to modern cloud ecosystems is most effective for:
The ELT process improves data conversion and manipulation capabilities due to parallel load and data transformation functionality. This schema allows data to be accessed and queried in near real time.
However, you might want to stick with ETL if your business collects data from multiple source systems, or the data is in dissimilar formats or you have dirty data: duplicate, incomplete or inaccurate data will require data engineers to clean and format prior to data loading.
Most people understand that better data will result in better business outcomes. The reverse is also true. A recent survey by Experian found that 95% of organizations believed poor data quality hurts their business.
Digital transformation has put data at the center of every organization.
There’s been an explosion in data over the past few decades. Costs for storage and processing have gone down dramatically. On top of that, the regulatory focus on data quality, policy and governance is higher than ever. For these reasons and more, enterprises have reshaped their business models to harness the great potential of a core business asset: their data.
Successful data transformation lets businesses extract all kinds of data, from everywhere. They can take data — from the cloud, mobile, streaming, IoT, social data, or others — and use it for the good of their business. Feed high-quality, transformed data into different applications to drive better decision making. Streamline operations with machine-to-machine communication free of bugs due to dirty data. In fact, the total reimagining of business in the digital age revolves around data transformation. And what is that but digital transformation?
By transforming your data, you can improve business processes across the entire enterprise. This allows you to:
The scale, automation and trust required by today’s modern enterprise can only be achieved with AI/ML capabilities. AI/ML requires that you normalize or transform your data for one source of truth. Five key reasons organizations transform their data:
Many organizations have achieved dramatic business success with their data transformation efforts. Here are a few examples:
Businesses, markets and technologies evolve and change over time. Your one constant for a sustainable competitive advantage is data. That's why we help you transform it from simply binary information to extraordinary innovation.
Whether you’re driving next-gen analytics, delivering perfectly timed customer experiences or ensuring governance and privacy, you can always know your data is accurate, your insights are actionable and your possibilities are limitless.
Want more information on data transformation and how Informatica can help? Start with these resources: