4.Data Transformation With Matillion
Data transformation with Matillion is a key part of the ELT (Extract, Load, Transform) process. Instead of transforming data before loading, Matillion leverages the power of cloud data warehouses to perform transformations directly within the target system. This is done by building visual transformation pipelines that clean, combine, and manipulate data to prepare it for analysis.
Building a Transformation Pipeline
A transformation pipeline in Matillion is a graphical workflow that defines the steps to process your data. Here is a step-by-step guide to building one:
🡆Start a new pipeline: In the Matillion designer, navigate to the Pipelines pane and click the Add button to create a new transformation pipeline. Give it a descriptive name to identify its purpose .
🡆Add a Read component: Every pipeline must start with a component that reads data from a source. A common choice is the Table Input component, which pulls data that has already been loaded into a source table in your data warehouse .
🡆Transform the data: Drag and drop various transformation components from the Components pane onto the canvas to manipulate the data. These components are the building blocks of your data cleaning and enrichment process .
- Example: To combine the first_name and last_name columns into a single full_name column, you would add a Calc component and define a formula.
- Example: To find and remove duplicate customer records, you might use a Rank component to assign a rank to rows and then a Filter component to keep only the highest-ranked row for each customer .
🡆Validate and review: At any point in the pipeline, you can use the Data Sample feature to see how your data looks after the transformations have been applied . This helps you verify that the transformations are working as expected before finalizing the process.
🡆Save the results: Once all transformations are complete, you must add a Write component, such as Write Table, to save the processed data to a new table or view in your data warehouse .
🡆Run or integrate: You can run the pipeline manually with the Run button or integrate it into a larger orchestration pipeline using the Run Transformation component .