4.3 Handling Nulls, Data Types, and Formatting
Matillion is an ELT tool, so handling nulls, data types, and formatting is done primarily through SQL functions within transformation components, leveraging the power of the cloud data warehouse .
Handling Data Types
While Matillion often performs automatic type conversion, you can use the Convert Type component for explicit casting .
🡆Convert Type Component: This component provides a graphical interface to explicitly change a column’s data type, such as converting a VARCHAR to an INTEGER. It also allows you to define the size, precision, and scale for numeric or text types to prevent data truncation .
🡆Type Casting with SQL: For more granular control, you can use the CAST() function directly within a Calculator component
🡆Example: To convert a text column containing numbers into an integer, you would use CAST(my_text_column AS INTEGER).
Handling Null Values
Nulls represent missing or unknown values and are not the same as an empty string or a zero.
🡆Replacing Nulls: The COALESCE() function is a common way to handle nulls. It returns the first non-null expression in a list .
🡆Example: To replace null values in a customer_rating column with “N/A,” you would use the expression COALESCE(customer_rating, ‘N/A’) within a Calculator
🡆Filtering Out Nulls: You can use a Filter component to remove rows where a specific column is null .
🡆Example: To remove all sales transactions that do not have an order_id, you would set the filter condition to order_id IS NOT NULL.
🡆Complex Logic: For more complex conditions, you can use a CASE statement in a Calculator component.
🡆Example: To categorize a customer’s feedback as ‘Missing’ if it is null or ‘Found’ if it has a value, you would use the expression CASE WHEN my_column IS NULL THEN ‘Missing’ ELSE ‘Found’ END.
Formatting Data
Formatting data involves standardizing how data is presented, such as capitalizing text or rounding numbers. This is done using SQL functions in a Calculator component .
🡆Formatting Strings: Functions like TRIM(), UPPER(), and LOWER() are used to clean and standardize text data.
🡆Example: To remove extra spaces and standardize all product names to uppercase, you would use UPPER(TRIM(product_name)).
🡆Formatting Dates: You can use your cloud data platform’s date functions to format dates.
🡆Example (Snowflake): The function TO_VARCHAR(my_date_column, ‘YYYY-MM-DD’) converts a date column into a string with a specific format.
🡆Formatting Numbers: Functions like ROUND() and FORMAT() are used to standardize numeric values.
🡆Example: To round a price to two decimal places, you would use ROUND.