Matillion ETL Transformations
Matillion ETL Transformations
1
1
agenda
• Evolution of ETL
3
ETL Solution and Transformations
a) Orchestrations
b) Transformations
4
Orchestration Components
There is an extensive list of Orchestration components provided by Matillion:
• AWS Components : specific to AWS services like SNS, SQS.
• Connectors : Provides various connectors to query and load data directly from various
services like Apache Hive, Google Big Query, Facebook Query, etc. Also provides various
connectors to unload data to services like Amazon S3.
• DDL and Transactions: To perform DDL operations and transactions on DWH.
• Scripting : Scripting components to run python/bash scripts directly from the orchestration
job.
• Flow: Components to control flow of the job execution.
• Iteration: Iterate over various components.
• Variables: To manipulate various types of variables available on matillion.
5
Python Script Component
This component runs a Python script within the Orchestration job.
• The script is executed in-process by an interpreter of the user's choice (Jython, Python2 or
Python3). Any output written via print statements will appear as the task completion
message, and so output should be brief.
• This component can also be used to manipulate values of environment and job variables.
• To access the database defined in the current environment, we can use the 'cursor' object
provided (Jython Only).
• Additional python modules may be installed by running the pip command.
• Python Script can be used to carry out custom tasks for which default components are not
present within Matillion. Ex: Move/Archive files in S3 bucket, place messages on message
queues, external api calls, etc. 7
Python Script Example
This script places an SQS message on the Amazon SQS service using the boto3 python
SDK for AWS.
8
SQL Script Component
This component allows you to write your own custom SQL script within the Orchestration job.
• Custom database maintenance tasks can be executed from within this script.
• It allows us to execute multiple SQL statements within the single script.
• It is useful in scenarios where custom logic needs to be used for update/delete statements.
• Using with Redshift, this component can be used to specify custom queries for unloading
the data to S3 buckets.
• Environment and job variables can be used within the scripts. The references for these
variables is substituted at the run time. This helps in parameterizing the queries (dynamic
SQL).
9
Transformation Components
10
11
Shared Jobs
• Apart from the Browser based GUI, Matillion also provides API to interact with Matillion Hub.
• The Matillion ETL API is designed to make it easy to interact and enable you to extend the
functionality of the product and perform high volume data transfers and to interact with Matillion
ETL programmatically.
• The Matillion ETL API is available on standard REST-based APIs that uses HTTP or HTTPS request to
GET, POST, and DELETE data. The API service is accessed through the Uniform Resource Identifier
(URI).
• The Matillion API can be used to:
o Run Matillion jobs.
o Exporting information (task history and job metadata).
o Export and import user configuration.
o Update or create Matillion schedules
13
THANK YOU
Click to add text
Click to add text
14