0% found this document useful (0 votes)
622 views14 pages

Matillion ETL Transformations

This document provides an overview of Matillion ETL, including its evolution, architecture, integration capabilities, ETL solutions and transformations, and implementation for Amazon Redshift. It describes Matillion's orchestration and transformation jobs, components, shared jobs, and API.

Uploaded by

Rahul Pant
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
622 views14 pages

Matillion ETL Transformations

This document provides an overview of Matillion ETL, including its evolution, architecture, integration capabilities, ETL solutions and transformations, and implementation for Amazon Redshift. It describes Matillion's orchestration and transformation jobs, components, shared jobs, and API.

Uploaded by

Rahul Pant
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 14

August 2021

POV – Matillion ETL

1
1
agenda

• Evolution of ETL

• Matillion ETL: Introduction and Architecture

• Supported Integration Capabilities

• ETL Solutions and Transformations

• Implementation of Matillion ETL for Amazon Redshift


2
Evolution

3
ETL Solution and Transformations

• Building ETL solutions on Matillion involves Tow main type of jobs :

a) Orchestrations

b) Transformations

• These jobs are can be organized into folders.

• Each Orchestration job can contain other orchestrations or Transformation jobs.

• Transformation Job will contain only the transformation components.

4
Orchestration Components
There is an extensive list of Orchestration components provided by Matillion:
• AWS Components : specific to AWS services like SNS, SQS.
• Connectors : Provides various connectors to query and load data directly from various
services like Apache Hive, Google Big Query, Facebook Query, etc. Also provides various
connectors to unload data to services like Amazon S3.
• DDL and Transactions: To perform DDL operations and transactions on DWH.
• Scripting : Scripting components to run python/bash scripts directly from the orchestration
job.
• Flow: Components to control flow of the job execution.
• Iteration: Iterate over various components.
• Variables: To manipulate various types of variables available on matillion.
5
Python Script Component
This component runs a Python script within the Orchestration job.
• The script is executed in-process by an interpreter of the user's choice (Jython, Python2 or
Python3). Any output written via print statements will appear as the task completion
message, and so output should be brief.
• This component can also be used to manipulate values of environment and job variables.
• To access the database defined in the current environment, we can use the 'cursor' object
provided (Jython Only).
• Additional python modules may be installed by running the pip command.
• Python Script can be used to carry out custom tasks for which default components are not
present within Matillion. Ex: Move/Archive files in S3 bucket, place messages on message
queues, external api calls, etc. 7
Python Script Example

This script places an SQS message on the Amazon SQS service using the boto3 python
SDK for AWS.

8
SQL Script Component
This component allows you to write your own custom SQL script within the Orchestration job.
• Custom database maintenance tasks can be executed from within this script.
• It allows us to execute multiple SQL statements within the single script.
• It is useful in scenarios where custom logic needs to be used for update/delete statements.
• Using with Redshift, this component can be used to specify custom queries for unloading
the data to S3 buckets.
• Environment and job variables can be used within the scripts. The references for these
variables is substituted at the run time. This helps in parameterizing the queries (dynamic
SQL).

9
Transformation Components

There are a number of transformation components provided by Matillion:


• Read : various components to input the data into the pipeline.
• Join : components to join / unite various types of input or other components.
• Transform: a list of components to transform the data ranging from aggregations, filters,
transpose, lead/lag, window calculation and many more.
• Write : write the output to target tables
• Wizard : wizard to replicate the data to an external table.

10
11
Shared Jobs

• Along with Orchestration and Transformation jobs, Matillion provides the

functionality to convert a predefined pipeline into a shared job components.

• Shared job can be used within an orchestration component. This ensures

reusability and encapsulation of existing workflows.

• If a set of transformations are common to various jobs, it makes sense to

group these transformations under a single shared job.


12
Matillion ETL API

• Apart from the Browser based GUI, Matillion also provides API to interact with Matillion Hub.
• The Matillion ETL API is designed to make it easy to interact and enable you to extend the
functionality of the product and perform high volume data transfers and to interact with Matillion
ETL programmatically.
• The Matillion ETL API is available on standard REST-based APIs that uses HTTP or HTTPS request to
GET, POST, and DELETE data. The API service is accessed through the Uniform Resource Identifier
(URI).
• The Matillion API can be used to: 
o Run Matillion jobs.
o Exporting information (task history and job metadata). 
o Export and import user configuration.
o Update or create Matillion schedules
13
THANK YOU
Click to add text
Click to add text

14

You might also like