Introduction To ETL in Python: Stefano Francavilla
Introduction To ETL in Python: Stefano Francavilla
in Python
ETL IN PYTHON
Stefano Francavilla
CEO - Geowox
What is ETL?
Extract, Transform, Load
ETL IN PYTHON
What is ETL?
Extract, Transform, Load
ETL IN PYTHON
What is ETL?
Extract, Transform, Load
ETL IN PYTHON
The scene
Private equity fund called "DataCamp
Capital Group" (DCG Capital)
Residential assets
ETL IN PYTHON
The pipeline
ETL IN PYTHON
The pipeline
ETL IN PYTHON
The pipeline
ETL IN PYTHON
In this lesson
ETL IN PYTHON
In this lesson
ETL IN PYTHON
In this lesson
ETL IN PYTHON
Requests
GET requests.get('<url>')
ETL IN PYTHON
Common requests attributes
response = requests.get('https://example.com/ny-properties-onsale.csv')
ETL IN PYTHON
Common requests attributes
ETL IN PYTHON
Zipfile
from zipfile import ZipFile class
Read mode
ETL IN PYTHON
Zipfile: an example
from zipfile import ZipFile
filepath = "/my/custom/path/example.zip"
with ZipFile(filepath, mode='r') as f:
name_list = f.namelist()
print("List of files:", name_list)
extract_path = f.extract(name_list[0], path="/my/custom/path/")
print("Extract Path:", extract_path)
ETL IN PYTHON
Let's practice!
ETL IN PYTHON
Ask the right
questions
ETL IN PYTHON
Stefano Francavilla
CEO - Geowox
Where we are in the pipeline
ETL IN PYTHON
Dataset example
Date of Sale Postal Description
Address County Price (€)
(dd/mm/yyyy) Code of Property
Second-
Hand
123 WALKINSTOWN PARK, Dublin
12/02/2021 Dublin €297,000.00 Dwelling
WALKINSTOWN, DUBLIN 12 12
house
/Apartment
New
12 Oileain Na
Dublin Dwelling
04/01/2021 Cranoige.Cranogue Isl, Dublin €192,951.00
11 house
Balbutcher Lane, BALLYMUN
/Apartment
ETL IN PYTHON
Open a file
Built-in open() function
Character Meaning
'r' open for reading (default)
'w' open for writing
ETL IN PYTHON
Open a file: example
Read mode
Write mode
ETL IN PYTHON
CSV module
csv implements classes to read and write tabular data in CSV format
ETL IN PYTHON
Read in action
Code
Output
OrderedDict([
('Date of Sale (dd/mm/yyyy)', '03/01/2021'),('Postal Code', 'Dublin 4'),
('Address', '16 BURLEIGH COURT, BURLINGTON ROAD, DUBLIN 4'),('County', 'Dublin'),
('Price (€)', '€450,000.00'), ...])
ETL IN PYTHON
Write in action
Code
ETL IN PYTHON
Let's practice!
ETL IN PYTHON
Extracting
ETL IN PYTHON
Stefano Francavilla
CEO - Geowox
End goal
Automated pipeline
cron
Command line utility used for scheduling
execute.py
1. extract.py
2. transform.py
3. load.py
ETL IN PYTHON
E(xtract)TL
ETL IN PYTHON
E(xtract)TL
ETL IN PYTHON
E(xtract)TL
ETL IN PYTHON
E(xtract)TL
ETL IN PYTHON
E(xtract)TL
ETL IN PYTHON
E(xtract)TL
ETL IN PYTHON
E(xtract)TL
ETL IN PYTHON
In this lesson: E(xtract)
ETL IN PYTHON
Create a folder
Make sure the downloaded_at folder exists
import os
Allows Python to interact with the operating system
os.makedirs(path, exist_ok=[True|False])
ETL IN PYTHON
Create a folder: an example
January 1st, 2021
rst time we run the cron job
ETL IN PYTHON
Create a folder: an example
# Create <root>/source/downloaded_at=2021-01-01
path = "root/source/downloaded_at=2021-01-01"
os.makedirs(path, exist_ok=True)
# 1. Create source
# 2. Create downloaded_at=2021-01-01
/source/downloaded_at=2021-01-01/<zipfile_name>.zip
ETL IN PYTHON
Save ZIP file locally
open()
Commonly used with two arguments: open(filepath, mode)
Character Meaning
'w' open for writing in text format
'wb' open for writing in binary format
ETL IN PYTHON
Let's practice!
ETL IN PYTHON
Project folder
structure
ETL IN PYTHON
Stefano Francavilla
CEO Geowox
ETL IN PYTHON
ETL IN PYTHON
ETL IN PYTHON
ETL IN PYTHON
ETL IN PYTHON
ETL IN PYTHON
ETL IN PYTHON
ETL IN PYTHON
ETL IN PYTHON
ETL IN PYTHON
ETL IN PYTHON
ETL IN PYTHON
ETL IN PYTHON
ETL IN PYTHON
ETL IN PYTHON
ETL IN PYTHON
ETL IN PYTHON
Extract, Transform and Load
# Import libraries
def methodX():
# Code here
pass
def methodY():
# Code here
pass
def main():
methodX()
methodY()
ETL IN PYTHON
Execute
# Import extract, transform and load
import extract, transform, load
python execute.py
ETL IN PYTHON
Let's practice!
ETL IN PYTHON