0% found this document useful (0 votes)

146 views29 pages

Building Data Pipelines - 3

Uploaded by

Gusti Adli Anshari

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

146 views29 pages

Building Data Pipelines - 3

Uploaded by

Gusti Adli Anshari

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 29

On the importance

of tests
B U I L D I N G D ATA E N G I N E E R I N G P I P E L I N E S I N P Y T H O N

Oliver Willekens
Data Engineer at Data Minded
Software tends to change
Common reasons for change:

new functionality desired

bugs need to get squashed

performance needs to be improved

Core functionality rarely evolves

How to ensure stability in light of changes?

BUILDING DATA ENGINEERING PIPELINES IN PYTHON

Rationale behind testing
improves chance of code being correct in the future
prevent introducing breaking changes

raises con dence (not a guarantee) that code is correct now

assert actuals match expectations

most up-to-date documentation

form of documentation that is always in sync with what’s running

BUILDING DATA ENGINEERING PIPELINES IN PYTHON

The test pyramid: where to invest your efforts
Testing takes time

thinking what to test

writing tests

running tests

Testing has a high return on investment

when targeted at the correct layer

when testing the non-trivial parts, e.g.

distance between 2 coordinates ? uppercasing
a rst name © Martin Fowler “TestPyramid”

BUILDING DATA ENGINEERING PIPELINES IN PYTHON

The test pyramid: where to invest your efforts
Testing takes time

thinking what to test

writing tests

running tests

Testing has a high return on investment

when targeted at the correct layer

when testing the non-trivial parts, e.g.

distance between 2 coordinates ? uppercasing
a rst name © Martin Fowler “TestPyramid”

BUILDING DATA ENGINEERING PIPELINES IN PYTHON

The test pyramid: where to invest your efforts
Testing takes time

thinking what to test

writing tests

running tests

Testing has a high return on investment

when targeted at the correct layer

when testing the non-trivial parts, e.g.

distance between 2 coordinates ? uppercasing
a rst name © Martin Fowler “TestPyramid”

BUILDING DATA ENGINEERING PIPELINES IN PYTHON

The test pyramid: where to invest your efforts
Testing takes time

thinking what to test

writing tests

running tests

Testing has a high return on investment

when targeted at the correct layer

when testing the non-trivial parts, e.g.

distance between 2 coordinates ? uppercasing
a rst name © Martin Fowler “TestPyramid”

BUILDING DATA ENGINEERING PIPELINES IN PYTHON

Let’s have this sink
in!
B U I L D I N G D ATA E N G I N E E R I N G P I P E L I N E S I N P Y T H O N
Writing unit tests for
PySpark
B U I L D I N G D ATA E N G I N E E R I N G P I P E L I N E S I N P Y T H O N

Oliver Willekens
Data Engineer at Data Minded
Our earlier Spark application is an ETL pipeline

BUILDING DATA ENGINEERING PIPELINES IN PYTHON

Our earlier Spark application is an ETL pipeline

BUILDING DATA ENGINEERING PIPELINES IN PYTHON

Our earlier Spark application is an ETL pipeline

BUILDING DATA ENGINEERING PIPELINES IN PYTHON

Separate transform from extract and load
prices_with_ratings = spark.read.csv(…) # extract
exchange_rates = spark.read.csv(…) # extract

unit_prices_with_ratings = (prices_with_ratings
.join(…) # transform
.withColumn(…)) # transform

BUILDING DATA ENGINEERING PIPELINES IN PYTHON

Solution: construct DataFrames in-memory
# Extract the data - depends on input/output (network access,
df = spark.read.csv(path_to_file) lesystem permissions, …)

- unclear how big the data is

- unclear what data goes in

from pyspark.sql import Row + inputs are clear

purchase = Row("price",
"quantity", + data is close to where it is being used (“code-
"product") proximity”)
record = purchase(12.99, 1, "cake")
df = spark.createDataFrame((record,))

BUILDING DATA ENGINEERING PIPELINES IN PYTHON

Create small, reusable and well-named functions
unit_prices_with_ratings = (prices_with_ratings
.join(exchange_rates, ["currency", "date"])
.withColumn("unit_price_in_euro",
col("price") / col("quantity")
* col("exchange_rate_to_euro"))

def link_with_exchange_rates(prices, rates):

return prices.join(rates, ["currency", "date"])

def calculate_unit_price_in_euro(df):
return df.withColumn(
"unit_price_in_euro",
col("price") / col("quantity") * col("exchange_rate_to_euro"))

BUILDING DATA ENGINEERING PIPELINES IN PYTHON

Create small, reusable and well-named functions
def link_with_exchange_rates(prices, rates):
return prices.join(rates, ["currency", "date"])

def calculate_unit_price_in_euro(df):
return df.withColumn(
"unit_price_in_euro",
col("price") / col("quantity") * col("exchange_rate_to_euro"))

unit_prices_with_ratings = (
calculate_unit_price_in_euro(
link_with_exchange_rates(prices, exchange_rates)
)
)

BUILDING DATA ENGINEERING PIPELINES IN PYTHON

Testing a single unit
def test_calculate_unit_price_in_euro():
record = dict(price=10,
quantity=5,
exchange_rate_to_euro=2.)
df = spark.createDataFrame([Row(**record)])

BUILDING DATA ENGINEERING PIPELINES IN PYTHON

Testing a single unit
def test_calculate_unit_price_in_euro():
record = dict(price=10,
quantity=5,
exchange_rate_to_euro=2.)
df = spark.createDataFrame([Row(**record)])
result = calculate_unit_price_in_euro(df)

BUILDING DATA ENGINEERING PIPELINES IN PYTHON

expected_record = Row(**record, unit_price_in_euro=4.)

expected = spark.createDataFrame([expected_record])

BUILDING DATA ENGINEERING PIPELINES IN PYTHON

expected_record = Row(**record, unit_price_in_euro=4.)

expected = spark.createDataFrame([expected_record])

assertDataFrameEqual(result, expected)

BUILDING DATA ENGINEERING PIPELINES IN PYTHON

Take home messages
1. Interacting with external data sources is costly

2. Creating in-memory DataFrames makes testing easier

the data is in plain sight,

focus is on just a small number of examples.

3. Creating small and well-named functions leads to more reusability and easier testing.

BUILDING DATA ENGINEERING PIPELINES IN PYTHON

Let’s practice!
B U I L D I N G D ATA E N G I N E E R I N G P I P E L I N E S I N P Y T H O N
Continuous testing
B U I L D I N G D ATA E N G I N E E R I N G P I P E L I N E S I N P Y T H O N

Oliver Willekens
Data Engineer at Data Minded
Running a test suite
Execute tests in Python, with one of:

in stdlib 3rd party

unittest pytest

doctest nose

Core task: assert or raise

Examples:

assert computed == expected

with pytest.raises(ValueError): # pytest specific

BUILDING DATA ENGINEERING PIPELINES IN PYTHON

Manually triggering tests
In a Unix shell:

cd ~/workspace/my_good_python_project
pytest .
# Lots of output…
== 19 passed, 2 warnings in 36.80 seconds ==

cd ~/workspace/my_bad_python_project
pytest .
# Lots of output…
== 3 failed, 1 passed in 6.72 seconds ==

Note: Spark increases time to run unit tests.

BUILDING DATA ENGINEERING PIPELINES IN PYTHON

Automating tests
Problem:

forget to run unit tests when making changes

Solution:

Automation

How:

Git -> con gure hooks

Con gure CI/CD pipeline to run tests automatically

BUILDING DATA ENGINEERING PIPELINES IN PYTHON

CI/CD

Continuous Integration:

get code changes integrated with the master branch regularly.

Continuous Delivery:

Create “artifacts” (deliverables like documentation, but also programs) that can be deployed into
production without breaking things.

BUILDING DATA ENGINEERING PIPELINES IN PYTHON

Con guring a CI/CD tool
CircleCI looks for .circleci/con g.yml.

Example:

Often:
jobs:
test:
1. checkout code
docker:
- image: circleci/python:3.6.4 2. install test & build requirements
steps:
3. run tests
- checkout
- run: pip install -r requirements.txt 4. package/build the software artefacts
- run: pytest .
5. deploy the artefacts (update docs / install
app / …)

BUILDING DATA ENGINEERING PIPELINES IN PYTHON

Let’s practice!
B U I L D I N G D ATA E N G I N E E R I N G P I P E L I N E S I N P Y T H O N

Data Engineering For Machine Learning Pipelines From Python Libraries To ML P
100% (2)
Data Engineering For Machine Learning Pipelines From Python Libraries To ML P
582 pages
Final Project Documentation
No ratings yet
Final Project Documentation
53 pages
GE3171 - Python Lab ManuaLall Print
0% (1)
GE3171 - Python Lab ManuaLall Print
111 pages
Unit Testing Python-Chapter1
No ratings yet
Unit Testing Python-Chapter1
86 pages
Devops Full Notes
100% (6)
Devops Full Notes
230 pages
Building Data Pipelines in Python
No ratings yet
Building Data Pipelines in Python
49 pages
Chapter 3
No ratings yet
Chapter 3
29 pages
Tba Record Final
No ratings yet
Tba Record Final
140 pages
O Reilly Etl Testing 31693429373641
No ratings yet
O Reilly Etl Testing 31693429373641
183 pages
Chapter4 9
No ratings yet
Chapter4 9
43 pages
Lecture 4
No ratings yet
Lecture 4
56 pages
Lecture 4
No ratings yet
Lecture 4
56 pages
What Is Spark?: A Fast and General Engine For Large-Scale Data Processing 4 Libraries Built On Top of Spark Core
No ratings yet
What Is Spark?: A Fast and General Engine For Large-Scale Data Processing 4 Libraries Built On Top of Spark Core
45 pages
ML File Fnail Merged
No ratings yet
ML File Fnail Merged
82 pages
Samsung-Rjio 5 G RRU & BBU
No ratings yet
Samsung-Rjio 5 G RRU & BBU
20 pages
Chapter 2
No ratings yet
Chapter 2
46 pages
Pythonadvanced 151127114045 Lva1 App6891 PDF
No ratings yet
Pythonadvanced 151127114045 Lva1 App6891 PDF
180 pages
Dav - Lab Manual
No ratings yet
Dav - Lab Manual
34 pages
ARTISTLANe 17
No ratings yet
ARTISTLANe 17
39 pages
DatabricksDataEngineer Associate2024
75% (4)
DatabricksDataEngineer Associate2024
157 pages
PythonDASE - 2025 Version1
No ratings yet
PythonDASE - 2025 Version1
44 pages
Ics Attack v12.1 Techniques
No ratings yet
Ics Attack v12.1 Techniques
38 pages
Chapter 1
No ratings yet
Chapter 1
25 pages
System Design Interview An Insiders Guide Volume 2 1736049119 9781736049112 - Compress
88% (8)
System Design Interview An Insiders Guide Volume 2 1736049119 9781736049112 - Compress
427 pages
Brochure MIT XPRO - Professional Certificate in Data Engineering - V44
No ratings yet
Brochure MIT XPRO - Professional Certificate in Data Engineering - V44
15 pages
Vishnu. ML
No ratings yet
Vishnu. ML
26 pages
Why Unit Test?: Dibya Chakravorty
No ratings yet
Why Unit Test?: Dibya Chakravorty
86 pages
Pioneer Deh-24ub 2400ub 2450ub 2490ub crt4745
100% (1)
Pioneer Deh-24ub 2400ub 2450ub 2490ub crt4745
63 pages
Chapter4 Maintainability
No ratings yet
Chapter4 Maintainability
43 pages
Summer Training Report - Ishan Patwal
No ratings yet
Summer Training Report - Ishan Patwal
21 pages
Tips For Testing in Python 1646539645
No ratings yet
Tips For Testing in Python 1646539645
23 pages
Machine Learning Lab Record Report
No ratings yet
Machine Learning Lab Record Report
38 pages
Pds Leb Manual
No ratings yet
Pds Leb Manual
54 pages
Program Delivery
No ratings yet
Program Delivery
37 pages
Unit 7: Problem Solving Real World Programming Problems
No ratings yet
Unit 7: Problem Solving Real World Programming Problems
36 pages
GRP Project DT
No ratings yet
GRP Project DT
22 pages
Int 5
No ratings yet
Int 5
12 pages
Chapter3 Utilizing Classes
No ratings yet
Chapter3 Utilizing Classes
33 pages
Python Fundamentals A Beginners Journey
No ratings yet
Python Fundamentals A Beginners Journey
17 pages
Lanelets: Efficient Map Representation For Autonomous Driving
100% (1)
Lanelets: Efficient Map Representation For Autonomous Driving
6 pages
Industrial Training Report (Sahil)
No ratings yet
Industrial Training Report (Sahil)
33 pages
Brochure Professional Certificate in Data Engineering
100% (1)
Brochure Professional Certificate in Data Engineering
14 pages
Machine Learning Algorithms
No ratings yet
Machine Learning Algorithms
32 pages
UPDATED Data Science Syllabus
No ratings yet
UPDATED Data Science Syllabus
20 pages
House Report
No ratings yet
House Report
26 pages
Top 100 Applications of Generative AI 1683282083
100% (15)
Top 100 Applications of Generative AI 1683282083
119 pages
Data-Engineering Compressed
No ratings yet
Data-Engineering Compressed
20 pages
New 276
No ratings yet
New 276
13 pages
Python Programming
No ratings yet
Python Programming
2 pages
1 Intro
No ratings yet
1 Intro
33 pages
Python Record Manual
No ratings yet
Python Record Manual
18 pages
Azure Devops Explained
88% (8)
Azure Devops Explained
438 pages
Extract, Transform and Load (ETL)
No ratings yet
Extract, Transform and Load (ETL)
31 pages
Engineering Python
No ratings yet
Engineering Python
5 pages
Durgasoft Git For Devop Study Material
100% (4)
Durgasoft Git For Devop Study Material
140 pages
Building Data Pipelines - 1
No ratings yet
Building Data Pipelines - 1
25 pages
MIT Data Engineering
No ratings yet
MIT Data Engineering
20 pages
30 Python Best Practices, Tips, and Tricks by Erik Van Baaren Python Land Medium
No ratings yet
30 Python Best Practices, Tips, and Tricks by Erik Van Baaren Python Land Medium
23 pages
PDS Merged New
No ratings yet
PDS Merged New
19 pages
Testing Machine Learning Systems - Code, Data and Models - Made With ML
No ratings yet
Testing Machine Learning Systems - Code, Data and Models - Made With ML
33 pages
7 Practicals With Python Practice With Data Science Cookbook
No ratings yet
7 Practicals With Python Practice With Data Science Cookbook
4 pages
Snowflake Snowpro Exam Cheatsheet
83% (12)
Snowflake Snowpro Exam Cheatsheet
7 pages
Toshiba Nemio 20 Service Manual
50% (2)
Toshiba Nemio 20 Service Manual
2 pages
Data Science Machine Learning 17054
No ratings yet
Data Science Machine Learning 17054
27 pages
Digital Panoramic X-Ray Unit: Installation Manual
No ratings yet
Digital Panoramic X-Ray Unit: Installation Manual
62 pages
yO5PJdPFShyuTyXTxbocww - Feature Engineering - Course Summary
No ratings yet
yO5PJdPFShyuTyXTxbocww - Feature Engineering - Course Summary
6 pages
Report
No ratings yet
Report
25 pages
Internship-Data Science and Machine Learning Using Python
No ratings yet
Internship-Data Science and Machine Learning Using Python
5 pages
Azure Databricks
67% (6)
Azure Databricks
69 pages
Snowflake Free Lab Guide
50% (4)
Snowflake Free Lab Guide
58 pages
Unit Abhishek 1735609911
No ratings yet
Unit Abhishek 1735609911
9 pages
Data Analysis
No ratings yet
Data Analysis
8 pages
System Design Interview - An Insider's Guide
90% (10)
System Design Interview - An Insider's Guide
103 pages
Azure Databricks Course Slide Deck
75% (4)
Azure Databricks Course Slide Deck
169 pages
AWS Boto - 1
No ratings yet
AWS Boto - 1
55 pages
Data Structure and Algorithmic Thinking With Python Data Structure and Algorithmic Puzzles PDF
95% (21)
Data Structure and Algorithmic Thinking With Python Data Structure and Algorithmic Puzzles PDF
471 pages
License Agreement
No ratings yet
License Agreement
14 pages
Computer Repair Business plan-SolDrb
No ratings yet
Computer Repair Business plan-SolDrb
32 pages
Data Engineering With Databricks Da
100% (2)
Data Engineering With Databricks Da
232 pages
HYDRUS3D User Manual
No ratings yet
HYDRUS3D User Manual
203 pages
Introduction
No ratings yet
Introduction
61 pages
Big Data Engineering Interview Questions
67% (3)
Big Data Engineering Interview Questions
189 pages
Practical MLOPS
100% (1)
Practical MLOPS
52 pages
Java Programming From Problem Analysis To Program Design 4th Edition by D S Malik Ebook and TestBank Bundle Verified PDF
100% (2)
Java Programming From Problem Analysis To Program Design 4th Edition by D S Malik Ebook and TestBank Bundle Verified PDF
410 pages
MBA in Python - 4
No ratings yet
MBA in Python - 4
41 pages
S. Haines - Modern Data Engineering With Apache Spark - A Hands-On Guide For Building Mission-Critical Streaming Applications (2022) - Libgen - Li
50% (4)
S. Haines - Modern Data Engineering With Apache Spark - A Hands-On Guide For Building Mission-Critical Streaming Applications (2022) - Libgen - Li
592 pages
Codnames Agents Printable 2
No ratings yet
Codnames Agents Printable 2
2 pages
Building Data Pipelines - 4
No ratings yet
Building Data Pipelines - 4
38 pages
Top 200 Data Engineer Interview Question PDF
100% (4)
Top 200 Data Engineer Interview Question PDF
482 pages
Internet OF Things
No ratings yet
Internet OF Things
10 pages
SQL Interview Questions
No ratings yet
SQL Interview Questions
20 pages
GCP Fundamentals
100% (2)
GCP Fundamentals
178 pages
Introduction To Cryptocurrency
No ratings yet
Introduction To Cryptocurrency
17 pages
Det Technical Manual Current
No ratings yet
Det Technical Manual Current
40 pages
Introduction and Refresher: Alex Scriven
No ratings yet
Introduction and Refresher: Alex Scriven
26 pages
Auto Appointments Tls France Grmany
No ratings yet
Auto Appointments Tls France Grmany
3 pages
Azure Devops Complete Ci CD Pipeline PDF
100% (8)
Azure Devops Complete Ci CD Pipeline PDF
109 pages
An Insiders Guide To Ace System Design in - Maurice Jayson
100% (2)
An Insiders Guide To Ace System Design in - Maurice Jayson
60 pages
Cmos 0 To 44 MHZ Single Chip 8-Bit Microntroller: Description
No ratings yet
Cmos 0 To 44 MHZ Single Chip 8-Bit Microntroller: Description
20 pages
10 Mat Ak 161102024
No ratings yet
10 Mat Ak 161102024
14 pages
Leet - Code Solution
100% (1)
Leet - Code Solution
630 pages
Java Complete Book
No ratings yet
Java Complete Book
84 pages
MBA in Python - 1
No ratings yet
MBA in Python - 1
32 pages
Document
No ratings yet
Document
33 pages
Banker's Algorithm in Operating System (OS) - Javatpoint
No ratings yet
Banker's Algorithm in Operating System (OS) - Javatpoint
6 pages
Pip, Crontab
No ratings yet
Pip, Crontab
39 pages
Post-Quantum 8309
No ratings yet
Post-Quantum 8309
39 pages
CSP Copilot Getting Started Promo FAQ
No ratings yet
CSP Copilot Getting Started Promo FAQ
7 pages
Intellectual Property Infringement On The Internet
No ratings yet
Intellectual Property Infringement On The Internet
21 pages
Data Engineering Cookbook
88% (8)
Data Engineering Cookbook
88 pages
928 0604 00 - Model928 OMS - UserManual 4CH
No ratings yet
928 0604 00 - Model928 OMS - UserManual 4CH
50 pages
800 Data Science Questions
100% (2)
800 Data Science Questions
258 pages
DS Interview Questions Guide 365DataScience
100% (5)
DS Interview Questions Guide 365DataScience
111 pages
How To Add A User To The Docker Group
No ratings yet
How To Add A User To The Docker Group
19 pages
AWS CSAA Practice-Questions DCT V08-Ambu0d
100% (11)
AWS CSAA Practice-Questions DCT V08-Ambu0d
411 pages
Physical Work Environment - Color
No ratings yet
Physical Work Environment - Color
12 pages
Performance Tuning in Azure Databricks
100% (1)
Performance Tuning in Azure Databricks
124 pages
Googlecloudplatform 1151921572881138355
83% (6)
Googlecloudplatform 1151921572881138355
288 pages
TPC Gusti-12522198
No ratings yet
TPC Gusti-12522198
1 page
Coefficient of Determination
No ratings yet
Coefficient of Determination
7 pages
Coyote Installation Guide Toyota
No ratings yet
Coyote Installation Guide Toyota
3 pages
Registration For Affinsys Recruitment Drive - 2025 Graduating Batch
No ratings yet
Registration For Affinsys Recruitment Drive - 2025 Graduating Batch
3 pages
Data Engineering With Databricks
100% (2)
Data Engineering With Databricks
63 pages
Loq 15iax9 83gs0097in
No ratings yet
Loq 15iax9 83gs0097in
2 pages
Architecting A Data Lake
100% (8)
Architecting A Data Lake
60 pages
3 - ETL Processing On Google Cloud Using Dataflow and BigQuery
0% (1)
3 - ETL Processing On Google Cloud Using Dataflow and BigQuery
15 pages
Introduction to Python Programming: Do your first steps into programming with python
From Everand
Introduction to Python Programming: Do your first steps into programming with python
Greytower Corp
No ratings yet
Salesforce Certified Platform Developer I CRT-450 Exam Preparation
From Everand
Salesforce Certified Platform Developer I CRT-450 Exam Preparation
Georgio Daccache
No ratings yet

Building Data Pipelines - 3

Uploaded by

Building Data Pipelines - 3

Uploaded by

On the importance

new functionality desired

bugs need to get squashed

performance needs to be improved

Core functionality rarely evolves

How to ensure stability in light of changes?

BUILDING DATA ENGINEERING PIPELINES IN PYTHON

raises con dence (not a guarantee) that code is correct now

most up-to-date documentation

BUILDING DATA ENGINEERING PIPELINES IN PYTHON

thinking what to test

Testing has a high return on investment

when targeted at the correct layer

when testing the non-trivial parts, e.g.

BUILDING DATA ENGINEERING PIPELINES IN PYTHON

thinking what to test

Testing has a high return on investment

when targeted at the correct layer

when testing the non-trivial parts, e.g.

BUILDING DATA ENGINEERING PIPELINES IN PYTHON

thinking what to test

Testing has a high return on investment

when targeted at the correct layer

when testing the non-trivial parts, e.g.

BUILDING DATA ENGINEERING PIPELINES IN PYTHON

thinking what to test

Testing has a high return on investment

when targeted at the correct layer

when testing the non-trivial parts, e.g.

BUILDING DATA ENGINEERING PIPELINES IN PYTHON

BUILDING DATA ENGINEERING PIPELINES IN PYTHON

BUILDING DATA ENGINEERING PIPELINES IN PYTHON

BUILDING DATA ENGINEERING PIPELINES IN PYTHON

BUILDING DATA ENGINEERING PIPELINES IN PYTHON

- unclear how big the data is

- unclear what data goes in

from pyspark.sql import Row + inputs are clear

BUILDING DATA ENGINEERING PIPELINES IN PYTHON

def link_with_exchange_rates(prices, rates):

BUILDING DATA ENGINEERING PIPELINES IN PYTHON

BUILDING DATA ENGINEERING PIPELINES IN PYTHON

BUILDING DATA ENGINEERING PIPELINES IN PYTHON

BUILDING DATA ENGINEERING PIPELINES IN PYTHON

expected_record = Row(**record, unit_price_in_euro=4.)

BUILDING DATA ENGINEERING PIPELINES IN PYTHON

expected_record = Row(**record, unit_price_in_euro=4.)

BUILDING DATA ENGINEERING PIPELINES IN PYTHON

2. Creating in-memory DataFrames makes testing easier

focus is on just a small number of examples.

BUILDING DATA ENGINEERING PIPELINES IN PYTHON

in stdlib 3rd party

Core task: assert or raise

assert computed == expected

with pytest.raises(ValueError): # pytest specific

BUILDING DATA ENGINEERING PIPELINES IN PYTHON

Note: Spark increases time to run unit tests.

BUILDING DATA ENGINEERING PIPELINES IN PYTHON

forget to run unit tests when making changes

Git -> con gure hooks

Con gure CI/CD pipeline to run tests automatically

BUILDING DATA ENGINEERING PIPELINES IN PYTHON

get code changes integrated with the master branch regularly.

BUILDING DATA ENGINEERING PIPELINES IN PYTHON

BUILDING DATA ENGINEERING PIPELINES IN PYTHON

You might also like