Lecture 3 Data Engineering Concepts, Processes, and Tools

Data engineering is essential for preparing and making data usable for analytics, machine learning, and AI projects within organizations. It involves a series of processes including data ingestion, transformation, and serving, which are managed through data pipelines to ensure data quality and accessibility. The discipline addresses challenges posed by disparate data sources and formats, enabling clearer insights into business operations.

Uploaded by

genesiskalya

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

7 views

Lecture 3 Data Engineering Concepts, Processes, and Tools

Uploaded by

genesiskalya

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 2

Lecture 4: Data Engineering Concepts, Processes, and

Tools

Sharing top billing on the list of data science capabilities, machine learning and artificial
intelligence are not just buzzwords: Many organizations are eager to adopt them. But prior to
building intelligent products, you need to gather and prepare data, that fuels AI. A separate
discipline called data engineering, lays the necessary groundwork for analytics projects.
Tasks related to it occupy the first three layers of the data science hierarchy of needs
suggested by Monica Rogati.

Data science layers towards AI by Monica Rogati.

What is data engineering?

Data engineering is a set of operations to make data available and usable to data scientists,
data analysts, business intelligence (BI) developers, and other specialists within an
organization. It takes dedicated experts – data engineers – to design and build systems for
gathering and storing data at scale as well as preparing it for further analysis.

Within a large organization, there are usually many different types of operations management
software (e.g., ERP, CRM, production systems, etc.), all containing databases with varied
information. Besides, data can be stored as separate files or pulled from external sources —
such as IoT devices — in real time. Having data scattered in different formats prevents the
organization from seeing a clear picture of its business state and running analytics.

1
Data engineering addresses this problem step by step.

Data engineering process

The data engineering process covers a sequence of tasks that turn a large amount of raw data
into a practical product meeting the needs of analysts, data scientists, machine learning
engineers, and others. Typically, the end-to-end workflow consists of the following stages.

A data engineering process in brief.

Data ingestion (acquisition) moves data from multiple sources — SQL

and NoSQL databases, IoT devices, websites, streaming services, etc. — to a target system to
be transformed for further analysis. Data comes in various forms and can be both structured
and unstructured.

Data transformation adjusts disparate data to the needs of end users. It involves removing
errors and duplicates from data, normalizing it, and converting it into the needed format.

Data serving delivers transformed data to end users — a BI platform, dashboard, or data
science team.

Data flow orchestration provides visibility into the data engineering process, ensuring that
all tasks are successfully completed. It coordinates and continuously tracks data workflows to
detect and fix data quality and performance issues.

The mechanism that automates ingestion, transformation, and serving steps of the data
engineering process is known as a data pipeline.

Data Engineering For Machine Learning Pipelines From Python Libraries To ML P
100% (2)
Data Engineering For Machine Learning Pipelines From Python Libraries To ML P
582 pages
2024 07 Eb Big Book of Data Engineering 3rd Edition
100% (2)
2024 07 Eb Big Book of Data Engineering 3rd Edition
125 pages
12 - DataEngineer - Interview - Questions and Answers - EPAM Anywhere
No ratings yet
12 - DataEngineer - Interview - Questions and Answers - EPAM Anywhere
2 pages
big-book-of-data-engineering-3rd-edition-1-27-2025
No ratings yet
big-book-of-data-engineering-3rd-edition-1-27-2025
126 pages
Fundamentals of Data Engineering
No ratings yet
Fundamentals of Data Engineering
16 pages
Introduction To Data Engineering
100% (1)
Introduction To Data Engineering
23 pages
CH1 - Introduction To Data Engineering
No ratings yet
CH1 - Introduction To Data Engineering
36 pages
Become A Data Engineer
100% (2)
Become A Data Engineer
14 pages
Google Cloud Platform for Data Engineering: From Beginner to Data Engineer using Google Cloud Platform
From Everand
Google Cloud Platform for Data Engineering: From Beginner to Data Engineer using Google Cloud Platform
alasdair gilchrist
5/5 (1)
Vmware Horizon View Instant Clone Technology
No ratings yet
Vmware Horizon View Instant Clone Technology
20 pages
Service Manual Philips DCM2060
No ratings yet
Service Manual Philips DCM2060
31 pages
M
No ratings yet
M
13 pages
Data Engineering
No ratings yet
Data Engineering
6 pages
Introduction to Data Engineering
No ratings yet
Introduction to Data Engineering
13 pages
DE NOTES
No ratings yet
DE NOTES
3 pages
Data Engineering Unit-1
No ratings yet
Data Engineering Unit-1
16 pages
Lecture 1.1 - Introduction To DE
No ratings yet
Lecture 1.1 - Introduction To DE
27 pages
Data Engineering UNIT-1
No ratings yet
Data Engineering UNIT-1
14 pages
Introduction To Data Engineering
No ratings yet
Introduction To Data Engineering
8 pages
The Essence of Data Engineering
No ratings yet
The Essence of Data Engineering
3 pages
Fundamentals-of-Data-Engineering-Concepts
No ratings yet
Fundamentals-of-Data-Engineering-Concepts
219 pages
2OEeUEnBTY_CompleteGuideToBecomeModernDataEngineer
No ratings yet
2OEeUEnBTY_CompleteGuideToBecomeModernDataEngineer
43 pages
DE Unit I
No ratings yet
DE Unit I
12 pages
Data Engineering - Beginner's Guide
100% (1)
Data Engineering - Beginner's Guide
9 pages
Data Engineeing 1 Pages 2
No ratings yet
Data Engineeing 1 Pages 2
14 pages
C1_W1
No ratings yet
C1_W1
91 pages
100_data_engineering_QUESTIONS_ANSWERS
No ratings yet
100_data_engineering_QUESTIONS_ANSWERS
59 pages
Inbound 2613578228155417375
No ratings yet
Inbound 2613578228155417375
2 pages
Data Engineering UNIT-1 (2)
No ratings yet
Data Engineering UNIT-1 (2)
5 pages
A Internship Report UTTAM
No ratings yet
A Internship Report UTTAM
9 pages
DE UNIT - I
No ratings yet
DE UNIT - I
43 pages
4.data Engineering
No ratings yet
4.data Engineering
9 pages
DE Week-1, Lecture
No ratings yet
DE Week-1, Lecture
3 pages
Data Engineering 101
No ratings yet
Data Engineering 101
1 page
Understanding The Differences Between Data Processing and Data Engineering On The Road Map To Become A Data Scientist
No ratings yet
Understanding The Differences Between Data Processing and Data Engineering On The Road Map To Become A Data Scientist
9 pages
essentials-of-data-engineeringByMukeshSaini
No ratings yet
essentials-of-data-engineeringByMukeshSaini
30 pages
Introduction To Data Engineering
No ratings yet
Introduction To Data Engineering
28 pages
DOC-20250317-WA0008.
No ratings yet
DOC-20250317-WA0008.
19 pages
Page 2
No ratings yet
Page 2
3 pages
The Evolving Role of The Data Engineer
No ratings yet
The Evolving Role of The Data Engineer
61 pages
Lec 01 - DATA 101 Sp24 - Welcome To Data Engineering!
No ratings yet
Lec 01 - DATA 101 Sp24 - Welcome To Data Engineering!
31 pages
5 Ferilion Labs Handbook Data Engg
No ratings yet
5 Ferilion Labs Handbook Data Engg
12 pages
Big Data
No ratings yet
Big Data
51 pages
de Lecture 1 Intro To Data Engg
No ratings yet
de Lecture 1 Intro To Data Engg
12 pages
Data Engineer Roadmap 2024 _ Navigating the Landscape of Data Engineering _ by Ansam Yousry _ in Technology Hits - Freedium
No ratings yet
Data Engineer Roadmap 2024 _ Navigating the Landscape of Data Engineering _ by Ansam Yousry _ in Technology Hits - Freedium
12 pages
This is What I Will Do to Become a Data Engineer in 2025 _ by Syed Kadar Ansari Syed Ahamed _ Aug, 2024 _ Data Engineer Things
No ratings yet
This is What I Will Do to Become a Data Engineer in 2025 _ by Syed Kadar Ansari Syed Ahamed _ Aug, 2024 _ Data Engineer Things
22 pages
DataEngineering(ut1)
No ratings yet
DataEngineering(ut1)
27 pages
W17470 EE Engineering Brochure Data Engineering English
No ratings yet
W17470 EE Engineering Brochure Data Engineering English
1 page
Python Data Wrangling for Business Analytics: Python for Business Analytics Series
From Everand
Python Data Wrangling for Business Analytics: Python for Business Analytics Series
George Snypes
2/5 (1)
Lecture Notes Ch1 (1)
No ratings yet
Lecture Notes Ch1 (1)
24 pages
M1
No ratings yet
M1
8 pages
DE UNIT-2
No ratings yet
DE UNIT-2
10 pages
Essentials of Data Engineering -- Saini, Dr_ Mukesh -- 2024 -- Bb50f635b916a3edd2d60d5109fbb873 -- Anna’s Archive (1)
No ratings yet
Essentials of Data Engineering -- Saini, Dr_ Mukesh -- 2024 -- Bb50f635b916a3edd2d60d5109fbb873 -- Anna’s Archive (1)
431 pages
Course1_summary
No ratings yet
Course1_summary
4 pages
DataEngineer Roadmap
No ratings yet
DataEngineer Roadmap
12 pages
Python
No ratings yet
Python
9 pages
Aws Learning Path
No ratings yet
Aws Learning Path
7 pages
LO2a) - Introduction To Data Engineering
No ratings yet
LO2a) - Introduction To Data Engineering
32 pages
Learn Hadoop in 24 Hours
From Everand
Learn Hadoop in 24 Hours
Alex Nordeen
No ratings yet
Data Engineering Explanation
No ratings yet
Data Engineering Explanation
43 pages
DataCamp - Data Engineer
No ratings yet
DataCamp - Data Engineer
2 pages
12 Must-Have Skills To Become A Data Engineer - by Anuj Syal - DataDrivenInvestor
No ratings yet
12 Must-Have Skills To Become A Data Engineer - by Anuj Syal - DataDrivenInvestor
9 pages
Tascr Manager Proposal: Roposal Ummary
No ratings yet
Tascr Manager Proposal: Roposal Ummary
2 pages
Polarx 5TR GNSS receiver manual
No ratings yet
Polarx 5TR GNSS receiver manual
94 pages
Expt 6 - Analysis of TCP Protocol
No ratings yet
Expt 6 - Analysis of TCP Protocol
8 pages
UPX1R
No ratings yet
UPX1R
2 pages
Operations Management: Managing Business Operations - . - Manufacturing Distribution Transportation Services
No ratings yet
Operations Management: Managing Business Operations - . - Manufacturing Distribution Transportation Services
20 pages
Panel Floor Formwork Dokadek 30
No ratings yet
Panel Floor Formwork Dokadek 30
100 pages
Forms Design-PIM
No ratings yet
Forms Design-PIM
5 pages
Domestic Wiring
No ratings yet
Domestic Wiring
47 pages
To: MR. Angelo P. Montemayor
No ratings yet
To: MR. Angelo P. Montemayor
3 pages
Dynamic Customer Engagement For Dummies
No ratings yet
Dynamic Customer Engagement For Dummies
52 pages
C-V2X Use Cases - Methodology, Examples and Service-Level Requirements (July2019) PDF
No ratings yet
C-V2X Use Cases - Methodology, Examples and Service-Level Requirements (July2019) PDF
77 pages
Concept of Intelligent Network: K S Madanpuri
No ratings yet
Concept of Intelligent Network: K S Madanpuri
26 pages
DCS Yokogawa Control
No ratings yet
DCS Yokogawa Control
26 pages
Practical 66
No ratings yet
Practical 66
4 pages
Manual Honda Eu10i
No ratings yet
Manual Honda Eu10i
230 pages
Vegemite
No ratings yet
Vegemite
3 pages
Possible Legality of Bitcoin in Nepal
No ratings yet
Possible Legality of Bitcoin in Nepal
54 pages
Ma Victoria
No ratings yet
Ma Victoria
5 pages
What Is Correct Method of MCB Connections: Electrical Notes & Articles
No ratings yet
What Is Correct Method of MCB Connections: Electrical Notes & Articles
7 pages
Cambridge International AS & A Level: Information Technology 9626/04
No ratings yet
Cambridge International AS & A Level: Information Technology 9626/04
8 pages
FortiNAC Demo Walkthrough
No ratings yet
FortiNAC Demo Walkthrough
13 pages
Modernism Architecture and Le Corbusier
No ratings yet
Modernism Architecture and Le Corbusier
3 pages
Traffic-Controller 10281
No ratings yet
Traffic-Controller 10281
4 pages
Port V1
No ratings yet
Port V1
22 pages
FortiAP Configuration
No ratings yet
FortiAP Configuration
205 pages
Mil HDBK 284 - 2
No ratings yet
Mil HDBK 284 - 2
323 pages
Security Architecture and Procedures For 5G System
No ratings yet
Security Architecture and Procedures For 5G System
194 pages
Ficha Tecnica
No ratings yet
Ficha Tecnica
1 page