0% found this document useful (0 votes)

5 views

Data engineering Flow-

Data engineering serves as a bridge between data producers and consumers, facilitating the management of large volumes of data generated by various applications. The end-to-end data pipeline involves data ingestion, ETL processes, and storage solutions, with a focus on both structured and unstructured data. Key responsibilities include effective communication, cost management, data architecture, and ensuring data reliability and security.

Uploaded by

daivshaladhepale

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

5 views

Data engineering Flow-

Uploaded by

daivshaladhepale

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 4

Data engineering Flow:

Genration-Insertio-trasformation-serving.

Data Generation Applications- Mysql,postgresql ,mongodb,

Third party application like stripe,salesforce,Google analytics etc.

Upstream( date producers) → Data manipulation (data engineering)-> Downstram(Data

Consumers)

Dats engineering is the bridge between date producers and data Consumers.
Here, Data Producers using different platforms produces data and Consumers uses it for
analysis and decision making.

Data engineering can be done on stramed data(kafka & Flink) as well as stored data.

Why Date Engineering is imp?

like there are many platforms and apps get used nowadays and produces large amount of data,
so to manage that data data engineering is helpful.

End to End Data pipeline-

-The receiving of data and storing process is called Data Ingestion.

computr vs Storage:

At top of data pipeline the is compute and at bottom the is Storage.

*Mpp-Massively parallel processing.

It can process on large amount of data very
quickly by splitis the date set into smaller and process them in parallel.

ETL -
E- Is the proccess of extraction of data or taking the data out of sources.
T- is nothing but transformation of data in more usable formats.
L- Loading data into storage.

On primices- company purchase their own networks and stored it on their premises.
• Hadoop - Allow data engineers to handle data on scale of terabyte and Petabyte.

Modern date stack : made up with collections of open sourse platforms and third party tools that
connet together.

* Data Maturity- Determine the complexity of data proccess or pipeline.

Simply, how data is using by different organisation with respect to huge amount of data
Storage -Scale- Lead

Responsibilities of data engineering→

Communicate with both technical and non technical.

Understand how to store and manage.
Minimise cost and work with budget.
Create good data architecture.
Build and manage reliability.
Security, data governance, automation and observation.

Structured Data- (Row Based)

Sql,
Used for BI, ML to detect small features.

Unstructured Data(ColumnBased)
Audio,video,images,log files
Deep learning, neural networks to detect large and micro features.

Event Strams-

Produces Websites-date producering

Event broker- Storage distribution.

Event Consumers-Consumes data in real time.

Challenges:-
Messages come in asynchronously.
Ordering
Duplicacy of data,

Idempotemy: An operation is idempot when the same result comes out no matter how many
times you run it. (imp to manage Duplicate data)
Popular event streaming platform -AWA SQS, Amazon kinesis, Rabbit MQ. Kafka,
pulsar ,spark.

Storage:- Central part of dato pipeline.

HDD- Hard disk drive - Traditional magnetic disk drives that have a rotating disk and arm.

SSD - solid state Drive - faster kind of hard disk.

RAM - Faster than SSD in terms of latency (temp. Memory)

Networking Cloud storage.

Serialisation : Turning data into byte stream to easily save and transport it.

Serialise data into std. fromat which is sent around and deserialized on the receiving end.

Row Based- xml,json,csv

Column Based- parquet,orc,

• Single Machina vs Distributed storage

Vertical Scaling and Horizontal Scaling.

Strong VS eventual consistancy:

When the system doesn't allow read operation until all the nodes with replicated data are
updated.

User read requests are not halted till all the replicas are updated rathe than upadate process is
eventual.

Some user might receive old data but eventually all the data is updated thr latest data.

ACID VS BASE-
ACID- Single machine, Strong Consistency
BASE- Distributed Consistency, Eventual Consistency

Storage systems:-

File Storage- Local File,NAS,Cloud storage.

Block Storage -
HDD/SSD, AWS EBS
Object Storage -AWS, S3
Cache Storage - RAM, redis
Streaming storage-Buffering

2024 07 Eb Big Book of Data Engineering 3rd Edition
100% (2)
2024 07 Eb Big Book of Data Engineering 3rd Edition
125 pages
Data Engineering Course Outline
No ratings yet
Data Engineering Course Outline
3 pages
Big Book of Data Engineering 2nd Edition Final
No ratings yet
Big Book of Data Engineering 2nd Edition Final
97 pages
AZ-900 Exam Study Guide (Microsoft Azure Fundamentals)
No ratings yet
AZ-900 Exam Study Guide (Microsoft Azure Fundamentals)
32 pages
100 Dataengineering Interview Questions TRRaveendra 1694654407
No ratings yet
100 Dataengineering Interview Questions TRRaveendra 1694654407
58 pages
Data Engineering Quick Reference
No ratings yet
Data Engineering Quick Reference
9 pages
Google Cloud Platform for Data Engineering: From Beginner to Data Engineer using Google Cloud Platform
From Everand
Google Cloud Platform for Data Engineering: From Beginner to Data Engineer using Google Cloud Platform
alasdair gilchrist
5/5 (1)
APO Overview
100% (1)
APO Overview
76 pages
Manual Testing Experienced Resume
77% (13)
Manual Testing Experienced Resume
4 pages
Course1_summary
No ratings yet
Course1_summary
4 pages
essentials-of-data-engineeringByMukeshSaini
No ratings yet
essentials-of-data-engineeringByMukeshSaini
30 pages
4.data Engineering
No ratings yet
4.data Engineering
9 pages
Essentials of Data Engineering -- Saini, Dr_ Mukesh -- 2024 -- Bb50f635b916a3edd2d60d5109fbb873 -- Anna’s Archive (1)
No ratings yet
Essentials of Data Engineering -- Saini, Dr_ Mukesh -- 2024 -- Bb50f635b916a3edd2d60d5109fbb873 -- Anna’s Archive (1)
431 pages
Data Engineering UNIT-1 (2)
No ratings yet
Data Engineering UNIT-1 (2)
5 pages
Data Engineering Unit-1
No ratings yet
Data Engineering Unit-1
16 pages
DATA_ENGINEER QUESTIONS
No ratings yet
DATA_ENGINEER QUESTIONS
3 pages
Ds Notes
No ratings yet
Ds Notes
88 pages
Big Data
No ratings yet
Big Data
51 pages
Data Engineering Top 100 Questions
No ratings yet
Data Engineering Top 100 Questions
59 pages
roadmap
No ratings yet
roadmap
3 pages
100_data_engineering_QUESTIONS_ANSWERS
No ratings yet
100_data_engineering_QUESTIONS_ANSWERS
59 pages
L1_Introduction and Data EcoSystem
No ratings yet
L1_Introduction and Data EcoSystem
42 pages
Data Engineering
No ratings yet
Data Engineering
6 pages
12 Must-Have Skills To Become A Data Engineer - by Anuj Syal - DataDrivenInvestor
No ratings yet
12 Must-Have Skills To Become A Data Engineer - by Anuj Syal - DataDrivenInvestor
9 pages
Lecture Notes Ch1 (1)
No ratings yet
Lecture Notes Ch1 (1)
24 pages
C1_W1
No ratings yet
C1_W1
91 pages
Big Book of Data Engineering 2nd Edition Final
No ratings yet
Big Book of Data Engineering 2nd Edition Final
97 pages
big-book-of-data-engineering-3rd-edition-1-27-2025
No ratings yet
big-book-of-data-engineering-3rd-edition-1-27-2025
126 pages
Data Engineering UNIT-1
No ratings yet
Data Engineering UNIT-1
14 pages
OD 02 PDE Designing Data Processing Systems
No ratings yet
OD 02 PDE Designing Data Processing Systems
67 pages
1 Intro
No ratings yet
1 Intro
33 pages
DE Skills and Tools Guide
No ratings yet
DE Skills and Tools Guide
20 pages
The Big Book of Data Engineering: A Collection of Technical Blogs, Including Code Samples and Notebooks
100% (2)
The Big Book of Data Engineering: A Collection of Technical Blogs, Including Code Samples and Notebooks
57 pages
Unit 3 (1)
No ratings yet
Unit 3 (1)
16 pages
Lecture 1.1 - Introduction To DE
No ratings yet
Lecture 1.1 - Introduction To DE
27 pages
Introduction to Data Engineering
No ratings yet
Introduction to Data Engineering
13 pages
A Internship Report UTTAM
No ratings yet
A Internship Report UTTAM
9 pages
InfoQ Modern Data Architectures Pipelines Streams
No ratings yet
InfoQ Modern Data Architectures Pipelines Streams
42 pages
Unit 2 - BD - Big Data Technology Foundations
No ratings yet
Unit 2 - BD - Big Data Technology Foundations
76 pages
Data Engineeing 1 Pages 2
No ratings yet
Data Engineeing 1 Pages 2
14 pages
Chapter Two Data Science: by Abdulaziz Oumer
No ratings yet
Chapter Two Data Science: by Abdulaziz Oumer
29 pages
Fundamentals of Data Engineering Index
No ratings yet
Fundamentals of Data Engineering Index
17 pages
Data Enginner Roadmap
No ratings yet
Data Enginner Roadmap
5 pages
test 1 big data
No ratings yet
test 1 big data
17 pages
Introduction To Data Engineering
No ratings yet
Introduction To Data Engineering
8 pages
Ds 6
No ratings yet
Ds 6
7 pages
Data Engineering - Session 01
No ratings yet
Data Engineering - Session 01
34 pages
Page 2
No ratings yet
Page 2
3 pages
M1.2 Building A Data Lake
No ratings yet
M1.2 Building A Data Lake
60 pages
PPT 1.1.2
No ratings yet
PPT 1.1.2
17 pages
Data Engineering 101
No ratings yet
Data Engineering 101
1 page
Unit 2 - BD - Big Data Technology Foundations
No ratings yet
Unit 2 - BD - Big Data Technology Foundations
44 pages
OD M2 Building A Data Lake
No ratings yet
OD M2 Building A Data Lake
59 pages
UNIT 1 To 5
No ratings yet
UNIT 1 To 5
37 pages
Terminologie_chap1
No ratings yet
Terminologie_chap1
17 pages
Data Engineering - Session 03
No ratings yet
Data Engineering - Session 03
26 pages
2 emerging
No ratings yet
2 emerging
10 pages
Data Engineer Toolkit in 2025_Must‑Have Skills, Tools & Resources _ by Vijay Gadhave _ May, 2025 _ Medium
No ratings yet
Data Engineer Toolkit in 2025_Must‑Have Skills, Tools & Resources _ by Vijay Gadhave _ May, 2025 _ Medium
15 pages
Data Engineering Notes
No ratings yet
Data Engineering Notes
4 pages
unit 1 b tech 3 year bd
No ratings yet
unit 1 b tech 3 year bd
10 pages
Chapter 2 - Data Science
No ratings yet
Chapter 2 - Data Science
20 pages
Edge AI Solutions
From Everand
Edge AI Solutions
Kai Turing
No ratings yet
Learn Hadoop in 24 Hours
From Everand
Learn Hadoop in 24 Hours
Alex Nordeen
No ratings yet
2 Life+Insurance+Case+Study
No ratings yet
2 Life+Insurance+Case+Study
3 pages
Parallel File System
No ratings yet
Parallel File System
7 pages
Software Requirements Specification
No ratings yet
Software Requirements Specification
16 pages
Prevent SQL Injection
No ratings yet
Prevent SQL Injection
2 pages
Services Offer: Services Offer:: Brgy. Smart, Gonzaga, Cagayan Valley
No ratings yet
Services Offer: Services Offer:: Brgy. Smart, Gonzaga, Cagayan Valley
1 page
FCP_Sec_Ops_FortiAnalyzer_Analyst_7_4_Study_Guide_stamped
No ratings yet
FCP_Sec_Ops_FortiAnalyzer_Analyst_7_4_Study_Guide_stamped
216 pages
Certificate of Completion: Electronic Record and Signature Disclosure
No ratings yet
Certificate of Completion: Electronic Record and Signature Disclosure
6 pages
ArpitGupta (3 9) PDF
No ratings yet
ArpitGupta (3 9) PDF
8 pages
NP2018 Mod05 IM Final
No ratings yet
NP2018 Mod05 IM Final
23 pages
Online Advertising
No ratings yet
Online Advertising
15 pages
Cloud Case Studies
No ratings yet
Cloud Case Studies
4 pages
Difference between Azure Service Manager and Azure Resource Manager
No ratings yet
Difference between Azure Service Manager and Azure Resource Manager
1 page
Sex Novels Free PDF - Google Search
33% (3)
Sex Novels Free PDF - Google Search
2 pages
Sad Assignment 2
No ratings yet
Sad Assignment 2
5 pages
Chapter 7
No ratings yet
Chapter 7
41 pages
UGRD-CS6209 Software Engineering 1 Prelim Quiz 1
No ratings yet
UGRD-CS6209 Software Engineering 1 Prelim Quiz 1
2 pages
1 Vitaly Danilchenko 2023 Aug (CAI)
No ratings yet
1 Vitaly Danilchenko 2023 Aug (CAI)
4 pages
An Overview of The Samsung KNOX Platform
No ratings yet
An Overview of The Samsung KNOX Platform
20 pages
GEE5 Long Examination 2: Multiple Choice (1/5)
No ratings yet
GEE5 Long Examination 2: Multiple Choice (1/5)
20 pages
Programming in C 1: Overview of Computers and Programming
No ratings yet
Programming in C 1: Overview of Computers and Programming
22 pages
Network and Security Iti
No ratings yet
Network and Security Iti
64 pages
Chapter No - 1
No ratings yet
Chapter No - 1
16 pages
University of Eldoret Main Campus School of Science Mathematics and Computer Science Department. Kiptoo Nickson INF/039/17
No ratings yet
University of Eldoret Main Campus School of Science Mathematics and Computer Science Department. Kiptoo Nickson INF/039/17
34 pages
Java Notes
No ratings yet
Java Notes
4 pages
Saplogon Exe Version
No ratings yet
Saplogon Exe Version
9 pages
OGSA
No ratings yet
OGSA
164 pages
Bolt Iot Student Partner Internship Program
No ratings yet
Bolt Iot Student Partner Internship Program
2 pages

Data engineering Flow-

Uploaded by

Data engineering Flow-

Uploaded by

Data engineering Flow:

Data Generation Applications- Mysql,postgresql ,mongodb,

Upstream( date producers) → Data manipulation (data engineering)-> Downstram(Data

Why Date Engineering is imp?

End to End Data pipeline-

-The receiving of data and storing process is called Data Ingestion.

At top of data pipeline the is compute and at bottom the is Storage.

*Mpp-Massively parallel processing.

* Data Maturity- Determine the complexity of data proccess or pipeline.

Responsibilities of data engineering→

Communicate with both technical and non technical.

Structured Data- (Row Based)

Produces Websites-date producering

Event broker- Storage distribution.

Event Consumers-Consumes data in real time.

Storage:- Central part of dato pipeline.

SSD - solid state Drive - faster kind of hard disk.

RAM - Faster than SSD in terms of latency (temp. Memory)

Networking Cloud storage.

Row Based- xml,json,csv

Column Based- parquet,orc,

• Single Machina vs Distributed storage

Strong VS eventual consistancy:

File Storage- Local File,NAS,Cloud storage.

You might also like