0% found this document useful (0 votes)

29 views22 pages

Data Engineering Part 1 1735286787

Uploaded by

Peshin Kunal

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

29 views22 pages

Data Engineering Part 1 1735286787

Uploaded by

Peshin Kunal

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 22

DATA

ENGINEERING
TERMS YOU NEED TO KNOW
PART - 1
Don't Forget to
Save For Later

1. Data Pipeline

INJESTION COMPUTATION

COLLLETION PREPARATION PRESENTATION

A data pipeline is a series of interconnected processes

that automate the flow of data from various sources,
through transformation steps, and into a destination
system, such as a data warehouse, data lake, or analytics
platform. Pipelines ensure that data is collected,
processed, and made available for analysis with minimal
manual intervention.
Don't Forget to
Save For Later

2. ETL
(Extract, Transform, Load)

Extract Data Transform Load in

database

ETL is a data integration process involving three key stages:

Extract: Gathering data from diverse sources,

such as databases, APIs, or files.

Transform: Cleaning, validating, and converting

the data into the desired format or structure.

Load: Moving the transformed data into a

destination system like a data warehouse or
database for further analysis.
Don't Forget to
Save For Later

3. Data Lake

A data lake is a large, centralized repository

that stores vast amounts of unstructured, semi-
structured, and structured data at scale. Unlike
traditional databases, data lakes hold raw data
in its native form, enabling storage flexibility and
easier access for machine learning, analytics,
and future processing.
Don't Forget to
Save For Later

4. Data Warehouse

A data warehouse is a centralized system designed

for storing structured data from different sources,
optimized for querying and analysis. Data warehouses
typically follow a schema-on-write approach, where
data is pre-processed and structured for high-
performance analytics, commonly used in business
intelligence (BI).
Don't Forget to
Save For Later

5. Data Governance

Data governance refers to the processes, policies, and

standards implemented to ensure that data is
accurate, accessible, secure, and used responsibly
across an organization. Effective governance includes
data quality management, compliance with
regulations, and ensuring the proper handling of
sensitive data.
Don't Forget to
Save For Later

6. Data Quality

Data quality refers to the condition of data, focusing on

factors such as accuracy, completeness, consistency,
timeliness, and reliability. High-quality data is essential
for ensuring accurate analysis, business intelligence,
and decision-making.
Don't Forget to
Save For Later

7. Data Cleansing

Data cleansing is the process of identifying and

correcting or removing erroneous, incomplete, or
inconsistent data from datasets. It helps improve the
quality and reliability of data before it is used for
analysis or decision-making.
Don't Forget to
Save For Later

8. Data Modeling

Data modeling involves creating a conceptual

representation of data and its relationships. It helps in
organizing data into structured formats (e.g., tables,
entities, and attributes) that align with business
needs, making it easier to store, query, and analyze
data effectively.
Don't Forget to
Save For Later

9. Data Integration

Data integration is the process of combining data

from different sources, both internal and external, into
a unified view. It ensures that all data is accessible
and can be analyzed holistically, improving decision-
making and insights across the organization.
Don't Forget to
Save For Later

10. Data Orchestration

Data orchestration is the automated management and

coordination of data workflows across multiple systems.
It ensures that data flows smoothly between different
stages of a pipeline, including extraction, transformation,
and loading, and handles scheduling, dependencies, and
error management.
Don't Forget to
Save For Later

11. Data Transformation

Data transformation refers to modifying data into a

desired format or structure for analysis, reporting, or
integration with other systems. This can include
actions such as aggregating data, filtering, sorting,
and converting data types.
Don't Forget to
Save For Later

12. Real-time Data

Processing

Real-time data processing involves collecting and

analyzing data instantly as it is generated, often used
for systems such as IoT, social media, or financial
trading platforms. It enables quick decision-making
and actions based on live data.
Don't Forget to
Save For Later

13. Batch Processing

Batch processing refers to processing large

volumes of data in chunks, typically at scheduled
intervals. This approach is often used when real-
time processing isn't required, and it allows for
efficient handling of massive datasets, such as
overnight data processing.
Don't Forget to
Save For Later

14. Cloud Data Platform

A cloud data platform is a data storage and analytics

solution hosted on cloud services like AWS, Azure, or
Google Cloud. These platforms offer scalability,
flexibility, and reduced infrastructure management
overhead, enabling organizations to store, process,
and analyze data from anywhere in the world.
Don't Forget to
Save For Later

15. Data Sharding

Data sharding is the practice of partitioning a large

database into smaller, more manageable pieces called
"shards," which can be stored on different servers.
Sharding improves performance by distributing the
load across multiple systems, allowing for more
efficient querying and scaling.
Don't Forget to
Save For Later

16. Data Partitioning

Data partitioning involves dividing large datasets into

smaller, logically separated sections, or partitions. This
can enhance performance and manageability by
allowing data to be processed and queried in parallel,
improving scalability and reducing latency.
Don't Forget to
Save For Later

17. Data Source

A data source refers to any origin from which

data is collected. This can include databases,
applications, websites, APIs, or external data
providers, each of which contributes raw data
for processing and analysis.
Don't Forget to
Save For Later

18. Data Schema

A data schema is a blueprint that defines the

structure of a database or data model. It includes
specifications such as tables, columns, relationships,
constraints, and data types. A schema ensures that
data is organized in a logical and efficient manner for
querying and storage.
Don't Forget to
Save For Later

19. Data Warehouse

Automation (DWA)

Data Warehouse Automation (DWA) refers to tools

and technologies that streamline the creation,
management, and maintenance of data
warehouses. Automation reduces the need for
manual intervention and increases the speed and
consistency of data management processes.
Don't Forget to
Save For Later

20. Metadata

Metadata is "data about data." It provides context

and additional information about the structure,
source, quality, and relationships of data. Examples
include data types, table descriptions, and field
names, which help data engineers and analysts
understand how to use and interpret the data.
Don't Forget to
Save For Later

Was it useful?
Let me know in the comments

@theravitshow

Nursing Care Plans
100% (3)
Nursing Care Plans
10 pages
Ch5 Big Data and Analytics Definitions
No ratings yet
Ch5 Big Data and Analytics Definitions
2 pages
BA
No ratings yet
BA
27 pages
Notes For DMML
No ratings yet
Notes For DMML
27 pages
fKRnNEEKSXed0KQFCUv8XQ Glossary
No ratings yet
fKRnNEEKSXed0KQFCUv8XQ Glossary
8 pages
Big Data Question
No ratings yet
Big Data Question
24 pages
DAunit 1
No ratings yet
DAunit 1
22 pages
Business Analytics
No ratings yet
Business Analytics
3 pages
DA Assignment 20241015 091512 0000
No ratings yet
DA Assignment 20241015 091512 0000
19 pages
BABOK 3 ONLINE - A Guide To The Business Analysis Body of Knowledge
99% (68)
BABOK 3 ONLINE - A Guide To The Business Analysis Body of Knowledge
514 pages
INTRODUCTION To Wholeness of DATA Anayltics
No ratings yet
INTRODUCTION To Wholeness of DATA Anayltics
12 pages
21CS71 Imp
No ratings yet
21CS71 Imp
29 pages
Business Intelligence Notes
No ratings yet
Business Intelligence Notes
27 pages
Summary Chapter 3 and 4
No ratings yet
Summary Chapter 3 and 4
9 pages
Data Engineering - Session 01
No ratings yet
Data Engineering - Session 01
34 pages
BI Architecture - 1
No ratings yet
BI Architecture - 1
11 pages
Hair PPT Ch02
No ratings yet
Hair PPT Ch02
15 pages
Digital and Leadership Acumen
No ratings yet
Digital and Leadership Acumen
35 pages
Data 101 Terms
No ratings yet
Data 101 Terms
6 pages
Data Terms 1714351092
No ratings yet
Data Terms 1714351092
12 pages
Week 5 Chapter 6
No ratings yet
Week 5 Chapter 6
29 pages
Group 4
No ratings yet
Group 4
10 pages
Da Unit-I
No ratings yet
Da Unit-I
19 pages
Data Engineering - Part 2
No ratings yet
Data Engineering - Part 2
22 pages
Introduction To Business Analytics
No ratings yet
Introduction To Business Analytics
63 pages
Chapter 2 Data Science
No ratings yet
Chapter 2 Data Science
28 pages
Data Warehouse
No ratings yet
Data Warehouse
11 pages
Data Warehouse
No ratings yet
Data Warehouse
14 pages
Moshi Moshi
No ratings yet
Moshi Moshi
25 pages
DSS ch2
No ratings yet
DSS ch2
112 pages
It125 Finals
No ratings yet
It125 Finals
16 pages
Dmbi Question Bank
No ratings yet
Dmbi Question Bank
21 pages
Chapter 1 Data Warehouse Fundamentals
No ratings yet
Chapter 1 Data Warehouse Fundamentals
26 pages
Introduction To Emerging Technologies Chapter 2
No ratings yet
Introduction To Emerging Technologies Chapter 2
31 pages
TIS Chapter 3
No ratings yet
TIS Chapter 3
36 pages
Ch2 Emerging
No ratings yet
Ch2 Emerging
24 pages
Grossary 6
No ratings yet
Grossary 6
7 pages
Difference Between OLAP and OLTP: Feature OLAP (Online Analytical Processing) OLTP (Online Transaction Processing)
No ratings yet
Difference Between OLAP and OLTP: Feature OLAP (Online Analytical Processing) OLTP (Online Transaction Processing)
34 pages
Chapter Two Data Science: by Abdulaziz Oumer
No ratings yet
Chapter Two Data Science: by Abdulaziz Oumer
29 pages
Ch-03-1 Unlocked 2
No ratings yet
Ch-03-1 Unlocked 2
45 pages
Presentation 20
No ratings yet
Presentation 20
31 pages
Practical Examples of Security Risk Assessment For ICS 1691482756
No ratings yet
Practical Examples of Security Risk Assessment For ICS 1691482756
96 pages
Unit1 Introduction To Data Analytics and Data Analytics Lifecycle Notes
No ratings yet
Unit1 Introduction To Data Analytics and Data Analytics Lifecycle Notes
13 pages
Data Analytics For IOT
No ratings yet
Data Analytics For IOT
57 pages
ACC IT APP MIdterm Bigdata
No ratings yet
ACC IT APP MIdterm Bigdata
12 pages
Data Processing
No ratings yet
Data Processing
5 pages
Lecture 2
No ratings yet
Lecture 2
14 pages
YAMAHA OUTBOARD LZ200NETO, LZ200TR Service Repair Manual X 100101 PDF
No ratings yet
YAMAHA OUTBOARD LZ200NETO, LZ200TR Service Repair Manual X 100101 PDF
60 pages
Babok Visual v3
92% (12)
Babok Visual v3
218 pages
Release Train Engineer Starter Guide
100% (1)
Release Train Engineer Starter Guide
35 pages
Mit Topic 3
No ratings yet
Mit Topic 3
7 pages
BDA Assignment 1: Big Data Features and Characteristics
No ratings yet
BDA Assignment 1: Big Data Features and Characteristics
14 pages
Datawarehouse and Data Mining Final Notes
No ratings yet
Datawarehouse and Data Mining Final Notes
9 pages
Big Data
No ratings yet
Big Data
10 pages
Unit 1
No ratings yet
Unit 1
36 pages
Data Infrastructure
No ratings yet
Data Infrastructure
7 pages
Chapter-2 Data Science2
No ratings yet
Chapter-2 Data Science2
24 pages
Ds Notes
No ratings yet
Ds Notes
88 pages
8 Different Ways To Organise Your Product Backlog 1705471595
No ratings yet
8 Different Ways To Organise Your Product Backlog 1705471595
11 pages
#2 Data Science
No ratings yet
#2 Data Science
32 pages
Agile WoW
No ratings yet
Agile WoW
115 pages
Data Mining v3
No ratings yet
Data Mining v3
54 pages
Chapter 2
No ratings yet
Chapter 2
27 pages
4th Gen Core Family Desktop Vol 1 Datasheet
No ratings yet
4th Gen Core Family Desktop Vol 1 Datasheet
125 pages
Transformers (Level-1 & 2)
No ratings yet
Transformers (Level-1 & 2)
45 pages
Chapter 12 - Sheep and Goat Meat Characteristics and Quality PDF
No ratings yet
Chapter 12 - Sheep and Goat Meat Characteristics and Quality PDF
16 pages
TOGAF-EA-Practitioner - Delegate Pack v1.2
80% (5)
TOGAF-EA-Practitioner - Delegate Pack v1.2
346 pages
MT514KTA1BHW
No ratings yet
MT514KTA1BHW
27 pages
Data Mining 1
No ratings yet
Data Mining 1
13 pages
Storytelling With Data Cole Nussbaumer Knaflic
100% (46)
Storytelling With Data Cole Nussbaumer Knaflic
291 pages
Rte Infographic v2
No ratings yet
Rte Infographic v2
1 page
Enterprise Architecture Template Framework Finprosys
93% (14)
Enterprise Architecture Template Framework Finprosys
108 pages
Business Intelligence?: BI Used For?
No ratings yet
Business Intelligence?: BI Used For?
9 pages
Fda 1
No ratings yet
Fda 1
5 pages
Itil V4
100% (13)
Itil V4
260 pages
Generative Ai Fundamentals v1
100% (16)
Generative Ai Fundamentals v1
80 pages
Retinoblastoma Case Presentation
No ratings yet
Retinoblastoma Case Presentation
40 pages
Logiq 500 GE
No ratings yet
Logiq 500 GE
407 pages
TOGAF® 9 Training Course - Level 1 Foundation 3.1.0 EN
75% (4)
TOGAF® 9 Training Course - Level 1 Foundation 3.1.0 EN
246 pages
SlideShare - Business Architecture Basics
100% (10)
SlideShare - Business Architecture Basics
101 pages
OKR Maturity Model
No ratings yet
OKR Maturity Model
17 pages
Design Thinking Methodology Book
88% (24)
Design Thinking Methodology Book
119 pages
BARTEC Engineers Manual
No ratings yet
BARTEC Engineers Manual
12 pages
Acetyline Cylinder PDF
No ratings yet
Acetyline Cylinder PDF
2 pages
Top 30 Strategy Templates Presentation 1742141718
No ratings yet
Top 30 Strategy Templates Presentation 1742141718
30 pages
Financial Modelling PDF
No ratings yet
Financial Modelling PDF
2 pages
Enterprise Systems Architecture Aligning Business Operating Models To Technology Landscapes (Daljit Roy Banger)
100% (5)
Enterprise Systems Architecture Aligning Business Operating Models To Technology Landscapes (Daljit Roy Banger)
311 pages
Business Analysis For Beginners - Mohamed Elgendy PDF
100% (10)
Business Analysis For Beginners - Mohamed Elgendy PDF
166 pages
Data Governance Playbook
100% (16)
Data Governance Playbook
168 pages
Zoology Non Chordata
No ratings yet
Zoology Non Chordata
525 pages
HybridExpert Client-Attraction-Calculator 060723 v2 Coupon
No ratings yet
HybridExpert Client-Attraction-Calculator 060723 v2 Coupon
12 pages
The Chief Strategy Officer Playbook PDF
100% (10)
The Chief Strategy Officer Playbook PDF
176 pages
Facilitator's Guide To SAFe - Iteration Planning
No ratings yet
Facilitator's Guide To SAFe - Iteration Planning
4 pages
Royal Park Property Development Limited
No ratings yet
Royal Park Property Development Limited
7 pages
Business and Consulting Toolkits - Sample
90% (10)
Business and Consulting Toolkits - Sample
114 pages
TOGAF 9 Enterprise Architecture Overview
100% (3)
TOGAF 9 Enterprise Architecture Overview
55 pages
Enterprise Architecture
100% (11)
Enterprise Architecture
154 pages
Contributions of Muslim Scientists
No ratings yet
Contributions of Muslim Scientists
5 pages
OKR Quickstart Facilitation Kit
No ratings yet
OKR Quickstart Facilitation Kit
23 pages
Designing Enterprise Architecture Frameworks - Integrating Business Processes With IT Infrastructure (PDFDrive)
100% (7)
Designing Enterprise Architecture Frameworks - Integrating Business Processes With IT Infrastructure (PDFDrive)
355 pages
Enterprise Architecture and Innovation Management Ebook
100% (3)
Enterprise Architecture and Innovation Management Ebook
164 pages
Enterprise Architecture
83% (12)
Enterprise Architecture
365 pages
Epic Failures in DevSecOps V1
No ratings yet
Epic Failures in DevSecOps V1
156 pages
Business & Consulting Toolkits - Free Sample in Powerpoint
100% (18)
Business & Consulting Toolkits - Free Sample in Powerpoint
131 pages
School Brochure 2024-2025
No ratings yet
School Brochure 2024-2025
2 pages
OKR Cheat Sheet OKR Quickstart
No ratings yet
OKR Cheat Sheet OKR Quickstart
2 pages
MGD Lime Projects - Activation Schedule (01 April 2025) Calls
No ratings yet
MGD Lime Projects - Activation Schedule (01 April 2025) Calls
81 pages
MOS-II Lecture 01 - Stress Analysis
No ratings yet
MOS-II Lecture 01 - Stress Analysis
25 pages
The 7Cs For Effective Communication
No ratings yet
The 7Cs For Effective Communication
2 pages
My Own Togaf Notes
88% (8)
My Own Togaf Notes
87 pages
Unlocking The Power of Generative AI With Microsoft 365 1720852440
No ratings yet
Unlocking The Power of Generative AI With Microsoft 365 1720852440
17 pages
792224FT-371008 - 207-00011
No ratings yet
792224FT-371008 - 207-00011
13 pages
In-Line Mixing
No ratings yet
In-Line Mixing
9 pages
5 Rules To Live by To Be A Great Leader
No ratings yet
5 Rules To Live by To Be A Great Leader
6 pages
2014 Capstone Team Member Guide
No ratings yet
2014 Capstone Team Member Guide
28 pages
Jesus Iteration Review Template
No ratings yet
Jesus Iteration Review Template
1 page
Presentation Value Period at Scrum Day Germany 1690051450
No ratings yet
Presentation Value Period at Scrum Day Germany 1690051450
33 pages
Product Backlog Anti Patterns To Avoid 1729149473
No ratings yet
Product Backlog Anti Patterns To Avoid 1729149473
16 pages
Agile Metricskk
No ratings yet
Agile Metricskk
12 pages
J24 Jimmys Combo
No ratings yet
J24 Jimmys Combo
54 pages
Togaf Overview in One Page
100% (4)
Togaf Overview in One Page
1 page
C196tc1e PDF
100% (2)
C196tc1e PDF
523 pages
Enterprise Architecture Principles
100% (5)
Enterprise Architecture Principles
5 pages
KunalPeshin LeanAgilePgMgr. 2024
No ratings yet
KunalPeshin LeanAgilePgMgr. 2024
2 pages
OKR Design Thinking Scrum 1692287144
No ratings yet
OKR Design Thinking Scrum 1692287144
3 pages
Which Advertisement. Next To Each Statement Write A Letter (A-H) - Some Advertisements Correspond To More Than One Statement. One Example Is Given
No ratings yet
Which Advertisement. Next To Each Statement Write A Letter (A-H) - Some Advertisements Correspond To More Than One Statement. One Example Is Given
9 pages
Aws Lambda Tutorial
86% (7)
Aws Lambda Tutorial
393 pages
Curriculum Plan Grade 3
No ratings yet
Curriculum Plan Grade 3
1 page
Bacte - Medically Significant Fungi
No ratings yet
Bacte - Medically Significant Fungi
4 pages
For Placement
No ratings yet
For Placement
7 pages
Catatonia and ECT Across The Lifespan - 2024 - Schizophrenia Research
No ratings yet
Catatonia and ECT Across The Lifespan - 2024 - Schizophrenia Research
6 pages
The Dandy System 8
No ratings yet
The Dandy System 8
1 page
Data Governance Best Practices
100% (5)
Data Governance Best Practices
50 pages
FSSAI - Internship Portal
No ratings yet
FSSAI - Internship Portal
3 pages
Enterprise Architecture Implementation and The Open Group Architecture Framework (TOGAF)
100% (11)
Enterprise Architecture Implementation and The Open Group Architecture Framework (TOGAF)
304 pages
MVP Comprehensive Resource Impacts Agreement
No ratings yet
MVP Comprehensive Resource Impacts Agreement
16 pages
Basics of Share Allotement
No ratings yet
Basics of Share Allotement
3 pages
Installation Instructions: Balance Bars
No ratings yet
Installation Instructions: Balance Bars
2 pages
B 2 B Sales Manager Checklist
No ratings yet
B 2 B Sales Manager Checklist
1 page
Develop An Enterprise Architecture Vision
100% (1)
Develop An Enterprise Architecture Vision
61 pages
Agile IndEA Framework V 1.0 - 0 PDF
No ratings yet
Agile IndEA Framework V 1.0 - 0 PDF
44 pages
Database And Computer Management: SERIES 1, #3
From Everand
Database And Computer Management: SERIES 1, #3
Elias Mutegi
No ratings yet
Database Management System
From Everand
Database Management System
Manish Soni
No ratings yet

Data Engineering Part 1 1735286787

Uploaded by

Data Engineering Part 1 1735286787

Uploaded by

DATA

COLLLETION PREPARATION PRESENTATION

A data pipeline is a series of interconnected processes

Extract Data Transform Load in

ETL is a data integration process involving three key stages:

Extract: Gathering data from diverse sources,

Transform: Cleaning, validating, and converting

Load: Moving the transformed data into a

A data lake is a large, centralized repository

A data warehouse is a centralized system designed

Data governance refers to the processes, policies, and

Data quality refers to the condition of data, focusing on

Data cleansing is the process of identifying and

Data modeling involves creating a conceptual

Data integration is the process of combining data

10. Data Orchestration

Data orchestration is the automated management and

11. Data Transformation

Data transformation refers to modifying data into a

12. Real-time Data

Real-time data processing involves collecting and

13. Batch Processing

Batch processing refers to processing large

14. Cloud Data Platform

A cloud data platform is a data storage and analytics

15. Data Sharding

Data sharding is the practice of partitioning a large

16. Data Partitioning

Data partitioning involves dividing large datasets into

17. Data Source

A data source refers to any origin from which

18. Data Schema

A data schema is a blueprint that defines the

19. Data Warehouse

Data Warehouse Automation (DWA) refers to tools

Metadata is "data about data." It provides context

You might also like