0% found this document useful (0 votes)
0 views

Lecture 3 Data Engineering Concepts, Processes, and Tools

Data engineering is essential for preparing and making data usable for analytics, machine learning, and AI projects within organizations. It involves a series of processes including data ingestion, transformation, and serving, which are managed through data pipelines to ensure data quality and accessibility. The discipline addresses challenges posed by disparate data sources and formats, enabling clearer insights into business operations.

Uploaded by

genesiskalya
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
0 views

Lecture 3 Data Engineering Concepts, Processes, and Tools

Data engineering is essential for preparing and making data usable for analytics, machine learning, and AI projects within organizations. It involves a series of processes including data ingestion, transformation, and serving, which are managed through data pipelines to ensure data quality and accessibility. The discipline addresses challenges posed by disparate data sources and formats, enabling clearer insights into business operations.

Uploaded by

genesiskalya
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 2

Lecture 4: Data Engineering Concepts, Processes, and

Tools

Sharing top billing on the list of data science capabilities, machine learning and artificial
intelligence are not just buzzwords: Many organizations are eager to adopt them. But prior to
building intelligent products, you need to gather and prepare data, that fuels AI. A separate
discipline called data engineering, lays the necessary groundwork for analytics projects.
Tasks related to it occupy the first three layers of the data science hierarchy of needs
suggested by Monica Rogati.

Data science layers towards AI by Monica Rogati.

What is data engineering?


Data engineering is a set of operations to make data available and usable to data scientists,
data analysts, business intelligence (BI) developers, and other specialists within an
organization. It takes dedicated experts – data engineers – to design and build systems for
gathering and storing data at scale as well as preparing it for further analysis.

Within a large organization, there are usually many different types of operations management
software (e.g., ERP, CRM, production systems, etc.), all containing databases with varied
information. Besides, data can be stored as separate files or pulled from external sources —
such as IoT devices — in real time. Having data scattered in different formats prevents the
organization from seeing a clear picture of its business state and running analytics.

1
Data engineering addresses this problem step by step.

Data engineering process


The data engineering process covers a sequence of tasks that turn a large amount of raw data
into a practical product meeting the needs of analysts, data scientists, machine learning
engineers, and others. Typically, the end-to-end workflow consists of the following stages.

A data engineering process in brief.

Data ingestion (acquisition) moves data from multiple sources — SQL


and NoSQL databases, IoT devices, websites, streaming services, etc. — to a target system to
be transformed for further analysis. Data comes in various forms and can be both structured
and unstructured.

Data transformation adjusts disparate data to the needs of end users. It involves removing
errors and duplicates from data, normalizing it, and converting it into the needed format.

Data serving delivers transformed data to end users — a BI platform, dashboard, or data
science team.

Data flow orchestration provides visibility into the data engineering process, ensuring that
all tasks are successfully completed. It coordinates and continuously tracks data workflows to
detect and fix data quality and performance issues.

The mechanism that automates ingestion, transformation, and serving steps of the data
engineering process is known as a data pipeline.

You might also like