0% found this document useful (0 votes)
52 views10 pages

AI ML Data Pipeline

Uploaded by

aaabhidaaa
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
52 views10 pages

AI ML Data Pipeline

Uploaded by

aaabhidaaa
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

AI ML Data Pipeline

Data pipeline
• What is a data pipeline
• A data pipeline is sequence/series of data processing steps
• Elements of data pipeline

Sources of Processing of Destination of


Data data Data
What is Big Data Pipeline
Big data pipelines are data pipelines built to accommodate one
or more of the three traits of big data.

• The velocity of big data makes it appealing to build streaming data


pipelines for big data.

• The volume of big data requires that data pipelines must be scalable,
as the volume can be variable over time.

• The variety of big data requires that big data pipelines be able to
recognize and process data in many different formats—structured,
unstructured, and semi-structured.
What is the difference between data pipeline and ETL
ETL refers to a specific type of data pipeline.
ETL stands for “extract, transform, load.”
It is the process of moving data from a raw source, such as an application, to a
destination, usually a data warehouse.
• “Extract” refers to pulling data out of a source;
• “transform” is about modifying the data so that it can be loaded into the destination,
and
• “load” is about inserting the data into the destination.

Extract Transform Load


Some examples of Data pipeline
Basic Example
Some examples of Data pipeline
Streaming data pipeline
Machine Learning Pipeline
A machine learning pipeline is a way to codify and automate the workflow
it takes to produce a machine learning model.
• Machine learning pipelines consist of multiple sequential steps that do everything
from data extraction and preprocessing to model training and deployment.

Machine Learning pipeline refers to the creation of independent and


reusable modules in such a manner that they can be pipelined together to
create an entire workflow.
• In other words, we divide our work into smaller parts and automate it in such a
way that we can do the entire task as small subtasks.
Characteristics of ML pipe line
• The ML pipeline is the product( IT product)
• It is fully automated process
• It should establish co-operation between the data scientist and the engineer
• ML pipe line has fast/quick iteration cycle
• ML pipe line involves automated testing and performance monitoring
• It is version-controlled
Advantages of ML Pipeline
• Reusability of components
• ML Pipeline construction gives the flexibility to use these components
independently and iteratively.
• Ease of implementation:
• Constructing new ML models can become very easy because we can re-use
some of the existing components
• Scalability and Customization
• Scalability is high as it independent of size of the data and alos customisation is
very easy as we have to customer specific module only
• Automatic Updation:
• ML Pipe line do not require manual update as it is getting updates at regular
intervals
Steps involved in ML Pipeline
• Data Extraction / Data collection
• Data pre-processing including formatting, missing value, outlier etc
• Feature engineering ( feature creation/split)
• Model selection
• Model training
• Model evaluation
• Model deployment
• Re-visiting the pipeline

You might also like