AI ML Data Pipeline
AI ML Data Pipeline
Data pipeline
• What is a data pipeline
• A data pipeline is sequence/series of data processing steps
• Elements of data pipeline
• The volume of big data requires that data pipelines must be scalable,
as the volume can be variable over time.
• The variety of big data requires that big data pipelines be able to
recognize and process data in many different formats—structured,
unstructured, and semi-structured.
What is the difference between data pipeline and ETL
ETL refers to a specific type of data pipeline.
ETL stands for “extract, transform, load.”
It is the process of moving data from a raw source, such as an application, to a
destination, usually a data warehouse.
• “Extract” refers to pulling data out of a source;
• “transform” is about modifying the data so that it can be loaded into the destination,
and
• “load” is about inserting the data into the destination.