0% found this document useful (0 votes)

96 views9 pages

ELTP Extending ELT For Modern AI and Analytics Airbyte

The document discusses the ELTP (Extract, Load, Transform, Publish) model for building robust data pipelines. ELTP addresses common mistakes in ETL designs by separating extraction/loading of raw data from downstream transformations and publishing. It provides a simple yet scalable architecture for AI, analytics and data engineering tasks. The Publish step delivers transformed data to various external systems like SaaS apps, file stores and analytics tools in an efficient manner. Overall, ELTP presents a holistic model that improves on ELT and enables robust, scalable and extensible data pipelines.

Uploaded by

Cezar Ovidiu

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

96 views9 pages

ELTP Extending ELT For Modern AI and Analytics Airbyte

Uploaded by

Cezar Ovidiu

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 9

ELTP: Extending ELT for Modern AI and

Analytics | Airbyte
Excerpt
Whether you are building the next great AI app, an analytics suite, or a new data
warehouse, the ELTP model is a simple and extensible architecture for building robust data
pipelines.

Whether building data pipelines for AI, analytics, or data engineering, there are two very
common design mistakes that almost everyone makes at least once.

The first common mistake comes from designing everything as a single “ETL” operation, running
Extract, Transform, and Load as a single combined step. The second common mistake is to
assume that the best place for processing data is also the best place to host it, omitting the
important ‘publish’ step.

To address these challenges, I introduce the “ELTP” architecture: Extract, Load, Transform, and
Publish. This simple and scalable model is crafted to prevent these common mistakes. Below,
we'll delve into the nuances of ELTP and explore its broad applicability for AI, analytics, and big
data.

From ETL to ELT to ELTP - A brief Introduction

In order to understand the benefits of the ELTP architecture, it’s important to first understand
some basic context around ETL and ELT.

The original “ETL” approach to data movement combined business logic and data replication
into a single operation. This was inherently fragile, because ETL processes would break whenever
business logic changed, and any breakage would prevent source data from loading at all. Since
many of the transforms required aggregating and joining across datasets, memory errors were
frequent and new failure types could arise at any time, as data volumes grew.

To address these issues and others, modern data engineering best practices now advocate for an
“ELT” or “EL+T” approach, where the replication (Extract and Load) is performed first, as a
standalone process. Then, business logic and transformations are applied only after the raw data
is safely landed in a database or data lake which we control. In this model, the Extract-Load Pair,
or “EL Pair”, are executed as a single step in the process, with composable Extract and Load
connectors working together to replicate data efficiently from source to destination.

While ELT architecture is more stable and more scalable than the ETL model, ELT is still lacking, in
that it doesn’t specify a method for delivering data to downstream users and systems - what is
sometimes called “Reverse ETL”. To address this gap, we introduce the 'Publish' step, leading to
the “EL+T+P” or “ELTP” model. In full, the ELTP model encompasses the following steps:

Extract data from raw data sources.

Load data into durable storage, such as a data warehouse or data lake.
Transform raw data into new datasets.
Publish data to downstream users and business applications.

This design preserves the benefits of ELT while also giving a framework for ensuring that data is
efficiently published to downstream consumers and applications.

Meet the “Publish” Step

The Publish framing is a simple way to describe how transformed data can be delivered to a
variety of downstream destinations, including: external SaaS applications, file stores, CRM
systems, AI vector stores, and even other databases. Publish steps implement the same
protocols and paradigms of other Extract-Load pairs used for raw data collection, and therefore
can benefit from the same advantages discussed above for EL pairs.

The biggest difference between the initial Extract-Load and the final Publish is that in the latter
case, the EL process is extracting from our data platform and loading to an external system or
external publish location.

Note that in this two-stage EL architecture, we can add any number of custom transformations
between the initial EL step that loads the raw data and the second EL step that publishes
transformed data to downstream applications.

Why Publish?
There are several reasons for adding a Publish step to one’s data architecture, namely: sending
data files to external systems, decentralizing analytic queries, and publishing to downstream
applications and indexes.

Publish Data Files to External Systems

The oldest example of a “publish” operation is to send data files to partner companies, vendors,
or regulatory agencies. For example, a Marketing Agency might request a regular upload of your
customer data file, which they in turn would process and provide back a new file with added
market segmentation data. These publish operations often have externally-defined file formats
and upload location requirements which are controlled by the third party.

Here is a simple example, where we publish one file to a partner via S3 and another file to a
vendor using SFTP:

Publish to Empower Analytics

While I was at Amazon Video, I built and maintained systems that processed terabytes of data
daily. To achieve that kind of scale, my team developed ELT pipelines on AWS Redshift, carefully
right-sizing our clusters to efficiently execute all our transformations.

But when it came time to consume the data, we ran into problems. Our system couldn’t handle
heavy query traffic, and one runaway query could cause significant performance issues for every
other user. Our clusters optimized for heavy-duty processing tasks weren’t necessarily equipped
for ad-hoc analysis or concurrency.

The solution was to add a final “publish” operation, where we could deliver data to our users
exactly how and where they needed it. We wouldn’t bar users from querying the main cluster,
but downstream consumers were incentivized to use Tableau and S3 datasets because those
provide better performance and the ability for consumers to right-size their own analytics
infrastructure according to their specific requirements. Meanwhile, the central database was
better insulated from runaway queries that might otherwise lead to costly outages.

Publish to SaaS Applications

A popular use case for Publish operations are the so-called “Reverse ETL” use cases, where
transformed data is Published to one or more business applications or SaaS providers. Popular
examples include Publishing data to Salesforce for business operations, Hubspot to keep
customer CRM data up-to-date, and Active Directory to keep employee and organizational data
up-to-date.
Publish to Vector Stores as AI Long-Term Memory (RAG)

A common method of building AI chatbots is to Publish text data to a specialized indexing

system called a “vector store”, which can in turn be used by the chatbot in a process called
“RAG” (or “retrieval-augmented generation”). With RAG, the chatbot dynamically retrieves
relevant information from a vector store as if retrieving from its long-term “memory”. It uses this
information to generate responses that are informed by the data it has retrieved, combining
pre-existing knowledge with the context provided by the user's query.

To run this process using ELTP, we follow the same basic steps as before:

1. Extract text from our raw document sources,

2. Load the raw text data to long-term storage,

3. Transform the data by splitting large text into smaller chunks and calculating a numeric
representation of each chunk (often called a “vector” or “embedding”)

4. Publish the text data to our vector store indexes, where the chatbot can then make use of this
data when answering questions it wouldn’t otherwise be able to answer.
ELTP in Practice
The ELTP framing offers a practical and holistic model for what otherwise might be expressed as
“ELT + Reverse ETL” - while addressing additional use cases that traditionally would not be in
scope for Reverse ETL. Here we’ll discuss a few advanced use cases and real world design and
deployment considerations.

Publish Destinations are more than Reverse ETL

All Reverse ETL destinations are Publish-type destinations, but not all Publish destinations are
Reverse ETL. The term “Reverse ETL” can still be useful when communicating the unique
challenges of publishing to a SaaS system like Salesforce and Hubspot, but the term falls down
when describing a publish flow to a partner’s SFTP site or to an external database. Whether
publishing to Salesforce, SFTP, S3, or SharePoint, the Publish framing offers a more robust and
holistic architectural model that is not tied to the “pickiness” of API-type destinations.

Plan for ELTP, but Don’t Fear ETL

Sometimes all that’s needed is a good proof of concept. For example, in AI and LLM
applications, tutorials will often walk users through this now-classic data processing flow for
RAG applications: Extract text from documents like PDFs or web sites, Transform into embedded
document chunks, and directly Load to the vector store. Tutorials covering this approach can be
found here and here, for instance.

As a tutorial or a proof of concept (POC) data flow, there’s nothing wrong with this ETL-like flow:
As shown in the diagram above, the decision to skip the database Load is very convenient and
improves accessibility. The author has less surface area to maintain and doesn’t need to worry
about introducing durable long-term storage into the architecture. This reduces the likelihood
that someone following along will be blocked by not having access to the same or similar
options for the long-term storage layer.

However, if and when our POC is successful, we will very soon want to migrate to the ELTP
architecture, loading our raw data into long-term storage and then publishing to the vector
store as a separate step. Here is a diagram of the same solution using ELTP:

Adding a database for long-term storage decouples the Extract/Load step from the Publish
operation, and gives us the ability to add any number of transformations in between the initial
EL operation and the Publish operation - for instance dbt models or custom Python scripts to
clean the data and to apply business logic.
Our speed of iteration is also greatly improved with this design. When we are ready to
experiment with different embedding models and tune the chunking parameters, we already
have the raw text extracted and ready to go. We can even run side-by-side experiments
comparing different transformation and publication options, without re-extracting the raw data
and without making any changes to the “production” data flow.

The ELTP model allows us to scale more effectively as well: from tens of documents to thousands
or even millions of documents. Each step in the process can run on hardware specialized for that
step, and all of the replication steps (EL and Publish) can operate incrementally: extracting only
new or updated documents when sending data downstream. As business logic evolves and as
bugs are identified, we can rerun or restate our transformations without needing to re-extract
the text from the upstream data source.

The upshot of all these ELTP benefits is that ELTP allows more opportunities for iteration, while
also providing scaling and performance benefits. While perhaps counter-intuitive, adding the
durable storage layer actually reduces our total cost of ownership (TCO) versus running the ETL
or “POC” model which performs Extract-Transform-Load all in one step.

ELTP for AI Use Cases

With the rapid growth in AI applications, many people now find themselves entering the world
of data engineering for the first time, specifically for the purpose of building an AI application.
The ELTP model provides these new data practitioners an opportunity to learn from data
engineering mistakes of the past, while not having to reinvent the wheel.

Vector Store Publish Destinations

Airbyte is an open source data movement tool which can be used for both “EL” and “Publish”
steps in the ELTP architecture. Airbyte currently supports several vector store destinations,
including Pinecone, Milvus, Chroma, and Weaviate. These can be used to directly load data using
an ETL or ELTP architecture - according to your own preferences and requirements. All of these
vector store destinations provide access to inline text splitting and vector embeddings using a
declarative, configuration-based approach.

Text Document Source Connectors (New!)

Whereas traditional data pipelines prioritized numeric and tabular data, AI and LLM data flows
greatly benefit from text data in “unstructured” document formats like PDF, Word, Markdown,
and Google Docs.

This week, Airbyte is also launching source connectors to extract “text-as-data” from those
unstructured documents and others. Text documents are a powerful new class of data source for
data pipelines, and the new text document connectors enable the same ELTP, ELT, and ETL
architectures across all of your data, whether for AI or for analytics.

For more information, please see the companion blog post announcement.

Simplifying ELTP Best Practices with Open Source

Modern data movement tools such as Airbyte allow you to implement robust ELTP data
pipelines easily and quickly. Airbyte offers a low-code UI, an API and a Terraform provider for
those who prefer to build pipelines programmatically. Airbyte being an open source tool carries
the additional benefit of letting users build and modify connectors if the source or destination
they want to reach is not yet officially supported.

With the ability to create your own open source connectors when needed, the same tooling and
design techniques can be applied to almost any imaginable data pipeline

Airbyte aims to provide data teams with an open-source standard in data movement. We’re
currently investing heavily in Publish connectors for AI and LLM use cases. In 2024, we will
expand to cover additional Reverse ETL and ELTP workflows and use cases, to simplify and
commoditize data movement in any direction.

In Closing
While the ELT model is a great improvement over ETL, it doesn’t solve for the last mile of data
delivery. ELTP is a model that handles the heavy lifting while also providing a real solution for
product delivery.

Whether you are building the next great AI application or a more traditional data analytics stack,
you can set yourself up for success by understanding the ELTP model and designing your
architecture with these best practices in mind.

Interested in more data content? Join us for the upcoming ELTP webinar for AI or check out our
blog, where we publish articles in-house and from the community on data engineering,
databases, AI infrastructure and more.

Dokumen - Pub - Understanding Etl Data Pipelines For Modern Data Architectures Early Release 9781098159252
No ratings yet
Dokumen - Pub - Understanding Etl Data Pipelines For Modern Data Architectures Early Release 9781098159252
39 pages
1422 Shortlisted FYPs of NGIRI 2022 23
No ratings yet
1422 Shortlisted FYPs of NGIRI 2022 23
33 pages
PCNSE 10.1 Domain #1 - Planning and Core Concepts
0% (1)
PCNSE 10.1 Domain #1 - Planning and Core Concepts
107 pages
D.K.T.E. Society's Yashwantrao Chavan Polytechnic, Ichalkaranji
No ratings yet
D.K.T.E. Society's Yashwantrao Chavan Polytechnic, Ichalkaranji
13 pages
PN 748684 - Echo Point IOM Manual 2021
No ratings yet
PN 748684 - Echo Point IOM Manual 2021
47 pages
Unity - Tutorials - 4-Create A Simple Terrain
No ratings yet
Unity - Tutorials - 4-Create A Simple Terrain
22 pages
20201202083346it Book For Tcs
No ratings yet
20201202083346it Book For Tcs
18 pages
M1.2 Building A Data Lake
No ratings yet
M1.2 Building A Data Lake
60 pages
Active Ecommerce CMS Documentation
No ratings yet
Active Ecommerce CMS Documentation
118 pages
Microsoft Questionpaper AZ-301 v2020-04-03 by - Chloe - 120q
No ratings yet
Microsoft Questionpaper AZ-301 v2020-04-03 by - Chloe - 120q
156 pages
3D Tune-In Toolkit - An Open-Source Library For Real-Time Binaural Spatialisation
No ratings yet
3D Tune-In Toolkit - An Open-Source Library For Real-Time Binaural Spatialisation
37 pages
Sample Report For Movie Recommender System
No ratings yet
Sample Report For Movie Recommender System
30 pages
Data Storage and File Compression
No ratings yet
Data Storage and File Compression
15 pages
A Survey On Software Quality Assurance
No ratings yet
A Survey On Software Quality Assurance
12 pages
How Can EDW Service For POS Eliminate The Challenges of Analyzing Point of Sales Data in The Retail Industry?
No ratings yet
How Can EDW Service For POS Eliminate The Challenges of Analyzing Point of Sales Data in The Retail Industry?
13 pages
Chapter 9: Concurrency Control
No ratings yet
Chapter 9: Concurrency Control
34 pages
Tmlog
No ratings yet
Tmlog
10 pages
Computer Skills For Resume 2018
100% (2)
Computer Skills For Resume 2018
6 pages
Faculty of Mechanical and Manufacturing Enginering: Lesson Plan
No ratings yet
Faculty of Mechanical and Manufacturing Enginering: Lesson Plan
18 pages
Ae Adam BKT
No ratings yet
Ae Adam BKT
44 pages
Assingment Cs Printable
No ratings yet
Assingment Cs Printable
42 pages
Malaquias - Hernandes Dias Lopes - GFHGJGK) - PDF Online - FlipHTML5
No ratings yet
Malaquias - Hernandes Dias Lopes - GFHGJGK) - PDF Online - FlipHTML5
97 pages
Data Models (Module - II)
No ratings yet
Data Models (Module - II)
101 pages
Day-2 QP
No ratings yet
Day-2 QP
2 pages
ELT Vs ETL
No ratings yet
ELT Vs ETL
13 pages
EC20S Manual
No ratings yet
EC20S Manual
68 pages
Mequannent Gashaw Updated CV
No ratings yet
Mequannent Gashaw Updated CV
4 pages
Deloitte Take Home Challenge - V2
No ratings yet
Deloitte Take Home Challenge - V2
83 pages
Traffic Sign
No ratings yet
Traffic Sign
15 pages
Conversion 35 41 AEA
No ratings yet
Conversion 35 41 AEA
7 pages
Understanding Etl Er1
No ratings yet
Understanding Etl Er1
34 pages
Etl VS Elt
No ratings yet
Etl VS Elt
8 pages
FP Sage Business Cloud Paie
No ratings yet
FP Sage Business Cloud Paie
3 pages
OD M2 Building A Data Lake
No ratings yet
OD M2 Building A Data Lake
59 pages
Data Engineering - Beginner's Guide
100% (1)
Data Engineering - Beginner's Guide
9 pages
ETL 2.0 Data Integration Comes of Age
No ratings yet
ETL 2.0 Data Integration Comes of Age
13 pages
De Imp Qa
No ratings yet
De Imp Qa
12 pages
Fritz Gerald Dumdum - Resume
No ratings yet
Fritz Gerald Dumdum - Resume
3 pages
Oreilly Technical Guide Understanding Etl
No ratings yet
Oreilly Technical Guide Understanding Etl
107 pages
LightRAG The Cross Breed of NavieRag and GraghRAG - by SUMITH - Oct, 2024 - Medium
No ratings yet
LightRAG The Cross Breed of NavieRag and GraghRAG - by SUMITH - Oct, 2024 - Medium
14 pages
19.1 - Data Pipelines
No ratings yet
19.1 - Data Pipelines
18 pages
CCD Unit 4
No ratings yet
CCD Unit 4
5 pages
ETL
No ratings yet
ETL
2 pages
Guide To Metadata-Driven Data Integration
No ratings yet
Guide To Metadata-Driven Data Integration
9 pages
Ayush Bhardwaj Resume 1 PDF
No ratings yet
Ayush Bhardwaj Resume 1 PDF
1 page
OnBoard Processing-The New SMAD PDF
No ratings yet
OnBoard Processing-The New SMAD PDF
2 pages
Aman Suthar Resume
No ratings yet
Aman Suthar Resume
1 page
Lab Manual
No ratings yet
Lab Manual
32 pages
Data Engineering and Data Engineer - Students
No ratings yet
Data Engineering and Data Engineer - Students
56 pages
DWM PDF
No ratings yet
DWM PDF
35 pages
Architectural Patterns in de
No ratings yet
Architectural Patterns in de
15 pages
Unified Data Workflows with Fugue: The Complete Guide for Developers and Engineers
From Everand
Unified Data Workflows with Fugue: The Complete Guide for Developers and Engineers
William Smith
No ratings yet
Efficient ETL Systems Design: Definitive Reference for Developers and Engineers
From Everand
Efficient ETL Systems Design: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Teradata Architecture and SQL Essentials: Definitive Reference for Developers and Engineers
From Everand
Teradata Architecture and SQL Essentials: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
ELT Architecture and Implementation: Definitive Reference for Developers and Engineers
From Everand
ELT Architecture and Implementation: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Data Pipeline Automation with Airbyte: Definitive Reference for Developers and Engineers
From Everand
Data Pipeline Automation with Airbyte: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Pentaho Solutions and Architecture: Definitive Reference for Developers and Engineers
From Everand
Pentaho Solutions and Architecture: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Logstash Made Easy: A Beginner's Guide to Log Ingestion and Transformation
From Everand
Logstash Made Easy: A Beginner's Guide to Log Ingestion and Transformation
Robert Johnson
No ratings yet
Data Warehousing: Optimizing Data Storage And Retrieval For Business Success
From Everand
Data Warehousing: Optimizing Data Storage And Retrieval For Business Success
Rob Botwright
No ratings yet
Semantic Translation: Fundamentals and Applications
From Everand
Semantic Translation: Fundamentals and Applications
Fouad Sabry
No ratings yet
Airflow for Data Workflow Automation
From Everand
Airflow for Data Workflow Automation
Richard Johnson
No ratings yet
Application Design: Key Principles For Data-Intensive App Systems
From Everand
Application Design: Key Principles For Data-Intensive App Systems
Rob Botwright
No ratings yet
Learn Data Warehousing in 24 Hours
From Everand
Learn Data Warehousing in 24 Hours
Alex Nordeen
No ratings yet
Learn SAP BI in 24 Hours
From Everand
Learn SAP BI in 24 Hours
Alex Nordeen
3/5 (1)
Learn Hadoop in 24 Hours
From Everand
Learn Hadoop in 24 Hours
Alex Nordeen
No ratings yet
Accelerated DevOps with AI, ML & RPA: Non-Programmer’s Guide to AIOPS & MLOPS
From Everand
Accelerated DevOps with AI, ML & RPA: Non-Programmer’s Guide to AIOPS & MLOPS
Stephen Fleming
5/5 (2)
Streamlining ETL: A Practical Guide to Building Pipelines with Python and SQL
From Everand
Streamlining ETL: A Practical Guide to Building Pipelines with Python and SQL
Peter Jones
No ratings yet
ORACLE PL/SQL Interview Questions You'll Most Likely Be Asked
From Everand
ORACLE PL/SQL Interview Questions You'll Most Likely Be Asked
Vibrant Publishers
5/5 (1)
THE STEP BY STEP GUIDE FOR SUCCESSFUL IMPLEMENTATION OF DATA LAKE-LAKEHOUSE-DATA WAREHOUSE: "THE STEP BY STEP GUIDE FOR SUCCESSFUL IMPLEMENTATION OF DATA LAKE-LAKEHOUSE-DATA WAREHOUSE"
From Everand
THE STEP BY STEP GUIDE FOR SUCCESSFUL IMPLEMENTATION OF DATA LAKE-LAKEHOUSE-DATA WAREHOUSE: "THE STEP BY STEP GUIDE FOR SUCCESSFUL IMPLEMENTATION OF DATA LAKE-LAKEHOUSE-DATA WAREHOUSE"
AJIT DASH
2/5 (2)
Google Cloud Platform for Data Engineering: From Beginner to Data Engineer using Google Cloud Platform
From Everand
Google Cloud Platform for Data Engineering: From Beginner to Data Engineer using Google Cloud Platform
alasdair gilchrist
5/5 (1)
DATABASE From the conceptual model to the final application in Access, Visual Basic, Pascal, Html and Php: Inside, examples of applications created with Access, Visual Studio, Lazarus and Wamp
From Everand
DATABASE From the conceptual model to the final application in Access, Visual Basic, Pascal, Html and Php: Inside, examples of applications created with Access, Visual Studio, Lazarus and Wamp
Olga Maria Stefania Cucaro
No ratings yet
Knight's Microsoft Business Intelligence 24-Hour Trainer: Leveraging Microsoft SQL Server Integration, Analysis, and Reporting Services with Excel and SharePoint
From Everand
Knight's Microsoft Business Intelligence 24-Hour Trainer: Leveraging Microsoft SQL Server Integration, Analysis, and Reporting Services with Excel and SharePoint
Brian Knight
3/5 (1)
Building Modern Data Applications Using Databricks Lakehouse: Develop, optimize, and monitor data pipelines on Databricks
From Everand
Building Modern Data Applications Using Databricks Lakehouse: Develop, optimize, and monitor data pipelines on Databricks
Will Girten
No ratings yet
Introduction to Data Platforms: How to leverage data fabric concepts to engineer your organization's data for today's cloud-based digital world
From Everand
Introduction to Data Platforms: How to leverage data fabric concepts to engineer your organization's data for today's cloud-based digital world
Anthony David Giordano
No ratings yet
Enterprise Data Science: Smarter Decisions with Big Data
From Everand
Enterprise Data Science: Smarter Decisions with Big Data
Vidhur Gupta
No ratings yet
Oracle Quick Guides: Part 2 - Oracle Database Design
From Everand
Oracle Quick Guides: Part 2 - Oracle Database Design
Malcolm Coxall
No ratings yet
Exploring Hadoop Ecosystem (Volume 2): Stream Processing
From Everand
Exploring Hadoop Ecosystem (Volume 2): Stream Processing
Wei Liu
No ratings yet
Smarter Data Science: Succeeding with Enterprise-Grade Data and AI Projects
From Everand
Smarter Data Science: Succeeding with Enterprise-Grade Data and AI Projects
Neal Fishman
No ratings yet
Oracle Modernization Solutions
From Everand
Oracle Modernization Solutions
Tom Laszewski
No ratings yet
Oracle Information Integration, Migration, and Consolidation
From Everand
Oracle Information Integration, Migration, and Consolidation
Jason Williamson
No ratings yet
LOTED: a semantic web portal for the management of tenders from the European Community
From Everand
LOTED: a semantic web portal for the management of tenders from the European Community
Francesco Valle
No ratings yet
Concise Oracle Database For People Who Has No Time
From Everand
Concise Oracle Database For People Who Has No Time
Billy Aung Myint
No ratings yet
Real-Time Analytics: Techniques to Analyze and Visualize Streaming Data
From Everand
Real-Time Analytics: Techniques to Analyze and Visualize Streaming Data
Byron Ellis
No ratings yet
Tableau 8.2 Training Manual: From Clutter to Clarity
From Everand
Tableau 8.2 Training Manual: From Clutter to Clarity
Larry Keller
No ratings yet
Beginning XML
From Everand
Beginning XML
Joe Fawcett
3/5 (1)
Instant Creating Data Models with PowerPivot How-to
From Everand
Instant Creating Data Models with PowerPivot How-to
Leo Taehyung Lee
1/5 (1)
IBM WebSphere eXtreme Scale 6
From Everand
IBM WebSphere eXtreme Scale 6
Anthony Chaves
No ratings yet
Intelligent Document Capture with Ephesoft
From Everand
Intelligent Document Capture with Ephesoft
Pat Myers
No ratings yet
Visual Basic 2010 Coding Briefs Data Access
From Everand
Visual Basic 2010 Coding Briefs Data Access
Kevin Hough
5/5 (1)
C# 2010 Coding Briefs Data Access
From Everand
C# 2010 Coding Briefs Data Access
Kevin Hough
No ratings yet

ELTP Extending ELT For Modern AI and Analytics Airbyte

Uploaded by

ELTP Extending ELT For Modern AI and Analytics Airbyte

Uploaded by

ELTP: Extending ELT for Modern AI and

From ETL to ELT to ELTP - A brief Introduction

Extract data from raw data sources.

Meet the “Publish” Step

Publish Data Files to External Systems

Publish to Empower Analytics

Publish to SaaS Applications

A common method of building AI chatbots is to Publish text data to a specialized indexing

1. Extract text from our raw document sources,

2. Load the raw text data to long-term storage,

Publish Destinations are more than Reverse ETL

Plan for ELTP, but Don’t Fear ETL

ELTP for AI Use Cases

Vector Store Publish Destinations

Text Document Source Connectors (New!)

Simplifying ELTP Best Practices with Open Source

You might also like