Lecture 09 RAG

RAG (Retrieval Augmented Generation) is a system that combines large language models (LLMs) with reliable external knowledge bases to generate responses, reducing hallucinations and ensuring up-to-date information. The document outlines the basic pipeline of RAG systems, including text chunking methods, advanced techniques, and open-source tools for chunking, embedding, and indexing. It also discusses the importance of indexing and similarity search algorithms for efficient retrieval in RAG implementations.

Uploaded by

classaen9

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

6 views16 pages

Lecture 09 RAG

Uploaded by

classaen9

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 16

RAG (Retrieval

Augmented Generation)
Young-Sik Choi,
Department of Artificial Intelligence
Korea Aerospace University
What is RAG?
• LLM이 신뢰할 수 있는 외부 지식 베이스를 참조하여 응답을 생
성하는 시스템
• 신뢰할 수 있는 지식과 결합해서 응답을 생성하므로 ‘hallucination’을
줄일 수 있고,
• 학습할 때의 데이터에 비하여, 최신 지식에 근거해서 응답을 생성할 수
있고,
• 생성된 응답의 출처를 제시할 수 있다.
• 검색 서비스와 결합된 LLM은 일종의 RAG 시스템
• Perplexity, SearchGPT 등
Basic Pipeline of RAG System
Indexing

Embedding Index
Documents
Chunks

Retrieval Generation
Index Top k

Query LLM Response

Text Chunking
https://github.com/BARG-Curtin-University/llm-chunking-stratagies
Basic Chunking Methods
• Character Splitting: Dividing text strictly by character count, which can
distort words and meanings, reducing response quality.
• Recursive Character Splitting: Using delimiters like new lines or specific
characters to split text recursively, providing slightly more context than
basic character splitting.
• Document-Based Chunking: Splitting text based on document types or
structures, like Python code or Markdown, aiming to retain more
context compared to basic methods.
Advanced Chunking Techniques
• Semantic Chunking: Using embeddings to analyse the semantic relationship
between text segments, grouping chunks based on meaning and significantly
improving the relevancy of data chunks.

• Agentic Chunking: With this method chunks are processed using a large
language model to ensure each chunk stands alone with complete meaning,
enhancing coherence and context preservation.

• Subdocument Chunking: It summarizes entire documents, attaches the

summaries as metadata to each chunk’s embedding, and uses a hierarchical
retrieval process searching summaries first to improve efficiency and accuracy.
Opensource for Chunking Text
• NLTK (Natural Language Toolkit) & spaCY: Support Basic chunking methods

• semchunk: A fast and lightweight Python library designed to split text into semantically
meaningful chunks. It supports various tokenizers and allows customization of chunk sizes.

• semantic-text-splitter: This Python library divides text into semantic chunks up to a desired size,
supporting length calculations by characters and tokens. It's callable from both Rust and Python,
making it versatile for different applications.

• semantic-chunker: A versatile library that divides text into semantically meaningful chunks by
employing a "Bring Your Own Embedder" approach. Users can provide their own embedding
functions to map text into vector spaces, facilitating flexible and context-aware chunking.
Text Embedding
Text and Code Embedding by Contrastive Pre-Training (OpenAI, 2022)
Encoder maps a chunk to embedding
Cosine Similarity Between Vectors
Contrastive Learning
Opensource for Text Embedding
• Sentence Transformers: A Python framework that provides state-of-the-art pre-
trained models for generating sentence and text embeddings. It supports tasks like
semantic search, clustering, and paraphrase mining.

• FastEmbed: A lightweight and fast Python library designed for embedding generation.
It supports popular text models and is optimized for speed and efficiency, making it
suitable for serverless runtimes.

• Hugging Face Transformers: A comprehensive library offering a wide range of pre-

trained models for generating embeddings, including BERT, GPT, and RoBERTa. It
supports both text and code embeddings, facilitating diverse NLP tasks.
Indexing
Billion-scale similarity search with GPUs (Facebook AI Research 2017)
IVFADC (Inverted File with
Asymmetric Distance Computation)
• Inverted File System
• Coarse Quantization: The data space is partitioned into distinct cells using a
coarse quantizer, typically achieved through k-means clustering. Each cell
corresponds to a cluster centroid.
• Indexing: Each data point is assigned to the nearest centroid, and these
assignments are stored in an inverted index, allowing for efficient retrieval of data
points associated with specific centroids.
• Product Quantization
• Residual Quantization: After assigning a data point to a centroid, the residual
vector (the difference between the data point and the centroid) is computed.
• Subspace Decomposition: The residual vector is divided into sub-vectors, each
quantized separately using pre-trained codebooks.
Retrieval: 먼저 IVF에서 𝜏개의 cluster를 선택하고, 그 안의 벡터들과 거리 계산하여 k-nearest 벡터를 반환한다.
Similarity Search on GPUs
• A GPU k-selection algorithm: operating in fast register memory and
flexible enough to be fusible with other kernels, for which we provide a
complexity analysis
• A Near-Optimal Algorithmic Layout: for exact and approximate k-
nearest neighbor search on GPU
• A Range of Experiments: show that these improvements outperform
previous art by a large margin on mid- to large-scale nearest-neighbor
search tasks, in single or multi-GPU configurations.
Opensource for Indexing (Retrieval)
• Faiss: Faiss is renowned for its high performance in similarity search and
clustering of dense vectors. Its comprehensive documentation and active
community contribute to its widespread adoption.
• HNSWlib: implements the Hierarchical Navigable Small World (HNSW)
algorithm for approximate nearest neighbor searches.
• Annoy: Developed by Spotify, Annoy (Approximate Nearest Neighbors Oh
Yeah) is a C++ library with Python bindings designed for fast approximate
nearest neighbor searches in high-dimensional spaces.
• ScaNN: Developed by Google Research, ScaNN (Scalable Nearest Neighbors)
is a high-performance library for efficient vector similarity search at scale.

IT Policies and Procedures Manual
No ratings yet
IT Policies and Procedures Manual
27 pages
Lecture 09 RAG 실습 1
No ratings yet
Lecture 09 RAG 실습 1
14 pages
Semester 1 Midterm Exam PLSQL
100% (2)
Semester 1 Midterm Exam PLSQL
15 pages
Smartwatch User Manual
No ratings yet
Smartwatch User Manual
14 pages
Compiler Design Full PDF
No ratings yet
Compiler Design Full PDF
138 pages
Week 6. Airflow Overview
No ratings yet
Week 6. Airflow Overview
71 pages
Strata Box
No ratings yet
Strata Box
54 pages
Digital Forensic Analysis of Facebook App in Virtual Environment
No ratings yet
Digital Forensic Analysis of Facebook App in Virtual Environment
8 pages
Skills Matrix - RQ00356
No ratings yet
Skills Matrix - RQ00356
22 pages
DBMS All Five Units MCQS
100% (1)
DBMS All Five Units MCQS
14 pages
C++ Bible
No ratings yet
C++ Bible
77 pages
Mobile Marketing!
No ratings yet
Mobile Marketing!
32 pages
Logs 24-11-29 001901
No ratings yet
Logs 24-11-29 001901
36 pages
Dr. JS. Khan Mendeley Guide
No ratings yet
Dr. JS. Khan Mendeley Guide
15 pages
Explanation
No ratings yet
Explanation
2 pages
Ejob Circular: Latest Standard CV Format For Bangladesh PDF
No ratings yet
Ejob Circular: Latest Standard CV Format For Bangladesh PDF
19 pages
ACDP Programming Master: FRM Module
No ratings yet
ACDP Programming Master: FRM Module
4 pages
Interview Questions
No ratings yet
Interview Questions
3 pages
ED1072
No ratings yet
ED1072
7 pages
Presentation On Cms Wordpress
No ratings yet
Presentation On Cms Wordpress
13 pages
Activity - 1 - Name - Range - in - Formula
No ratings yet
Activity - 1 - Name - Range - in - Formula
4 pages
Computer Architecture: MIPS Instruction Set Architecture
No ratings yet
Computer Architecture: MIPS Instruction Set Architecture
34 pages
The Basics of The Word Window
No ratings yet
The Basics of The Word Window
12 pages
How To Create A Live Ubuntu USB Drive With Persistent Storage
No ratings yet
How To Create A Live Ubuntu USB Drive With Persistent Storage
15 pages
MITOCW - Class 2: Artificial Intelligence, Machine Learning, and Deep Learning
No ratings yet
MITOCW - Class 2: Artificial Intelligence, Machine Learning, and Deep Learning
28 pages
(PUBLIC) G4G20 - Wildfire Boundaries On Search and Maps
No ratings yet
(PUBLIC) G4G20 - Wildfire Boundaries On Search and Maps
11 pages
Nco Class-8
No ratings yet
Nco Class-8
2 pages
WEEK 007 008 MODULE Selecting and Cutting Out Part of An Image
No ratings yet
WEEK 007 008 MODULE Selecting and Cutting Out Part of An Image
3 pages
Computers For Digital Era
No ratings yet
Computers For Digital Era
2 pages
3D Photography
No ratings yet
3D Photography
2 pages
Give Access Right Non-Administrators To View AD Deleted Objects Container
No ratings yet
Give Access Right Non-Administrators To View AD Deleted Objects Container
2 pages
Software Architecture with Python
From Everand
Software Architecture with Python
Anand Balachandran Pillai
3/5 (1)
Python Machine Learning: Learn how to build powerful Python machine learning algorithms to generate useful data insights with this data analysis tutorial
From Everand
Python Machine Learning: Learn how to build powerful Python machine learning algorithms to generate useful data insights with this data analysis tutorial
Sebastian Raschka
4/5 (20)
Learning R Programming
From Everand
Learning R Programming
Kun Ren
5/5 (3)
Python Data Science Cookbook
From Everand
Python Data Science Cookbook
Taryn Voska
No ratings yet
Python Data Science Cookbook: Practical solutions across fast data cleaning, processing, and machine learning workflows with pandas, NumPy, and scikit-learn
From Everand
Python Data Science Cookbook: Practical solutions across fast data cleaning, processing, and machine learning workflows with pandas, NumPy, and scikit-learn
Taryn Voska
No ratings yet
Mastering Elasticsearch 5.x - Third Edition
From Everand
Mastering Elasticsearch 5.x - Third Edition
Bharvi Dixit
3/5 (1)
Elasticsearch Essentials: Harness the power of ElasticSearch to build and manage scalable search and analytics solutions with this fast-paced guide
From Everand
Elasticsearch Essentials: Harness the power of ElasticSearch to build and manage scalable search and analytics solutions with this fast-paced guide
Bharvi Dixit
No ratings yet
Mastering OpenStack: Design, deploy, and manage clouds in mid to large IT infrastructures
From Everand
Mastering OpenStack: Design, deploy, and manage clouds in mid to large IT infrastructures
Omar Khedher
No ratings yet
Mastering Object-Oriented Programming with Python: Unlock the Secrets of Expert-Level Skills
From Everand
Mastering Object-Oriented Programming with Python: Unlock the Secrets of Expert-Level Skills
Larry Jones
No ratings yet
Efficient Scientific Programming with Spyder: Definitive Reference for Developers and Engineers
From Everand
Efficient Scientific Programming with Spyder: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Statistics with Rust: 50+ Statistical Techniques Put into Action
From Everand
Statistics with Rust: 50+ Statistical Techniques Put into Action
Keiko Nakamura
No ratings yet
ElasticSearch Server
From Everand
ElasticSearch Server
Rafal Kuc
No ratings yet
Elasticsearch Server: Second Edition
From Everand
Elasticsearch Server: Second Edition
Rafał Kuć
No ratings yet
Citus for Scalable PostgreSQL Systems: The Complete Guide for Developers and Engineers
From Everand
Citus for Scalable PostgreSQL Systems: The Complete Guide for Developers and Engineers
William Smith
No ratings yet
Mastering the Craft of Python Programming: Unraveling the Secrets of Expert-Level Programming
From Everand
Mastering the Craft of Python Programming: Unraveling the Secrets of Expert-Level Programming
Steve Jones
No ratings yet
Learning Concurrent Programming in Scala
From Everand
Learning Concurrent Programming in Scala
Aleksandar Prokopec
No ratings yet
Functional Python Programming
From Everand
Functional Python Programming
Steven Lott
No ratings yet
Distributed Computing with Python
From Everand
Distributed Computing with Python
Francesco Pierfederici
No ratings yet
Natural Language Processing with NLTK: Definitive Reference for Developers and Engineers
From Everand
Natural Language Processing with NLTK: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Learning Bayesian Models with R
From Everand
Learning Bayesian Models with R
M.Koduvely Dr. Hari
5/5 (1)
Chapel Programming and Parallel Computation: Definitive Reference for Developers and Engineers
From Everand
Chapel Programming and Parallel Computation: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
PLpgSQL Programming Essentials: Definitive Reference for Developers and Engineers
From Everand
PLpgSQL Programming Essentials: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Mastering Apache Cassandra - Second Edition
From Everand
Mastering Apache Cassandra - Second Edition
Nishant Neeraj
No ratings yet
Data Structures and Algorithms with Python
From Everand
Data Structures and Algorithms with Python
Aadinath Pothuvaal
No ratings yet
Sourcegraph Essentials: The Complete Guide for Developers and Engineers
From Everand
Sourcegraph Essentials: The Complete Guide for Developers and Engineers
William Smith
No ratings yet
Comprehensive Guide to BLAST: Definitive Reference for Developers and Engineers
From Everand
Comprehensive Guide to BLAST: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Programming with X10: Definitive Reference for Developers and Engineers
From Everand
Programming with X10: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Rsync Solutions: Definitive Reference for Developers and Engineers
From Everand
Rsync Solutions: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Exploring Hadoop Ecosystem (Volume 2): Stream Processing
From Everand
Exploring Hadoop Ecosystem (Volume 2): Stream Processing
Wei Liu
No ratings yet
Elasticsearch Engineering in Practice: Definitive Reference for Developers and Engineers
From Everand
Elasticsearch Engineering in Practice: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
PostgreSQL Foundations: Definitive Reference for Developers and Engineers
From Everand
PostgreSQL Foundations: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Parallel Software Development with Threading Building Blocks: Definitive Reference for Developers and Engineers
From Everand
Parallel Software Development with Threading Building Blocks: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
LangChain Essentials: From Basics to Advanced AI Applications
From Everand
LangChain Essentials: From Basics to Advanced AI Applications
Robert Johnson
No ratings yet
Haystack for Natural Language Search and Question Answering: The Complete Guide for Developers and Engineers
From Everand
Haystack for Natural Language Search and Question Answering: The Complete Guide for Developers and Engineers
William Smith
No ratings yet
Machine Learning with Python: A Comprehensive Guide with a Practical Example
From Everand
Machine Learning with Python: A Comprehensive Guide with a Practical Example
MARTIN NEEL
No ratings yet
CoreNLP in Practice: Definitive Reference for Developers and Engineers
From Everand
CoreNLP in Practice: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
IPFS Protocol Engineering: Definitive Reference for Developers and Engineers
From Everand
IPFS Protocol Engineering: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Awk Programming in Practice: Definitive Reference for Developers and Engineers
From Everand
Awk Programming in Practice: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
SpaCy for Natural Language Processing: Definitive Reference for Developers and Engineers
From Everand
SpaCy for Natural Language Processing: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Caffe Deep Learning Framework Essentials: Definitive Reference for Developers and Engineers
From Everand
Caffe Deep Learning Framework Essentials: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Efficient Parallel Computing with Dask: Definitive Reference for Developers and Engineers
From Everand
Efficient Parallel Computing with Dask: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Logstash Essentials: Definitive Reference for Developers and Engineers
From Everand
Logstash Essentials: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Python Data Persistence
From Everand
Python Data Persistence
Malhar Lathkar
No ratings yet
OpenMP in Practice: Definitive Reference for Developers and Engineers
From Everand
OpenMP in Practice: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
MongoDB Architecture and Operations: Definitive Reference for Developers and Engineers
From Everand
MongoDB Architecture and Operations: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
OpenCL Programming and Architecture: Definitive Reference for Developers and Engineers
From Everand
OpenCL Programming and Architecture: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Mastering Data Structures and Algorithms in Python & Java
From Everand
Mastering Data Structures and Algorithms in Python & Java
Sachin Naha
No ratings yet
Gensim for Natural Language Processing: Definitive Reference for Developers and Engineers
From Everand
Gensim for Natural Language Processing: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Couchbase Essentials: Definitive Reference for Developers and Engineers
From Everand
Couchbase Essentials: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Lexicon of Computer Science Terminology: Lexicon of Tech and Business, #16
From Everand
Lexicon of Computer Science Terminology: Lexicon of Tech and Business, #16
Mustafa Al-Dori
4/5 (1)
Technical Foundations of Torch: Definitive Reference for Developers and Engineers
From Everand
Technical Foundations of Torch: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Essential Apache Beam: Definitive Reference for Developers and Engineers
From Everand
Essential Apache Beam: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Mastering Python Algorithms: Practical Solutions for Complex Problems
From Everand
Mastering Python Algorithms: Practical Solutions for Complex Problems
Robert Johnson
No ratings yet
Lexicon of Programming Terminology: Lexicon of Tech and Business, #17
From Everand
Lexicon of Programming Terminology: Lexicon of Tech and Business, #17
Mustafa Al-Dori
5/5 (1)
Comprehensive Guide to Zipkin: Definitive Reference for Developers and Engineers
From Everand
Comprehensive Guide to Zipkin: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Kafka Up and Running for Network DevOps: Set Your Network Data in Motion
From Everand
Kafka Up and Running for Network DevOps: Set Your Network Data in Motion
Eric Chou
No ratings yet
Large Scale Machine Learning with Python
From Everand
Large Scale Machine Learning with Python
Bastiaan Sjardin
2/5 (1)
Automatic Image Annotation: Enhancing Visual Understanding through Automated Tagging
From Everand
Automatic Image Annotation: Enhancing Visual Understanding through Automated Tagging
Fouad Sabry
No ratings yet
Automatic Image Annotation: Fundamentals and Applications
From Everand
Automatic Image Annotation: Fundamentals and Applications
Fouad Sabry
No ratings yet

Lecture 09 RAG

Uploaded by

Lecture 09 RAG

Uploaded by

RAG (Retrieval

Query LLM Response

• Subdocument Chunking: It summarizes entire documents, attaches the

• Hugging Face Transformers: A comprehensive library offering a wide range of pre-

You might also like