0% found this document useful (0 votes)
6 views16 pages

Lecture 09 RAG

RAG (Retrieval Augmented Generation) is a system that combines large language models (LLMs) with reliable external knowledge bases to generate responses, reducing hallucinations and ensuring up-to-date information. The document outlines the basic pipeline of RAG systems, including text chunking methods, advanced techniques, and open-source tools for chunking, embedding, and indexing. It also discusses the importance of indexing and similarity search algorithms for efficient retrieval in RAG implementations.

Uploaded by

classaen9
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views16 pages

Lecture 09 RAG

RAG (Retrieval Augmented Generation) is a system that combines large language models (LLMs) with reliable external knowledge bases to generate responses, reducing hallucinations and ensuring up-to-date information. The document outlines the basic pipeline of RAG systems, including text chunking methods, advanced techniques, and open-source tools for chunking, embedding, and indexing. It also discusses the importance of indexing and similarity search algorithms for efficient retrieval in RAG implementations.

Uploaded by

classaen9
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 16

RAG (Retrieval

Augmented Generation)
Young-Sik Choi,
Department of Artificial Intelligence
Korea Aerospace University
What is RAG?
• LLM이 신뢰할 수 있는 외부 지식 베이스를 참조하여 응답을 생
성하는 시스템
• 신뢰할 수 있는 지식과 결합해서 응답을 생성하므로 ‘hallucination’을
줄일 수 있고,
• 학습할 때의 데이터에 비하여, 최신 지식에 근거해서 응답을 생성할 수
있고,
• 생성된 응답의 출처를 제시할 수 있다.
• 검색 서비스와 결합된 LLM은 일종의 RAG 시스템
• Perplexity, SearchGPT 등
Basic Pipeline of RAG System
Indexing

Embedding Index
Documents
Chunks

Retrieval Generation
Index Top k

Query LLM Response


Text Chunking
https://github.com/BARG-Curtin-University/llm-chunking-stratagies
Basic Chunking Methods
• Character Splitting: Dividing text strictly by character count, which can
distort words and meanings, reducing response quality.
• Recursive Character Splitting: Using delimiters like new lines or specific
characters to split text recursively, providing slightly more context than
basic character splitting.
• Document-Based Chunking: Splitting text based on document types or
structures, like Python code or Markdown, aiming to retain more
context compared to basic methods.
Advanced Chunking Techniques
• Semantic Chunking: Using embeddings to analyse the semantic relationship
between text segments, grouping chunks based on meaning and significantly
improving the relevancy of data chunks.

• Agentic Chunking: With this method chunks are processed using a large
language model to ensure each chunk stands alone with complete meaning,
enhancing coherence and context preservation.

• Subdocument Chunking: It summarizes entire documents, attaches the


summaries as metadata to each chunk’s embedding, and uses a hierarchical
retrieval process searching summaries first to improve efficiency and accuracy.
Opensource for Chunking Text
• NLTK (Natural Language Toolkit) & spaCY: Support Basic chunking methods

• semchunk: A fast and lightweight Python library designed to split text into semantically
meaningful chunks. It supports various tokenizers and allows customization of chunk sizes.

• semantic-text-splitter: This Python library divides text into semantic chunks up to a desired size,
supporting length calculations by characters and tokens. It's callable from both Rust and Python,
making it versatile for different applications.

• semantic-chunker: A versatile library that divides text into semantically meaningful chunks by
employing a "Bring Your Own Embedder" approach. Users can provide their own embedding
functions to map text into vector spaces, facilitating flexible and context-aware chunking.
Text Embedding
Text and Code Embedding by Contrastive Pre-Training (OpenAI, 2022)
Encoder maps a chunk to embedding
Cosine Similarity Between Vectors
Contrastive Learning
Opensource for Text Embedding
• Sentence Transformers: A Python framework that provides state-of-the-art pre-
trained models for generating sentence and text embeddings. It supports tasks like
semantic search, clustering, and paraphrase mining.

• FastEmbed: A lightweight and fast Python library designed for embedding generation.
It supports popular text models and is optimized for speed and efficiency, making it
suitable for serverless runtimes.

• Hugging Face Transformers: A comprehensive library offering a wide range of pre-


trained models for generating embeddings, including BERT, GPT, and RoBERTa. It
supports both text and code embeddings, facilitating diverse NLP tasks.
Indexing
Billion-scale similarity search with GPUs (Facebook AI Research 2017)
IVFADC (Inverted File with
Asymmetric Distance Computation)
• Inverted File System
• Coarse Quantization: The data space is partitioned into distinct cells using a
coarse quantizer, typically achieved through k-means clustering. Each cell
corresponds to a cluster centroid.
• Indexing: Each data point is assigned to the nearest centroid, and these
assignments are stored in an inverted index, allowing for efficient retrieval of data
points associated with specific centroids.
• Product Quantization
• Residual Quantization: After assigning a data point to a centroid, the residual
vector (the difference between the data point and the centroid) is computed.
• Subspace Decomposition: The residual vector is divided into sub-vectors, each
quantized separately using pre-trained codebooks.
Retrieval: 먼저 IVF에서 𝜏개의 cluster를 선택하고, 그 안의 벡터들과 거리 계산하여 k-nearest 벡터를 반환한다.
Similarity Search on GPUs
• A GPU k-selection algorithm: operating in fast register memory and
flexible enough to be fusible with other kernels, for which we provide a
complexity analysis
• A Near-Optimal Algorithmic Layout: for exact and approximate k-
nearest neighbor search on GPU
• A Range of Experiments: show that these improvements outperform
previous art by a large margin on mid- to large-scale nearest-neighbor
search tasks, in single or multi-GPU configurations.
Opensource for Indexing (Retrieval)
• Faiss: Faiss is renowned for its high performance in similarity search and
clustering of dense vectors. Its comprehensive documentation and active
community contribute to its widespread adoption.
• HNSWlib: implements the Hierarchical Navigable Small World (HNSW)
algorithm for approximate nearest neighbor searches.
• Annoy: Developed by Spotify, Annoy (Approximate Nearest Neighbors Oh
Yeah) is a C++ library with Python bindings designed for fast approximate
nearest neighbor searches in high-dimensional spaces.
• ScaNN: Developed by Google Research, ScaNN (Scalable Nearest Neighbors)
is a high-performance library for efficient vector similarity search at scale.

You might also like