0% found this document useful (0 votes)
5 views

Tutorial Membuat RAG AI ChatBot API Dengan Python FastAPI Dan Open Source LLMs

The document outlines a tutorial for creating a Retrieval Augmented Generation (RAG) AI ChatBot API using Python FastAPI and open-source Large Language Models (LLMs). It covers core concepts of LLMs, their limitations, and the advantages and challenges of using open-source versus paid models. The tutorial includes practical steps for setting up the environment, building the API, integrating LLMs, and managing document uploads.

Uploaded by

teguhteja
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views

Tutorial Membuat RAG AI ChatBot API Dengan Python FastAPI Dan Open Source LLMs

The document outlines a tutorial for creating a Retrieval Augmented Generation (RAG) AI ChatBot API using Python FastAPI and open-source Large Language Models (LLMs). It covers core concepts of LLMs, their limitations, and the advantages and challenges of using open-source versus paid models. The tutorial includes practical steps for setting up the environment, building the API, integrating LLMs, and managing document uploads.

Uploaded by

teguhteja
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 41

Membuat RAG AI

ChatBot API Dengan


Python FastAPI dan
Open Source LLMs
By Irvan Fauziansyah
Who am I?
● Meetap Member Chapter Tangerang
○ Gabung sekarang dan rasakan manfaatnya: https://linktr.ee/Meetap
● Beginner programmer who love to learn and share
● Let’s connect
○ Linkedin: https://linkedin.com/in/IrvanFza
○ X/Twitter: https://x.com/IrvanFza
○ Website: https://irvan.cc
Media Partners - Big Thanks to:
Limitation and Disclaimer
● Limitation
○ Will not cover the fundamental of AI (Machine Learning, NLP, and so on)
○ Will not cover programming language (Python) basic knowledge
○ Will not cover the frontend UI integration (API only)
○ Will not cover the deployment process
● Disclaimer
○ I am not AI Expert
○ I am not AI Engineer
○ Just a beginner programmer who love to learn and share
● Prerequisites to follow the tutorial
○ Python installed (version 3.7 or higher)
○ Basic understanding of Python programming
○ Basic knowledge of web development concepts
○ Familiarity with command-line interface (CLI)
○ Git installed for version control
○ An IDE or text editor of your choice (e.g., VSCode, PyCharm)
Introduction Agenda
1. Understanding the Core Concepts
a. What is a Large Language Model (LLM)?
b. Limitations of LLMs
c. Introduction to Retrieval Augmented Generation (RAG)
d. Text Embeddings and Vector Databases
2. Open Source vs. Paid LLMs
a. Pros and Cons of Open Source LLMs
b. Comparison with Paid LLMs (e.g., OpenAI)
c. Cost Estimations and Use Case Recommendations
3. Challenges and Best Practices
a. Hosting Open Source LLMs
b. Mitigating Challenges
4. Running Open Source LLMs
a. Different Methods and Tools
b. Why Build Your Own API with FastAPI
What is a Large Language Model LLM?
● Definition:
○ AI models trained on vast text data to understand and generate human-like language.
○ Dataset can be books, articles, websites, or any other form of written content.
● Examples: GPT-3, GPT-4, GPT o1, GPT-Neo, LLaMA.
● Capabilities:
○ Natural language understanding.
○ Text generation and completion.
○ Language translation and summarization.
● How LLM works?
○ Input: the LLM takes in input data, which can be text, questions, or any other form of language.
○ Masked language modeling: the LLM tries to predict which word or subword was missing or
incorrect from an input token.
○ Next sentence prediction: the model will try to predict the next sentence in a sequence of text
based on the input.
○ Language modeling: The model will try to predict the next word in a sequence of text, given the
context provided by previous words.
LLM Layers in the Field of AI

Image source: https://medium.com/data-science-at-microsoft/how-large-language-models-work-91c362f5b78f


How LLM Works

Image source: https://medium.com/data-science-at-microsoft/how-large-language-models-work-91c362f5b78f


Limitations of LLM
● Hallucinations:
○ Generating plausible but incorrect or nonsensical answers.
○ Presenting false information confidently.
● Context Limitations:
○ Limited context window; may forget earlier conversation parts.
○ Difficulty maintaining long-term conversation coherence.
● Data Biases:
○ Reflecting biases present in training data.
○ Potentially producing biased or unfair content.
● Resource Intensive:
○ Require significant computational power to run.
○ High memory and processing demands.
Retrieval Augmented Generation RAG
● Concept:
○ Combines LLMs with an external knowledge base.
○ Grounds model responses in specific, factual information.
● Benefits:
○ Reduces Hallucinations: Provides accurate, contextually relevant answers.
○ Up-to-Date Information: Access to current data not present in training.
○ Customization: Tailors responses to domain-specific knowledge.
● RAG Process:
○ Query Understanding: Analyze user's question.
○ Document Retrieval: Fetch relevant information from the knowledge base.
○ Response Generation: LLM generates an answer using retrieved data.
Image source: https://qdrant.tech/articles/what-is-rag-in-ai/
Text Embeddings and Vector Databases
● Text Embeddings
○ Definition: Numerical representations of text capturing semantic meaning.
○ Purpose:
■ Allows comparison of text based on meaning, not just keywords.
○ Facilitates similarity searches in a high-dimensional space.
● Vector Databases
○ Function:
■ Store and index text embeddings.
■ Enable efficient retrieval of semantically similar documents.
● Benefits
○ Efficient Retrieval:
■ Quickly find relevant documents among large datasets.
○ Enhanced Search Capabilities:
■ Goes beyond keyword matching to understand context and semantics.
○ Scalability:
■ Designed to handle and search through massive amounts of data efficiently.
RAGʼs Seven Deadly Sins
● Missing Content
● Missed Top Ranked
● Not in Context
● Wrong Format
● Incomplete Data
● Not Extracted
● Incorrect Specificity

Source: https://arxiv.org/pdf/2401.05856
Source: https://arxiv.org/pdf/2401.05856
Open Source LLMs
● Pros
○ Cost-Effective: No licensing fees or per-use costs.
○ Customizable:
■ Full control over model parameters and architecture.
■ Ability to fine-tune on specific datasets.
○ Transparency:
■ Access to source code and model internals.
■ Facilitates auditing and understanding model behavior.
● Cons
○ Resource Requirements:
■ Requires significant computational power, especially GPUs.
■ Higher upfront infrastructure costs.
○ Maintenance: Responsibility for updates, bug fixes, and security patches.
○ Performance: May lag behind the latest advancements in paid models.
Paid LLMs (e.g., OpenAI
● Pros
○ Ease of Use:
■ Simple API integrations with extensive documentation.
■ Quick setup without the need for specialized infrastructure.
○ Performance: Access to cutting-edge models with superior capabilities.
○ Support:
■ Professional customer service.
■ Regular model updates and improvements.
● Cons
○ Cost:
■ Usage fees based on number of requests or tokens processed.
■ Costs can escalate with high-volume usage.
○ Data Privacy:
■ Sending data to third-party servers may raise compliance issues.
■ Potential concerns over proprietary data exposure.
○ Customization: Limited ability to modify or fine-tune models beyond provided parameters.
Use Case and Recommendations
● Open Source LLMs
○ Cost Considerations:
■ Initial Setup: Investment in hardware or cloud infrastructure.
■ Operational Costs: Ongoing expenses for maintenance and energy consumption.
○ Best Suited For:
■ Projects requiring full control over the model.
■ Applications needing to process sensitive data in-house.
■ Organizations with the technical expertise to manage ML infrastructure.
● Paid LLMs
○ Cost Considerations:
■ Usage-Based Pricing: Charges per API call or per token (e.g., OpenAI's GPT-4).
■ Scalability Costs: Expenses grow proportionally with usage.
○ Best Suited For:
■ Rapid development and deployment needs.
■ Access to state-of-the-art model performance.
■ Teams without extensive ML infrastructure experience.
Challenges and Best Practices
● Challenges
○ Hardware: GPUs or high-performance CPUs are necessary for reasonable inference times.
○ Expertise Needed:
■ Requires knowledge in machine learning deployment and optimization.
■ System administration skills for infrastructure setup and management.
○ Scalability Issues: Difficulty handling increased user requests without performance degradation.
● Best Practices
○ Cloud Services: Use GPU-enabled instances from providers like AWS, GCP, Azure.
○ Optimized Implementations:
■ Utilize efficient libraries (e.g., LLaMA.cpp) for better performance on less powerful hardware.
○ Containerization:
■ Deploy models using Docker for consistent environments and easier management.
○ Monitoring and Logging:
■ Implement tools to track performance metrics and system health.
■ Use services like Prometheus, Grafana, or CloudWatch.
Running Open Source LLMs 1
● LLaMA.cpp (https://github.com/ggerganov/llama.cpp)
○ Description: Lightweight implementation for running LLaMA models on CPUs.
○ Pros:
■ Low resource requirements.
■ Can run on consumer-grade hardware.
○ Cons:
■ Slower inference compared to GPU solutions.
○ May not handle large models effectively.
● vLLM (https://docs.vllm.ai/)
○ Description: Optimized inference and serving engine for LLMs.
○ Pros:
■ High throughput and low latency.
■ Supports dynamic batching.
○ Cons:
■ Requires GPUs for optimal performance.
■ More complex setup.
Running Open Source LLMs 2
● HuggingFace (https://huggingface.co/)
○ Description: Comprehensive library for state-of-the-art models.
○ Pros:
■ Wide range of pre-trained models.
■ Active community and support.
○ Cons:
■ May need adaptation for production-scale deployment.
■ Can be resource-intensive.
● Ollama (https://ollama.com/)
○ Description: Platform simplifying the deployment of large models.
○ Pros:
■ Streamlined setup process.
■ Focus on simplifying ML deployment.
○ Cons:
■ Less flexibility in customization.
■ May have limitations in model choices.
Running Open Source LLMs 3
● LM Studio (https://lmstudio.ai/)
○ Description: User-friendly interface for running language models locally.
○ Pros:
■ Easy to use GUI.
■ Good for experimentation.
○ Cons:
■ Not ideal for backend API integrations.
■ Limited scalability.
Why Building Your Own API with FastAPI
● Customization:
○ Tailor API endpoints to specific application requirements.
● Scalability:
○ Optimize performance to handle increased traffic.
● Integration:
○ Seamlessly connect with other services (databases, frontends, etc.).
● Control:
○ Full oversight over data processing and security.
● Flexibility:
○ Implement custom authentication and authorization mechanisms.
○ Adjust middleware and request handling as needed.
Live Demo Agenda
1. Building the API with FastAPI
a. Setting Up the Environment
b. Creating Basic API Endpoints
c. Integrating Open Source LLMs
2. Managing and Parsing Documents
a. Creating the Document Upload Endpoint
b. Parsing Text Documents
c. Parsing PDF Documents with OCR
3. Database Integration and Management
a. Setting Up PostgreSQL
b. Integrating the Database with FastAPI
c. Testing the FastAPI Application
4. Configuring the LLM Model
a. Understanding Model Parameters
b. Adjusting Configurations for Optimal Performance
Setting Up the Environment
● Prerequisites
○ Python 3.7+ installed
○ Basic Python programming knowledge
○ Familiarity with command-line interface (CLI)
○ Git installed for version control
○ An IDE or text editor (e.g., VSCode, PyCharm)
● Creating a Virtual Environment
○ Create Virtual Environment: python -m venv env
○ Activate the Environment:
■ macOS/Linux: source env/bin/activate
■ Windows: env\Scripts\activate
● Installing Required Packages with UV
○ https://github.com/astral-sh/uv
Building the API with FastAPI 1
● Creating Basic API Endpoints
○ Create main.py: The main application file.
○ Import FastAPI: from fastapi import FastAPI
○ Initialize the App: app = FastAPI()
● Creating the Root Endpoint
○ Define Root Endpoint:
@app.get("/")
def read_root():
return {"message": "Welcome to the RAG AI ChatBot API"}
○ Explanation:
■ This endpoint responds to GET requests at the root URL.
■ Returns a welcome message as a JSON response.
Building the API with FastAPI 2
● Creating the Generate Endpoint
○ Define Generate Endpoint:
@app.post("/generate")
def generate_response(user_input: str):
# Placeholder for response generation
return {"response": f"Echo: {user_input}"}
○ Explanation:
■ This endpoint responds to POST requests at /generate.
■ Receives user_input as a string.
■ Currently echoes back the input; to be updated with LLM integration.
Integrating Open Source LLMs 1
● Loading the Model and Tokenizer
○ Import Libraries:
from transformers import AutoModelForCausalLM, AutoTokenizer
○ Load Pre-trained Model and Tokenizer:
tokenizer = AutoTokenizer.from_pretrained("gpt2")
model = AutoModelForCausalLM.from_pretrained("gpt2")
● Updating the Generate Endpoint
○ Modify generate_response Function:
○ Explanation:
■ Encoding Input:
● Adds end-of-sequence token to user input.
● Converts text to tensor format for the model.
■ Generating Output:
● Uses the model to generate a response.
● Specifies maximum length and padding behavior.
■ Decoding Output:
● Converts generated tokens back to human-readable text.
● Skips special tokens during decoding.
Integrating Open Source LLMs 2
● Updating the Generate Endpoint
○ Modify generate_response Function:
@app.post("/generate")
def generate_response(user_input: str):
inputs = tokenizer.encode(user_input + tokenizer.eos_token, return_tensors="pt")
outputs = model.generate(inputs, max_length=500, pad_token_id=tokenizer.eos_token_id)
text = tokenizer.decode(outputs[0], skip_special_tokens=True)
return {"response": text}

○ Explanation:
■ Encoding Input:
● Adds end-of-sequence token to user input.
● Converts text to tensor format for the model.
■ Generating Output:
● Uses the model to generate a response.
● Specifies maximum length and padding behavior.
■ Decoding Output:
● Converts generated tokens back to human-readable text.
● Skips special tokens during decoding.
Integrating Open Source LLMs 3
● Understanding Model Parameters
○ max_length:
■ Sets the maximum number of tokens to generate.
■ Prevents overly long responses.
○ pad_token_id:
■ Specifies the token used for padding sequences.
■ Ensures consistent sequence lengths.
Managing and Parsing Documents 1
● Creating the Document Upload Endpoint
○ Add File Upload Support:
from fastapi import File, UploadFile
@app.post("/upload")
async def upload_document(file: UploadFile = File(...)):
contents = await file.read()
# Process the contents
return {"filename": file.filename, "status": "Processed"}
○ Explanation:
■ upload_document Endpoint:
● Asynchronously handles file uploads.
● Uses UploadFile for efficient file handling.
■ Processing Steps:
● Reads contents of the uploaded file.
● Placeholder for parsing and storing the document.
Managing and Parsing Documents 2
● Parsing Text Documents
○ Function to Parse Text Content:
def parse_text_document(contents):
text = contents.decode("utf-8")
# Further processing (e.g., cleaning, tokenization)
return text
○ Explanation:
■ Decoding Content:
● Converts byte data to a string using UTF-8 encoding.
■ Additional Processing:
● Prepare text for embedding (e.g., removing noise).
Managing and Parsing Documents 3
● Parsing PDF Documents with OCR
○ Add Dependencies:
uv add PyPDF2 pytesseract pillow
○ Function to Parse PDF Content:
import PyPDF2
from PIL import Image
import pytesseract
import io
def parse_pdf_document(contents):
pdf_reader = PyPDF2.PdfReader(io.BytesIO(contents))
text = ""
for page in pdf_reader.pages:
text += page.extract_text()
# If text extraction fails, use OCR
if not text.strip():
images = convert_pdf_to_images(contents)
for image in images:
text += pytesseract.image_to_string(image)
return text
Managing and Parsing Documents 4
● Explanation of PDF Parsing
○ PDF Text Extraction:
■ Uses PyPDF2 to extract text directly from PDF pages.
○ Handling Non-Text PDFs:
■ Some PDFs are scanned images without embedded text.
■ If direct extraction yields no text, fallback to OCR.
○ OCR Process:
■ Converts PDF pages to images.
■ Uses pytesseract to perform OCR on images.
■ Extracts text from images for further processing.
Database Integration and Management
● Setting Up PostgreSQL
○ Install PostgreSQL:
■ Download and install from the official website.
○ Create a New Database:
psql -U postgres
CREATE DATABASE rag_chatbot;
○ Explanation:
■ psql: PostgreSQL command-line interface.
■ Database Creation: Initializes a new database for the application.
What's Next? 1
● RAG Algorithm Enhancement
○ Advanced Retrieval Techniques:
■ Implement BM25, TF-IDF, or Hybrid Retrieval combining semantic and keyword search.
○ Contextual Embeddings:
■ Use models that consider conversation history for more coherent responses.
○ Feedback Mechanisms:
■ Incorporate user feedback to refine retrieval and response generation.
■ Adjust weights or retrain models based on interactions.
● Finding Best and Optimized Open Source LLMs
○ Model Comparison: Evaluate models like GPT-J, BLOOM, LLaMA-based models.
○ Resource Considerations: Balance model size and performance with hardware capabilities.
○ Optimization Techniques:
■ Quantization: Reduce model size by decreasing precision (e.g., from 32-bit to 8-bit).
■ Pruning: Remove redundant weights to streamline the model.
○ Benchmarking:
■ Test models in your specific use case to determine the best fit.
What's Next? 2
● Authentication and Deployment
○ Authentication Methods
■ OAuth 2.0:
● Industry-standard protocol for authorization.
● Allows third-party applications to access user data without exposing credentials.
■ JSON Web Tokens (JWT):
● Stateless authentication mechanism.
● Encode user information securely in tokens.
○ Deployment Strategies
■ Containerization (Docker): Package applications with all dependencies for consistency.
■ Orchestration (Kubernetes):
● Manage containerized applications across multiple hosts.
● Provides scalability and high availability.
■ Serverless Platforms:
● Use services like AWS Lambda or Google Cloud Functions.
● Automatically scale with demand; reduce server management overhead.
What's Next? 3
● Production-Ready System (1)
○ Scalability
■ Auto-Scaling:
● Configure systems to adjust resource allocation based on load.
■ Load Balancing:
● Distribute incoming traffic to prevent overloading a single instance.
○ Monitoring and Logging
■ Monitoring Tools:
● Prometheus, Datadog for metrics collection.
● Real-time alerts for system issues.
■ Logging Solutions:
● ELK Stack (Elasticsearch, Logstash, Kibana) for centralized log management.
● Facilitates debugging and performance tuning.
What's Next? 4
● Production-Ready System (2)
○ Security Best Practices
■ Encryption: Use HTTPS/TLS for secure data transmission.
■ Input Validation: Prevent injection attacks by validating and sanitizing inputs.
■ Rate Limiting:
● Protect against denial-of-service attacks.
● Control the number of requests a client can make in a given time frame.
○ Compliance and Data Privacy
■ Regulatory Compliance:
● Adhere to GDPR, CCPA, or other relevant regulations.
● Implement consent mechanisms where necessary.
■ Data Anonymization:
● Remove personally identifiable information from datasets.
■ User Consent and Control:
● Provide options for users to manage their data and privacy settings.
Thank
You!
If you have any other questions, you can get in touch
with me from the following media:
LinkedIn: https://linkedin.com/in/IrvanFza
Website: https://irvan.cc
Community: https://linktr.ee/Meetap

You might also like