Module 1 - Intro To GenAI - PEC - Gen - AI - Training
Module 1 - Intro To GenAI - PEC - Gen - AI - Training
Ubaid Ullah
Chemist from University of Sialkot (USKT)
Trainer and Moderator at ICodeGuru.
Module 1 Agenda
Understanding A Generative AI Model
○ Finding a Generative Model
○ Transformer
○ Understanding ChatGPT Model Process
○ Stemming and lemmatization
○ Understating its parameters, inputs required, performance requirement
○ Creating and using an API Key
○ Vector Database
○ RAG and its different Techniques Theoretical
Leading Engineers Forward: PEC Generative AI Training Program - Cohort 2
Leading Engineers Forward: PEC Generative AI Training Program - Cohort 2
Deep Learning
DL is a type of machine learning that uses Artificial Neural Networks to learn complex patterns from
the data.
Leading Engineers Forward: PEC Generative AI Training Program - Cohort 2
Generative AI
• Generative AI is a subset of AI that aims to generate new content from given instructions
(Prompt).
Leading Engineers Forward: PEC Generative AI Training Program - Cohort 2
Leading Engineers Forward: PEC Generative AI Training Program - Cohort 2
● LLM APIs
○ GPT (3.5, 4, 4 turbo, 4-o, 4-o turbo)
○ Gemini
○ Mistral
○ Whisper
● Specialized models
○ YOLO
○ Google T5-base
○ Whisper
Leading Engineers Forward: PEC Generative AI Training Program - Cohort 2
Chat Interfaces
• Simplest way to interact with LLMs (Large Language Models)
• These interfaces exist for most of the big LLMs available today e.g.
• ChatGPT
• Gemini
• Claude
• Groq
• Most of these services have a free plan that limits usage (certain number of prompts per day) or
limits the models available (ChatGPT 3.5 vs. 4)
• These chat interfaces are a great way to play around with LLMs, work on prompting skills and
getting everyday tasks done like writing emails, documentation, study questions, etc.
Leading Engineers Forward: PEC Generative AI Training Program - Cohort 2
Chatgpt.com
Leading Engineers Forward: PEC Generative AI Training Program - Cohort 2
Leading Engineers Forward: PEC Generative AI Training Program - Cohort 2
Leading Engineers Forward: PEC Generative AI Training Program - Cohort 2
What is an API?
• APIs, or Application Programming Interfaces, are like waiters at a restaurant. They
take requests from you (the customer), tell the kitchen (the server) what you
want, and then bring the response back to you.
• Imagine a Restaurant:
• You (the customer): Want to order food (ask for data or action).
• Waiter (API): Takes your order to the kitchen (server).
• Kitchen (Server): Prepares the food (processes the request).
• Waiter (API): Brings the food back to you (returns the data).
• Simple Breakdown:
• You make a request: "What’s the weather in New York?"
• API delivers the request: Takes it to the server.
• Server processes: Finds the weather data.
• API brings back: "The weather is sunny!"
Leading Engineers Forward: PEC Generative AI Training Program - Cohort 2
What is “GPT”?
Generative Pre-trained Transformer
(attention is all you need)
Embedding • Similar chunks end up as vectors close to each other (words with similar meaning,
pixels that have similar RGB values)
• Can do vector operations like add, subtract, dot, and cross products
• [Vector for “King”] – [Vector for “Man”] + [Vector for “Woman”] = [Vector for “Queen”]
Leading Engineers Forward: PEC Generative AI Training Program - Cohort 2
• Removing these stop words will lower tokens used and will help you get a more
bang for your buck when fine-tuning a Large Language Model
• Utilities to remove them: NLTK, a python package, is the most used way to
remove them, the full list of stop words can be found here.
Leading Engineers Forward: PEC Generative AI Training Program - Cohort 2
• Doing this standardizes words and allows models to be sure of each word’s
meaning
• The most common one is PorterStemmer, which uses rules such as removing
common suffixes like “ed”, “ing”, converting plurals to singulars (cars to car), and
• Use it wisely! Be cautious of ”over stemming” – All stemmers have false positives where a
word might be reduced to a form that is meaningless, like “university” might be reduced
to “univers”
• Stemming words can make them lose their contextual meaning e.g. agreement -> agree
• Reduces words to their dictionary basic form (lemma) so ”university” will remain as such
• Helps reduce overhead and reduce dimensionality of vectors that words produce
• Helps info retrieval: “Best coffee” will also retrieve results for “good coffee”
• Context can still be lost. E.g. “running" and "run" have different nuances in certain contexts, which might be
lost after lemmatization.
• Rules are specific to each language, requiring different approaches and tools for different languages.
• Ambiguous words can lead to incorrect lemmatization. For example, "bats" can be the plural of "bat" (the
animal) or a form of the verb "to bat".
• Different lemmatization tools might produce inconsistent results for the same text. (so use the same tool
please)
Leading Engineers Forward: PEC Generative AI Training Program - Cohort 2
What is RAG?
Why RAG?
Language Models (LMs) like GPT-4 are powerful for generating human-like text
but face key limitations in certain tasks.
Challenges with LLMs:
● Static Knowledge: LLMs are limited to knowledge up to their last training cut-off and lack real-time
updates.
● Contextual Limits: They struggle with generating accurate responses for highly specific or less
common topics without sufficient context.
● Large Scale Data Handling: Handling vast amounts of information and ensuring relevance and
accuracy in responses can be challenging.
Leading Engineers Forward: PEC Generative AI Training Program - Cohort 2
Benefits of RAG
Updated Knowledge: Access to real-time or recent information.
Types of RAG
● Simple RAG
● Simple RAG with Memory
● Branched RAG
● Adaptive RAG
● Corrective RAG (CRAG)
● Self-RAG
● Agentic RAG
Leading Engineers Forward: PEC Generative AI Training Program - Cohort 2