0% found this document useful (0 votes)
22 views88 pages

GenAI Workflow Automation NPTEL Zoom Course

The document outlines a workshop on Generative AI (GenAI) and workflow automation, covering topics such as Natural Language Processing (NLP), Large Language Models (LLMs), and prompt engineering. Participants will learn to utilize tools like Google Gems and n8n for automating workflows and improving their understanding of GenAI applications. The workshop includes interactive activities and aims to enhance participants' skills in using GenAI effectively in various contexts.

Uploaded by

aneeshsv.career
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
22 views88 pages

GenAI Workflow Automation NPTEL Zoom Course

The document outlines a workshop on Generative AI (GenAI) and workflow automation, covering topics such as Natural Language Processing (NLP), Large Language Models (LLMs), and prompt engineering. Participants will learn to utilize tools like Google Gems and n8n for automating workflows and improving their understanding of GenAI applications. The workshop includes interactive activities and aims to enhance participants' skills in using GenAI effectively in various contexts.

Uploaded by

aneeshsv.career
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 88

Generative AI Agents and Workflow

Automation
Session 1

Introduction
About NPTEL+
elearn.nptel.ac.in

3
About EdTech Society
▪ Public forum, professional non-profit association, committed to
improving instruction and learning through the use of educational
technologies.
▪ Launched in April 2022.
▪ 370+ members; 2 online events per month; T4E conference per
year.

etsociety.org

4
Instructors

Ramkumar Rajendran Yash Desai


Introduction Activity

What comes to mind when you hear the word GenAI?

https://www.menti.com/alzsc5es5bja

Type Menti.com and enter the code: 1484 4863


What comes to mind when you hear the word
GenAI?
Knowledge Level in GenAI

What is your understanding of the term GenAI


1. I heard the term GenAI but never worked with it
2. I know ML basics but not GenAI
3. I use GenAI for language editing - email, summary, ChatGPT,
Gemini
4. I use GenAI in my workplace - what?
5. I know how NLP works but not GenAI
6. I know how GenAI works interested only agents and workflow
automation
Type Menti.com and enter the code: 1484 4863
Knowledge Level in GenAI
Objective of this Workshop
After the workshop learners will be able to:
- Explain the basics of NLP
- Familiarise with the terms used in building LLMs
- Utilise LLMs better using few strategies
- Use Google Gems, AI Studio
- Automate a workflow using n8n

Main project Submission is compulsory for the workshop completion.


Session 2

Basics of NLP
NLP – Analysing a sentence
▪ Lemmatisation – Grouping the inflected form of words
• Eating, Ate, Eats - Eat
• Talking, Talked, Talks, Talk – Talk Finding out the root word

▪ Stemming – Some set of rules applied to obtain the word


stem
• Remove ing, ed, ly, if the word ends with these suffixes

Remove suffixes

What is the difference – Ate will not become eat in Stemming but in Lemmatization it will become

Resource: Natural language processing course. Prof. Dan Jurafsky

12
Activity
NLP
▪ Example Sentence: Walking is good for health. But Jogging is better
than walking

13
Activity Response
NLP
▪ Example Sentence: Walking is good for health. But Jogging is better
than walking

▪ Lemmatised: Walk be good for health. But jog be good than walk
▪ Stemmed: walk is good for health. But jog is bett than walk

14
N-Gram
▪ Unigram Sentence: I like to drink coffee.
▪ Bi-gram
▪ tri-gram
Unigram Dictionary: Bigram Dictionary:
▪ N-gram I I like
Like Like to
To To drink
Drink Drink coffee
coffee

15
Which word will come next?
▪ Markov assumption: The P(current word) depends only on
last word P(Current Word/Previous word) ~
• P(Word_n)/,P (Word_n-1, Word_n-2)

Mark likes to eat meal with his family. Kail likes to sing and
eats meal with her friend. Kiran likes music.
P (to/likes) = ?
P (to/likes) = 2/3

16
How similar is two words or sentence?
▪ Minimum edit distance

▪ Which word is similar to analytics


▪ Analysis
▪ Lytics
▪ Anlytics

▪ How we find this?

17
How similar is two words or sentence?
▪ Minimum edit distance

▪ How we find this?


▪ Apply operators: Insert, delete, Substitute

▪ Have you have seen this application, if yes, where?

18
Bag of Words
▪ Word frequency
▪ Similar words
▪ Sparse Vector
Sentence 1: Students interact with peers in class.
Sentence 2: Peer instruction increases students’ interest
Bow_sen1 = {students:1, interact:1, with:1, peer:1, in:1, class:1}
Bow_sen2 = {peer:1, instruction:1, increases:1, students:1, interest:1}
Bow = {students:2, interact:1, with:1, peer:2, in:1, class:1, instruction:1,
increases:1, interest:1}

19
Bag of words
Bow = {1: students, 2:interact, 3:with, 4:peer, 5:class, 6:instruction,
7:increases, 8:interest}
Sentence 1: Students interact with peers in class.
Sentence 2: Peer instruction increases students’ interest
Sen1 = {1,1,1,1,1,0,0,0}
Sen2 = {1,0,0,1,0,1,1,1}
Index
Sen1 = {1,2,3,4,5}
Sen2 = {4,6,7,1,}

20
Activity
Bag of Words
▪ 100 students wrote essay and validated by human experts. If
we want to create a algorithm to grade essays?

21
Preliminary Idea
Bag of Words

22
Tools
▪ https://corenlp.run/

23
Session 3

Intro to WordEmbedding
Basics of LLMs
What is latest before GPT
▪ Word embedding
▪ Vector for each words
• We can define the dimension and context

Sample Two words, assume the size is only 2.


Pen = [3, 2] and Pencil = [4, 3]
How do you find the distance between these words?

“You shall know a word by the company it keeps” J. R. Firth

25
Word2Vec
Introduced by Mikolov et al., 2013, Word2Vec learns word
embeddings (dense vector representations of words) from raw text.
It uses a shallow neural network to predict either :

● A word from its context (CBOW)

● The context from a word (Skip-Gram)

After training, the hidden-layer weights become the word vectors that
capture semantic meaning .
CBOW & Skip-Gram
▪ Continuous Bag of Words (CBOW): Given surrounding words
(context window) as input, predict the target (center) word.The
model averages (or sums) the context-word embeddings and
applies a softmax output to guess the missing word.
▪ For example, in “The cat sat on the mat” with window size 2,
CBOW would use “The, cat, on, the” to predict “sat”.

▪ Skip-Gram: The inverse task. Given a center word, predict


surrounding context words.
▪ For example, input “sat” and try to predict “The, cat, on, the”.
▪ Skip-gram tends to work better for infrequent words, because it
generates multiple training pairs from one word.
Demo - Word2Vec

https://projector.tensorflow.org/

http://epsilon-it.utu.fi/wv_demo/

28
Word2Vec
▪ Word embedding
▪ Vector for each words
• We can define the dimension and context

Sample Two words, assume the size is only 2.


How do you find the distance between these words?

Euclidean Distance

http://epsilon-it.utu.fi/wv_demo/

https://code.google.com/archive/p/word2vec/ - Tomas Mikolov

“You shall know a word by the company it keeps” J. R. Firth


29
Encoder-Decoder Models

● Also known as Sequence-to-Sequence


(Seq2Seq) architecture.

● First popularized in machine translation (e.g.,


English → French)

● Two-part architecture :

○ Encoder reads the input sequence (e.g.,


“How are you?”)

○ Decoder generates output (e.g., “Comment


ça va ?”)
Encoder-Decoder Models
● Encoder : Input sequence is processes token by token to produce encapsulated
sequence's information.

● Context Vector : The final hidden state from the encoder, serving as a
condensed representation of the entire input sequence.

● Decoder : Utilizes the context vector to generate the output sequence,


producing one token at a time.
A significant limitation of the basic Seq2Seq model is its reliance on a fixed-length
context vector, which can lead to information loss, especially with longer input
sequences.
Attention - Solution to Single Vector
Bottleneck
1. Seq2Seq Bottleneck : Compressing the entire input into a single vector leads to
information loss, especially in long sequences.

2. Introduction of Attention : Allows the decoder to focus on different parts of the input
sequence at each step, mitigating the fixed-vector limitation.

3. Limitations of Attention : Despite improvements, attention mechanisms in RNN-based


models still face challenges like sequential processing and limited parallelization.

4. Emergence of Self-Attention : Replaces recurrence with mechanisms that enable


models to process sequences in parallel, enhancing efficiency and scalability.
Self Attention
1. Transforms input into Query, Key, and Value vectors to compute attention scores
based on similarity.

2. Captures long-range dependencies by allowing each token to attend to all others in


the sequence.

3. Enables contextual understanding by weighting input elements based on relevance.

4. Supports parallel computation , making it efficient and scalable for large datasets.
Transformers - Attention is All You Need
1. Encoder

2. Decoder

3. Attention

4. Feed Forward Network

5. Layer Normalization

6. Positional Encodin g
Transformers - Attention is All You Need
1. Encoder
Transformers - Attention is All You Need
2. Decoder
Transformers - Attention is All You Need
3 . Attention
Transformers - Attention is All You Need
4. Feed Forward Network

5. Layer Normalization

6. Positional Encodin g
Transformers - Bert & GPT
BERT - Bidirectional Encoder Representations from
Transformers
1. BERT , introduced in October 2018 by researchers at Google.
2. BERT utilizes only the encoder part of the Transformer architecture, comprising
multiple identical layers.Each token is represented by the sum of three
embeddings:Token,Segment and Position.
3. BERT is pre-trained on two unsupervised tasks: Masked Language Modeling (MLM)
and Next Sentence Prediction (NSP). After pre-training, BERT can be fine-tuned with
just one additional output layer to perform specific tasks like question answering,
sentiment analysis, or named entity recognition.
Visualize BERT

https://colab.research.google.com/drive/1hXIQ77A4TYS4y3UthWF-Ci7V7vVUoxm
Q?usp=sharing
GPT - Generative Pre-trained Transformer
1. GPT utilizes a multi-layer Transformer decoder
architecture with masked self-attention , enabling it
to consider preceding tokens when processing text.
2. Input tokens are converted into context vectors by
passing them through token and position
embedding layers
3. The final output is generated by a softmax layer ,
which produces a probability distribution over
potential target tokens.

4. After pre-training, the model's parameters are


adapted to specific supervised downstream tasks
using labeled data.
GPT’s Task-specific input transformations
▪ For some tasks, like text classification, GPT can be directly
fine-tune without any tweaks. Certain other tasks, like question
answering or textual entailment, have structured inputs such as
ordered sentence pairs, or triplets of document, question, and
answers.
▪ Since pre-trained model was trained on contiguous sequences of
text, some modifications were made to apply it to these tasks.
Visualize GPT

https://bbycroft.net/llm
LLMs - From GPT to Now
Session 4

How to use LLMs Effectively


How do you use LLMs effectively? - Brainstorm

Think what are the ways you improve your interaction with LLMs.

Type them as subjective answer.

Type Menti.com and enter the code: 1484 4863

Code Methods

Low Code Methods

No Code Methods
How do you use LLMs effectively? - Response
How to Customize LLMs effectively?
▪ Prompt Engineering
▪ RAG
▪ Agent
▪ Fine-Tuning
▪ Parameterization
▪ RLHF
Interactive Tools
▪ Prompt Engineering -> Frameworks - Choose your own and rate
them
▪ RAG -> Gemini Gems
▪ Parameterization -> Google Collab Notebook
▪ Agent - Gems, n8n
▪ Fine-Tuning
▪ RLHF
Prompt Engineering
▪ Prompt engineering is writing and optimizing prompts for LLMs
• The goal is to get optimal response
▪ Prompt - the input you provide to the model for a specific response.
▪ Why? Mostly people assume LLMs are human and not provide all relvent
deatils in the required format. LLMs are probabilistic sequence model that
predicts the next token based on context.
▪ Is it important to learn? Not for all tasks. For example prompt fine-tuning
systems like Anthropic’s Dashboard automated the prompt engineering. But
for sepcific it is good to understand

https://cloud.google.com/discover/what-is-prompt-engineering , https://console.anthropic.com/dashboard
Prompt Engineering - Components
▪ Prompt format
▪ Context and Examples
▪ Fine Tuning and Adapting
▪ Iterative Conversations

- RTF framework - Role, Task, and Format.


Prompt Engineering - Types of Prompts
▪ Direct Prompts (Zero-shot): Ask the model to perform a task
without giving any examples (e.g., summarization, translation).

▪ One-/Few-/Multi-shot Prompts: Provide one or more examples to


guide the model in generating desired responses.

▪ Chain of Thought (CoT) Prompts: Encourage step-by-step


reasoning for tasks requiring complex logic.

▪ Zero-shot CoT Prompts: Combine direct instructions with reasoning


to improve zero-shot performance.
Chain of Thought Prompting
▪ Introduced in Wei et al. (2022), chain-of-thought (CoT) prompting enables
complex reasoning capabilities through intermediate reasoning steps.
▪ It involves providing the LLM with a few examples (few-shot exemplars)
where the input is followed by a series of intermediate reasoning steps
(the "chain of thought") leading to the final output.
▪ The model is prompted to generate a coherent sequence of these
reasoning steps, mimicking a human-like thought process.
▪ This allows models to decompose complex, multi-step problems into
more manageable parts, significantly enhancing performance on tasks
requiring arithmetic, commonsense, and symbolic reasoning.

https://arxiv.org/pdf/2201.11903 , https://www.promptingguide.ai/techniques/cot
Effective Prompting Guidelines
▪ Set Clear Goals and Objectives- Define the desired length and format of
the input,specify the target audience.
▪ Important Terms related to Prompt Size-
• Token Limit – Maximum number of tokens (input + output) the
model can process.
Max Tokens

Use few-shot prompting and give different examples .

▪ Adjust the level of detail and specificity


▪ Leverage CoT
▪ Iterate and Experiment
Activity - Try Prompts
https://ai.google.dev/gemini-api/docs/prompting-strategies#additional-pr
ompt-guides

- Think of a Question you want answered from LLMs


- Develop Chain of Thoughts Prompting, Few shot prompts
- Use Gemini or GPT and test your prompts.
- Compare the responses
- Share your analysis
Automated Prompt Engineering - Tools
▪ https://platform.openai.com/docs/guides/text?api-mode=responses
▪ https://console.anthropic.com/dashboard
How to Customize LLMs effectively?
▪ Prompt Engineering
▪ RAG
▪ Agent
▪ Fine-Tuning
▪ Parameterization
▪ RLHF
Fine-Tuning - Theoretical Approach
▪ What is Fine-Tuning?
Modifies LLM weights using task-specific data to align it with niche objectives—unlike prompt
engineering or RAG, it updates model parameters directly.
▪ Full vs. PEFT (Parameter-Efficient Fine-Tuning):
Full fine-tuning updates all weights (resource-heavy, risk of forgetting); PEFT is lighter and avoids
these issues.
• Types of PEFT Methods:
• Selective: Fine-tune only part of the model.
• Reparameterization: Use efficient approximations (e.g., LoRA).
• Additive: Insert new trainable components like adapters or soft prompts.

Fine-Tuning Setup Needs:


Requires training data, evaluation metrics, pretrained model, and hyperparameters like learning rate
and optimizer.
Use Cases:
Used in instruction tuning (for chat & tasks) and domain adaptation (specializing in fields like law,
medicine, etc.)
How to Customize LLMs effectively?
▪ Prompt Engineering
▪ RAG
▪ Agent
▪ Fine-Tuning
▪ Parameterization
▪ RLHF
Parameterization
● Decoding = Decision-making during text generation based on token probabilities.
● Greedy Search: Picks the token with the highest probability at each step
(deterministic, fast, low creativity).
● Beam Search: Explores multiple token paths and selects the best complete
sequence (balanced and strategic).
● Example: num_beams=5 means the model evaluates 5 possible sequences before
finalizing output.
● Useful for: Structured responses like summaries or formal text completions.
● Default Behavior: Greedy search is applied if no sampling parameters are specified.
Parameterization
● Sampling = Controlled randomness in token selection for creative or varied outputs.
● Temperature: Controls sharpness of token distribution; lower = more focused, higher =
more diverse.
● Top-K Sampling: Samples only from the top K probable tokens (e.g., top_k=50).
● Top-P Sampling (Nucleus): Samples from the smallest set of tokens with cumulative
probability ≥ p (e.g., top_p=0.95).
● Example Use: Creative writing, brainstorming, dialogue generation.
Parameterization - Activity

Google Collab Link-

https://colab.research.google.com/drive/16NXoGqppNy6Pj20pwB7-ftpIZZRCBdwG?usp=sharing
How to Customize LLMs effectively?
▪ Prompt Engineering
▪ RAG
▪ Agent
▪ Fine-Tuning
▪ Parameterization
▪ RLHF
Reinforcement learning from human feedback -
RLHF
▪ RLHF fine-tunes LLMs by aligning their responses with human preferences.

▪ Involves training a reward model using human-annotated preference data.

▪ The reward model scores responses based on quality and preference.

▪ A policy model is then fine-tuned using PPO to maximize reward signals.

▪ Requires two datasets: preference pairs (for reward model) and prompt-responses (for
RL loop).
Reinforcement learning from human feedback - RLHF
▪ Implemented using libraries like trl (Transformers Reinforcement Learning).

▪ PPOConfig defines learning rate, batch size, and optimization epochs.

▪ PPOTrainer updates the model by optimizing response rewards iteratively.

▪ RLHF is effective at reducing hallucination, toxicity, and aligning tone.

▪ Limitations: High cost of human feedback and compute → Alternatives like DPO & RLAIF.
▪ HF LINK
Session 5

Agent and Workflow Automation


How to Customize LLMs effectively?
▪ Prompt Engineering
▪ RAG
▪ Agent
▪ Fine-Tuning
▪ Parameterization
▪ RLHF
Retrieval Augmented Generation - RAG
What is RAG?
A method that augments LLMs with external knowledge
to reduce hallucinations and improve domain-specific
responses. DB
Relevant
Query content
Vectors
Why use RAG?
Doesn’t modify model weights; it’s cost-effective and
requires no re-training for new domains.
LLMs

How it works – Two Stages:


Response
1. Retrieval: Find relevant knowledge chunks.
2. Generation: Use them to create a rich, grounded
answer.
Retrieval Augmented Generation - RAG
Retrieval Steps:
DB
Relevant
Query content
Vectors
▪ Chunking: Break documents into small, meaningful pieces.
▪ Embedding: Convert each chunk and the user query into vector
format.
▪ Indexing: Store vectors in a vector database for efficient search. LLMs
▪ Similarity Search: Find top matches based on vector closeness.

Generation Steps:
Response
▪ Augmented Query Creation:
Combine top retrieved chunks with the user query to form
context-enriched input.
▪ Generation Step:
Feed augmented query into the LLM to generate a more accurate,
knowledge-grounded response.
Retrieval Augmented Generation - Brainstorm
Advantages of RAG:
● Reduces hallucinations and improves factual accuracy. What will be your use
cases for RAG?
● Easily adapts general LLMs to domain-specific tasks. In which instances will
RAG be better than just
● Doesn’t require re-training or fine-tuning the base model.
simple prompting?
Common Use Cases:
● Customer support bots with internal docs.

● Academic or legal question answering.

● Enterprise knowledge assistants.


Think and Answer

What will be your use cases for RAG?


In which instances will RAG be better than just simple prompting?
Retrieval Augmented Generation - State Of The Art
● OpenAI (ChatGPT with HyDE)
Utilizes Hypothetical Document Embeddings (HyDE)
● Google DeepMind (Gemini 2.5 Pro)
Employs Long-Context Multimodal RAG: Integrates 1M+ token context windows
with multimodal inputs (text, images, audio) for comprehensive, real-time
reasoning.

● Anthropic (Claude 4 Opus)


Implements Hybrid Retrieval with Multilingual Support
● DeepSeek (Advanced RAG Chatbot)
Adopts Hybrid Search with Neural Reranking
Retrieval Augmented Generation - Activity

● Gemini Gems - Demo


● Create your own Gem now
● This Gem can be converted to your course project
How to Customize LLMs effectively?
▪ Prompt Engineering
▪ RAG
▪ Agent
▪ Fine-Tuning
▪ Parameterization
▪ RLHF
AI Agent
AI Agents
● What is an Agent?

An agent is a system that can perceive its environment through sensors, process this information, and act
upon the environment through actuators to achieve specific goals.

● What is an AI Agent?

An AI agent is an agent that applies artificial intelligence techniques—such as machine learning, search, logic,
or knowledge representation—to make decisions or improve its behavior over time.
AI Agents - Components
● Sensors: Collect data from the environment.
● Actuators: Execute actions in the environment.
● Percept Sequence: History of all that the agent has perceived.
● Agent Function: Maps percept sequence to actions.
● Agent Program: Implements the agent function.
● PEAS Framework
AI Agents - Characteristics
● Autonomy
● Reactive and Proactive Behavior
● Adaptability
● Goal-Oriented
● Interactivity
● Persistence
AI Agents - Tools

● Langchain -A powerful framework for developers to build complex, multi-step AI agents by


integrating LLMs with tools, memory, and external data sources via code.
Langflow : A visual no-code builder for creating LLM-powered agents and workflows using
LangChain components.
● Flowise : An open-source drag-and-drop interface to build and deploy LLM applications and
agents with ease.

● Autogen Studio : A Microsoft tool for visually building multi-agent LLM systems that
collaborate or converse with each other.

● Superagent : A platform to create and manage AI agents with memory, tool use, and API
integrations—backed by Y Combinator.

Google AI Studio Demo Video:


https://drive.google.com/file/d/1ugXDBVI60UQJagD8SaIAYOuCeyQmLMlb/view?usp=sharing
AI Agents - Workflow & Studios
● AI Agents perform intelligent tasks by combining perception, reasoning, and actions.

● Modern agents often operate through workflows —structured sequences of decisions, tool use,
and API calls.

● Visual Studios like Langflow, Flowise, and Autogen Studio help design these AI workflows
without intensive coding.

● This opens the door to tools like n8n , where agents and automations can be visually built and
deployed.
n8n - Community
▪ https://n8n.io/workflows/3499-ai-powered-student-assistant-for-course-information-via-
twilio-sms/
▪ https://n8n.io/workflows/
▪ You can contribute your template and publish it in the n8n community , being a
creator.

Demo Video: https://drive.google.com/file/d/1ysQsvk9urfaIMXnKXRM4OPozkL6HoM8Q/view?usp=sharing

Demo 2:
https://drive.google.com/file/d/15O9gkPSoXFYUu7nzwuSfK3rpfxtJFSEV
/view?usp=sharing
Course Project
Create n8n or customised Gem
Course Project
As an assessment of this workshop, participants should work on a
hands-on project applying the concepts learned.
Two options:

Build a RAG system


- Think about a use case in your domain
- Develop a chatbot using RAG using the documents provided for a
specific task

Design a workflow automation using n8n


- Identify a repetitive task or process in your domain.
- Use n8n to automate the workflow by connecting apps, or services.
Course Project Deliverables
● A working prototype or demo

● Short presentation (5–10 minutes) explaining the use case,


approach, and outcome - Record your video and share YouTube or
Google Drive Link

We will share the Google form to collect your responses. Due 13th June.
Next Session
14th June 5 pm

Project Presentation
- We will select 10-12 projects from the submissions and play the
videos - diverse projects will be selected
- Discuss on the project ideas
- Listen to different project and learn from them
Thank You
Final Submission
Title

Description

Photo of the learner

N8n link , workflow screenshot,

gem startup ss and example prompt

Consent to share the project in public

Do you agree to share your project in the public domain?

Yes, No

You might also like