0% found this document useful (0 votes)

22 views25 pages

DAB311 DL Week 11 RNN

The document discusses the need for sequential modeling in machine learning, highlighting the limitations of Fully Connected Networks (FCN) and Convolutional Neural Networks (CNN) in handling fixed input dimensions and lack of memory. It emphasizes the importance of sequential models for time series data, such as video and autonomous vehicle data, which exhibit periodic cycles, trends, and sudden changes. The motivation for using sequential models is to effectively capture and analyze temporal patterns in data.

Uploaded by

sidharth

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

22 views25 pages

DAB311 DL Week 11 RNN

Uploaded by

sidharth

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 25

Sequential Modelling

Week 11
Need for sequential modelling
• Fully Connected Network (FCN)
• Fixed input dimension
• e.g : input [x1 x2 x3 x4 ……… xn ]
• If input size is <=n → set zeros
• If input size is > n → ignore the input data

• Convolutional Neural Network (CNN)

• Carry spatial information
• Good for image data

FCN and CNN: FCN and CNN:

▪ Output for a given snapshot ▪ Fixed input dimension
▪ Next set of input is treated a new snapshot ▪ Does not carry memory
Need for sequential modelling (cont’d)
• Motivation for sequential Models:
➢ Time series data E.g.: for time series data
• Video
o Periodic cycles
• Autonomous vehicle: Object
o Trends state
o Regularity • Electric circuit
o Sudden spikes/drops • Temperature variation
• Stock price

➢ Natural Language:
o Email auto complete
o Translation (e.g.: English to French)
o Sentiment analysis
Need for sequential modelling (cont’d)
• Today is the coolest temperature in Windsor
NLP:
1 2 3 4 5 6 7
▪ Varying input size
• The historical average temperature in November is 12 degree Celsius
1 8 9 5 6 10 2 11 13 14

Tokenization is the process of breaking down text into smaller, manageable pieces called "tokens.“

Word tokenization - ["I", "love", "NLP"].

Character tokenization – ["N", "L", "P"]

A token ID is a numerical identifier assigned to each token during the tokenization process
Need for sequential modelling (cont’d)
salary Loan Rejected

Credit score
Loan granted
Sequence does not matter
Problems:
Experience
• Varying input size
Needs Verifcation • Too much computation
Age • No parameter sharing

I Neutral

like Positive
Sequence matters
this
Negative
dish
Recurrent Neural Network (RNN)

Neural Network unwrap

Simple RNN : one hidden layer

Deep RNN : many hidden layer

Issues with RNN
• Vanishing gradient
• Exploding gradient

Long Short-Term Memory (LSTM)

• LSTMs introduce special units called memory cells to store information across time steps in a sequence. These
cells can maintain their state (memory) over a longer period of time than traditional RNN units.

• The memory cells are controlled by three gates: input gate, forget gate, and output gate. These gates allow
LSTMs to decide which information to keep, which to discard, and which new information to add.

Gated Recurrent Unit (GRU)

• GRUs are similar to Long Short-Term Memory (LSTMs) but have a simpler structure and fewer parameters,
making them computationally more efficient.
Large Language Models (LLM)
• Language Models:
➢ Basic NLP tasks (answering questions, translation, sentiment analysis)

• LLM is a form of Generative Artificial Intelligence (GenAI – Able to generate new content)

• LLM is a Neural Network designed to

➢ Understand
➢ Generate
➢ Respond
to human like texts
• Deep NN trained on massive (large) amount of data

Why do we call Large Language Models?

• Training on massive amount of data
• Billions of parameters
Large Language Models (cont’d.)
LLM vs Earlier NLP (or simple LM) Models
• NLP/LM:
➢ very specific tasks (e.g., translation, sentiment analysis)
➢ Not able to write an email from given instructions

• LLM:
➢ Can do wide range of NLP tasks
➢ Able to write email for a given set of instructions and more

• Why LLM is so good compared to earlier NLP/LM?

TRANSFORMER ARCHITECTURE
➢ Not all LLMs are transformers
➢ Not all transformers are LLMs
Large Language Models (cont’d.)

• Generative Artificial Intelligence (GenAI): Generate new contents

• LLM typically deals with text, but do they have to be limited to text only?
NO

• GPT 4 is a multimodal model that can process text and images, however referred as LLM due to its primary
fucus and fundamental design being around text-based tasks
• Waymo's multimodal end-to-end model refers to their integrated approach for autonomous driving, where
multiple types of data inputs (camera, radar and lidar) are processed together to make driving decisions.
Use Cases of LLM
o Machine translation: LLMs can be used to translate text from one language to another.

o Content generation: LLMs can generate new text, such as fiction, articles, and even computer
code.

o Sentiment analysis: LLMs can be used to analyze the sentiment of a piece of text, such as
determining whether it is positive, negative, or neutral.

o Text summarization: LLMs can be used to summarize a long piece of text, such as an article or a
document.

o Chatbots and virtual assistants: LLMs can be used to power chatbots and virtual assistants,
such as OpenAI's ChatGPT or Google's Gemini (formerly called Bard).

o Knowledge retrieval: LLMs can be used to retrieve knowledge from vast volumes of text in
specialized areas such as medicine or law.
Stages of Building LLMs Huge computational cost (e.g.: GPT3 training
cost is approximately 4.6 million dollars)
▪ Stage 1: Implementing the LLM
architecture and data preparation
process. This stage involves preparing
and sampling the text data and
understanding the basic mechanisms
behind LLMs.

▪ Stage 2: Pretraining an LLM to create a

foundation model. This stage involves
pretraining the LLM on unlabeled data.
Typically training on a large diverse data
set. (Also known as general data set)

▪ Stage 3: Fine-tuning the foundation

model to become a personal assistant Why is fine tuning important?
or text classifier. This stage involves fine • Train your specific data set
tuning the pretrained LLM on labeled data, • Customize for your application of
which can be either an instruction organization (e.g. health care, airline, law
dataset or a dataset with class labels. firm, educational institute etc.,)
Simplified Transformer Architecture
• An encoder that processes the input text
and produces an embedding representation
(a numerical representation that captures
many different factors in different
dimensions) of the text

• Encodes input text into vectors

• Decoder can use to generate the translated

text one word at a time.

• Generate output text from encoded

vectors

Self-attention mechanism:
• Key part of transformers that allows to weigh importance of different words/tokens relative to
each other.
• Enables model to capture long range dependencies
Transformer Architecture

Attention Is All You Need

https://arxiv.org/pdf/1706.03762
BERT Vs GPT Architecture

• Bidirectional encode representations from

transformers (BERT): the encoder segment
exemplifies BERT-like LLMs, which focus on
masked word prediction and are primarily
used for tasks like text classification

• Predict hidden words in a given

sentence

• Generate pre-trained transformers (GPT):

the decoder segment showcases GPT-like
LLMs, designed for generative tasks and
producing coherent text sequences

• Generate new words

GPT Architecture
• The GPT architecture employs only the
decoder portion of the original transformer.

• It is designed for unidirectional, left-to-right

processing, making it well suited for text
generation and next-word prediction tasks.

• Generate text in an iterative fashion, one

word at a time.
GPT Architecture (cont’d.)
Working with text
Embedding
Vector Embedding

Words corresponding to similar concepts

often appear close to each other in the
embedding space. For instance, different
types of birds appear closer to each
other in the embedding space than in
countries and cities.
Tokenizing Texts

Here, we split an input text into

individual tokens, which are either
words or special characters, such as
punctuation characters.
Converting Tokens into Token IDs

We build a vocabulary by tokenizing the

entire text in a training dataset into
individual tokens. These individual
tokens are then sorted alphabetically,
and duplicate tokens are removed. The
unique tokens are then aggregated into
a vocabulary that defines a mapping
from each unique token to a unique
integer value. The depicted vocabulary is
purposefully small and contains no
punctuation or special characters for
simplicity.
Converting Tokens into Token IDs (cont’d.)
Starting with a new text sample, we tokenize the
text and use the vocabulary to convert the text
tokens into token IDs. The vocabulary is built from
the entire training set and can be applied to the
training set itself and any new text samples. The
depicted vocabulary contains no punctuation or
special characters for simplicity.
Adding special context tokens

We add special tokens to a vocabulary to deal with

certain contexts. For instance, we add
an <|unk|> token to represent new and unknown
words that were not part of the training data and
thus not part of the existing vocabulary.
Furthermore, we add an <|endoftext|> token that
we can use to separate two unrelated text sources.
Byte Pair Encoding (BPE)
The BPE tokenizer was used to train LLMs such as GPT-2, GPT-3, and the original model used in ChatGPT.

BPE tokenizers break down unknown words into

subwords and individual characters. This way, a BPE
tokenizer can parse any word and doesn’t need to
replace unknown words with special tokens, such
as <|unk|>.

Whitepaper - Foundational Large Language Models & Text Generation
100% (2)
Whitepaper - Foundational Large Language Models & Text Generation
75 pages
Quick Start Guide To LLMs by Sinan Ozdemir 1703540700
100% (3)
Quick Start Guide To LLMs by Sinan Ozdemir 1703540700
275 pages
(EARLY RELEASE) Quick Start Guide To Large Language Models Strategies and Best Practices For Using ChatGPT and Other LLMs (Sinan Ozdemir) (Z-Library)
100% (14)
(EARLY RELEASE) Quick Start Guide To Large Language Models Strategies and Best Practices For Using ChatGPT and Other LLMs (Sinan Ozdemir) (Z-Library)
132 pages
Generative AI For Dummies
67% (3)
Generative AI For Dummies
6 pages
Sinan Ozdemir - Quick Start Guide To Large Language Models, Second Edition-Addison-Wesley (2024)
No ratings yet
Sinan Ozdemir - Quick Start Guide To Large Language Models, Second Edition-Addison-Wesley (2024)
279 pages
Building LLMs - Stanford
No ratings yet
Building LLMs - Stanford
78 pages
Sinan Ozdemir - Quick Start Guide To Large Language Models - Strategies and Best Practices For Using ChatGPT and Other LLMs-Addison-Wesley Professional (2023)
100% (5)
Sinan Ozdemir - Quick Start Guide To Large Language Models - Strategies and Best Practices For Using ChatGPT and Other LLMs-Addison-Wesley Professional (2023)
326 pages
Vocabulary LP Real
100% (3)
Vocabulary LP Real
15 pages
Generative AI With Large Language Models
100% (4)
Generative AI With Large Language Models
31 pages
DLL ENGLISH Jan 9-13
100% (1)
DLL ENGLISH Jan 9-13
5 pages
AI Tools
No ratings yet
AI Tools
19 pages
Chapter 1
No ratings yet
Chapter 1
29 pages
Chapter 1
No ratings yet
Chapter 1
29 pages
To Create A LLM
No ratings yet
To Create A LLM
53 pages
Whitepaper - Foundational Large Language Models & Text Generation - v2
100% (1)
Whitepaper - Foundational Large Language Models & Text Generation - v2
86 pages
Path To The LLM & Generative AI
No ratings yet
Path To The LLM & Generative AI
12 pages
The Best LLMs Cheatsheet - Part 1
No ratings yet
The Best LLMs Cheatsheet - Part 1
16 pages
The Diverse Landscape of Large Language Models Deepsense Ai
No ratings yet
The Diverse Landscape of Large Language Models Deepsense Ai
16 pages
LLMs and Future Directions in AI
No ratings yet
LLMs and Future Directions in AI
8 pages
Week4 LLMs EN
No ratings yet
Week4 LLMs EN
48 pages
LLMS&EMBEDDINGS
No ratings yet
LLMS&EMBEDDINGS
10 pages
Slides
No ratings yet
Slides
137 pages
LLMs
No ratings yet
LLMs
40 pages
Presentation 11
No ratings yet
Presentation 11
20 pages
Generative AI With LArge Language Models
No ratings yet
Generative AI With LArge Language Models
36 pages
03 NLP Document
No ratings yet
03 NLP Document
38 pages
LLM 1
No ratings yet
LLM 1
6 pages
Definition:: Large Language Models (LLMS)
No ratings yet
Definition:: Large Language Models (LLMS)
41 pages
Day 1
No ratings yet
Day 1
32 pages
LLM - A Introduction To Generative AI
100% (1)
LLM - A Introduction To Generative AI
31 pages
Large Language Models
No ratings yet
Large Language Models
10 pages
LLM and Gen AI
No ratings yet
LLM and Gen AI
4 pages
4-HC24.PrimisAI - Hans Bouwmeester.v4
No ratings yet
4-HC24.PrimisAI - Hans Bouwmeester.v4
29 pages
Generative AI For Everyone: Doç. Dr. Murat Mühendislik Fakültesi, Bilgisayar, Gazi Üniversitesi, E-Mail: My Gazi - Edu.tr
No ratings yet
Generative AI For Everyone: Doç. Dr. Murat Mühendislik Fakültesi, Bilgisayar, Gazi Üniversitesi, E-Mail: My Gazi - Edu.tr
44 pages
Generative AI Exists Because of The Transformer
No ratings yet
Generative AI Exists Because of The Transformer
52 pages
Module1 L4 LLMs New
No ratings yet
Module1 L4 LLMs New
37 pages
BTech Advanced AI Unit03
No ratings yet
BTech Advanced AI Unit03
109 pages
Intro To LLMs
No ratings yet
Intro To LLMs
32 pages
Dokumen - Pub Quick Start Guide To Large Language Models Strategies and Best Practices For Using Chatgpt and Other Llms 9780138199425
No ratings yet
Dokumen - Pub Quick Start Guide To Large Language Models Strategies and Best Practices For Using Chatgpt and Other Llms 9780138199425
325 pages
Unit 4 LLM
No ratings yet
Unit 4 LLM
11 pages
Mod 4
No ratings yet
Mod 4
69 pages
D 02 Large Language Models
100% (1)
D 02 Large Language Models
58 pages
Introduction To Large Language Models
No ratings yet
Introduction To Large Language Models
3 pages
NLP & LLM
No ratings yet
NLP & LLM
4 pages
Day 5
No ratings yet
Day 5
48 pages
GenAI Syllabus
No ratings yet
GenAI Syllabus
17 pages
Large Language Models (LLM)
100% (1)
Large Language Models (LLM)
139 pages
Transformer Basics
No ratings yet
Transformer Basics
17 pages
Robotics - PPT For Ros Etc Students Good
No ratings yet
Robotics - PPT For Ros Etc Students Good
15 pages
Exploring The Evolution of Large Language Models: Architectures, Applications, and Future Directions
No ratings yet
Exploring The Evolution of Large Language Models: Architectures, Applications, and Future Directions
11 pages
Genaitoolboxltslides 1736779963542
No ratings yet
Genaitoolboxltslides 1736779963542
38 pages
Generative AI Unit 3 Notes
No ratings yet
Generative AI Unit 3 Notes
8 pages
NLP Transformer Class Notes
No ratings yet
NLP Transformer Class Notes
3 pages
LLM Cheatsheet
No ratings yet
LLM Cheatsheet
1 page
Llm
No ratings yet
Llm
22 pages
Report 1 Transformers
No ratings yet
Report 1 Transformers
7 pages
Sinan Ozdemir Quick Start Guide To Large Language Models Strategies
No ratings yet
Sinan Ozdemir Quick Start Guide To Large Language Models Strategies
285 pages
w1largelanguagemodelsandchatgptin3weeks11748368383984
No ratings yet
w1largelanguagemodelsandchatgptin3weeks11748368383984
134 pages
LLM Prompting & In-Context Learning
No ratings yet
LLM Prompting & In-Context Learning
18 pages
C++ VS JAVA A PERFORMANCE DEEPDIVE: Unraveling the Performance Characteristics of C++ and Java for High-Performance Computing
From Everand
C++ VS JAVA A PERFORMANCE DEEPDIVE: Unraveling the Performance Characteristics of C++ and Java for High-Performance Computing
Manoj R Chakravarthi
No ratings yet
Code Beneath the Surface: Mastering Assembly Programming
From Everand
Code Beneath the Surface: Mastering Assembly Programming
Kameron Hussain
No ratings yet
Basic Information About C language PDF
From Everand
Basic Information About C language PDF
Suraj Das
No ratings yet
Icd 09
No ratings yet
Icd 09
9 pages
DAB304 Lab06 S2025-001
No ratings yet
DAB304 Lab06 S2025-001
1 page
Developer - Vapasi - Thoughtworks
No ratings yet
Developer - Vapasi - Thoughtworks
2 pages
2HCA-Health Analyst Toolkit
No ratings yet
2HCA-Health Analyst Toolkit
17 pages
Project - Management - PPT Final
No ratings yet
Project - Management - PPT Final
18 pages
Lecture Week 8 Part 2
No ratings yet
Lecture Week 8 Part 2
9 pages
Exam 2 Details S1
No ratings yet
Exam 2 Details S1
1 page
Week 7
No ratings yet
Week 7
24 pages
Basic - Listening - RPS PBI 161 Basic Listening S. Ganjil 2018
No ratings yet
Basic - Listening - RPS PBI 161 Basic Listening S. Ganjil 2018
9 pages
Psychological Basis For Differentiated Assessment For Junior High
No ratings yet
Psychological Basis For Differentiated Assessment For Junior High
228 pages
Factor Causing Habitual Viewing of Pornographic Material
No ratings yet
Factor Causing Habitual Viewing of Pornographic Material
64 pages
Trixcy TM Portfolio New 1
No ratings yet
Trixcy TM Portfolio New 1
125 pages
Learners' Cultural Practices and Its Relation To Academic Performance in Understanding Culture, Society and Politics
No ratings yet
Learners' Cultural Practices and Its Relation To Academic Performance in Understanding Culture, Society and Politics
9 pages
Research Proposal: NSU Students in Part-Time Jobs
No ratings yet
Research Proposal: NSU Students in Part-Time Jobs
8 pages
Textbook
No ratings yet
Textbook
2 pages
Introduction To Machine Learning PART 1
No ratings yet
Introduction To Machine Learning PART 1
6 pages
English S.B.A.
No ratings yet
English S.B.A.
15 pages
03 Digital Technology and Media Literacy
100% (1)
03 Digital Technology and Media Literacy
35 pages
Reading Notes
No ratings yet
Reading Notes
4 pages
Earth and Life Science Lesson 1
No ratings yet
Earth and Life Science Lesson 1
5 pages
The Importance of The Teaching Method-Theory and Its Application
No ratings yet
The Importance of The Teaching Method-Theory and Its Application
4 pages
Unpacking Diagram
No ratings yet
Unpacking Diagram
1 page
LP-1Brief History of Computers
No ratings yet
LP-1Brief History of Computers
4 pages
Eshel2019 Rehabilitation of Cognitive Disfunction xTBI
No ratings yet
Eshel2019 Rehabilitation of Cognitive Disfunction xTBI
18 pages
DLL Science 10 Quarter 1 Week 10
No ratings yet
DLL Science 10 Quarter 1 Week 10
4 pages
Slac Training Proposal
No ratings yet
Slac Training Proposal
3 pages
Different Types of Education
No ratings yet
Different Types of Education
9 pages
Abejero - LegalRes - Finals - How To Study by Ron Fry
No ratings yet
Abejero - LegalRes - Finals - How To Study by Ron Fry
5 pages
Instructional Objectives Notes
67% (3)
Instructional Objectives Notes
6 pages
Termdates2008 09 Version3
No ratings yet
Termdates2008 09 Version3
2 pages
National Conference Hybrid
No ratings yet
National Conference Hybrid
5 pages
IT Support Interview Cheat Sheet
100% (1)
IT Support Interview Cheat Sheet
3 pages
Applicaion For Assistance Engineer
No ratings yet
Applicaion For Assistance Engineer
3 pages
DLL - Demo
No ratings yet
DLL - Demo
3 pages
Skim E
No ratings yet
Skim E
11 pages
Work Immersion Portfolio: (Based On Deped Order No. 30 S. 2017)
No ratings yet
Work Immersion Portfolio: (Based On Deped Order No. 30 S. 2017)
54 pages