0% found this document useful (0 votes)

15 views26 pages

Unit 6

Transfer learning is a machine learning technique that reuses pre-trained models for new tasks, enabling efficient training with less data. Key concepts include feature extraction, fine-tuning, and domain adaptation, with applications in computer vision and natural language processing. The Reformer model addresses the limitations of traditional transformers by using Locality-Sensitive Hashing for efficient attention and reversible layers for memory efficiency.

Uploaded by

Anshaj Shukla

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

15 views26 pages

Unit 6

Uploaded by

Anshaj Shukla

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 26

BUILDING MODELS/ CASE STUDIES

TRANSFER LEARNING
 Transfer learning is a machine learning (ML)
technique where an already developed ML
model is reused in another task.
 Transfer learning is a popular approach

in deep learning, as it enables the training of

deep neural networks with less data.
TRANSFER LEARNING
 Key Concepts of Transfer Learning
 Pre-trained Models: These are models that

have already been trained on large datasets

(like ImageNet for images). Instead of starting
from zero, you can use these models to help
with your specific task.
 Feature Extraction: The earlier layers of a

neural network usually learn general features,

such as shapes and colors. The later layers
learn more specific features for the original
task. In transfer learning, you can keep the
early layers fixed (not changing them) and
only train the later layers on your new task.
TRANSFER LEARNING
 Fine-tuning: This means making small
adjustments to the model's weights by
training it a bit on your new dataset. This
helps the model learn the specifics of your
new task without losing what it already
knows.
 Domain Adaptation: Sometimes, the
original task and your task are quite
different. Domain adaptation helps adjust the
model so it can work better on the new task.
TRANSFER LEARNING
APPLICATIONS OF TRANSFER
LEARNING

 Computer Vision: Using models trained on

large datasets to do specific tasks, like
recognizing medical images.
 Natural Language Processing: Using pre-

trained language models (like BERT or GPT)

to perform tasks such as understanding
sentiments or identifying names in text.
ADVANTAGES OF TRANSFER
LEARNING
 Reduced Training Time: Since the model
already knows a lot, it requires less time to
train on the new task.
 Better Performance: Especially when you

have limited data, transfer learning can give

better results compared to training a new
model from scratch.
 Lower Resource Requirements: It usually

needs less computing power and memory.

CHALLENGES
 Negative Transfer: If the original task is
very different from the new task, the model
might perform worse than if it had been
trained from scratch.
 Domain Shift: If the pre-trained model was

trained in a different environment, it might

not work well on your new data.
LINK TO CODE
 https://colab.research.google.com/drive/
1_UZ2xtL6Ejvqj6CjLXc5xAWvLwCD-ese?
usp=sharing
BERT
BERT (Bidirectional Encoder Representations
from Transformers) is a powerful language
model developed by Google in 2018.
 It’s designed to understand language in a

better way than earlier models, making it

highly effective for natural language
processing (NLP) tasks.
BERT
 Bidirectional Context: Unlike traditional
models that process text left-to-right (or
right-to-left).
 BERT reads text bidirectionaLly, capturing

the full context of a word by looking at both

its previous and next words.
 This leads to better comprehension of
nuanced language.
BERT
 Transformer Architecture: BERT is based
on the Transformer,
 a deep learning architecture that uses

attention mechanisms to weigh the

importance of different words in a sentence.
 This allows it to handle long-range

dependencies and relationships between

words.
BERT
 Pre-trained on Large Corpora: BERT is
pre-trained on large text corpora (like
Wikipedia and BookCorpus)
 using two tasks: masked language modeling

(MLM) and next sentence prediction (NSP).

 This helps it generalize well across different

NLP tasks.
ARCHITECTURE: TRANSFORMER-
BASED

 Capture Long-Range Dependencies: Self-attention

enables BERT to weigh relationships between all words in a
sentence simultaneously, rather than sequentially, as in
recurrent neural networks (RNNs).
 Parallelize Processing: Unlike RNNs, Transformers process
tokens in parallel, allowing for much faster training.
 The Transformer consists of:
 Self-Attention Layers: Compute how much attention each
word should pay to every other word in the sequence.
 Feedforward Neural Networks: After the self-attention
layer, these apply transformations to enrich token
embeddings.
 Positional Encoding: Since BERT doesn’t have any
inherent word order, it adds positional embeddings to
encode the order of words in a sentence.
BIDIRECTIONAL TRAINING
 Understand Polysemy: By considering the
surrounding words, BERT can interpret words
with multiple meanings correctly based on
context (e.g., "bank" in "river bank" vs.
"savings bank").
 Improve Contextual Understanding:
Bidirectional training enables BERT to
understand more intricate dependencies,
such as handling complex negations or
sarcasm.
COMMON APPLICATIONS
 Text Classification: Assigning categories to
text (e.g., spam detection, sentiment
analysis).
 Question Answering: Extracting answers

from text for given questions.

 Named Entity Recognition (NER):
Identifying names, dates, locations, etc.,
within text.
 Sentence Similarity: Evaluating how
similar two sentences are, which is helpful in
paraphrasing and semantic search.
REFORMER MODEL
 Traditional transformer models (like BERT and GPT)
revolutionized natural language processing (NLP) by
introducing mechanisms that allowed them to
understand context and relationships in text.
However, these models have limitations:
 Complexity: The standard self-attention
mechanism computes attention scores between
every pair of tokens, leading to quadratic complexity
(O(n²)), which makes it computationally expensive
and memory-intensive for long sequences.
 Memory Constraints: As the length of input
sequences increases, the memory required to store
intermediate activations also increases, limiting the
length of sequences that can be processed.
THE REFORMER ARCHITECTURE
 The Reformer model addresses these
limitations through: innovations:
 Locality-Sensitive Hashing (LSH)

 Locality-Sensitive Hashing (LSH) is used

in the attention mechanism to reduce the

number of tokens each token attends to.
Instead of computing attention scores for all
pairs of tokens, LSH groups tokens based on
similarity, allowing each token to attend only
to its closest neighbors.
THE REFORMER ARCHITECTURE
 Example:
 Suppose you have a sequence of 8 tokens: ["I",
"love", "to", "learn", "about", "deep", "learning",
"today"].
 In traditional self-attention, every token would
compute attention scores with every other token
(8x8 matrix).
 With LSH, you might hash these tokens into
buckets based on their similarity (e.g., based on
their embeddings), allowing each token to only
attend to other tokens within the same bucket.
 This drastically reduces the computational
complexity from O(n²) to O(n log n), making it
feasible to process longer sequences.
THE REFORMER ARCHITECTURE
 Reversible Layers
 Reformer's architecture includes reversible

layers, which allow for memory-efficient

training.
 Instead of storing all intermediate activations

for each layer, reversible networks compute

the output of each layer based on the output
of the previous layer and can reconstruct
previous activations.
THE REFORMER ARCHITECTURE
THE REFORMER ARCHITECTURE
THE REFORMER ARCHITECTURE
 Example:
 In a traditional feed-forward neural network,

each layer’s output is saved for backpropagation.

 If you have 12 layers, you need to store 12

outputs, consuming significant memory.

 In a reversible layer, when you compute the

output of layer n, you can discard the output of

layer n−1 .
 since it can be computed again during
backpropagation.
 This means you only need to store the input to

the first layer and the output of the last

layer, drastically reducing memory usage.
THE REFORMER ARCHITECTURE
 Attention Mechanism
 The Reformer model employs a modified self-

attention mechanism where the attention

scores are computed only for the tokens in
the same bucket as determined by LSH.
 The efficient attention mechanism in

Reformer uses:
 Chunking(Buckets): The input sequence is

divided into smaller chunks, and attention is

computed within these chunks.

Lesson 14 - Transformer
No ratings yet
Lesson 14 - Transformer
124 pages
Nicole Koenigstein - Transformers in Action (MEAP v7) 2024 (2024, Manning Publications Co.) - Libgen - Li
No ratings yet
Nicole Koenigstein - Transformers in Action (MEAP v7) 2024 (2024, Manning Publications Co.) - Libgen - Li
272 pages
LLM .Foundation - Models.from - The.ground - Up
No ratings yet
LLM .Foundation - Models.from - The.ground - Up
195 pages
Week 12
100% (1)
Week 12
64 pages
Generative AI Interview Questions and Answers
No ratings yet
Generative AI Interview Questions and Answers
7 pages
Generative AI With LArge Language Models
No ratings yet
Generative AI With LArge Language Models
36 pages
21CSE356T-NLP-Unit 4.2
No ratings yet
21CSE356T-NLP-Unit 4.2
31 pages
Transformers MUIA
No ratings yet
Transformers MUIA
34 pages
The Transformer - The Engine Behind Large Language
No ratings yet
The Transformer - The Engine Behind Large Language
3 pages
Transformers in Machine Learning - GeeksforGeeks
No ratings yet
Transformers in Machine Learning - GeeksforGeeks
8 pages
Transformer Models - BERT, GPT, and Beyond
No ratings yet
Transformer Models - BERT, GPT, and Beyond
10 pages
Realformer: Transformer Likes Residual Attention: Ruining He, Anirudh Ravula, Bhargav Kanagal, Joshua Ainslie
No ratings yet
Realformer: Transformer Likes Residual Attention: Ruining He, Anirudh Ravula, Bhargav Kanagal, Joshua Ainslie
15 pages
LectureLtR-neural IR 2
No ratings yet
LectureLtR-neural IR 2
52 pages
1102AITA04 AI For Text Analytics
No ratings yet
1102AITA04 AI For Text Analytics
88 pages
Cluster1 Core ML NLP Techniques Summary
No ratings yet
Cluster1 Core ML NLP Techniques Summary
8 pages
Transformer Networks
No ratings yet
Transformer Networks
53 pages
A M3 RD Ipjn Yd Ps GKF
No ratings yet
A M3 RD Ipjn Yd Ps GKF
20 pages
Java Programming 8th Edition by Joyce Farrell
No ratings yet
Java Programming 8th Edition by Joyce Farrell
308 pages
Transformers
No ratings yet
Transformers
10 pages
RNN StannfordBased
No ratings yet
RNN StannfordBased
102 pages
Transformers in NLP 1
No ratings yet
Transformers in NLP 1
9 pages
Navigating The Landscape of Large Language Models: A Comprehensive Review and Analysis of Paradigms and Fine-Tuning Strategies
No ratings yet
Navigating The Landscape of Large Language Models: A Comprehensive Review and Analysis of Paradigms and Fine-Tuning Strategies
45 pages
Deep Learning Basics
No ratings yet
Deep Learning Basics
10 pages
14.chapter10 AdvancedDeepLearningForText
No ratings yet
14.chapter10 AdvancedDeepLearningForText
22 pages
RADL TTho
No ratings yet
RADL TTho
64 pages
Transformer Design Report
No ratings yet
Transformer Design Report
21 pages
Unit - 3
No ratings yet
Unit - 3
55 pages
AI Assignment
No ratings yet
AI Assignment
8 pages
Transformers: Attention Is All You Need
No ratings yet
Transformers: Attention Is All You Need
54 pages
The NLP Cookbook Modern Recipes For Transformer Ba
No ratings yet
The NLP Cookbook Modern Recipes For Transformer Ba
29 pages
Transformers in Machine Learning
No ratings yet
Transformers in Machine Learning
16 pages
Bert
No ratings yet
Bert
60 pages
Accelerating Training of Transformer Based Language Models With Progressive Layer Dropping
No ratings yet
Accelerating Training of Transformer Based Language Models With Progressive Layer Dropping
16 pages
Generative AI
No ratings yet
Generative AI
54 pages
Transformers in Machine Learning - GeeksforGeeks
No ratings yet
Transformers in Machine Learning - GeeksforGeeks
9 pages
Analysis of The Evolution of Advanced Transformer-Based Language Models: Experiments On Opinion Mining
No ratings yet
Analysis of The Evolution of Advanced Transformer-Based Language Models: Experiments On Opinion Mining
16 pages
Week4 LLMs EN
No ratings yet
Week4 LLMs EN
48 pages
NLP DL Lecture4
No ratings yet
NLP DL Lecture4
78 pages
JioDiscover-What Is The Neural Networ
No ratings yet
JioDiscover-What Is The Neural Networ
5 pages
Presentation 11
No ratings yet
Presentation 11
20 pages
CH 5
No ratings yet
CH 5
16 pages
Paper Review
No ratings yet
Paper Review
6 pages
Large Language Models For Information Management - 01 - Modulo Base (MB) - 4pdf
No ratings yet
Large Language Models For Information Management - 01 - Modulo Base (MB) - 4pdf
68 pages
Complete NLP Guide - From Fundamentals To Deep Learning With TensorFlow
No ratings yet
Complete NLP Guide - From Fundamentals To Deep Learning With TensorFlow
13 pages
Definition:: Large Language Models (LLMS)
No ratings yet
Definition:: Large Language Models (LLMS)
41 pages
To Create A LLM
No ratings yet
To Create A LLM
53 pages
Thuyết Trình TWP
No ratings yet
Thuyết Trình TWP
7 pages
AI-Driven Natural Language Processing Using Transformer Models
No ratings yet
AI-Driven Natural Language Processing Using Transformer Models
3 pages
Transformers
No ratings yet
Transformers
27 pages
Tranformrerz
No ratings yet
Tranformrerz
62 pages
Problem Statement:: Rule-Based Machine Translation (RBMT), Statistical Machine Translation (SMT), Neural
No ratings yet
Problem Statement:: Rule-Based Machine Translation (RBMT), Statistical Machine Translation (SMT), Neural
4 pages
Advanced Deep Learning and Transformers - Cirrincione
No ratings yet
Advanced Deep Learning and Transformers - Cirrincione
3 pages
Chapter 12
No ratings yet
Chapter 12
16 pages
ML Algorithms
No ratings yet
ML Algorithms
5 pages
Attention Book Sample
No ratings yet
Attention Book Sample
32 pages
Documentation. HiPath 3000 - 5000 HiPath 3000 Manager C. Communication For The Open Minded. Administrator Documentation A31003-H3580-M101!7!76A9
No ratings yet
Documentation. HiPath 3000 - 5000 HiPath 3000 Manager C. Communication For The Open Minded. Administrator Documentation A31003-H3580-M101!7!76A9
283 pages
Natural Language Processing With Deep Learning CS224N/Ling284
No ratings yet
Natural Language Processing With Deep Learning CS224N/Ling284
62 pages
R: T E T: Eformer HE Fficient Ransformer
No ratings yet
R: T E T: Eformer HE Fficient Ransformer
12 pages
Unit 4 LLM
No ratings yet
Unit 4 LLM
11 pages
Transformer
No ratings yet
Transformer
5 pages
ISO 19650 Workflow With Free ISO 19650 Templates
No ratings yet
ISO 19650 Workflow With Free ISO 19650 Templates
1 page
The Diverse Landscape of Large Language Models Deepsense Ai
No ratings yet
The Diverse Landscape of Large Language Models Deepsense Ai
16 pages
Think Project User Manual
No ratings yet
Think Project User Manual
50 pages
Orcad电路设计与实践电子工业出版社华春梅 (等) 编著 12286272
No ratings yet
Orcad电路设计与实践电子工业出版社华春梅 (等) 编著 12286272
229 pages
Mad Manual Shruti Good Topics
No ratings yet
Mad Manual Shruti Good Topics
56 pages
Introduction To Python PDF
No ratings yet
Introduction To Python PDF
7 pages
Synopsis Format
No ratings yet
Synopsis Format
6 pages
Project Report: Monitoring of Industrial Faults by Using RF Signal
No ratings yet
Project Report: Monitoring of Industrial Faults by Using RF Signal
20 pages
Petco Current Architecture
No ratings yet
Petco Current Architecture
1 page
Oxe12.1 SD CommonHwBoards 8AL91022USAF 1 en
No ratings yet
Oxe12.1 SD CommonHwBoards 8AL91022USAF 1 en
44 pages
Effective Group Discussions
No ratings yet
Effective Group Discussions
11 pages
Fuzzy Logic and Ant Colony Optimization For Resource Allocation in Cloud Computing
No ratings yet
Fuzzy Logic and Ant Colony Optimization For Resource Allocation in Cloud Computing
8 pages
Lovejeet Ar Worksheet 10
No ratings yet
Lovejeet Ar Worksheet 10
2 pages
Java Lab Assignment
No ratings yet
Java Lab Assignment
10 pages
E-TECH - 3
No ratings yet
E-TECH - 3
9 pages
Robotics Keerthana
No ratings yet
Robotics Keerthana
24 pages
Volte Call Flow
No ratings yet
Volte Call Flow
10 pages
CBN Manual 1936
No ratings yet
CBN Manual 1936
70 pages
Sophos UTM On AWS: Quick Start Guide
No ratings yet
Sophos UTM On AWS: Quick Start Guide
40 pages
SAP SIGN FILE Process New
No ratings yet
SAP SIGN FILE Process New
4 pages
HALOALKANE
No ratings yet
HALOALKANE
16 pages
Anviz Lesson7 New Software Crosschex
No ratings yet
Anviz Lesson7 New Software Crosschex
21 pages
Oops Interview Questions
No ratings yet
Oops Interview Questions
10 pages
Automation Technology AppNote Matlab Image Acquisition Toolbox
No ratings yet
Automation Technology AppNote Matlab Image Acquisition Toolbox
15 pages
415 Media SQP T1
No ratings yet
415 Media SQP T1
8 pages
Key Midterm Exam v2
No ratings yet
Key Midterm Exam v2
7 pages
Dsls 17022013 SSQ Setup
No ratings yet
Dsls 17022013 SSQ Setup
11 pages
Meraki Wi-Fi6 Indoor and Outdoor AP V2
No ratings yet
Meraki Wi-Fi6 Indoor and Outdoor AP V2
2 pages
Snail Mail Vs E-Mail (9th)
No ratings yet
Snail Mail Vs E-Mail (9th)
2 pages
JEE (Main) AcknowledgementPage
No ratings yet
JEE (Main) AcknowledgementPage
1 page
F235 Plus Pakon Digital High Speed Film Scanners
No ratings yet
F235 Plus Pakon Digital High Speed Film Scanners
3 pages
Print - Udyam Registration Certificate SK
No ratings yet
Print - Udyam Registration Certificate SK
1 page
Manually Process An EWA Report
No ratings yet
Manually Process An EWA Report
4 pages
The Newbie’s Guidebook to ChatGPT: A Beginner's Tutorial: The Newbie’s Guidebook
From Everand
The Newbie’s Guidebook to ChatGPT: A Beginner's Tutorial: The Newbie’s Guidebook
Timothy King
No ratings yet