0% found this document useful (0 votes)

31 views62 pages

Lec6 RNN Attention Search

Uploaded by

kaydee140492

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

31 views62 pages

Lec6 RNN Attention Search

Uploaded by

kaydee140492

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 62

FALL 2023

INTRODUCTION
TO ARTIFICIAL
INTELLIGENCE

Tianyi Zhou

09/18/2023
University of Maryland
Some slides are adapted from Song & Abbeel &
Russell @Berkeley, Fe—Fei @ Stanford 1
Seven Components of this course
Action,
Acting Prediction
Probabilistic
Reasoning
Human
users Agent
Language
Models Embodied &
Interface Multi-modal AI World
Neural
Networks

Search & Perception

Planning
Reward, Data,
Observation
Seven Components of this course
Action,
Acting Prediction
Probabilistic
Reasoning
Human
users Agent
Language
Models Embodied &
Interface Multi-modal AI World
Neural
Networks

Search & Perception

Planning
Reward, Data,
Observation
Neural networks
• Perceptron and MLP
• Optimization and Backpropagation
• Convolutional Neural Networks
• Recurrent Neural Networks and LSTM
• Attention and Transformer

4
Plan
• Today
– CNN, RNN, Attention, Uninformed Search
– Reading:
• Russell and Norvig (4th edition): Ch 1-2, 3.1-3.4
• Goodfellow, Bengio, & Courville: Ch 9-10

• Coding:
– Numpy:
https://piazza.com/class/llup7xc8lm44w4/post/34
– PyTorch:
https://piazza.com/class/llup7xc8lm44w4/post/21
• Next lecture
– Uninformed Search
– Informed Search & A*
Project topics
• Vision-Language-Action (VLA) models
• Search and planning for Curriculum learning
• Mini-batch In-Context Learning on LLMs
• Mixture of Adapters for Personalized LLMs and VLMs
• Safety Verification and Validation of RL agent
• LLM agent controlling AI generation of images or videos

• Multi-Context Multi-Modal Instruction Following (Imitation Learning) on NatSGD Dataset

• Human-in-the-loop Reinforcement Learning from Demonstrations
• Watch, Learn, and Adapt: Bootstrapping Model-based Reinforcement Learning with Complex Objects from
Online Videos
• Enhancing Goal-Conditioned Reinforcement Learning with 3D Point Cloud Goals (3DGcRL)
• Large Vision-Language Models for Self-Explanation-guided Robot Learning
CNN: Translation invariance
& Weight sharing via kernels
• How to detect/track a moving dog in the scene?
• Convolution computes the weighted sum
between a k✕k kernel and each k✕k patch in
the image, followed by a nonlinear activation.
• It outputs a featuremap of m✕m patches.
Input image (5x5 Kernel / Output
• CNN as an MLP: if each pixel is an input, the
zero padded) Filter (3x3) featuremap (3x3)
output can be written as
• Weight sharing: apply the same weights (kernel
or filter) to different patches of the input image.
• Weight sharing saves parameters! You only
need 3 ✕3 parameters (agnostic to input size).
for each channel in a hidden layer.
CNN: Padding
• How to control the output featuremap size (steps of convolution)?
• Zero padding: add all-zero columns and rows to boarders of the input image.
• Conv features focusing on the edge: edges usually contain important information.
• Large padding value leads to a large output featuremap.
Valid Same Full
CNN: Stride
• Does the convolutional filter have to
move 1 pixel at a time?
• We call the number of steps “stride”.
• Large stride creates smaller featuremap.
• Use stride together with padding.
CNN: Multi-channel in & Out
• How to apply CNN to RGB (or HSV) images?
– Nin input channels RGB: three channels
– Kernel size = k ✕ k ✕ Nin

• How to produce different features?

– Nout output channels
– Nout kernels, each producing an output featuremap

• In general:
– Input: n ✕ H ✕ W ✕ Nin
– Weights: Nout ✕ k ✕ k ✕ Nin
– Output: n ✕ H’ ✕ W’ ✕ Nout
"#$%&' )#$%&'
– 𝐻! = + 1, 𝑊 ! = +1
( (
Depth-wise convolution (left) &
depth-wise separable convolution (right)
• Can we reduce the computation and number of parameters for multi-channel input and output?

https://eli.thegreenplace.net/2018/depthwise-separable-convolutions-for-machine-learning/
CNN: Receptive field
• Each pixel in the output featuremap of layer-l is
produced from a region of the input image (and thus
captures the feature of the region).
• The size (height x width) of the region is called
receptive field (RF): k=kernel size, s=stride
• Different RF captures different scales of features.
• Multi-scale features can be helpful to many tasks.

https://www.baeldung.com/cs/cnn-receptive-field-size
https://theaisummer.com/receptive-field/
Atrous (Dilated) convolution
• RF exponentially increases with the #layers and lower-layer features have smaller RF.
• How to increase the RF without increasing the depth or changing the kernel configurations?
1
CNN: pooling max
*+∈$-$ %./01
𝑥*+
4
+
*+∈$-$ %./01
𝑥*+

• How to quickly reduce the

dimensionality of a featuremap?
• Pooling: a summary/compression of
the featuremap.
• Compared to convolution: no
learnable kernel weights; same idea to
compute the output size.
• Max-pooling is not differentiable! How
does backprop go through it?
– Locally linear at the selected maximal value
with slope of 1
– Zero for other neurons (similar to ReLU)
• Advanced: attentional pooling in
Transformer models such as ViT.
CNN: multi-scale features
CNN: multi-scale features

https://distill.pub/2017/feature-visualization/
CNN: flatten layer + MLP
How to compute the final prediction from the CNN featuremap?
CNN: design
Any Questions?

19
Data Augmentation
Data augmentation: a technique used to increase the amount of data by adding slightly
modified copies of already existing data

Flip

Original image Flip horizontally Flip vertically

20
Source: images from the website
Data Augmentation
Gaussian Noise

Original image Gaussian noise Impulse noise

21
Data Augmentation

Crop

Original image Crop right Crop top

22
Data Augmentation
Rotation

Original image 180 degree rotation 90 degree rotation

23
Data Augmentation
Translation just involves moving the image along the X or Y direction

24
Data Augmentation
Cutout

25
Data Augmentation
CutMix

26
Data Augmentation

MixUp

27
ResNet: practical CNN

100 layers 𝐹 𝑥 +𝑥 ⨁

skip
𝐹(𝑥) connections

What is the issue of 100 layers DNN? 𝑥

Gradient vanishing ResNet 28
Transposed Convolution
• Long known as deconvolution (a misleading name)
• Use it when you try to reconstruct or generate images.
• Building block of U-Net and diffusion models.

29
Transposed convolution
3 2 1 1
4 0 ** 1 1

Input Kernel

3 3 2 2
= 3 3 + 2 2 + 4 4 + 0 0
4 4 0 0

3 5 2
= 7 9 2
4 4 0
Transposed convolution (practice)
1 2 1 0
3 0 ** 0 1

Input Kernel

= + + +

=
Transposed convolution (answer)
1 2 1 0
3 0 ** 0 1

Input Kernel

1 0 2 0

= 0 1 + 0 2 + 3 0 + 0 0
0 3 0 0

1 2 0
= 3 1 2
0 3 0
U-Net

33
U-Net

34
Diffusion model
• Forward diffusion (image to
noise): progressively add noise to
an image for T steps.
• Backward diffusion (noise to
image): apply U-Net recursively
for T denoising steps.
• Train U-Net to predict the noise
at each step-t and then remove
the noise Type equation here..
• Diffusion model aims to learn a
generative model capturing the
distribution of images.
Stable diffusion & AIGC
Any Questions?

37
Why Recurrent neural networks?
• How to model time series data (speech,
text, videos, house price, trajectories,
sensor signals), i.e., (𝑥! , 𝑥" , ⋯ , 𝑥# )?
– How to capture the time dependency?
– How to represent a sequence?
• What kind of tasks we expect to address?
– Denoising of sequence: T inputs T outputs
– Generation of sequences: T inputs T’ outputs
– Sequential labeling: T inputs T outputs
– Sequence classification: T inputs 1 output
• Can we apply a neuron to input at each
time step?
• How to process input data with
different lengths using the same DNN?
Sequential generation of non-
sequential data

• Ba, Mnih, and Kavukcuoglu,

“Multiple Object Recognition with
Visual Attention”, ICLR 2015.

• Gregor et al, “DRAW: A Recurrent

Neural Network For Image
Generation”, ICML 2015

• Aäron Van Den Oord, Nal

Kalchbrenner, and Koray
Kavukcuoglu, “ Pixel recurrent
neural networks”, ICML 2016
RNN tasks on sequence data

Image captioning: image Action recognition: video Machine translation: Speech recognition:
to a sequence of words frames to an action class Chinese to English word-level prediction
Sequence to sequence =
many to one + many to many
• Encoder-decoder architecture that is widely used in machine translation
RNN: update Hidden state from input
RNN: generate output from hidden state
Unrolling rnn
Vanilla RNN
Backpropagation through time
• “Many to One” task as an example.
Gradient vanishing & explosion
• Recurrent: send previous step’s output hidden state to next step
input for T time steps.
• RNN: MLP with T layers but they all share the same weight matrix.
• Training CNN can be a problem: backpropagation through time
(BPTT) leads to gradient vanishing.

https://towardsdatascience.com/the-exploding-and-
vanishing-gradients-problem-in-time-series-6b87d558d22
RNNs equipped with gates
LSTM: Long-short term memory GRU: gated recurrent unit

s or c s or c s or c

o
o
h h

x x

https://towardsdatascience.com/illustrated-guide-to-
lstms-and-gru-s-a-step-by-step-explanation-44e9eb85bf21
Lstm: Forget gate
Lstm: Input gate
Lstm: cell state
LSTM: Output gate
Seq2seq task using lstm
From perceptron to attention
• Neurons: weighted sum of inputs +
nonlinear activation

• Perceptron: 𝑓$ 𝑥 = 𝜎 ∑'%&! 𝑤%,$ 𝑥% , nm

parameters of 𝑤%,$ for n inputs and m
outputs.
• RNN: same weights 𝑤 for all layers.
• Convolution: 𝑓$ 𝑥 =
𝜎 ∑!)%,*)+ 𝑤$±%,$±* 𝑥%,* , 𝑘 " parameters.
• Attention (informal): 𝑓$ 𝑥 =
𝜎 ∑- %&! 𝑤%,$ 𝑥% , 𝑤%,$ = 𝐴𝑡𝑡𝑒𝑛𝑡𝑖𝑜𝑛 𝑥% , 𝑥$ ,
attention parameters are independent of the
input and output dimensions.
Self-Attention Layer
• Sequence of n input tokens
• Sequence of n output embedding
• For each input token 𝑥!
– Query: 𝑄! =MLP(𝑥! )
– Key: 𝐾! =MLP(𝑥! )
– Value: 𝑉! =MLP(𝑥! )
– Score:
𝑆!" = 𝑆𝑜𝑓𝑡𝑚𝑎𝑥( 𝐹 𝑄! , 𝐾" )
"#$:&

– F() can be an inner product or MLP, e.g.,

𝑊𝑄! , 𝑊𝐾" or 𝜎(𝑊𝑄! + 𝑊𝐾" )
– Output: 𝑦! = ∑%"#$ 𝑆!" 𝑉"

• Handle variable length input

• Parameters independent of input
length
• 𝑂(𝑛" ) complexity
Transformer
• Encoder-decoder architecture (seq2seq).
• Stack of encoder/decoder blocks
• Encoder outputs are sent to all decoder
blocks.
• Each encoder or decoder: Attention + MLP
• Encoder-decoder attention replaces the K and
V in the self-attention with K and V from
encoder.
• Multi-head attention: concatenate the
outputs of multiple attention layers that share
the same inputs.
• Autoregressive decoding: 𝑃(𝑥;<! |𝑥!:; )

http://jalammar.github.io/illustrated-transformer/
Decoder model
• No encoder, decoder
only.
• Generative model only,
cannot provide sentence
embedding.
• Most SOTA LLMs use
this architecture.
• But discriminative tasks
(e.g., classification) need
further processing.
Vision transformer
• Split the input image into 16x16 patches.
• Each patch of image is treated as a token.
• Apply Transformer encoder to the tokens.
• Attention(task token, image tokens) produces
embedding for the whole image.
• Apply MLP to the image embedding to
produce class probabilities.
• ViT: global convolution with kernel weights
computed by attention for each token.
• All tokens share the same attention
parameters.
• More details in Perception section.
Any Questions?

59
Seven Components of this course
Action,
Acting Prediction
Probabilistic
Reasoning
Human
users Agent
Language
Models Embodied &
Multi-modal AI World
Neural
Networks

Search & Perception

Planning
Reward, Data,
Observation
Search & Planning
• Uninformed Search
• Informed Search and A*
• Constraint Satisfaction
• Adversarial Search and Games
• Planning

9/18/23
61
Optimization vs. search

Mathematical Introduction To Deep Learning: Methods, Implementations, and Theory
No ratings yet
Mathematical Introduction To Deep Learning: Methods, Implementations, and Theory
601 pages
Introduction To Deep Convolutional Neural Networks: March 2016
No ratings yet
Introduction To Deep Convolutional Neural Networks: March 2016
51 pages
Convolutional Neural Networks: CMSC 733 Fall 2015 Angjoo Kanazawa
No ratings yet
Convolutional Neural Networks: CMSC 733 Fall 2015 Angjoo Kanazawa
55 pages
Artificial Intelligence: A.I. Artificial Intelligence by Edson L P Camacho
No ratings yet
Artificial Intelligence: A.I. Artificial Intelligence by Edson L P Camacho
163 pages
12 Neural Network
No ratings yet
12 Neural Network
52 pages
Deep Learning
No ratings yet
Deep Learning
5 pages
A Brief Tutorial On Interval Type-2 Fuzzy Sets and Systems
No ratings yet
A Brief Tutorial On Interval Type-2 Fuzzy Sets and Systems
10 pages
Machine Learning, AI & Its Applications: Live Online Instructor-Led Training On
No ratings yet
Machine Learning, AI & Its Applications: Live Online Instructor-Led Training On
6 pages
Beginners Guide To Anomaly Detection Using Self Organizing Maps
No ratings yet
Beginners Guide To Anomaly Detection Using Self Organizing Maps
10 pages
Machine Learning in Medical Health Care
100% (1)
Machine Learning in Medical Health Care
47 pages
Sugarcane Disease Detection Using Data Mining Techniques
No ratings yet
Sugarcane Disease Detection Using Data Mining Techniques
6 pages
Deep Learning: Seungsang Oh
No ratings yet
Deep Learning: Seungsang Oh
39 pages
SAR AI Paper
No ratings yet
SAR AI Paper
26 pages
CVlecture 5
No ratings yet
CVlecture 5
56 pages
Selvakumar Perumal: Education Skills
No ratings yet
Selvakumar Perumal: Education Skills
1 page
Lecture 17. Convolutional Neural Networks PDF
No ratings yet
Lecture 17. Convolutional Neural Networks PDF
32 pages
Name - Chinmay Soni REG. NO. - 17BCE0578 Digital Assesment - 1
No ratings yet
Name - Chinmay Soni REG. NO. - 17BCE0578 Digital Assesment - 1
3 pages
Intro CNN PDF
No ratings yet
Intro CNN PDF
31 pages
CNN
No ratings yet
CNN
31 pages
Module11 - NNandDeep Learning
No ratings yet
Module11 - NNandDeep Learning
84 pages
cs224n spr2024 Lecture15 Life After Dpo Lambert
No ratings yet
cs224n spr2024 Lecture15 Life After Dpo Lambert
86 pages
JuneJuly - 2019
No ratings yet
JuneJuly - 2019
2 pages
Synopsis Phase 1
No ratings yet
Synopsis Phase 1
14 pages
L09-10 DL and CNN
No ratings yet
L09-10 DL and CNN
56 pages
CNN 2
No ratings yet
CNN 2
47 pages
CNN2
No ratings yet
CNN2
70 pages
Unit-4 AI - SVM
No ratings yet
Unit-4 AI - SVM
21 pages
Supervised ANN
No ratings yet
Supervised ANN
19 pages
Introduction To Deep Learning: TA: Drew Hudson May 8, 2020
No ratings yet
Introduction To Deep Learning: TA: Drew Hudson May 8, 2020
33 pages
AI Weekly Lesson Plan
No ratings yet
AI Weekly Lesson Plan
1 page
Employability Prediction Model Using Academic Performance in Higher Education Through Deep Learning Techniques
No ratings yet
Employability Prediction Model Using Academic Performance in Higher Education Through Deep Learning Techniques
16 pages
BMM 2018 - Deep Learning Tutorial
No ratings yet
BMM 2018 - Deep Learning Tutorial
47 pages
UGRD-AI6100 AI Prompt Engineering Midterm Lab Exam
No ratings yet
UGRD-AI6100 AI Prompt Engineering Midterm Lab Exam
23 pages
UNIT-2 DL
No ratings yet
UNIT-2 DL
51 pages
4b Image Processing
No ratings yet
4b Image Processing
63 pages
Deep Learning Image Classification
No ratings yet
Deep Learning Image Classification
11 pages
ML 2
No ratings yet
ML 2
70 pages
4th Unit Aktu Machine Learning
No ratings yet
4th Unit Aktu Machine Learning
9 pages
Lec5 CNN RNN Attention
No ratings yet
Lec5 CNN RNN Attention
71 pages
Crimeai
No ratings yet
Crimeai
8 pages
JNTUK R20 UNIT-IV DEEP LEARNING TECHNIQUES (WWW - Jntumaterials.co - In)
No ratings yet
JNTUK R20 UNIT-IV DEEP LEARNING TECHNIQUES (WWW - Jntumaterials.co - In)
26 pages
What Is Convolutional Neural Network
No ratings yet
What Is Convolutional Neural Network
16 pages
CNN 1
No ratings yet
CNN 1
23 pages
All About Encoder-Decoder Models
No ratings yet
All About Encoder-Decoder Models
50 pages
DL CNN
No ratings yet
DL CNN
129 pages
Roadmap For Mastering Natural Language Processing
No ratings yet
Roadmap For Mastering Natural Language Processing
3 pages
Introduction To Deep Learning: Nandita Bhaskhar
No ratings yet
Introduction To Deep Learning: Nandita Bhaskhar
56 pages
Machine Learning IMP Questions
No ratings yet
Machine Learning IMP Questions
5 pages
Unit 3 NNDL-1
No ratings yet
Unit 3 NNDL-1
31 pages
DLT Unit - 4
No ratings yet
DLT Unit - 4
36 pages
MathVision An Accessible Intelligent Agent For Vis
No ratings yet
MathVision An Accessible Intelligent Agent For Vis
12 pages
AI ML Solved Question Paper
No ratings yet
AI ML Solved Question Paper
25 pages
Deep Learning Nov-2024
No ratings yet
Deep Learning Nov-2024
2 pages
Neural Network (RNN & CNN)
No ratings yet
Neural Network (RNN & CNN)
31 pages
AI Slide 2
No ratings yet
AI Slide 2
82 pages
JNTUK R20 UNIT-IV DEEP LEARNING TECHNIQUES-www - Jntumaterials.co - in
No ratings yet
JNTUK R20 UNIT-IV DEEP LEARNING TECHNIQUES-www - Jntumaterials.co - in
26 pages
Unit 3
No ratings yet
Unit 3
105 pages
DL Unit-4
No ratings yet
DL Unit-4
26 pages
Introduction To Deep Learning
No ratings yet
Introduction To Deep Learning
47 pages
Unit Iv DL
No ratings yet
Unit Iv DL
26 pages
New
No ratings yet
New
8 pages
Rec03 - Deep Architectures
No ratings yet
Rec03 - Deep Architectures
65 pages
Convolutional Neural Network
No ratings yet
Convolutional Neural Network
11 pages
Mergeddv
No ratings yet
Mergeddv
2 pages
Lecture2.2 UnimodalRepresentations Part1 PDF
No ratings yet
Lecture2.2 UnimodalRepresentations Part1 PDF
92 pages
ENG6500 8 DL IntroductionToDeepLearning Part2
No ratings yet
ENG6500 8 DL IntroductionToDeepLearning Part2
65 pages
Machine Learning For Power System Protection and Control
No ratings yet
Machine Learning For Power System Protection and Control
7 pages
Lecture 3
No ratings yet
Lecture 3
48 pages
ch4 CNN
No ratings yet
ch4 CNN
35 pages
02 - Introduction To Convolutional Neural Networks (CNNS)
No ratings yet
02 - Introduction To Convolutional Neural Networks (CNNS)
28 pages
DL Unit Iv
No ratings yet
DL Unit Iv
18 pages
CNN Notes Unit 3 Notes
No ratings yet
CNN Notes Unit 3 Notes
17 pages
Introduction To Convolutional Neural Networks1-Unit3
No ratings yet
Introduction To Convolutional Neural Networks1-Unit3
10 pages
CNN Deep
No ratings yet
CNN Deep
35 pages
CS601 Machine Learning Unit 3
No ratings yet
CS601 Machine Learning Unit 3
47 pages
What Is A Convolutional Neural Network-Unit3
No ratings yet
What Is A Convolutional Neural Network-Unit3
12 pages
Module 5
No ratings yet
Module 5
20 pages
Assignment
No ratings yet
Assignment
15 pages
Lec14 CNNRNNModels
No ratings yet
Lec14 CNNRNNModels
64 pages
AE556 2024 Topic4 CNN
No ratings yet
AE556 2024 Topic4 CNN
26 pages
AI - Mind or Machine?
No ratings yet
AI - Mind or Machine?
8 pages
DLA Unit 4
No ratings yet
DLA Unit 4
38 pages
Kernel Slides
No ratings yet
Kernel Slides
33 pages
Understanding of Convolutional Neural Network (CNN) - Deep Learning - by Prabhu Raghav - Medium
No ratings yet
Understanding of Convolutional Neural Network (CNN) - Deep Learning - by Prabhu Raghav - Medium
10 pages
Lecture 3 V33
No ratings yet
Lecture 3 V33
52 pages
Images and Convolutional Neural Networks: Practical Deep Learning
No ratings yet
Images and Convolutional Neural Networks: Practical Deep Learning
34 pages
UNIT-III Convolution Neural Networks
No ratings yet
UNIT-III Convolution Neural Networks
9 pages
Class Notes Unit 5
No ratings yet
Class Notes Unit 5
13 pages
Pyramid Image Processing: Exploring the Depths of Visual Analysis
From Everand
Pyramid Image Processing: Exploring the Depths of Visual Analysis
Fouad Sabry
No ratings yet
The Numpy Pocketbook: Essentials on the Go
From Everand
The Numpy Pocketbook: Essentials on the Go
Silas Meadowlark
No ratings yet

Lec6 RNN Attention Search

Uploaded by

Lec6 RNN Attention Search

Uploaded by

FALL 2023

Search & Perception

Search & Perception

• Multi-Context Multi-Modal Instruction Following (Imitation Learning) on NatSGD Dataset

• How to produce different features?

• How to quickly reduce the

Original image Flip horizontally Flip vertically

Original image Gaussian noise Impulse noise

Original image Crop right Crop top

Original image 180 degree rotation 90 degree rotation

What is the issue of 100 layers DNN? 𝑥

• Ba, Mnih, and Kavukcuoglu,

• Gregor et al, “DRAW: A Recurrent

• Aäron Van Den Oord, Nal

• Perceptron: 𝑓$ 𝑥 = 𝜎 ∑'%&! 𝑤%,$ 𝑥% , nm

– F() can be an inner product or MLP, e.g.,

• Handle variable length input

Search & Perception

You might also like