0% found this document useful (0 votes)
1 views

02_intro_learning

The document outlines an introductory course on Neural Networks and Deep Learning, covering topics such as supervised and unsupervised learning, classification models, and the k-Nearest Neighbor (kNN) classifier. It discusses the basic framework for supervised learning, including the formulation of prediction functions and the importance of training data. Additionally, it touches on various learning paradigms, including weakly-supervised and semi-supervised learning, emphasizing their differences and applications.

Uploaded by

hydralynette
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
1 views

02_intro_learning

The document outlines an introductory course on Neural Networks and Deep Learning, covering topics such as supervised and unsupervised learning, classification models, and the k-Nearest Neighbor (kNN) classifier. It discusses the basic framework for supervised learning, including the formulation of prediction functions and the importance of training data. Additionally, it touches on various learning paradigms, including weakly-supervised and semi-supervised learning, emphasizing their differences and applications.

Uploaded by

hydralynette
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 98

ECBM E4040

Neural Networks and Deep


Learning
Micah Goldblum
Acknowledgement for slides - Abhinav Shrivastava (UMD)
Administrative
• We will have pop quizzes
• Based on previous lectures
• Bring paper to use for quiz
• No phones or laptops allowed
• Lowest couple grades dropped
• Will send out HW assignment 1 soon
Introduction to Statistical Learning
Statistical Learning
• Making sense of data
Supervised Statistical Learning
• Given inputs (data-label pairs), learn a model to predict output
Supervised Statistical Learning
• Given inputs (data-label pairs), learn a model to predict output
• Require training data to learn
Example 1: Image classification
input desired output

apple apple

pear pear

tomato
tomato
cow
cow
dog

dog
horse

horse

Example: Lazebnik, Images: ETH-80 dataset


Basic Supervised Learning Framework
Training Samples
apple
Features
Training pear
Learned
time tomato Training
model
cow
Training
dog
Labels
horse

Testing Learned
Features Prediction
Model
time
Test Sample
Credit: Lazebnik
Basic Supervised Learning Formulation

y = f(x)
output prediction input
function

• Training (or learning): given a training set of labeled examples {(x1,y1), …,


(xN,yN)}, instantiate a predictor f
• Testing (or inference): apply f to a new test example x and output the
predicted value y = f(x)
Credit: Lazebnik
Basic Supervised Learning Formulation

y = f(x)
output prediction input
function

Formulation:
• Given training data: {(x1,y1), …, (xN,yN)},
• Find f using training data
• such that f is correct on test data

Credit: Lazebnik, Liang


Basic Supervised Learning Formulation

y = f(x)
output prediction input
function

Formulation:
• Given training data: {(x1,y1), …, (xN,yN)},
• Find f using training data What kind of functions are allowed?

• such that f is correct on test data

Credit: Lazebnik, Liang


Basic Supervised Learning Formulation

y = f(x)
output prediction input
function

Formulation:
• Given training data: {(x1,y1), …, (xN,yN)},
• Find f ∈ 𝓗 using training data Hypothesis space of f

• such that f is correct on test data

Credit: Lazebnik, Liang


Basic Supervised Learning Formulation

y = f(x)
output prediction input
function

Formulation:
• Given training data: {(x1,y1), …, (xN,yN)},
• Find f ∈ 𝓗 using training data Is there any connection between
training and test data?
• such that f is correct on test data

Credit: Lazebnik, Liang


Basic Supervised Learning Formulation

y = f(x)
output prediction input
function

Formulation:
• Given training data: {(x1,y1), …, (xN,yN)},
• Find f ∈ 𝓗 using training data i.i.d. samples from the same
distribution D
• such that f is correct on test data

Credit: Lazebnik, Liang


Basic Supervised Learning Formulation

y = f(x)
output prediction input
function

Formulation:
• Given training data: {(x1,y1), …, (xN,yN)},
• Find f ∈ 𝓗 using training data
• such that f is correct on test data How do we measure correctness?

Credit: Lazebnik, Liang


Basic Supervised Learning Formulation

y = f(x)
output prediction input
function

Formulation:
• Given training data: {(x1,y1), …, (xN,yN)},
• Find f ∈ 𝓗 using training data
• such that f is correct on test data Loss measures in next class

Credit: Lazebnik, Liang


Simple Models
Classification and Regression
Simple Classifier (classification model)

Training
Training examples
examples from class 2
from class 1

Credit: Lazebnik
Simple Classifier

Training
Training Test
examples
examples example
from class 2
from class 1

f(x) = ?

Credit: Lazebnik
Nearest Neighbor Classifier

Training
Training Test
examples
examples example
from class 2
from class 1

f(x) = label of the training example nearest to x

• All we need is a distance function for our inputs


• No training required!
Credit: Lazebnik
K-Nearest Neighbor (kNN) Classifier

k=3

• Find k nearest points to x


• f(x) = vote for class labels with labels of the k nearest points

Credit: An Introduction to Statistical Learning


K-Nearest Neighbor (kNN) Classifier

k=3

• Find k nearest points to x


• f(x) = vote for class labels with labels of the k nearest points

Credit: An Introduction to Statistical Learning


kNN vs. Nearest Neighbor

Credit: Lazebnik, CS231n Karpathy


kNN vs. Nearest Neighbor

K-NN is more robust to outliers

Credit: Lazebnik, CS231n Karpathy


kNN Classifier – problems in higher dimensions

Credit: Lazebnik, CS231n Karpathy


kNN Classifier – problems in higher dimensions
Better distance functions - handcrafted features
• Euclidean distance is often bad in high dimensional spaces
• Doesn't match semantic similarity
• Consider images – color, translations, …
• Map samples to another space
• Make mapping invariant to things that don't effect semantic meaning
kNN Classifier – problems in higher dimensions
Example: Histogram of Oriented Gradients (HOG)
• Divide image into cells
• Compute local gradient directions within cell (conv with filters)
• Detects edges
• Per-cell histograms
• Robust to re-coloring and other transformations
kNN Classifier – problems in higher dimensions
Example: Histogram of Oriented Gradients (HOG)
Linear Classifier

Training
Training examples
examples from class 2
from class 1

Find linear function that separates the classes


f(x) = ?

Credit: Lazebnik
Linear Classifier

Training
Training examples
examples from class 2
from class 1

Find linear function that separates the classes


f(x) = sgn(w1x1 + w2x2 + … + wDxD + b) = sgn(w T x + b)

Credit: Lazebnik
Linear Classifier

Class 1 Class 2

Test
example

Find linear function that separates the classes


f(x) = sgn(w1x1 + w2x2 + … + wDxD + b) = sgn(w T x + b)

Credit: Lazebnik
Linear Classifier

Class 1 Class 2

Test
example

Find linear function that separates the classes


f(x) = sgn(w1x1 + w2x2 + … + wDxD + b) = sgn(w T x + b)
f(x) = sgn(w1x1 + w2x2 + … + wDxD + w0x0) = sgn(w T x)

Credit: Lazebnik
Quick review
Linear Classifier

Class 1 Class 2

Test
example

Find linear function that separates the classes


f(x) = sgn(w1x1 + w2x2 + … + wDxD + b) = sgn(w T x + b)
f(x) = sgn(w1x1 + w2x2 + … + wDxD + w0x0) = sgn(w T x)

Credit: Lazebnik
Linear Classifier for >2 classes

One vs. All


Total: K-1 classifiers

Source: Bishop PRML, page 183


Linear Classifier for >2 classes

One vs. All One vs. One


Total: K-1 classifiers Total: K(K-1)/2 classifiers
Source: Bishop PRML, page 183
Visualizing (low-dim) Linear Classifier
Seismic data classification

Earthquakes
Surface wave magnitude

Nuclear explosions

Body wave magnitude

Credit: Lazebnik
Linear Classifier: Perceptron view

Input
Weights
x1
w1

x2
w2
Output: sgn(wTx + b)
x3
w3
.
.
.
wD
xD

Credit: Lazebnik
Loose inspiration: Human neurons

Credit: Lazebnik, https://mcgovern.mit.edu/2019/02/28/ask-the-brain-how-do-neurons-communicate/


Loose inspiration: Human neurons

Credit: Lazebnik, https://giphy.com/gifs/harvard-brain-neuron-9N2UvCx7wXLnG, https://www.youtube.com/watch?v=N_ynH3BeKIo


Recall: A brief history of Neural Networks
1943: McCulloch and Pitts neurons
1958: Rosenblatt’s perceptron
1969: Minsky and Papert Perceptrons book
….AI Winter….

Slide credit: Lazebnik


Recall: A brief history of Neural Networks
1943: McCulloch and Pitts neurons
1958: Rosenblatt’s perceptron
1969: Minsky and Papert Perceptrons book
….AI Winter….

Snippet credit: PRML 2006, Bishop Slide credit: Lazebnik


Perceptrons, linear separability, and Boolean functions

Minsky and Papert studied theory of Boolean functions in their book


What can perceptrons represent?
Credit: Lazebnik
kNN vs. Linear Classifiers
kNN Pros Linear Pros

kNN Cons Linear Cons

Credit: Lazebnik
kNN vs. Linear Classifiers
kNN Pros Linear Pros
+ Simple to implement + Low-dimensional parametric representation
+ Decision boundaries not necessarily linear + Easy to learn (more later)
+ Works for any number of classes + Very fast at test time
+ Nonparametric method
kNN Cons Linear Cons
– Need good distance function – Works for two classes (?)
– Slow at test time – How to train the linear function?
– What if data is not linearly separable?

Credit: Lazebnik
Prediction Scenarios
• Supervised Learning
• Given: Inputs (data-label pairs)
• Later classes: other prediction scenarios
Learning Paradigms
Supervised Learning
• Given inputs (data-label pairs), learn a model to predict output

• Labeled data is available


• Labeled data is clean
• No restrictions on type of input and labels
Unsupervised Learning
• Given just data as input (no labels), learn some sort of underlying structure..
• Goal is often vague or subjective (compared to supervised learning,
where labels define the goals)
• Also known as exploratory/descriptive data analysis

Credit: Lazebnik
Unsupervised Learning: Clustering
• Discover groups of “similar” data points

Credit: Lazebnik
Unsupervised Learning: Quantization
Quantization or data compression
• Encode the data into a more compact form

C2

C1

C3

Credit: Lazebnik
Unsupervised Learning: Dimensionality Reduction
Dimensionality reduction, manifold learning
• Discover a lower-dimensional surface on which the data lives

Credit: Lazebnik, Image source1, source2


Unsupervised Learning: Dimensionality Reduction
Dimensionality reduction, manifold learning
Goals:
• Discover a lower-dimensional surface on which the data lives Preserve (compress) the data
Separate classes

Credit: Lazebnik, Image source1, source2


Unsupervised Learning: Learning Data Distribution
Density Estimation
• Find a function that approximates the probability density of the data (i.e., value of the function is high for “typical”
points and low for “atypical” points)
• Can be used for anomaly detection

Credit: Lazebnik, Image source1, source2


Unsupervised Learning: Learning Data Distribution
Learning to sample (e.g., GANs or Generative Adversarial Networks)
• Produce samples from a data distribution that mimics the training set

Credit: Lazebnik, Image DeepMind’s BigGAN, NVIDIA GAN


Extremes of Learning Paradigms

Supervised Unsupervised
Learning Learning

• Needs nice, clean labels • No labels


• All data needs to be labeled • No labels
• Labels correspond to task of • How does this work?
interest
(Some) Learning Paradigms

Supervised Weakly-Supervised Unsupervised


Learning Learning Learning

• Needs nice, clean labels • Noisy, often incorrect, labels • No labels


• All data needs to be labeled • All data needs to be labeled • No labels
• Labels correspond to task of • Labels might not correspond • How does this work?
interest to task of interest, but need
to be related
Weakly-Supervised Learning
• Noisy or incomplete labels (e.g., image classification)

Vocabulary: Person, Cat, Dog, Horse, Sheep, Cat, Shirt,


Jeans, Hat, Grass, Sky, Road, Fur

Labels: Person, sheep


Missing Labels: Dog, Grass, Shirt, Jeans, Hat, Fur, etc.

Image: COCO dataset


Weakly-Supervised Learning
• Noisy or incomplete labels: Extracting labels (e.g., captions to labels)

Flickr images with


user-assigned or
machine-assigned
tags, captions

Use ML to assign
images concepts

Figures: Joulin et al.


Weakly-Supervised Learning
• Related labels, but don’t correspond to the task (e.g., object detection)
Input: Goal:

Vocabulary: Person, Cat, Dog, Horse, Sheep, Cow person sheep dog
Labels: Person, Sheep, Dog

Image: COCO dataset


Weakly-Supervised Learning
• Related labels, but don’t correspond to the task (e.g., image segmentation)
Input: Goal:

Vocabulary: Person, Cat, Dog, Horse, Sheep, Cow person sheep dog grass
Labels: Person, Sheep, Dog, Grass

Image: COCO dataset


(Some) Learning Paradigms

Supervised Weakly-Supervised Unsupervised


Learning Learning Learning

• Needs nice, clean labels • Noisy, often incorrect, labels • No labels


• All data needs to be labeled • All data needs to be labeled • No labels
• Labels correspond to task of • Labels might not correspond • How does this work?
interest to task of interest, but need
to be related
(Some) Learning Paradigms

Supervised Weakly-Supervised Semi-Supervised Unsupervised


Learning Learning Learning Learning

• Needs nice, clean labels • Noisy, often incorrect, labels • Needs nice, clean labels • No labels
• All data needs to be labeled • All data needs to be labeled • Small portion of data needs • No labels
• Labels correspond to task of • Labels might not correspond to be labeled • How does this work?
interest to task of interest, but need • Labels correspond to task of
to be related interest
Semi-supervised Learning (SSL)

Figure credit: UCL slides


Semi-supervised Learning (SSL)

Figure credit: UCL slides


Semi-supervised Learning (SSL)

Figure credit: UCL slides


Semi-supervised Learning (SSL)

Figure credit: UCL slides


Semi-supervised Learning (SSL)

Figure: source
Semi-supervised Learning (SSL)

SSL algorithms may


assign unlabeled
samples different
labels

Figure: source
(Some) Learning Paradigms

Supervised Weakly-Supervised Semi-Supervised Unsupervised


Learning Learning Learning Learning

• Needs nice, clean labels • Noisy, often incorrect, labels • Needs nice, clean labels • No labels
• All data needs to be labeled • All data needs to be labeled • Small portion of data needs • No labels
• Labels correspond to task of • Labels might not correspond to be labeled • How does this work?
interest to task of interest, but need • Labels correspond to task of
to be related interest
(Some) Learning Paradigms

Supervised Weakly-Supervised Semi-Supervised Unsupervised


Learning Learning Learning Learning
(Some) Learning Paradigms

Supervised Weakly-Supervised Semi-Supervised Unsupervised


Learning Learning Learning Learning

Active
Learning
• Human-in-the-loop
• Questions can be complete
labels or incomplete
information
Active Learning
• Human in the loop to ask questions/labels
Active Learning Hierarchical labels

• Human in the loop to ask questions/labels Confident models


might require more
granular information

Reduce human time


expenditure

Figure: source
Active + Weakly-supervised Learning
• Human in the loop to ask incomplete questions/labels

Figure: source
(Some) Learning Paradigms

Supervised Weakly-Supervised Semi-Supervised Unsupervised


Learning Learning Learning Learning

Active
Learning
(Some) Learning Paradigms

Supervised Weakly-Supervised Semi-Supervised Unsupervised


Learning Learning Learning Learning

Active Self-supervised/Predictive
Learning Learning
• Proxy-tasks
• Use supervision naturally arising
from data (without any human
provided labels)
Self-supervised/Predictive Learning
• Proxy-tasks (e.g., image colorization):
• Use supervision naturally arising from data
• No human provided labels

Credit: Lazebnik, Figure: source


Self-supervised/Predictive Learning
• Proxy-tasks (e.g., future flow prediction):
• Use supervision naturally arising from data
• No human provided labels

Unlabeled video

Use algorithm to label images


with directions of motion​, use
video

Train neural network to predict


just from image
Figure: source
Self-supervised/Predictive Learning
• Proxy-tasks (e.g., future motion prediction): Next frame prediction
Model-predictive control
• Use supervision naturally arising from data Action conditioned MPC
• No human provided labels

Credit: Lazebnik, Figure: source1, source2


Self-supervised/Predictive Learning
• Proxy-tasks (e.g., context prediction):
• Use supervision naturally arising from data
• No human provided labels

Figure: source
Self-supervised/Predictive Learning
Goal: representation learning
• Proxy-tasks (e.g., supervision from tracking): -Unsupervised object tracking
• Use supervision naturally arising from data in videos​
• No human provided labels -Train model to assign similar
features to ​same object from
different frames

Figure: source
(Some) Learning Paradigms

Supervised Weakly-Supervised Semi-Supervised Unsupervised


Learning Learning Learning Learning

Active Self-supervised/Predictive
Learning Learning
(Some) Learning Paradigms

Supervised Weakly-Supervised Semi-Supervised Unsupervised


Learning Learning Learning Learning

Active Self-supervised/Predictive
Learning Learning

Reinforcement
Learning
Reinforcement Learning
• Learn in an interactive environment by sequential trial and error using
feedback/rewards from its own actions and experiences

Figure source1, source2


Reinforcement Learning

Credit: Lazebnik, AlphaGo


Reinforcement Learning

Credit: Lazebnik, Playing Atari with Deep RL, breakout video


Reinforcement Learning

Credit: source
“RLHF”

https://aws.amazon.com/what-is/reinforcement-learning-from-human-feedback/
(Some) Learning Paradigms

Supervised Weakly-Supervised Semi-Supervised Unsupervised


Learning Learning Learning Learning

Active Self-supervised/Predictive
Learning Learning

Reinforcement
Learning
(Some) Learning Paradigms

Supervised Weakly-Supervised Semi-Supervised Unsupervised


Learning Learning Learning Learning

Active Self-supervised/Predictive
Learning Life-long/Never-ending Learning
Learning

Reinforcement
Learning
Life-long/Never-ending Learning

Start with categories (e.g. politician,


fruit, etc.) and relations (e.g.
playsOnTeam(athlete,sportsTeam))

Use search engine APIs to find new


instances of nouns belonging to
categories and noun pairs representing
relations

Keep training systems for information


extraction

Credit: Lazebnik, NELL


Life-long/Never-ending Learning

Credit: Lazebnik, NELL


Life-long/Never-ending Learning

Like NELL but using images

Start with seed images and object categories

Train detectors to locate those objects in an image

Discover relationships between categories (e.g. tails are on animals like cats and dogs, tv
looks similar to computer monitor)

Use system to label new data retrain detectors

Cluster images to find categories

Paper: NEIL
Life-long/Never-ending Learning

Paper: NEIL
Life-long/Never-ending Learning

Paper: NEIL
(Some) Learning Paradigms

Supervised Weakly-Supervised Semi-Supervised Unsupervised


Learning Learning Learning Learning

Active Self-supervised/Predictive
Learning Life-long/Never-ending Learning
Learning

Reinforcement
Learning
Next...
Neural networks, finally

You might also like