0% found this document useful (0 votes)
29 views28 pages

DLCV Ch1 Introduction

This document provides an introduction to deep learning for computer vision. It discusses collecting labeled training data and building models based on feature spaces. It also covers bounding boxes, gradient descent, and using PyTorch to build a linear regression model for localization. Examples are provided on computing gradients for bounding box coordinate updates, as well as calculating precision and recall for object detection tasks.

Uploaded by

Mario Parot
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
29 views28 pages

DLCV Ch1 Introduction

This document provides an introduction to deep learning for computer vision. It discusses collecting labeled training data and building models based on feature spaces. It also covers bounding boxes, gradient descent, and using PyTorch to build a linear regression model for localization. Examples are provided on computing gradients for bounding box coordinate updates, as well as calculating precision and recall for object detection tasks.

Uploaded by

Mario Parot
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 28

Deep Learning for

Computer Vision

INTRODUCTION TO
DEEP LEARNING

Prof. G.S. Jison Hsu 徐繼聖


• Artificial Vision Laboratory
• National Taiwan University of
Science and Technology

Deep Learning for Computer Vision


Deep Learning for Computer Vision 2
https://www.youtube.com/watch?v=kE5QZ8G_78c [9:39]
Deep Learning for Computer Vision 3
• Collection of training data
• Model built upon features (or more precisely, on the feature space)
• Model-based prediction when new data is given

Deep Learning for Computer Vision 4


Deep Learning for Computer Vision 5
Deep Learning for Computer Vision 6
What is Bounding Box?

Ground Truth: Initial Box :


• 𝑥𝑐𝑒𝑛𝑡𝑒𝑟 : 443 • 𝑥ො𝑐𝑒𝑛𝑡𝑒𝑟 : 678
• 𝑦𝑐𝑒𝑛𝑡𝑒𝑟 : 346 • 𝑦ො𝑐𝑒𝑛𝑡𝑒𝑟 : 105
• 𝑊(𝑤𝑖𝑑𝑡ℎ): 167 • 𝑊(𝑤𝑖𝑑𝑡ℎ): 167
• 𝐻(ℎ𝑒𝑖𝑔ℎ𝑡): 158 • 𝐻(ℎ𝑒𝑖𝑔ℎ𝑡): 158

Deep Learning for Computer Vision 7


Confidence Score of Bounding Box
• The confidence is defined as Pr (Class) * IOU (pred, truth). If no object exists
in that cell, the confidence score should be zero. Otherwise, we want the
confidence score to be as high as possible.

Digital Surveillance
Deep Learning forSystems and Application
Computer Vision 8
Initial Parameters
• 𝐼𝑡𝑒𝑟𝑎𝑡𝑖𝑜𝑛 = 2
• 𝐿𝑒𝑎𝑟𝑛𝑖𝑛𝑔 𝑟𝑎𝑡𝑒 = 0.3
• 𝑃 =(678, 105) Score: 0
• 𝐺 =(443, 346)

Score: 1

Deep Learning for Computer Vision 9


Iteration Results

x y S
Initial position 678 105 0 S: 0
S: 0.15
Iteration 1 537 249 0.15
Iteration 2 481 307 0.85 S: 0.85

Ground Truth 443 346 1

S = IoU Score for Bounding Box


S: 1

Deep Learning for Computer Vision 10


Gradient Descent
• Loss function
𝐿𝑜𝑠𝑠 𝐿𝑥 = (𝑃𝑥 − 𝐺𝑥 )2
𝐿𝑜𝑠𝑠 𝐿𝑦 = (𝑃𝑦 − 𝐺𝑦 )2
Score: 0
• Differential of loss function
𝐺𝑟𝑎𝑑𝑖𝑒𝑛𝑡 𝑑𝑥 = 2 𝑃𝑥 − 𝐺𝑥
𝐺𝑟𝑎𝑑𝑖𝑒𝑛𝑡 𝑑𝑦 = 2 𝑃𝑦 − 𝐺𝑦
• Update position
𝑥ො = 𝑥 − 𝑑𝑥 𝑙𝑟
𝑦ො = 𝑦 − 𝑑𝑦 𝑙𝑟
Score: 1

Deep Learning for Computer Vision 11


• Iteration 1 • Iteration 2
𝐿𝑜𝑠𝑠 𝐿𝑜𝑠𝑠
𝐿1𝑥 = (678 − 443)2 𝐿1𝑥 = (537 − 443)2
𝐿1𝑦 = (105 − 346)2 𝐿1𝑦 = (249.6 − 346)2

𝐺𝑟𝑎𝑑𝑖𝑒𝑛𝑡 𝐺𝑟𝑎𝑑𝑖𝑒𝑛𝑡
𝑑1𝑥 = 2 678 − 443 = 470 𝑑1𝑥 = 2 537 − 443 = 188
𝑑1𝑦 = 2 105 − 346 = -482 𝑑1𝑦 = 2 249.6 − 346 = -192.8
Update position Update position
537 = 678 − 0.3 ∗ 470 480.6 = 537 − 0.3 ∗ 188
249.6 = 105 − 0.3 ∗ (−482) 307.44
= 249.6 − 0.3 ∗ (−192.8)
Deep Learning for Computer Vision 12
Update Position
x y Loss of x Loss of y 𝐺𝑟𝑎𝑑𝑖𝑒𝑛𝑡 of x 𝐺𝑟𝑎𝑑𝑖𝑒𝑛𝑡 of y
Initial position 678 105 55225 58081 470 -482
Iteration 1 537 249.6 8836 92921 188 -192.8
Iteration 2 480 307 1414 1487 74 -78
Ground Truth 443 346 0 0 0 0

Deep Learning for Computer Vision 13


Use the pytorch to build our first linear regression model. We employ
the SGD as optimizer and use MSE loss function to train the model.
Moreover, we can visualize the training process.

Deep Learning for Computer Vision 14


• Set up the base structure of this model in Pytorch

import torch.nn as nn # Toy dataset


import numpy as np x_train = np.array([[3.3], [4.4], [5.5]], dtype=np.float32)
import matplotlib.pyplot as plt
plt.ion() y_train = np.array([[1.7], [2.76], [2.09]], dtype=np.float32)

# Hyper-parameters
num_epochs = 50
learning_rate = 0.001
# Set initialize parameter y = ax + b
a = -0.5
b=1

Deep Learning for Computer Vision 15


• Initialize the model type and declare the forward pass

# Define Linear regression model


model = nn.Linear(1, 1)
# Initialize parameter
model.weight.data.fill_(a)
model.bias.data.fill_(b)

Deep Learning for Computer Vision 16


Use the Mean Square Error (MSE), which is the most
commonly used regression loss function
# Define Loss
criterion = nn.MSELoss()

Use Stochastic Gradient Descent (SGD) optimizer


for the update of hyperparameters
# Define optimizer
optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate)

Deep Learning for Computer Vision 17


Example 1.2
Please use initial position and ground truth to compute the
gradient, and then complete the table below.

x y
Initial position 678 105 S: 0.15 S: 0
Iteration 1 537 249
Iteration 2 481 307
Iteration 3
Iteration 4
Iteration 5
S: 1
Ground Truth 443 346
Deep Learning for Computer Vision 18
Example 1.2
• Iteration 1 • Iteration 2
𝐿𝑜𝑠𝑠 𝐿𝑜𝑠𝑠
𝐿1𝑥 = (678 − 443)2 𝐿1𝑥 = (537 − 443)2
𝐿1𝑦 = (105 − 346)2 𝐿1𝑦 = (249.6 − 346)2

𝐺𝑟𝑎𝑑𝑖𝑒𝑛𝑡 𝐺𝑟𝑎𝑑𝑖𝑒𝑛𝑡
𝑑1𝑥 = 2 678 − 443 = 470 𝑑1𝑥 = 2 537 − 443 = 188
𝑑1𝑦 = 2 105 − 346 = -482 𝑑1𝑦 = 2 249.6 − 346 = -192.8
Update position Update position
537 = 678 − 0.3 ∗ 470 480.6 = 537 − 0.3 ∗ 188
249.6 = 105 − 0.3 ∗ (−482) 307.44
= 249.6 − 0.3 ∗ (−192.8)
Deep Learning for Computer Vision 19
Example 1.3 Dog Detection

Three dogs in the image.


𝑇𝑃
Precision =
𝑇𝑃 + 𝐹𝑃

𝑇𝑃
Recall =
𝑇𝑃 + 𝐹𝑁
2
Precision = = 0.5
2+2
2
TP = 2 FP = 2 FN = 1 Recall = 2 +1
= 0.666

Deep Learning for Computer Vision 20


Example 1.3 Face Detection

𝑇𝑃
Precision =
𝑇𝑃 + 𝐹𝑃

𝑇𝑃
Recall =
𝑇𝑃 + 𝐹𝑁
4
Precision = = 0.667
4+2
4
Recall = = 0.8
4 +1
TP = 4 FP = 2 FN = 1

Deep Learning for Computer Vision 15


Example 1.3 Confusion Matrix
Basic Form
GT\Pred Class 1 Class 2 True positive = TP
Class 1 TP FP
False positive = FP
True negative = TN
Class 2 FN TN
False negative = FN

Example 1.2

GT\Pred Dog Others GT\Pred Face Others


Dog 2 1 Face 5 0
Others 2 0 Others 2 0

Deep Learning for Computer Vision 15


True positive (TP) = correctly identified
False positive (FP) = incorrectly identified
True negative (TN) = correctly rejected
False negative (FN) = incorrectly rejected

𝑇𝑃
Precision =
𝑇𝑃 + 𝐹𝑃
𝑇𝑃
Recall =
𝑇𝑃 + 𝐹𝑁

Deep Learning for Computer Vision 23


https://www.youtube.com/watch?v=prWyZhcktn4&ab_channel=Simplilearn [4:28 – 24:46]
Deep Learning for Computer Vision 24
F M

Deep Learning for Computer Vision 25


Training and Testing Sets
Training Set
– A set in which data are known to a system for building
classification/regression model.
– For example, in a face recognition neural network, the face
images used to train the network.

Deep Learning for Computer Vision Face Image from Multi-PIE 26


Training and Testing Sets
Testing Set
– A set in which data are unknown to a system for recognition.
– For example, the face images to be recognized by the trained
face recognition network.

Face Image from Celeb-HQ


Deep Learning for Computer Vision 27
Summary

Deep Learning for Computer Vision 2

You might also like