0% found this document useful (0 votes)
56 views54 pages

Week8 WEB

The document discusses convolutional neural networks (CNNs) and the YOLO object detection model. It provides an overview of CNN architecture including convolution, activation, pooling, flattening, and fully connected layers. It explains how CNNs use shared weights and biases to detect features across image regions. The document also describes how YOLO improves on previous models by predicting bounding boxes and class probabilities simultaneously for real-time object detection in images.

Uploaded by

Ankit Shaw
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
56 views54 pages

Week8 WEB

The document discusses convolutional neural networks (CNNs) and the YOLO object detection model. It provides an overview of CNN architecture including convolution, activation, pooling, flattening, and fully connected layers. It explains how CNNs use shared weights and biases to detect features across image regions. The document also describes how YOLO improves on previous models by predicting bounding boxes and class probabilities simultaneously for real-time object detection in images.

Uploaded by

Ankit Shaw
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 54

TECHIN513 – Managing

Signal and Data Processing


Week 8
Today’s Agenda
• CNN
• YOLO
• ICTE
• FPDAWT
Today’s Agenda
• Convolutional Neural Network
• You Only Look Once
• In Class Team Exercise
• Final Project Discussion And Work Time
Announcement
• Purchasing supplies for final project
• Budget of $40 per team
• Requests must be made by Monday, February 26 at 9:59am

Link to Request Form:


TECHIN513 Final Project Supply Request Form - Google Sheets
What is a convolutional neural network?
• A network architecture for deep learning
• CNNs can have tens or hundreds of hidden layers
• Includes a typical artificial neural network architecture
• Useful for finding patterns in images to recognize objects
Stages of a CNN
• Input image
• Convolution
• Activation
• Pooling
• Flattening
• Fully Connected ANN
• Activation
image source

• Output

Convolutional Operations | Medium


pixel values range
Greyscale Image Data from 0 to 255

24x16 matrix

How Do Machines Read and Store Images? | Analytics Vidhya


Color Image Data

one image has


three matrices or
pixel values range “channels”
from 0 to 255

How Do Machines Read and Store Images? | Analytics Vidhya


CNN Overview
Feature Extraction

Feature Extraction with CNNs | Towards Data Science


Typical Artificial Neural Network
• Each neuron in the input layer
is connected to a neuron in the
hidden layer
• Each connection has a weight
value
• Each neuron has a bias value
• The model learns these values
during the training process
• Values are updated with each
new training example

Introduction to Deep Learning - MATLAB


Typical Artificial Neural Network
• Each neuron in the input layer
is connected to a neuron in the
hidden layer
• Each connection has a weight
value
• Each neuron has a bias value
• The model learns these values
during the training process
• Values are updated with each
new training example

Introduction to Deep Learning - MATLAB


Convolutional Neural Network
• The weights and bias values are
the same for all neurons in a
hidden layer
• All hidden layers are detecting
the same feature (e.g. edge) in
different regions of an image
• The network is better equipped
to detect the feature regardless
of its location in an image

Introduction to Deep Learning - MATLAB


Convolutional Neural Network
• The weights and bias values are
the same for all neurons in a
hidden layer
• All hidden layers are detecting
the same feature (e.g. edge) in
different regions of an image
• The network is better equipped
to detect the feature regardless
of its location in an image

Introduction to Deep Learning - MATLAB


Convolutional Operation

An operation on two functions


which produces a third
combined function

Convolution Integral | Statistics How To


Convolutional Operation
kernel types

• A convolutional kernal is a
small 2D matrix
• The kernal maps on to the
input image by matrix
multiplication and addition
• The output is a matrix of
lower dimensions
Sliding window protocol
where stride =1

Lower dimension matrix


(feature map) Convolutional Operations | Medium
Convoluting to Create Feature Maps

CNNs | simplilearn
45*0
+ 12*(-1)
+ 5*0
+ 22*(-1)
+ 10*5
+ 35*(-1)
+ 88*0
+ 26*(-1)
+ 51*0
= - 45
Activation Step Rectified
Linear
Unit
• Activation function takes the
output of a neuron and maps it
to the highest positive value
• If output is negative, the
function maps it to zero
• ReLU is a commonly used
activation function in deep
learning

Introduction to Deep Learning - MATLAB


ReLu activation retains only positive values

CNNs | simplilearn
CNN Overview
Pooling Step New
Feature
Map
• Pooling reduces dimensionality
of features map by using
different filters
• Condenses regions of neurons
into a single output
• Simplifies model by reducing
the number of parameters the
model needs to learn
• Pooling retains the most
important information but
lowers resolution

Introduction to Deep Learning - MATLAB


Pooling Applies Various Filters

CNNs | simplilearn
Pooling Enhances Edges Three iterations of
max pooling using a
(2, 2) kernel

Features (edges) are


enhanced, but
resolution is reduced

Pooling In Convolutional Neural Networks | paperspace


CNN Overview
Flattening
• The flatten layer lies
between the CNN and the
Softmax
ANN
• Converts the feature map
from the pooling layer into
an input that the ANN can
understand
• The ANN requires a one-
dimensional array as input
Artificial Neural Network

Feature Maps | educative.io , Dense layers | Pysource


Softmax Activation Step
Mathematical
representation
Last fully
• Often used as the last connected layer
activation function to
normalize the output of a
network to a probability
distribution over predicted
output classes
• The output of a Softmax is a
vector with probabilities of
each possible outcome.

Softmax Activation Function | Towards Data Science


CNN Output Layer
The final layer of the CNN architecture provides the final
classification output
A vector of length K
equal to the
number of classes

Introduction to Deep Learning - MATLAB


Classification, Detection, & Segmentation

or object localization

Object Segmentation vs. Object Detection | LinkedIn


You Only Look Once
• "You Only Look Once" (YOLO)
• YOLOv1 paper published May 2016
• Uses CNN as its backbone
network architecture
• YOLO predicts bounding boxes
and class probabilities for these
boxes simultaneously
• Improvement on previous model:
R-CNN

https://arxiv.org/abs/1506.02640
YOLO

https://pjreddie.com/darknet/yolo/

https://arxiv.org/abs/1506.02640
Previous Model for Image Detection: R-CNN
• Regions with CNN features
• Published Oct 2014
• link to article
• Splits an image into 2000
regions in boundary boxes
then classify each region
• Drawbacks:
• Long time to train – classify
2000 regions per image
• Detection not in real-time: 47
sec for test image
• Boundary box inaccuracies

R-CNN | Towards Data Science


How does YOLO work?
• Resizes the input image into YOLO Architecture
448x448
• A 1x1 convolution is first applied
to reduce the number of
channels
• 24 convolutional layers
• 4 max pooling layers
• The activation function is ReLU
• Two fully connected layers

https://arxiv.org/abs/1506.02640
What is Object
Detection?
First let’s talk about
object localization

36
What is object localization?
width (bw)
Object localization is
finding what and where a
(single) object exists in a
single image

height
(bh)

(bx, by)
How is object localization described
numerically in YOLO?
• The coordinates of a bounding x_train

box are described as a vector

y_train

Pc 1
Probability Bx 0.5
of class By 0.6
Bw 0.4
Bh 0.3
C1 1
C2 0
C1 = car class
C2 = motorcycle class
How is object localization described
numerically in YOLO? (0.5,0.6)
• The coordinates of a bounding (0,0) x_train

box are described as a vector

y_train

Pc 1 (bx,by)
Probability Bx 0.5 bh
of class By 0.6
0.3
Bw 0.4
Bh 0.3 bw
C1 1
C2 0 (1,1)
C1 = car class 0.4
C2 = motorcycle class
How is object localization described
numerically in YOLO? (0.5,0.6)
• The coordinates of a bounding (0,0)
box are described as a vector

Output of
Neural Network

Pc 1 (bx,by)
Probability Bx 0.5 bh
of class By 0.6
0.3
Bw 0.4
Bh 0.3 bw
C1 0.97
C2 0.03 (1,1)
C1 = car class 0.4
C2 = motorcycle class
How is object localization described
numerically in YOLO?
• The coordinates of a bounding x_train

box are described as a vector

y_train

Pc 0
Probability Bx -
of class By -
Bw -
Bh -
C1 -
C2 -
C1 = car class
C2 = motorcycle class
What about multiple objects?

YOLO algorithm | YouTube


What about multiple objects?

Pc 0
Bx -
By -
Bw -
Bh -
C1 -
C2 -

C1 = dog class
C2 = person class

YOLO algorithm | YouTube


What about multiple objects?
Person’s
object
belongs to
this cell

Pc 1
Bx 0.05
By 0.3
Bw 2
Bh 1.3
C1 1
C2 0

C1 = dog class
C2 = person class

YOLO algorithm | YouTube


What about multiple objects?

Pc 1
Bx 0.32
By 0.02
Bw 2.2
Bh 1.7
C1 0
C2 1

C1 = dog class
C2 = person class

YOLO algorithm | YouTube


What about multiple objects?

All other cells 4x4x7 matrix

Pc 0
Bx -
By -
Bw -
Bh -
C1 -
C2 -

C1 = dog class
C2 = person class

YOLO algorithm | YouTube


Training the YOLO Model

YOLO algorithm | YouTube


YOLO Prediction

YOLO algorithm | YouTube


Evaluating Image Detection Models
• Common Objects in Context
(COCO) dataset
• Published by Microsoft
• Used to evaluate algorithms’
performance of real-time
object detection
• 330,000 images
• 200,000 are labeled Pc 1

• 1.5 million object instances y_train


Bx
By
0.5
0.6
Bw 0.4
• 5 captions per image Bh
C1
0.3
1
C2 0

COCO Dataset | viso.ai


Evaluating Image Detection Models
Error Matrix

• Mean Average Precision (mAP)


• Benchmark metric used to
evaluate the robustness of
object detection models
• Incorporates mathematics image source

from:
• Error matrix
• Intersection over union (IoU)
ratio for bounding box

image source

Understanding Confusion Matrix | Towards Data Science


Best Object Detection Models

Object Detection | viso.ai


YOLOv8

YOLOv8 Tutorial - Colaboratory (google.com)


YOLOv8

Ultralytics YOLOv8 | GitHub


ICTE

You might also like