0% found this document useful (0 votes)

25 views

03_pytorch_computer_vision

The document discusses computer vision and convolutional neural networks (CNNs), outlining various problems such as binary classification, multiclass classification, object detection, and segmentation. It covers the architecture of CNNs using PyTorch, including data handling, model creation, training, and evaluation. Additionally, it addresses concepts like overfitting, data augmentation, and popular architectures in computer vision.

Uploaded by

Saksham

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

25 views

03_pytorch_computer_vision

Uploaded by

Saksham

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 29

Computer Vision & Convolutional

Neural Networks
with
Where can you get help?
“If in doubt, run the code”

• Follow along with the code

• Try it for yourself
• Press SHIFT + CMD + SPACE to read the docstring
• Search for it
• Try again
• Ask

https://www.github.com/mrdbourke/pytorch-deep-learning/discussions
“What is a computer vision
problem?”
Example computer vision problems
“Is this a photo of steak or pizza?” “Where’s the thing we’re looking for?”

Binary classi cation t h er )

Object detection
o r an o
(one thing
“What are the di erent sections in this image?”
“Is this a photo of sushi, steak or pizza?”

Multiclass classi cation

(more than one thing or
another) Segmentation
Source: On-device Panoptic Segmentation for Camera Using Transformers.
fi
ff
fi
Tesla Computer Vision

Source: Tesla AI Day Video (49:49). PS see 2:01:31 of the same video for surprise ;)
Tesla Computer Vision

Source: AI Drivr YouTube channel.

What we’re going to cover
(broadly)
• Getting a vision dataset to work with using torchvision.datasets

• Architecture of a convolutional neural network (CNN) with PyTorch

• An end-to-end multi-class image classi cation problem

• Steps in modelling with CNNs in PyTorch

• Creating a CNN model with PyTorch

• Picking a loss and optimizer

• Training a PyTorch computer vision model

• Evaluating a model

👩🍳 👩🔬
(w e’ ll be co ok ing u p lots of co d e! )

How:
fi
Computer vision inputs and outputs
224

W = 224 224 Sushi 🍣

H = 224 Steak 🥩
C=3 Pizza 🍕
(c = colour channels, R, G, B) Actual output
This is often a
convolutional neural network (CNN)!
🍣 🥩 🍕
[[0.31, 0.62, 0.44…], [[0.97, 0.00, 0.03],
[0.92, 0.03, 0.27…], [0.81, 0.14, 0.05],
[0.25, 0.78, 0.07…], [0.03, 0.07, 0.90],
…, (normalized pixel valu …,
es)
Numerical
Predicted output
encoding (often already ex
ists, if not,
you can build on (comes from looking at lots
e) of these)
Input and output shapes
(for an image classification example) We’re going to be building CNNs
to do this part!

224
[[0.31, 0.62, 0.44…], 🍣 🥩 🍕
224 [0.92, 0.03, 0.27…], [0.00, 0.97, 0.03]
[0.25, 0.78, 0.07…], i o n p r ob ab i l i t i e s )
(predict
…,

(gets represented as a tens

or)
[batch_size, width, height, colour_channels] Shape = [3]
Shape = [None, 224, 224, 3]
or
Shape = [32, 224, 224, 3] These will vary depending on the
(32 is a v e ry c o m m o n b a t c h problem you’re working on.
size)
Input and output shapes
(gets represented as a tens
28
or)
[[0.00, 0.62, 0.44…], 🥾 👕 👖…
28
[0.00, 0.03, 0.27…], [0.00, 0.97, …]
[0.01, 0.78, 0.07…], t i o n p r o b ab i l i t i e s )
(predic
…,

(colour channels last)

[batch_size, height, width, colour_channels] (NHWC)
or (colour channels first) Shape = [10]
[batch_size, colour_channels, height, width] (NCHW)

Shape = [None, 28, 28, 1] (NHWC)

Shape = [None, 1, 28, 28] (NCHW) These will vary depending on the
or problem you’re working on.
Shape = [32, 28, 28, 1]
(32 is a very common batch
size)
“What is a convolutional neural
network (CNN)?”
Let’s code!
FashionMNIST

“What type of clothing is in

this image?”
Multiclass classi cation
(more than one thing or
another)

torchvision.datasets.FashionMNIST
fi
Input and output shapes
(gets represented as a tens
28
or)
[[0.00, 0.62, 0.44…], 🥾 👕 👖…
28
[0.00, 0.03, 0.27…], [0.00, 0.97, …]
[0.01, 0.78, 0.07…], t i o n p r o b ab i l i t i e s )
(predic
…,

(colour channels last)

[batch_size, height, width, colour_channels] (NHWC)
or (colour channels first) Shape = [10]
[batch_size, colour_channels, height, width] (NCHW)

Shape = [None, 28, 28, 1] (NHWC)

Shape = [None, 1, 28, 28] (NCHW) These will vary depending on the
or problem you’re working on.
Shape = [32, 28, 28, 1]
(32 is a very common batch
size)
FashionMNIST: Batched batch_size=32
(32 samples per batch)

Sample 0 1 2 3 4 5 32
Batch 0 …

1 …

2 …

torch.utils.data.DataLoader

3 …

torchvision.datasets.FashionMNIST 4 …

shuffle=True

…
(samples all mixed up)

Num samples/
batch_size
(typical)*

Architecture of a CNN

(what we’re working towa

rds
building)

Steak 🥩
Pizza 🍕
Sushi 🍣

*Note: there are almost an unlimited amount of ways you could stack together a convolutional neural network, this slide demonstrates only one.
Typical architecture of a CNN
(col o ur e d b l o c k e d it i o n )
Simple CNN

Deeper CNN
CNN Explainer model
Input layer Conv2d layers ReLU activation layers Pooling layers Output layer

Source: CNN Explainer website, architecture is known as TinyVGG.

Breakdown of torch.nn.Conv2d layer
Example code: torch.nn.Conv2d(in_channels=3, out_channels=10, kernel_size=(3, 3), stride=(1, 1), padding=0)
Example 2 (same as above): torch.nnConv2d(in_channels=3, out_channels=10, kernel_size=3, stride=1, padding=0)

Hyperparameter name What does it do? Typical values

in_channels De nes the number of input channels of the input data. 1 (grayscale), 3 (RGB color images)

De nes the number output channels of the layer (could

out_channels 10, 128, 256, 512
also be called hidden units).

kernel_size (also referred to as 3, 5, 7 (lowers values learn smaller

Determines the shape of the kernel (sliding windows) over
features, higher values learn larger
lter size) the input. features)

The number of steps a lter takes across an image at a

stride time (e.g. if strides=1, a lter moves across an image 1 1 (default), 2
pixel at a time).

Pads the target tensor with zeroes (if “same”) to preserve

padding input shape. Or leaves in the target tensor as is (if 0, 1, “same”, “valid”
“valid”), lowering output shape.

📖 Resource: For an interactive demonstration of the above hyperparameters, see the CNN Explainer website.
fi
fi
fi
fi
fi
Breakdown of torch.nn.Conv2d layer (Visually)

📖 Resource: For an interactive demonstration of the above hyperparameters, see the CNN Explainer website.
FashionMNIST -> CNN
Output layer outputs
predictions

[[0.00, 0.62, 0.44…],

[0.00, 0.03, 0.27…],
[0.01, 0.78, 0.07…],
🥾
[0.21, 0.34, 0.00…],
[0.91, 0.66, 0.81…],
[0.90, 0.55, 0.99…],
👕
👖
[0.00, 0.22, 0.57…],
…,

👡
Numerical Layers learn numerical
Inputs
encoding representation 👗

…
Keep going until number
of classes is fulfilled
torchvision.transforms
torch.utils.data.Dataset
torch.save
torch.utils.data.DataLoader torchmetrics torch.load

torch.optim torch.nn torch.utils.tensorboard

torch.nn.Module
torchvision.models

See more: https://pytorch.org/tutorials/beginner/ptcheat.html

What is overfitting?
Over tting — when a model over learns patterns in a particular dataset and isn’t able to
generalise to unseen data.

For example, a student who studies the course materials too hard and then isn’t able to perform
well on the nal exam. Or tries to put their knowledge into practice at the workplace and nds
what they learned has nothing to do with the real world.

Under tting Balanced Over tting

(goldilocks zone)
fi
fi
fi
fi
fi
Improving a model (from a model’s perspective)

Smaller model

Common ways to improve a deep model:

• Adding layers
• Increase the number of hidden units
• Change/add activation functions Larger model
• Change the optimization function
• Change the learning rate (because you can alter each of
•
these, they’re hyperparameters)
Fitting for longer
Improving a model (from a data perspective)

Method to improve a model

What does it do?
(reduce over tting)

Gives a model more of a chance to learn patterns between samples

More data (e.g. if a model is performing poorly on images of pizza, show it more
images of pizza).

Increase the diversity of your training dataset without collecting more

data (e.g. take your photos of pizza and randomly rotate them 30°).
Data augmentation
Increased diversity forces a model to learn more generalisation
patterns.

Not all data samples are created equally. Removing poor samples
Better data from or adding better samples to your dataset can improve your
model’s performance.

Take a model’s pre-learned patterns from one problem and tweak

Use transfer learning them to suit your own problem. For example, take a model trained on
pictures of cars to recognise pictures of trucks.
fi
What is data augmentation?
Looking at the same image but from di erent perspective(s)*.

Original Rotate Shift Zoom

*Note: There are many more di erent kinds of data augmentation such as, cropping, replacing, shearing. This slide only demonstrates a few.
ff
ff
Popular & useful computer vision
architectures: see torchvision.models
Release
Architecture Paper Use in PyTorch When to use
Date

A good backbone for

ResNet (residual https://arxiv.org/abs/
2015 torchvision.models.resnet… many computer vision
networks) 1512.03385
problems

Typically now better than

https://arxiv.org/abs/
E cientNet(s) 2019 torchvision.models.e cientnet… ResNets for computer
1905.11946
vision

https://arxiv.org/abs/ Transformer architecture

Vision Transformer (ViT) 2020 torchvision.models.vit_…
2010.11929 applied to vision

Lightweight architecture
https://arxiv.org/abs/
MobileNet(s) 2017 torchvision.models.mobilenet… suitable for devices with
1704.04861
less computing power
ffi
ffi
The machine learning explorer’s
motto
“Visualize, visualize, visualize”
Data

Model It’s a good idea to visualize

these as often as possible.

Training

Predictions
The machine learning practitioner’s
motto

“Experiment, experiment, experiment”

👩🍳 👩🔬
(try lots of things an
d see what
tastes good)

Skoda Superb 1.8 Tsi: (Engine Codes CDAA, BZB)
100% (2)
Skoda Superb 1.8 Tsi: (Engine Codes CDAA, BZB)
16 pages
03 Convolution Neural Networks and Computer Vision With Tensorflow
No ratings yet
03 Convolution Neural Networks and Computer Vision With Tensorflow
21 pages
Rec03 - Deep Architectures
No ratings yet
Rec03 - Deep Architectures
65 pages
CS601 Machine Learning Unit 3
No ratings yet
CS601 Machine Learning Unit 3
47 pages
Lec5 CNN RNN Attention
No ratings yet
Lec5 CNN RNN Attention
71 pages
Week 6 Unsupervised Learning
No ratings yet
Week 6 Unsupervised Learning
60 pages
Lect11 Neural Nets2
No ratings yet
Lect11 Neural Nets2
48 pages
Lab 5 - Intro To Convolutional Neural Networks
No ratings yet
Lab 5 - Intro To Convolutional Neural Networks
52 pages
Train your image classifier model with PyTorch
No ratings yet
Train your image classifier model with PyTorch
6 pages
CNN with TensorFlow and Keras
No ratings yet
CNN with TensorFlow and Keras
11 pages
5-Convolutional Neural Network
No ratings yet
5-Convolutional Neural Network
43 pages
Introduction to Deep Learning
No ratings yet
Introduction to Deep Learning
47 pages
Convolutional Neural Networks: Convolutions, Pooling and Cnns. Neural Architectures For Computer Vision
No ratings yet
Convolutional Neural Networks: Convolutions, Pooling and Cnns. Neural Architectures For Computer Vision
64 pages
CO2_CNN_3
No ratings yet
CO2_CNN_3
31 pages
DL_NN3
No ratings yet
DL_NN3
5 pages
Deep Learning Unit 4
No ratings yet
Deep Learning Unit 4
11 pages
CVlecture 5
No ratings yet
CVlecture 5
56 pages
UNIT 2 Self Notes
No ratings yet
UNIT 2 Self Notes
10 pages
Convolutional Neural Networks
No ratings yet
Convolutional Neural Networks
8 pages
Step by Step Procedure That How I Resolve Given Task Pytorh
No ratings yet
Step by Step Procedure That How I Resolve Given Task Pytorh
6 pages
PNAL9_CNNs
No ratings yet
PNAL9_CNNs
61 pages
Convolutional Neural Networks Notes
No ratings yet
Convolutional Neural Networks Notes
29 pages
CNN Architecture
No ratings yet
CNN Architecture
24 pages
Super VIP Cheetsheet - Deep Learning, AI, ML
No ratings yet
Super VIP Cheetsheet - Deep Learning, AI, ML
47 pages
CNN and Autoencoder
No ratings yet
CNN and Autoencoder
56 pages
Lecture_3
No ratings yet
Lecture_3
48 pages
Convolutional Neural Networks in Python _ DataCamp
No ratings yet
Convolutional Neural Networks in Python _ DataCamp
22 pages
Deep Learning Notes For Easy Access
No ratings yet
Deep Learning Notes For Easy Access
14 pages
Unit III
No ratings yet
Unit III
89 pages
Building A Convolutional Neural Network Using Tensorflow Keras
No ratings yet
Building A Convolutional Neural Network Using Tensorflow Keras
10 pages
Unit 4a - Convolutional Neural Networks
No ratings yet
Unit 4a - Convolutional Neural Networks
107 pages
Computer Vision Projects with PyTorch: Design and Develop Production-Grade Models 1st Edition Akshay Kulkarni 2024 Scribd Download
100% (4)
Computer Vision Projects with PyTorch: Design and Develop Production-Grade Models 1st Edition Akshay Kulkarni 2024 Scribd Download
40 pages
Convolutional Neural Networks - Part 1
No ratings yet
Convolutional Neural Networks - Part 1
44 pages
Convolutional Neural Networks : Covnets
No ratings yet
Convolutional Neural Networks : Covnets
22 pages
Cnnbasics 171028092801
No ratings yet
Cnnbasics 171028092801
43 pages
Cnn
No ratings yet
Cnn
73 pages
Convolutional Neural Network
No ratings yet
Convolutional Neural Network
37 pages
یادگیری پایتورچ
No ratings yet
یادگیری پایتورچ
30 pages
CS 601 Machine Learning Unit 3
No ratings yet
CS 601 Machine Learning Unit 3
37 pages
Seminar Report cnn1
No ratings yet
Seminar Report cnn1
23 pages
Project Exhibition 2
No ratings yet
Project Exhibition 2
42 pages
Unit IV Deep Leraning
No ratings yet
Unit IV Deep Leraning
35 pages
Lecture 26-30 Unit 2
No ratings yet
Lecture 26-30 Unit 2
20 pages
Computer Vision Projects with PyTorch: Design and Develop Production-Grade Models 1st Edition Akshay Kulkarni - Quickly download the ebook to start your content journey
100% (1)
Computer Vision Projects with PyTorch: Design and Develop Production-Grade Models 1st Edition Akshay Kulkarni - Quickly download the ebook to start your content journey
76 pages
Guddu jha_organized
No ratings yet
Guddu jha_organized
3 pages
Introduction To Convolutional Neural Network (CNN) Using Tensorflow - by Govinda Dumane - Towards Data Science
No ratings yet
Introduction To Convolutional Neural Network (CNN) Using Tensorflow - by Govinda Dumane - Towards Data Science
17 pages
Super VIP Cheatsheet - Deep Learning
No ratings yet
Super VIP Cheatsheet - Deep Learning
47 pages
02 Cnn Slides
No ratings yet
02 Cnn Slides
77 pages
Introduction To Convolution Neural Network
No ratings yet
Introduction To Convolution Neural Network
6 pages
ML Lec 13 CNN
No ratings yet
ML Lec 13 CNN
44 pages
Unit III
No ratings yet
Unit III
89 pages
02 - Introduction to Convolutional Neural Networks (CNNs)
No ratings yet
02 - Introduction to Convolutional Neural Networks (CNNs)
28 pages
4a Convolutional Neural Networks
No ratings yet
4a Convolutional Neural Networks
56 pages
CNN
No ratings yet
CNN
8 pages
Deep Learning
No ratings yet
Deep Learning
17 pages
Advanced DL Computer Vision
No ratings yet
Advanced DL Computer Vision
10 pages
unit-3-CNN-2024
No ratings yet
unit-3-CNN-2024
58 pages
UNIT-III DLL full unit
No ratings yet
UNIT-III DLL full unit
63 pages
Module 3 Notes
No ratings yet
Module 3 Notes
22 pages
Week 7
No ratings yet
Week 7
24 pages
20-Minute (Or Less) Animation Hacks
From Everand
20-Minute (Or Less) Animation Hacks
Sheela Preuitt
No ratings yet
Market Structures and Pricing Decisions
No ratings yet
Market Structures and Pricing Decisions
24 pages
Hanan Mohammed
No ratings yet
Hanan Mohammed
83 pages
Interested Party Analysis
No ratings yet
Interested Party Analysis
3 pages
What CPU Should I Buy v3
No ratings yet
What CPU Should I Buy v3
12 pages
Traffic Investigation For Review
100% (1)
Traffic Investigation For Review
212 pages
GAD Planning and Budgeting Short Orientation Baungon
100% (2)
GAD Planning and Budgeting Short Orientation Baungon
28 pages
Documentsthe Jungle Book
No ratings yet
Documentsthe Jungle Book
27 pages
Commander Install PDF
No ratings yet
Commander Install PDF
38 pages
RSTAR-Affinity-Gateway-Spec-Sheet-2024_Digital
No ratings yet
RSTAR-Affinity-Gateway-Spec-Sheet-2024_Digital
1 page
Human Impact On The Environment Posters
No ratings yet
Human Impact On The Environment Posters
4 pages
National University OF Modern Languages: Project
No ratings yet
National University OF Modern Languages: Project
5 pages
MC14539
No ratings yet
MC14539
6 pages
Inspection and Testing Requirements
No ratings yet
Inspection and Testing Requirements
10 pages
List Plugin 2017
No ratings yet
List Plugin 2017
64 pages
s0400 Ad Urm 010 Tum (Revision 7), Tag Out Users Manual
No ratings yet
s0400 Ad Urm 010 Tum (Revision 7), Tag Out Users Manual
91 pages
Earnings Quality Score 90: Nike Inc - Balance Sheet 25-Mar-2022 19:25
No ratings yet
Earnings Quality Score 90: Nike Inc - Balance Sheet 25-Mar-2022 19:25
12 pages
Microsoft Word - 130601
No ratings yet
Microsoft Word - 130601
2 pages
Reservoir Drive Mechanisms
No ratings yet
Reservoir Drive Mechanisms
30 pages
Innovative Technology
No ratings yet
Innovative Technology
18 pages
The Official Guide For GMAT Review, 7th
0% (1)
The Official Guide For GMAT Review, 7th
37 pages
All chapter download Quality Management For Organizational Excellence Introduction To Total Quality 7th Edition Goetsch Solutions Manual
100% (5)
All chapter download Quality Management For Organizational Excellence Introduction To Total Quality 7th Edition Goetsch Solutions Manual
25 pages
Lumiqued vs. Exevea, 282 SCRA 125 (1997)
No ratings yet
Lumiqued vs. Exevea, 282 SCRA 125 (1997)
9 pages
One bank limited deposit product features
No ratings yet
One bank limited deposit product features
34 pages
Shelf Life Extention of Banana
100% (1)
Shelf Life Extention of Banana
17 pages
Dq0 Transform - Open Electrical
No ratings yet
Dq0 Transform - Open Electrical
5 pages
AX27 - 50u-5f Service Manual
0% (1)
AX27 - 50u-5f Service Manual
8 pages
Windfalljaya
No ratings yet
Windfalljaya
35 pages
Grade 6 Performance Task Tle
No ratings yet
Grade 6 Performance Task Tle
6 pages
Final PPT
No ratings yet
Final PPT
32 pages

03_pytorch_computer_vision

Uploaded by

03_pytorch_computer_vision

Uploaded by

Computer Vision & Convolutional

• Follow along with the code

Binary classi cation t h er )

Multiclass classi cation

Source: AI Drivr YouTube channel.

• Architecture of a convolutional neural network (CNN) with PyTorch

• An end-to-end multi-class image classi cation problem

• Steps in modelling with CNNs in PyTorch

• Creating a CNN model with PyTorch

• Picking a loss and optimizer

• Training a PyTorch computer vision model

W = 224 224 Sushi 🍣

(gets represented as a tens

(colour channels last)

Shape = [None, 28, 28, 1] (NHWC)

“What type of clothing is in

(colour channels last)

Shape = [None, 28, 28, 1] (NHWC)

(what we’re working towa

Source: CNN Explainer website, architecture is known as TinyVGG.

Hyperparameter name What does it do? Typical values

De nes the number output channels of the layer (could

kernel_size (also referred to as 3, 5, 7 (lowers values learn smaller

The number of steps a lter takes across an image at a

Pads the target tensor with zeroes (if “same”) to preserve

[[0.00, 0.62, 0.44…],

torch.optim torch.nn torch.utils.tensorboard

See more: https://pytorch.org/tutorials/beginner/ptcheat.html

Under tting Balanced Over tting

Common ways to improve a deep model:

Method to improve a model

Gives a model more of a chance to learn patterns between samples

Increase the diversity of your training dataset without collecting more

Take a model’s pre-learned patterns from one problem and tweak

Original Rotate Shift Zoom

A good backbone for

Typically now better than

https://arxiv.org/abs/ Transformer architecture

Model It’s a good idea to visualize

“Experiment, experiment, experiment”

You might also like