0% found this document useful (0 votes)

19 views27 pages

Tutorial4 - Image Classification A

Uploaded by

Aryaman Mani

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

19 views27 pages

Tutorial4 - Image Classification A

Uploaded by

Aryaman Mani

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 27

Tutorial 4: Dataset preparation for image classification tasks

&
Loading and evaluation of pre-trained image classification models

CHEN JIELIN
Department of Architecture, National University of Singapore
Image Classification Task

Image Classification is a fundamental task in vision recognition that aims to understand and categorize an image as a
whole under a specific label, and it typically pertains to single-object images.

https://paperswithcode.com/task/image-classification
https://vitalflux.com/difference-binary-multi-class-multi-label-classification/#:~:text=Multiclass%20Classification%20is%20where%20each,labels%20to%20each%20data%20sample.
Image Classification Task: Binary vs Multi-Class vs Multi-Label

https://medium.com/@saugata.paul1010/a-detailed-case-study-on-multi-label-classification-with-machine-learning-algorithms-and-72031742c9aa
Image Classification Task: Binary vs Multi-Class vs Multi-Label

To ensure only one class is

selected each time, we apply the In the case where multi-label
Softmax Activation Function at classification is needed, we use
the last layer: the sum of the multiple sigmoids on the last
probabilities of each class is 1 layer and thus learn a separate
distribution for each class: the
probability of each class is
independent from each other

https://www.analyticsvidhya.com/blog/2021/07/demystifying-the-difference-between-multi-class-and-multi-label-classification-problem-statements-in-deep-learning/
Existing open-sourced large-scale datasets for image classification task
Some example datasets for image classification

ImageNet: contains 14,197,122 annotated

images. The average image resolution on ImageNet is
469x387 pixels. The publicly released dataset contains
a set of manually annotated training images. A set of
test images is also released, with the manual
annotations withheld.

https://paperswithcode.com/datasets?task=image-classification
Existing open-sourced large-scale datasets for image classification task
Some example datasets for image classification

CIFAR-10: 60000 32x32 color images. The images are

labelled with one of 10 mutually exclusive classes: airplane,
automobile, bird, cat, deer, dog, frog, horse, ship, and truck. There
are 6000 images per class with 5000 training and 1000 testing
images per class.

MNIST: a large collection of handwritten digits. It

has a training set of 60,000 examples, and a test set
of 10,000 examples. Each image is a crude 28 x 28
(784 pixels) handwritten digit from "0" to "9." Each pixel
value is a grayscale integer between 0 and 255.

Fashion-MNIST: a dataset comprising of 28×28

grayscale images of 70,000 fashion products from 10
categories, with 7,000 images per category. The training
set has 60,000 images and the test set has 10,000 images.
Fashion-MNIST shares the same image size, data format
and the structure of training and testing splits with the
original MNIST.
https://paperswithcode.com/datasets?task=image-classification
For classification tasks, we need to split the original dataset into
subsets for training and testing
(This also applies to all discriminative tasks/supervised learning)

There are different practical

methods for splitting the dataset,
the most common ones are 80:20,
70:30, or 90:10, depending on the
size of your original dataset

Adopted from “https://labelyourdata.com/articles/machine-learning-and-training-data”

Feature Engineering: One-Hot Encoding for categorical or string-based
labels

Since every machine learning or deep learning model

requires exact mathematical and statistical
computation, which can only be achieved with
numerical data type, thus while building a machine
learning or deep learning models, we need to convert
our data into int/float based values.

One-Hot Encoding is frequently opted to achieve this 0

purpose. It is a process by which categorical or
string-based labels are converted into 1s and 0s. New 1
columns will be introduced in this process; the number
of columns depends upon the categorical values in the 2
original column.
3
There is no correlation between the values of any pair
of data points in any newly generated columns, which 4
is desirable as it is normally assumed that all data
points are mutually independent from each other.
Annotated Image Database of Architecture (AIDA)
● A repository of building imagery with high-diversity and high-coverage
retrieved from the professional architectural website Archdaily®

● Each image is annotated with ground truth architectural category labels and
scene labels

● 14,659 images, 25 architecture categories, The number of images in each

architectural category of each scene class varies from 20 to 1,400.

● 11,730 images from the dataset are randomly selected for training and 2,929
for testing.

Chen, J., Stouffs, R., & Biljecki, F. (2021). Hierarchical (multi-label) architectural image recognition and classification. In PROJECTIONS, Proceedings of the 26th International Conference of the Association for Computer-Aided Architectural Design Research in Asia
(CAADRIA) 2021 (pp. 161-170).
Hands-on Exercise of Image Dataset Preparation
Architectural Image Classification
Models using AIDA

input
image

outdoor
scene category indoor
street-level

architectural
category houses school ... cinema

Chen, J., Stouffs, R., & Biljecki, F. (2021). Hierarchical (multi-label) architectural image recognition and classification. In PROJECTIONS, Proceedings of the 26th International Conference of the Association for Computer-Aided Architectural Design Research in Asia
(CAADRIA) 2021 (pp. 161-170).
Architectural Image Classification Models using AIDA

(Convolutional Neural Network)

Pooling Pooling

In simple word what CNN does is, it extract the feature of image and convert it into lower
dimension without losing its characteristics. In CNN, the hidden layers include one or more layers
that perform convolutions for learning feature engineering by the model itself with convolution
kernels. As the convolution kernel slides along the input matrix for the layer, the convolution
operation generates a feature map, which in turn contributes to the input of the next layer. This is
followed by other layers such as pooling layers or fully connected layers.

LeCun, Y., Boser, B., Denker, J. S., Henderson, D., Howard, R. E., Hubbard, W., & Jackel, L. D. (1989). Backpropagation applied to handwritten zip code recognition. Neural computation, 1(4), 541-551.
https://towardsdatascience.com/types-of-convolutions-in-deep-learning-717013397f4d
https://www.superannotate.com/blog/guide-to-convolutional-neural-networks
Convolutional Neural Network (CNN)

Pooling Pooling

Pooling layers are used to reduce the spatial size of the feature maps while preserving important
information. This reduces the computational cost of the network and helps to prevent overfitting.

Pooling Pooling

In simple word what CNN does is, it extract the feature of image and convert it into lower dimension
without losing its characteristics. In CNN, the hidden layers include one or more layers that perform
convolutions for learning feature engineering by the model itself with convolution kernels. As the
convolution kernel slides along the input matrix for the layer, the convolution operation generates
a feature map, which in turn contributes to the input of the next layer. This is followed by other
layers such as pooling layers or fully connected layers.

Pooling layers are used to reduce the spatial size of the feature maps while preserving important
information. This reduces the computational cost of the network and helps to prevent overfitting.

Fully connected layer is applied on the feature map at the end to map learned features into a
chosen number of classes.

Pooling Pooling
Image Kernels explained visually:
https://setosa.io/ev/image-kernels/

A convolution kernel is a small matrix used to apply effects like the ones you might find in
Photoshop, like blurring, sharpening, or embossing. They're used in CNNs for 'feature extraction'.
In this context the process is referred to as "convolution". The values of the kernels are iteratively
updated during training of CNN models.

Stride denotes how many steps we are moving in

each steps in convolution; this animation shows a
stride size of one.

Pooling Pooling

To maintain the dimension of output as in input , we use

A convolution kernel is a small matrix used to apply effects like the ones you might find in padding. Padding is a process of adding zeros around the
input matrix symmetrically. After applying padding we will
Photoshop, like blurring, sharpening, or embossing. They're used in CNNs for 'feature extraction'. get the same dimension as the original input
In this context the process is referred to as "convolution". The values of the kernels are iteratively
updated during training of CNN models.

Classical CNNs are not able to scale to a large number of layers, as they face the “vanishing
gradient” problem (with too many layers, repeated multiplications will eventually reduce the
gradient until it “disappears”). ResNets provides a solution to the vanishing gradient problem
by adding “skip connections” between every two or three layers

He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 770-778).
ResNeXt

ResNeXt is an alternative model based on

the ResNet design, which adds another
dimension of cardinality in the form of the
independent path number, and the Increased
cardinality has been found to help the
network go wider or deeper

A block of ResNet A block of ResNeXt with cardinality = 32

Xie, S., Girshick, R., Dollár, P., Tu, Z., & He, K. (2017). Aggregated residual transformations for deep neural networks. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 1492-1500).
DenseNet

A ResNet variation, which attempts to

resolve the issue of vanishing gradients
by creating more connections. The
authors of DenseNet ensured the
maximum flow of information between the
network layers by connecting each layer
directly to all the others, and by allowing
every layer to obtain additional inputs
from its preceding layers and passing on
the feature map to subsequent layers

Huang, G., Liu, Z., Van Der Maaten, L., & Weinberger, K. Q. (2017). Densely connected convolutional networks. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 4700-4708).
Interpreting performance of image classification models:
Gradient-weighted Class Activation Mapping (Grad-CAM)

Visual explanations: making Convolutional Neural Network (CNN)-based models more transparent by
visualizing the regions of input that are “important” for predictions from these models.

https://medium.com/@mohamedchetoui/grad-cam-gradient-weighted-class-activation-mapping-ffd72742243a
Selvaraju, R. R., Cogswell, M., Das, A., Vedantam, R., Parikh, D., & Batra, D. (2017). Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision (pp. 618-626).
Use Grad-CAM to interpreting performance of architectural image
classification model

Chen, J., Stouffs, R., & Biljecki, F. (2021). Hierarchical (multi-label) architectural image recognition and classification. In PROJECTIONS, Proceedings of the 26th International Conference of the Association for Computer-Aided Architectural Design Research in Asia (CAADRIA) 2021 (pp. 161-170).
Hands-on Exercise of Image Classification Model
Loading and Evaluation
Assignment 3: Individual work (10% of final grade)

For this assignment, you can choose one of the two following tasks:

1. Choose a target design website suitable for collecting image datasets for training an image
classification model related to your field of design practice (architecture, landscape architecture,
industrial design, etc.), and construct an image dataset using one of the introduced data crawling
and pre-processing approaches from tutorials 3&4. Write a report with at least 500 words (one or
two paragraphs), with screenshots or illustrations of your image dataset construction process.

2. Use the two pre-trained architectural image classification models (densenet161 or resnext101) to
classify 15 randomly selected architectural images from the test set of AIDA respectively. Analyse
and compare the classification results of two models in terms of accuracy, and use the Grad-CAM
visualization tool to interpret the model performance. Write a report based on your analysis. The
report should contain at least 500 words (one or two paragraphs), with comparison charts and
corresponding Grad-CAM visualization results.

Please note that the next assignment will allow you to train your own image classifier model, either
based on the AIDA dataset or the image dataset from task 1 above.
Assignment assessment criteria (10% of final grade)

Completeness: Make sure your report is complete with respect to the

assignment requirements.

Critical thinking is expected: If you choose the second task, we will look into
the width and depth of your thinking concerning the performance of the
classification models in terms of design practice.

Image Recognition Using CNN
0% (1)
Image Recognition Using CNN
12 pages
Cat and Dog Classification Using CNN Fin
No ratings yet
Cat and Dog Classification Using CNN Fin
34 pages
Mnist Handwritten Digit Classification
No ratings yet
Mnist Handwritten Digit Classification
26 pages
Laptop Motherboard Guide
No ratings yet
Laptop Motherboard Guide
6 pages
Made Easy Cse Workbook
67% (6)
Made Easy Cse Workbook
41 pages
Group B Deep Learning Assignment No: 3B: Categories
No ratings yet
Group B Deep Learning Assignment No: 3B: Categories
13 pages
CS231n Convolutional Neural Networks For Visual Recognition
No ratings yet
CS231n Convolutional Neural Networks For Visual Recognition
1 page
Object Recog
No ratings yet
Object Recog
102 pages
2 Convolutional Neural Network For Image Classification
No ratings yet
2 Convolutional Neural Network For Image Classification
6 pages
Exer8 TresMarias
No ratings yet
Exer8 TresMarias
3 pages
CNN Model For Image Classification Using Resnet: Dr. Senbagavalli M & Swetha Shekarappa G
No ratings yet
CNN Model For Image Classification Using Resnet: Dr. Senbagavalli M & Swetha Shekarappa G
10 pages
Exp 9 DL
No ratings yet
Exp 9 DL
5 pages
Image Recognition Using Machine Learning Research Paper
No ratings yet
Image Recognition Using Machine Learning Research Paper
5 pages
Transfer Learning Based Image Visualization Using CNN
No ratings yet
Transfer Learning Based Image Visualization Using CNN
9 pages
Object Detection Using Convolutional Neural Network Transfer Learning
No ratings yet
Object Detection Using Convolutional Neural Network Transfer Learning
11 pages
Understanding House Numbers For Delivery Robots-2024
No ratings yet
Understanding House Numbers For Delivery Robots-2024
8 pages
Deeplearning - Ai Deeplearning - Ai
No ratings yet
Deeplearning - Ai Deeplearning - Ai
73 pages
Unit 4
No ratings yet
Unit 4
19 pages
Lab DigitRecognitionMINST
No ratings yet
Lab DigitRecognitionMINST
10 pages
Fruit Old
No ratings yet
Fruit Old
37 pages
An Introduction To Image Classification (Klaus D Toennis) (Z-Library)
No ratings yet
An Introduction To Image Classification (Klaus D Toennis) (Z-Library)
297 pages
L10 Image Classification
No ratings yet
L10 Image Classification
10 pages
CV - T3 - Unit-7
No ratings yet
CV - T3 - Unit-7
36 pages
4 100593163merged
No ratings yet
4 100593163merged
11 pages
Bundled
No ratings yet
Bundled
12 pages
Mango Classification Using Convolutional Neural Networks
No ratings yet
Mango Classification Using Convolutional Neural Networks
3 pages
Make 04 00002 v2
No ratings yet
Make 04 00002 v2
20 pages
Module V-Deep Learning
No ratings yet
Module V-Deep Learning
19 pages
Feature Extraction Using Convolution Neural Networks (CNN) and Deep Learning
No ratings yet
Feature Extraction Using Convolution Neural Networks (CNN) and Deep Learning
5 pages
Cats and Dogs Classification
No ratings yet
Cats and Dogs Classification
12 pages
2 Deep Learning in Image Classification A Survey Report
No ratings yet
2 Deep Learning in Image Classification A Survey Report
4 pages
Efficacy of Deep Learning Algorithms in Detecting Lung Cancer
No ratings yet
Efficacy of Deep Learning Algorithms in Detecting Lung Cancer
6 pages
An Introduction To Convolutional Neural Networks
No ratings yet
An Introduction To Convolutional Neural Networks
7 pages
"I C U N N ": Mage Lassification Sing Eural Etworks
No ratings yet
"I C U N N ": Mage Lassification Sing Eural Etworks
15 pages
1.convolutional Neural Networks For Image Classification
No ratings yet
1.convolutional Neural Networks For Image Classification
11 pages
Image Category Classification Using Deep Learning
No ratings yet
Image Category Classification Using Deep Learning
11 pages
Sampath Et Al. - 2021 - A Survey On Generative Adversarial Networks For Im
No ratings yet
Sampath Et Al. - 2021 - A Survey On Generative Adversarial Networks For Im
60 pages
Image Categorization Using CNN pt2
No ratings yet
Image Categorization Using CNN pt2
25 pages
Mandar Sat Vil Kar
No ratings yet
Mandar Sat Vil Kar
23 pages
Summary
No ratings yet
Summary
36 pages
MNIST Dataset
No ratings yet
MNIST Dataset
12 pages
Cnnbasics 171028092801
No ratings yet
Cnnbasics 171028092801
43 pages
Construction of CNN Model Based On Hard-Assigned Coding of Image Features
No ratings yet
Construction of CNN Model Based On Hard-Assigned Coding of Image Features
5 pages
Admin,+4554 Article+Text 17736 2 10 20210928
No ratings yet
Admin,+4554 Article+Text 17736 2 10 20210928
13 pages
DLT Record Final
No ratings yet
DLT Record Final
120 pages
Image Recognition in Self-Driving Cars Using CNN
No ratings yet
Image Recognition in Self-Driving Cars Using CNN
7 pages
Facial Emotion Detection
No ratings yet
Facial Emotion Detection
10 pages
Irjet V10i1067
No ratings yet
Irjet V10i1067
5 pages
SoS'25 Midterm - Report
No ratings yet
SoS'25 Midterm - Report
14 pages
Convolutional Networks 2024
No ratings yet
Convolutional Networks 2024
44 pages
DIP Mini Project
100% (1)
DIP Mini Project
12 pages
'AI & Machine Vision Coursework Implementation of Deep Learning For Classification of Natural Images
No ratings yet
'AI & Machine Vision Coursework Implementation of Deep Learning For Classification of Natural Images
13 pages
CS601 Machine Learning Unit 3
No ratings yet
CS601 Machine Learning Unit 3
47 pages
1 s2.0 S0925231221009486 Main
No ratings yet
1 s2.0 S0925231221009486 Main
7 pages
Image Classification Using Pre-Trained Convolutional Neural Network in COLAB
No ratings yet
Image Classification Using Pre-Trained Convolutional Neural Network in COLAB
6 pages
V25I0108
No ratings yet
V25I0108
7 pages
Idap 2019 8875953
No ratings yet
Idap 2019 8875953
6 pages
RESNET
No ratings yet
RESNET
5 pages
I JP Am Image Classification
No ratings yet
I JP Am Image Classification
15 pages
Deep 2
No ratings yet
Deep 2
57 pages
Convolutional Neural Network For Satellite Image Classification
100% (1)
Convolutional Neural Network For Satellite Image Classification
14 pages
Machine Learning - Advanced Concepts
From Everand
Machine Learning - Advanced Concepts
Derrick Mwiti
No ratings yet
Project Management in Oil and Gas Projects
100% (3)
Project Management in Oil and Gas Projects
58 pages
ResearchGateDecisionMakingonConsumerPurchases Onlinevs - In-Store
No ratings yet
ResearchGateDecisionMakingonConsumerPurchases Onlinevs - In-Store
5 pages
CRSP US Total Market
No ratings yet
CRSP US Total Market
170 pages
Test Admission Ticket
No ratings yet
Test Admission Ticket
2 pages
E-Portfolio Lesson Plan: QR Code and Youtube Clip: General Information
No ratings yet
E-Portfolio Lesson Plan: QR Code and Youtube Clip: General Information
4 pages
Static Probability
No ratings yet
Static Probability
6 pages
Zebra Printer Setting New
No ratings yet
Zebra Printer Setting New
4 pages
Cognizant Coding Questions
No ratings yet
Cognizant Coding Questions
9 pages
1 C Programming Arrays Download PDF
No ratings yet
1 C Programming Arrays Download PDF
3 pages
Magnetic Induction and Electric Potential PDF
No ratings yet
Magnetic Induction and Electric Potential PDF
60 pages
SCU Description (Support Control Unit)
No ratings yet
SCU Description (Support Control Unit)
20 pages
Gate Academy - Control System
No ratings yet
Gate Academy - Control System
108 pages
One-To-One and Inverse Functions
No ratings yet
One-To-One and Inverse Functions
18 pages
P-660HW-T1 v3 - 1
No ratings yet
P-660HW-T1 v3 - 1
2 pages
Mechanics For Complaints
No ratings yet
Mechanics For Complaints
4 pages
AVL Trees in Java
No ratings yet
AVL Trees in Java
7 pages
Qif3.0 2018 Ansi
No ratings yet
Qif3.0 2018 Ansi
563 pages
Question Bank Mma
No ratings yet
Question Bank Mma
27 pages
Delta Temperature Controller DTE Catalogue
No ratings yet
Delta Temperature Controller DTE Catalogue
32 pages
Message Authentication Codes
No ratings yet
Message Authentication Codes
18 pages
An Interactive E
100% (1)
An Interactive E
19 pages
Unit IV
No ratings yet
Unit IV
17 pages
unit-5-PGP and SMIME
No ratings yet
unit-5-PGP and SMIME
42 pages
Fluke 68X Lanmeter
No ratings yet
Fluke 68X Lanmeter
12 pages
Gigaset N870 IP PRO - Multicell System - en - INT
No ratings yet
Gigaset N870 IP PRO - Multicell System - en - INT
122 pages
Consumer Buying Behaviour in Smartphones Shaikh Farheen: Journal of Management/Showtoc
No ratings yet
Consumer Buying Behaviour in Smartphones Shaikh Farheen: Journal of Management/Showtoc
10 pages
Gds AMADEUS
No ratings yet
Gds AMADEUS
46 pages
AI For Managers - Assignment: Startups
No ratings yet
AI For Managers - Assignment: Startups
5 pages

Tutorial4 - Image Classification A

Uploaded by

Tutorial4 - Image Classification A

Uploaded by

Tutorial 4: Dataset preparation for image classification tasks

To ensure only one class is

ImageNet: contains 14,197,122 annotated

CIFAR-10: 60000 32x32 color images. The images are

MNIST: a large collection of handwritten digits. It

Fashion-MNIST: a dataset comprising of 28×28

There are different practical

Adopted from “https://labelyourdata.com/articles/machine-learning-and-training-data”

Since every machine learning or deep learning model

One-Hot Encoding is frequently opted to achieve this 0

● 14,659 images, 25 architecture categories, The number of images in each

(Convolutional Neural Network)

Stride denotes how many steps we are moving in

To maintain the dimension of output as in input , we use

ResNeXt is an alternative model based on

A block of ResNet A block of ResNeXt with cardinality = 32

A ResNet variation, which attempts to

Completeness: Make sure your report is complete with respect to the

You might also like