Tutorial4 - Image Classification A
Tutorial4 - Image Classification A
&
Loading and evaluation of pre-trained image classification models
CHEN JIELIN
Department of Architecture, National University of Singapore
Image Classification Task
Image Classification is a fundamental task in vision recognition that aims to understand and categorize an image as a
whole under a specific label, and it typically pertains to single-object images.
https://paperswithcode.com/task/image-classification
https://vitalflux.com/difference-binary-multi-class-multi-label-classification/#:~:text=Multiclass%20Classification%20is%20where%20each,labels%20to%20each%20data%20sample.
Image Classification Task: Binary vs Multi-Class vs Multi-Label
https://medium.com/@saugata.paul1010/a-detailed-case-study-on-multi-label-classification-with-machine-learning-algorithms-and-72031742c9aa
Image Classification Task: Binary vs Multi-Class vs Multi-Label
https://www.analyticsvidhya.com/blog/2021/07/demystifying-the-difference-between-multi-class-and-multi-label-classification-problem-statements-in-deep-learning/
Existing open-sourced large-scale datasets for image classification task
Some example datasets for image classification
https://paperswithcode.com/datasets?task=image-classification
Existing open-sourced large-scale datasets for image classification task
Some example datasets for image classification
https://paperswithcode.com/datasets?task=image-classification
Existing open-sourced large-scale datasets for image classification task
Some example datasets for image classification
https://paperswithcode.com/datasets?task=image-classification
Existing open-sourced large-scale datasets for image classification task
Some example datasets for image classification
● Each image is annotated with ground truth architectural category labels and
scene labels
● 11,730 images from the dataset are randomly selected for training and 2,929
for testing.
Chen, J., Stouffs, R., & Biljecki, F. (2021). Hierarchical (multi-label) architectural image recognition and classification. In PROJECTIONS, Proceedings of the 26th International Conference of the Association for Computer-Aided Architectural Design Research in Asia
(CAADRIA) 2021 (pp. 161-170).
Hands-on Exercise of Image Dataset Preparation
Architectural Image Classification
Models using AIDA
input
image
outdoor
scene category indoor
street-level
architectural
category houses school ... cinema
Chen, J., Stouffs, R., & Biljecki, F. (2021). Hierarchical (multi-label) architectural image recognition and classification. In PROJECTIONS, Proceedings of the 26th International Conference of the Association for Computer-Aided Architectural Design Research in Asia
(CAADRIA) 2021 (pp. 161-170).
Architectural Image Classification Models using AIDA
Chen, J., Stouffs, R., & Biljecki, F. (2021). Hierarchical (multi-label) architectural image recognition and classification. In PROJECTIONS, Proceedings of the 26th International Conference of the Association for Computer-Aided Architectural Design Research in Asia
(CAADRIA) 2021 (pp. 161-170).
Convolutional Neural Network (CNN)
Pooling Pooling
In simple word what CNN does is, it extract the feature of image and convert it into lower
dimension without losing its characteristics. In CNN, the hidden layers include one or more layers
that perform convolutions for learning feature engineering by the model itself with convolution
kernels. As the convolution kernel slides along the input matrix for the layer, the convolution
operation generates a feature map, which in turn contributes to the input of the next layer. This is
followed by other layers such as pooling layers or fully connected layers.
LeCun, Y., Boser, B., Denker, J. S., Henderson, D., Howard, R. E., Hubbard, W., & Jackel, L. D. (1989). Backpropagation applied to handwritten zip code recognition. Neural computation, 1(4), 541-551.
https://towardsdatascience.com/types-of-convolutions-in-deep-learning-717013397f4d
https://www.superannotate.com/blog/guide-to-convolutional-neural-networks
Convolutional Neural Network (CNN)
Pooling Pooling
In simple word what CNN does is, it extract the feature of image and convert it into lower
dimension without losing its characteristics. In CNN, the hidden layers include one or more layers
that perform convolutions for learning feature engineering by the model itself with convolution
kernels. As the convolution kernel slides along the input matrix for the layer, the convolution
operation generates a feature map, which in turn contributes to the input of the next layer. This is
followed by other layers such as pooling layers or fully connected layers.
Pooling layers are used to reduce the spatial size of the feature maps while preserving important
information. This reduces the computational cost of the network and helps to prevent overfitting.
LeCun, Y., Boser, B., Denker, J. S., Henderson, D., Howard, R. E., Hubbard, W., & Jackel, L. D. (1989). Backpropagation applied to handwritten zip code recognition. Neural computation, 1(4), 541-551.
https://towardsdatascience.com/types-of-convolutions-in-deep-learning-717013397f4d
https://www.superannotate.com/blog/guide-to-convolutional-neural-networks
Convolutional Neural Network (CNN)
Pooling Pooling
In simple word what CNN does is, it extract the feature of image and convert it into lower dimension
without losing its characteristics. In CNN, the hidden layers include one or more layers that perform
convolutions for learning feature engineering by the model itself with convolution kernels. As the
convolution kernel slides along the input matrix for the layer, the convolution operation generates
a feature map, which in turn contributes to the input of the next layer. This is followed by other
layers such as pooling layers or fully connected layers.
Pooling layers are used to reduce the spatial size of the feature maps while preserving important
information. This reduces the computational cost of the network and helps to prevent overfitting.
Fully connected layer is applied on the feature map at the end to map learned features into a
chosen number of classes.
LeCun, Y., Boser, B., Denker, J. S., Henderson, D., Howard, R. E., Hubbard, W., & Jackel, L. D. (1989). Backpropagation applied to handwritten zip code recognition. Neural computation, 1(4), 541-551.
https://towardsdatascience.com/types-of-convolutions-in-deep-learning-717013397f4d
https://www.superannotate.com/blog/guide-to-convolutional-neural-networks
Convolutional Neural Network (CNN)
Pooling Pooling
Image Kernels explained visually:
https://setosa.io/ev/image-kernels/
A convolution kernel is a small matrix used to apply effects like the ones you might find in
Photoshop, like blurring, sharpening, or embossing. They're used in CNNs for 'feature extraction'.
In this context the process is referred to as "convolution". The values of the kernels are iteratively
updated during training of CNN models.
LeCun, Y., Boser, B., Denker, J. S., Henderson, D., Howard, R. E., Hubbard, W., & Jackel, L. D. (1989). Backpropagation applied to handwritten zip code recognition. Neural computation, 1(4), 541-551.
https://towardsdatascience.com/types-of-convolutions-in-deep-learning-717013397f4d
https://www.superannotate.com/blog/guide-to-convolutional-neural-networks
Convolutional Neural Network (CNN)
Pooling Pooling
LeCun, Y., Boser, B., Denker, J. S., Henderson, D., Howard, R. E., Hubbard, W., & Jackel, L. D. (1989). Backpropagation applied to handwritten zip code recognition. Neural computation, 1(4), 541-551.
https://towardsdatascience.com/types-of-convolutions-in-deep-learning-717013397f4d
https://www.superannotate.com/blog/guide-to-convolutional-neural-networks
ResNets
Classical CNNs are not able to scale to a large number of layers, as they face the “vanishing
gradient” problem (with too many layers, repeated multiplications will eventually reduce the
gradient until it “disappears”). ResNets provides a solution to the vanishing gradient problem
by adding “skip connections” between every two or three layers
He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 770-778).
ResNeXt
Xie, S., Girshick, R., Dollár, P., Tu, Z., & He, K. (2017). Aggregated residual transformations for deep neural networks. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 1492-1500).
DenseNet
Huang, G., Liu, Z., Van Der Maaten, L., & Weinberger, K. Q. (2017). Densely connected convolutional networks. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 4700-4708).
Interpreting performance of image classification models:
Gradient-weighted Class Activation Mapping (Grad-CAM)
Visual explanations: making Convolutional Neural Network (CNN)-based models more transparent by
visualizing the regions of input that are “important” for predictions from these models.
https://medium.com/@mohamedchetoui/grad-cam-gradient-weighted-class-activation-mapping-ffd72742243a
Selvaraju, R. R., Cogswell, M., Das, A., Vedantam, R., Parikh, D., & Batra, D. (2017). Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision (pp. 618-626).
Use Grad-CAM to interpreting performance of architectural image
classification model
Chen, J., Stouffs, R., & Biljecki, F. (2021). Hierarchical (multi-label) architectural image recognition and classification. In PROJECTIONS, Proceedings of the 26th International Conference of the Association for Computer-Aided Architectural Design Research in Asia (CAADRIA) 2021 (pp. 161-170).
Hands-on Exercise of Image Classification Model
Loading and Evaluation
Assignment 3: Individual work (10% of final grade)
For this assignment, you can choose one of the two following tasks:
1. Choose a target design website suitable for collecting image datasets for training an image
classification model related to your field of design practice (architecture, landscape architecture,
industrial design, etc.), and construct an image dataset using one of the introduced data crawling
and pre-processing approaches from tutorials 3&4. Write a report with at least 500 words (one or
two paragraphs), with screenshots or illustrations of your image dataset construction process.
2. Use the two pre-trained architectural image classification models (densenet161 or resnext101) to
classify 15 randomly selected architectural images from the test set of AIDA respectively. Analyse
and compare the classification results of two models in terms of accuracy, and use the Grad-CAM
visualization tool to interpret the model performance. Write a report based on your analysis. The
report should contain at least 500 words (one or two paragraphs), with comparison charts and
corresponding Grad-CAM visualization results.
Please note that the next assignment will allow you to train your own image classifier model, either
based on the AIDA dataset or the image dataset from task 1 above.
Assignment assessment criteria (10% of final grade)
Critical thinking is expected: If you choose the second task, we will look into
the width and depth of your thinking concerning the performance of the
classification models in terms of design practice.