0% found this document useful (0 votes)

14 views65 pages

Military AI-Week 05-AI in Computer Vision

Uploaded by

Adhi Kusumadjati

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

14 views65 pages

Military AI-Week 05-AI in Computer Vision

Uploaded by

Adhi Kusumadjati

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 65

AI in Computer Vision

Military Artificial Intelligence

Prof. Dr. Eng. Wisnu Jatmiko, S.T., M.Kom.
Dr. Ario Yudo Husodo, S.T., M.T.
Grafika Jati, S.Kom., M.Kom.
© Fasilkom UI - 2023
Discussion Topic

AI in Computer Vision
Presentation Outline
❖ The basic understanding of Convolutional Neural Network (CNN)
❖ The usage of CNN in CV
➢ Image Classification
➢ Semantic Segmentation
➢ Object Detection

❖ Practical use of CNN in CV

Basic Knowledge : Convolutional
Operation
The Convolution Operation

Image by Cmglee - Own work, CC BY-SA 3.0,

https://commons.wikimedia.org/w/index.php?curid=20206883
Discrete cross-correlation: 2-D example

1 0 0 1 2

0 0 0 3 0 0 0 1

0 1 2 1 1 0 2 0

1 1 3 0 0 1 1 0

3 0 0 0 1

6
Discrete cross-correlation: 2-D example
Can be viewed as a “sliding window” operation:
1 0
0 0
0 1
1 2

0 0
0 2
0 0
3 0 1 4 11

0 1
1 1
2 0
1 1 4 11 5

1 1 3 0 0 7 7 7

3 0 0 0 1

7
Discrete cross-correlation: 2-D example
Can be viewed as a “sliding window” operation:
1 0
0 0
0 1
1 2

0 0
0 2
0 0
3 0 1 4 11

0 1
1 1
2 0
1 1 4 11 5

1 1 3 0 0 7 7 7

3 0 0 0 1

8
Discrete cross-correlation: 2-D example
Can be viewed as a “sliding window” operation:
1 0 0
0 0
1 1
2

0 0 0
0 2
3 0
0 1 4 11

0 1 1
2 1
1 0
1 4 11 5

1 1 3 0 0 7 7 7

3 0 0 0 1

9
Discrete cross-correlation: 2-D example
Can be viewed as a “sliding window” operation:
1 0 0 0
1 0
2 1

0 0 0 0
3 2
0 0
1 4 11

0 1 2 1
1 1
1 0
4 11 5

1 1 3 0 0 7 7 7

3 0 0 0 1

10
Discrete cross-correlation: 2-D example
Can be viewed as a “sliding window” operation:
1 0 0 1 2

0 0
0 0
0 1
3 0 1 4 11

0 0
1 2
2 0
1 1 4 11 5

1 1
1 1
3 0
0 0 7 7 7

3 0 0 0 1

11
Discrete cross-correlation: 2-D example
Can be viewed as a “sliding window” operation:
1 0 0 1 2

0 0 0
0 0
3 1
0 1 4 11

0 1 0
2 2
1 0
1 4 11 5

1 1 1
3 1
0 0
0 7 7 7

3 0 0 0 1

12
Discrete cross-correlation: 2-D example
Can be viewed as a “sliding window” operation:
1 0 0 1 2

0 0 0 0
3 0
0 1
1 4 11

0 1 2 0
1 2
1 0
4 11 5

1 1 3 1
0 1
0 0
7 7 7

3 0 0 0 1

13
Discrete cross-correlation: 2-D example
Can be viewed as a “sliding window” operation:
1 0 0 1 2

0 0 0 3 0 1 4 11

0 0
1 0
2 1
1 1 4 11 5

1 0
1 2
3 0
0 0 7 7 7

3 1
0 1
0 0
0 1

14
Discrete cross-correlation: 2-D example
Can be viewed as a “sliding window” operation:
1 0 0 1 2

0 0 0 3 0 1 4 11

0 1 0
2 0
1 1
1 4 11 5

1 1 0
3 2
0 0
0 7 7 7

3 0 1
0 1
0 0
1

15
Discrete cross-correlation: 2-D example
Can be viewed as a “sliding window” operation:
1 0 0 1 2

0 0 0 3 0 1 4 11

0 1 2 0
1 0
1 1
4 11 5

1 1 3 0
0 2
0 0
7 7 7

3 0 0 1
0 1
1 0

16
Discrete cross-correlation: 2-D example

Often called the

feature map
1 0 0 1 2

0 0 0 3 0 0 0 1 1 4 11

0 1 2 1 1 0 2 0 4 11 5

1 1 3 0 0 1 1 0 7 7 7

3 0 0 0 1

17
Problem Definition : ANN vs CNN
(Convolutional Neural Network)
ANN (MLP) versus CNN Image Analysis
ANN Architecture Simple CNN Architecture

Hidden Layer
Classification /
Output Layer
MLP Problem in Image Analysis

• The amount of weights

rapidly becomes
unmanageable for large
images. 224 x 224 pixel
image with 3 color =
150.000 weight
• React differently to an input • Do not scale well for images
(images) and its shifted • Ignore the information brought by pixel position
version. and correlation with neighbors
• Spatial information is lost • Cannot handle translations

https://towardsdatascience.com/simple-introduction-to-convolutional-neural-networks-cdf8d3077bac
CNN in Image Analysis

CNN do
• analyze the influence of nearby pixels using
Kernel • Pixel position and neighborhood have semantic
• Extract feature of image > called feature map meanings
• Elements of interest can appear anywhere in the image

https://towardsdatascience.com/simple-introduction-to-convolutional-neural-networks-cdf8d3077bac
Visualizing convolution activations

feature map

22
Visualizing convolution activations
Lower resolution after
pooling

Image credit: http://cs231n.github.io/convolutional-networks/

23
Visualizing convolution activations

Not as easy to
interpret…

Image credit: http://cs231n.github.io/convolutional-networks/

24
Visualizing learned convolutions
Different orientations
Learned convolutions from first layer of AlexNet:

Krizhevsky et al. Imagenet classification with deep convolutional neural networks. NIPS 2012.

25
Visualizing learned convolutions
Different spatial frequencies
Learned convolutions from first layer of AlexNet:

Krizhevsky et al. Imagenet classification with deep convolutional neural networks. NIPS 2012.

26
Visualizing learned convolutions
Different colour contrasts
Learned convolutions from first layer of AlexNet:

Krizhevsky et al. Imagenet classification with deep convolutional neural networks. NIPS 2012.

27
Biological connections
Several principles of ConvNets were inspired by Image credit: Wendell. Foundations of Vision: https://foundationsofvision.stanford.edu/
elements of neuroscience models of the primary DTS AI 2019

and secondary visual cortex, including:

• Limited receptive field (at least at early stages)
• Filters tuned to different scales, orientations, Different spatial frequencies
spatial frequencies and colour contrasts

Different
Different scales Different orientations colour contrasts

28
Notable difference ANN vs CNN
• Field of pattern recognition within images > encode image-specific features
into the architecture.
• Complexity required to compute image data.
• Ex. MNIST 28*28 in ANN first layer 28 x 28 x 1 = 784 weight.
• 64 * 64 data in ANN first layer 64 x 64 x 1 = 12288 weight.

If we use 3 channel in RGB case, the amount of weights rapidly becomes

unmanageable for large images

Just increase the number of hidden layers ? NO, Huge ANN cause
• unlimited computational power and time to train
• likely the network will overfit (either too general or too over-engineered)
Sample Architecture
CNN Visualization

https://www.youtube.com/watch?v=f0t-OCG79-U
Usage of CNN in Image
Processing
CNNs for Image Classification
34
Image Classification

• When the system is given an image of a

handwritten number (from 0 to 9), can the
system classify the image of that number?
• A few things to keep in mind:
• The location/position of numbers (localization) is
not important in image classification.
• Generally, classes that are not included in the
training set are not taken into account in the
performance measure.
• Image classification performance can be
evaluated using confusion matrix
measurements (e.g., accuracy, precision,
recall, etc.), F1-score, ROC curves, etc.
35
LeNet-5

MNIST images are 28x28, • Using the tanh activation function on

but zero-padded into 32x32 every layer (except the classification
layer).
• Demo: http://yann.lecun.com/exdb/lenet/

(5x5) (5x5)

Radial Basis Function (RBF) for the non-linear function.

Average Pooling

LeCun, Yann, et al. "Gradient-based learning applied to document recognition." Proceedings of

the IEEE 86.11 (1998): 2278-2324.
36
AlexNet

• Very similar to LeNet-5. Difference:

• Bigger and deep (deeper).
• Stack multiple convolutional layers directly
without layer pooling.
• Using Rectified Linear Unit (ReLU) as a non-linear
activation function.
• Using several regularization techniques:
• Dropout layer with 50% dropout rate
• Data augmentation (i.e., random shifting, horizontal
flipping, and changing lightning conditions)

Géron, Aurélien. Hands-on machine learning with Scikit-Learn, Keras, and TensorFlow:
Concepts, tools, and techniques to build intelligent systems. O'Reilly Media, 2019.
Krizhevsky, Alex, Ilya Sutskever, and Geoffrey E. Hinton. "Imagenet classification with deep
convolutional neural networks." Advances in neural information processing systems. 2012.
37
GoogLeNet
• Key points in GoogleNet:
• Bigger and deeper than the previous CNN architecture.
• Using the inception module so GoogLeNet can use
parameters more effectively:
• Better performance despite fewer parameters (6 million)
compared to AlexNet (60 million).
• The Inception module proposed in the GoogLeNet
architecture performs feature extraction using a
combination of 1x1, 3x3, and 5x5 filters to cover
various patterns in the image.

Géron, Aurélien. Hands-on machine learning with Scikit-Learn, Keras, and TensorFlow:
Concepts, tools, and techniques to build intelligent systems. O'Reilly Media, 2019.
Szegedy, Christian, et al. "Going deeper with convolutions." Proceedings of the IEEE
conference on computer vision and pattern recognition. 2015.
38
GoogLeNet’s Inception Module

• In the inception module owned by GoogLeNet, there is a

1x1 kernel for dimension reduction.
• Why is a 1x1 kernel important?
• Remember that convolution is cross-channel, so it also computes
all input channels (RGB in the image) or feature maps (input from
the hidden layer).
• If the number of kernels used is less than the number of feature
maps owned by the input, then the 1x1 kernel is used as a
bottleneck layer that functions for dimension reduction.
• The kernel combination [1x1,3x3] and [1x1,5x5] function as cross-
channel convolution (for 1x1) and cross-channel + spatial
convolution (for 3x3,5x5).

Szegedy, Christian, et al. "Going deeper with convolutions." Proceedings of the IEEE
conference on computer vision and pattern recognition. 2015.
39
VGGNet
• Deeper and larger (has more parameters).
• VGGNet was the runner-up of the 2014
ImageNet Large Scale Visual Recognition
Challenge (ILSVRC). Meanwhile, the winner was
the GoogLeNet.

Simonyan, Karen, and Andrew Zisserman. "Very deep convolutional networks for large-scale image
recognition." arXiv preprint arXiv:1409.1556 (2014).
40
Residual Network

• Note: At the beginning of training, RU is an identity function (or almost the same as an identity function)
because during training, almost all weights in the convolutional layer are close to 0 (zero). This makes RU
easy and fast to train.
Géron, Aurélien. Hands-on machine learning with Scikit-Learn, Keras, and TensorFlow: He, Kaiming, et al. "Deep residual learning for image recognition." Proceedings of the IEEE conference on
Concepts, tools, and techniques to build intelligent systems. O'Reilly Media, 2019. computer vision and pattern recognition. 2016.
41
Extreme Inception (Xception)

• Extreme Inception (Xception) from Google proposes a

depth-wise separable convolutional layer to strengthen
the inception module.

FM3

Convolutional layer
• Kernels with size > 1, perform computations at
the spatial and cross-channel levels. Depth-wise separable convolutional layer
• A kernel with a size of 1x1 only performs cross- • Perform spatial computation for each input
channel computing. channel/feature map separately.

Chollet, François. "Xception: Deep learning with depthwise separable convolutions." Proceedings of the
IEEE conference on computer vision and pattern recognition. 2017.
42
Xception: Extreme Inception Module

• In general, the depth-wise convolutional layer uses

less parameters, memory, and computation.
• Recommended for use on deeper layers (deeper
layers, or after multiple convolutional layers).

3x3
Depth-wise 1x1
convolutional Convolutional layer
layer (cross-channel only)

Chollet, François. "Xception: Deep learning with depthwise separable convolutions." Proceedings of the
IEEE conference on computer vision and pattern recognition. 2017.
43
Squeeze-and-Excitation Network (SENet)
• The Squeeze-and-Excitation Network
(SENet) is the winner of the ILSVRC 2017
challenge proposing a squeeze-and-
excitation (SE) block.
• The SE block calibrates the feature maps
issued by the previous layer by giving
additional scaling weights.
• Scaling values used in SENet are trained on
the SE module in the training process and
issued during testing. Géron, Aurélien. Hands-on machine learning with
Scikit-Learn, Keras, and TensorFlow: Concepts,
• SE block does not pay attention to spatial tools, and techniques to build intelligent systems.
O'Reilly Media, 2019.
patterns and focuses on feature maps which
are very closely related during training
(strongly correlated feature maps).

Hu, Jie, Li Shen, and Gang Sun. "Squeeze-and-excitation networks." Proceedings of the IEEE conference
on computer vision and pattern recognition. 2018.
44
SENet: Squeeze-and-Excitation Block

• E.g., the input SE block is 256 feature maps.

• Squeezes:
• Global average pooling: The output is 256 feature maps of
very smaller dimensions (usually 1 neuron for 1 feature
map).
• Dense connection: Encoding/embedding data with a fully
connected layer (e.g., 16 neurons from 256 neurons initially).
• Excitation: Decodes the embedded data to the
initial number of feature maps (from 16 to 256
neurons).

Géron, Aurélien. Hands-on machine learning with Scikit-Learn, Keras, and TensorFlow:
Concepts, tools, and techniques to build intelligent systems. O'Reilly Media, 2019.
CNNs for Semantic Segmentation
46
(Semantic) Segmentation

• Image segmentation performance can be evaluated using precision, recall, F1-score

(Dice similarity coefficient), ROC curve, etc.

Géron, Aurélien. Hands-on machine learning with Scikit-Learn, Keras, and TensorFlow:
Concepts, tools, and techniques to build intelligent systems. O'Reilly Media, 2019.
47
Fully Convolutional Networks (FCN)
• In the CNN architecture for classification
tasks, the last layers are usually
fully-connected layers.
• In Fully Convolutional Networks (FCN),
all layers that have weights to train
(trainable layers) are in the form of
convolutional layers.
• A strategy is needed to restore
the spatial resolution that has been reduced due to the convolutional layer.
• We can restore the image dimensions in the last layers by using up-sampling or
transpose convolutional layers.

Long, Jonathan, Evan Shelhamer, and Trevor Darrell. "Fully convolutional networks for semantic
segmentation." Proceedings of the IEEE conference on computer vision and pattern recognition. 2015.
48
Recovering Spatial Resolution
• Up-sampling can be done by replicating the values in feature maps or it can be done
by interpolating (e.g., bilinear).
• It can also be done by performing a transpose convolutional layer operation, where
feature maps are expanded by adding 0 (zero) and then performing a convolution
operation as usual. For more precise results, we can add the previous skip connection.

Géron, Aurélien. Hands-on machine learning with Scikit-Learn, Keras, and TensorFlow:
Concepts, tools, and techniques to build intelligent systems. O'Reilly Media, 2019.
49
Pixel-to-Pixel Segmentation

Long, Jonathan, Evan Shelhamer, and Trevor

Darrell. "Fully convolutional networks for semantic
segmentation." Proceedings of the IEEE conference
on computer vision and pattern recognition. 2015.

• The results of (semantic) segmentation from low-level resolution feature

maps are very imprecise because there is very little spatial information.
• Therefore, we can combine (spatial) information from shallower feature
maps that have more spatial information.
• This merger, usually done using a skip connection, is also very useful to
speed up the training process.
50
U-Net

Encoder Decoder • U-Net expands by using encoder, decoder,

bottleneck, and skip connection strategies
Skip
simultaneously.
connections • It can be said that U-Net performs domain mapping
from the origin domain (image) to the target domain
(semantic segmentation).
• The bottleneck can be considered as a latent
representation that contains the most important
Bottleneck features needed to be able to perform semantic
segmentation of the original image.
• U-Net is very popular in medical image analysis.

Ronneberger, Olaf, Philipp Fischer, and Thomas Brox. "U-net: Convolutional networks for biomedical
image segmentation." International Conference on Medical image computing and computer-assisted
intervention. Springer, Cham, 2015.
CNNs for Object Detection
52
Object Detection
• In object detection, we also want information about the
location / position of the desired object. This is commonly
called localization.
• In its implementation, localization is done using bounding
boxes, where each object has a label in a tuple (images_id,
(class_label, bounding_boxes)).
• The bounding box itself has 4 values, namely information on
the coordinates of the center of the object (x,y) and the size of
the bounding box (height, width).
• Convolutional networks (or other machine learning methods) can be optimized for object detection by
regressing the values (x, y, height, width) using the mean square error (MSE) loss function.
• Meanwhile, object detection performance can be evaluated using Intersection over Union (IoU).

Géron, Aurélien. Hands-on machine learning with Scikit-Learn, Keras, and TensorFlow:
Concepts, tools, and techniques to build intelligent systems. O'Reilly Media, 2019.
53
Object Detection’s Sliding Window
• Object detection is usually done using a sliding window that
will cover the entire image.
• Divide the image/image into several parts (example: 6x8 as
shown on the side).
• Run CNN using a sliding window (example: large black
rectangle 3x3) to all parts of the image.
• Draw and save 1 bounding box for each object detected by
CNN (example: red square).
• Run the non-maximum suppression method to get 1 bounding
box for each detected object.
• Objects in the image/image can have various sizes (small,
large). Therefore, sliding windows can be done many times
using different sizes (e.g.,: 2x2 🡪 3x3 🡪 4x4).

Non-maximum suppression:
https://towardsdatascience.com/non-maximum- Géron, Aurélien. Hands-on machine learning with Scikit-Learn, Keras, and TensorFlow:
Concepts, tools, and techniques to build intelligent systems. O'Reilly Media, 2019.
suppression-nms-93ce178e177c
54
CNNs for Object Detection: R-CNN Model Family
• R-CNN (2014): Girshick, Ross, et al. "Rich feature hierarchies for accurate object detection and semantic
segmentation." Proceedings of the IEEE conference on computer vision and pattern recognition. 2014.
• Use 3 modules: (1) Region Proposal: selective search (i.e., non-CNN method for proposing
bounding box), (2) Feature Extraction: AlexNet (CNN), dan (3) Classifier: SVM (non-CNN method
for object classification).
• Fast R-CNN (2015): Girshick, Ross. "Fast R-CNN." Proceedings of the IEEE international conference on computer vision.
• Use 2 modules: (1) Region Proposal Netwok (CNN for deciding bounding boxes which will be
classified) dan (2) Fast R-CNN (CNN for object classification and bounding box regression update).
55
CNN for Object Detection: YOLO & SSD
• You Only Look Once (YOLO) (2016): Redmon, Joseph, et al. "You only look once: Unified,
real-time object detection." Proceedings of the IEEE conference on computer vision and pattern recognition.
• Only has one CNN for localization (bounding box) and object classification.
• Single Shot Multi-box Detector (SSD) (2016): Liu, Wei, et al. “SSD: Single shot multibox
detector." European conference on computer vision. Springer, Cham.
• Same with YOLO, SSD only has 1 CNN for object detection.
• Difference between SSD and YOLO: SSD can detect smaller objects and compute
faster than YOLO.
• Due to faster computing, SSDs can be used in real-time object detection systems.
56
Yolo vs. SSD

Redmon, Joseph, et al. "You only look once: Unified, real-time object detection." Proceedings of the IEEE
conference on computer vision and pattern recognition. 2016.
Liu, Wei, et al. “SSD: Single shot multibox detector." European conference on
computer vision. Springer, Cham. 2016.
57
Other Architectures for Object Detection
• SSD and YOLO has various variants of models that are faster and better for real-time system.
For example: SSD300, SSD500, YOLOv2, YOLOv3, YOLO9000.
• Mask R-CNN (2017): Object detection and semantic segmentation at the same time.
He, Kaiming, et al. "Mask R-CNN." Proceedings of the IEEE international conference on computer vision. 2017.
Practical use of CNN in CV
Computer Vision in Construction

https://www.youtube.com/watch?v=l1LYg9NCKWY
Computer Vision Aerial Surveillance

https://www.youtube.com/watch?v=9tLCFbupeOI
Computer Vision Autonomous Car

https://www.youtube.com/watch?v=HS1wV9NMLr8
Computer Vision in Security

https://www.youtube.com/watch?v=HHXRqCGCRCs
Computer Vision in Military

https://www.youtube.com/watch?v=g0zxmO6qlD8
64
Summary
• We have studied various forms of CNN architectural models for 3 types of tasks
in computer vision, namely image classification, semantic segmentation, and
object detection.
• We can see how the convolutional layer is very easy to implement for a wide
variety of architectures and tasks.
• We can also see how convolution can be modified to meet the needs of a specific
task (e.g., inception module, residual unit, squeeze-and-excitation block).
• Understanding the data and the problems and assumptions associated with the
task will greatly facilitate the development of new CNN models.
• A lot of working codes and examples are available on the internet.
• https://keras.io/examples/vision/
Thank You

Deep Learning UNIT-5
No ratings yet
Deep Learning UNIT-5
37 pages
Module5 ML
No ratings yet
Module5 ML
112 pages
Lecture2.2 UnimodalRepresentations Part1 PDF
No ratings yet
Lecture2.2 UnimodalRepresentations Part1 PDF
92 pages
7 CNN
No ratings yet
7 CNN
66 pages
Module11 - NNandDeep Learning
No ratings yet
Module11 - NNandDeep Learning
84 pages
6-DeepVisualLearning L6
No ratings yet
6-DeepVisualLearning L6
82 pages
ML 2
No ratings yet
ML 2
70 pages
06 CNN Convolutional Neural Network
No ratings yet
06 CNN Convolutional Neural Network
133 pages
Images and Convolutional Neural Networks: Practical Deep Learning
No ratings yet
Images and Convolutional Neural Networks: Practical Deep Learning
34 pages
Unit 3
No ratings yet
Unit 3
105 pages
Module11 - NNandDeep Learning
No ratings yet
Module11 - NNandDeep Learning
84 pages
JNTUK R20 UNIT-IV DEEP LEARNING TECHNIQUES (WWW - Jntumaterials.co - In)
No ratings yet
JNTUK R20 UNIT-IV DEEP LEARNING TECHNIQUES (WWW - Jntumaterials.co - In)
26 pages
DL Unit 4&5
No ratings yet
DL Unit 4&5
30 pages
Convolutional Networks
No ratings yet
Convolutional Networks
37 pages
Computer Vision: Field of AI That Enables Computers To Derive Meaningful Information From
No ratings yet
Computer Vision: Field of AI That Enables Computers To Derive Meaningful Information From
26 pages
Convolution Neural Networks
No ratings yet
Convolution Neural Networks
80 pages
Convolutional Neural Networks - Part 1
No ratings yet
Convolutional Neural Networks - Part 1
44 pages
Images, Neural Networks, CNNs
No ratings yet
Images, Neural Networks, CNNs
26 pages
Convolutional Networks 2024
No ratings yet
Convolutional Networks 2024
44 pages
Kernel Slides
No ratings yet
Kernel Slides
33 pages
Lec5 CNN RNN Attention
No ratings yet
Lec5 CNN RNN Attention
71 pages
CNN2
No ratings yet
CNN2
70 pages
CNN 1
No ratings yet
CNN 1
19 pages
Unit 3 CNN 2024
No ratings yet
Unit 3 CNN 2024
58 pages
Convolutional Neuralnetworks: Abin - Roozgard
No ratings yet
Convolutional Neuralnetworks: Abin - Roozgard
54 pages
CNN Iitkgp
No ratings yet
CNN Iitkgp
112 pages
ch4 CNN
No ratings yet
ch4 CNN
35 pages
CNN 2
No ratings yet
CNN 2
47 pages
4th Unit Aktu Machine Learning
No ratings yet
4th Unit Aktu Machine Learning
9 pages
CS601 Machine Learning Unit 3
No ratings yet
CS601 Machine Learning Unit 3
47 pages
5-Convolutional Neural Network
No ratings yet
5-Convolutional Neural Network
43 pages
DL6 - Convnets 4
No ratings yet
DL6 - Convnets 4
57 pages
4a Convolutional Neural Networks
No ratings yet
4a Convolutional Neural Networks
56 pages
Deep Learning: Seungsang Oh
No ratings yet
Deep Learning: Seungsang Oh
39 pages
Co2 CNN 3
No ratings yet
Co2 CNN 3
31 pages
Week5 Computer Vision
No ratings yet
Week5 Computer Vision
58 pages
03 - CNN
No ratings yet
03 - CNN
10 pages
AE556 2024 Topic4 CNN
No ratings yet
AE556 2024 Topic4 CNN
26 pages
Convolutional Neural Networks: CMSC 733 Fall 2015 Angjoo Kanazawa
No ratings yet
Convolutional Neural Networks: CMSC 733 Fall 2015 Angjoo Kanazawa
55 pages
CV Ss16 0609 Deep Learning
No ratings yet
CV Ss16 0609 Deep Learning
91 pages
Convolutional Neural Networks: CS 535 Deep Learning, Winter 2020 Fuxin Li
No ratings yet
Convolutional Neural Networks: CS 535 Deep Learning, Winter 2020 Fuxin Li
44 pages
Trustworthy - Final Essay
No ratings yet
Trustworthy - Final Essay
21 pages
Advancements in Image Classification Using Convolutional Neural Network
No ratings yet
Advancements in Image Classification Using Convolutional Neural Network
8 pages
UNIT-III DeepLearning Notes
No ratings yet
UNIT-III DeepLearning Notes
30 pages
JNTUK R20 UNIT-IV DEEP LEARNING TECHNIQUES-www - Jntumaterials.co - in
No ratings yet
JNTUK R20 UNIT-IV DEEP LEARNING TECHNIQUES-www - Jntumaterials.co - in
26 pages
Module 3 Notes
No ratings yet
Module 3 Notes
22 pages
CV PPT Mt101
No ratings yet
CV PPT Mt101
16 pages
Convolutional Neural Networks CNN
No ratings yet
Convolutional Neural Networks CNN
8 pages
Convolutional Neural Networks
No ratings yet
Convolutional Neural Networks
8 pages
Deep Convolutional Neural Networks For Image Classification: Many Slides From Rob Fergus (NYU and Facebook)
No ratings yet
Deep Convolutional Neural Networks For Image Classification: Many Slides From Rob Fergus (NYU and Facebook)
55 pages
Ch-3 Convolutional Neural Networks (CNNS)
No ratings yet
Ch-3 Convolutional Neural Networks (CNNS)
11 pages
DL Unit-4
No ratings yet
DL Unit-4
26 pages
CNN 1
No ratings yet
CNN 1
23 pages
Identify Web Cam Images Using Neural Networks
No ratings yet
Identify Web Cam Images Using Neural Networks
17 pages
Convolution Neural Networks U2
No ratings yet
Convolution Neural Networks U2
24 pages
DAA Unit 3
No ratings yet
DAA Unit 3
23 pages
Image Recognition Using Neural Networks
No ratings yet
Image Recognition Using Neural Networks
18 pages
Classify Webcam Images Using Deep Learning
No ratings yet
Classify Webcam Images Using Deep Learning
17 pages
PPT
No ratings yet
PPT
20 pages
0282 Algorithms
No ratings yet
0282 Algorithms
90 pages
Understanding of Convolutional Neural Network (CNN) - Deep Learning
No ratings yet
Understanding of Convolutional Neural Network (CNN) - Deep Learning
7 pages
BCA4001
No ratings yet
BCA4001
16 pages
Ques Bank Updated
No ratings yet
Ques Bank Updated
2 pages
Some Problems From Tre Fe Ten
No ratings yet
Some Problems From Tre Fe Ten
3 pages
Review of Transforms: ECGR 6118 Computer Project: Transforms Student Name
No ratings yet
Review of Transforms: ECGR 6118 Computer Project: Transforms Student Name
25 pages
DSP Lab 7 Manual
0% (1)
DSP Lab 7 Manual
10 pages
Siemens SW Multidisciplinary Simulation in The Marine Industry EB
No ratings yet
Siemens SW Multidisciplinary Simulation in The Marine Industry EB
84 pages
Computer Graphics Lab Manual: MR - Shivakumar B, Lecturer, Dept of BCA, SSIBM, Tumakuru
No ratings yet
Computer Graphics Lab Manual: MR - Shivakumar B, Lecturer, Dept of BCA, SSIBM, Tumakuru
21 pages
Lecture 3
No ratings yet
Lecture 3
14 pages
Recurrence Relation Examples 1
No ratings yet
Recurrence Relation Examples 1
27 pages
Alpha Beta Pruning Algorithm
No ratings yet
Alpha Beta Pruning Algorithm
27 pages
Midterm
No ratings yet
Midterm
4 pages
Big M Method
No ratings yet
Big M Method
10 pages
Unit 4 Part 3
No ratings yet
Unit 4 Part 3
6 pages
RBS 15 MK3 10.10.2023
No ratings yet
RBS 15 MK3 10.10.2023
18 pages
9.2 Notes 2DArray Challenges - Watermark
No ratings yet
9.2 Notes 2DArray Challenges - Watermark
11 pages
Hierarchical Clustering: DSCI 5240 Data Mining and Machine Learning For Business
No ratings yet
Hierarchical Clustering: DSCI 5240 Data Mining and Machine Learning For Business
45 pages
Intelligent Systems Tutorial
No ratings yet
Intelligent Systems Tutorial
6 pages
Compressive Sensing Based Image Reconstruction: Sherin C Abraham 150230727015
No ratings yet
Compressive Sensing Based Image Reconstruction: Sherin C Abraham 150230727015
51 pages
An LSTM-based Prediction Model For Gradient-Descending Optimization in Virtual Learning Environments
No ratings yet
An LSTM-based Prediction Model For Gradient-Descending Optimization in Virtual Learning Environments
9 pages
FIR Using Freq Sampling
No ratings yet
FIR Using Freq Sampling
18 pages
Digital Control Systems (Elective) : Suggested Reading
No ratings yet
Digital Control Systems (Elective) : Suggested Reading
1 page
Gauss Elimination, Jordan, Siedel, Jacobi
No ratings yet
Gauss Elimination, Jordan, Siedel, Jacobi
4 pages
Tree-Structured Vector Quantizers
No ratings yet
Tree-Structured Vector Quantizers
13 pages
Sheet 4
No ratings yet
Sheet 4
3 pages
How To Make Biodiesel
No ratings yet
How To Make Biodiesel
5 pages
Newton Mahmud
No ratings yet
Newton Mahmud
5 pages
Electrostatics Motor
No ratings yet
Electrostatics Motor
3 pages
Rahman MD Matiur 2230130236 13no
No ratings yet
Rahman MD Matiur 2230130236 13no
5 pages
Hugin Product Specification
No ratings yet
Hugin Product Specification
2 pages
Assignment 1 - A.BHARGAVI (182001)
No ratings yet
Assignment 1 - A.BHARGAVI (182001)
4 pages
Homework1 2023
No ratings yet
Homework1 2023
3 pages
1.2.1. Machine Learning
No ratings yet
1.2.1. Machine Learning
2 pages
Theseus AUV
No ratings yet
Theseus AUV
1 page

Military AI-Week 05-AI in Computer Vision

Uploaded by

Military AI-Week 05-AI in Computer Vision

Uploaded by

AI in Computer Vision

Military Artificial Intelligence

❖ Practical use of CNN in CV

Image by Cmglee - Own work, CC BY-SA 3.0,

Often called the

• The amount of weights

Image credit: http://cs231n.github.io/convolutional-networks/

Image credit: http://cs231n.github.io/convolutional-networks/

and secondary visual cortex, including:

If we use 3 channel in RGB case, the amount of weights rapidly becomes

• When the system is given an image of a

MNIST images are 28x28, • Using the tanh activation function on

Radial Basis Function (RBF) for the non-linear function.

LeCun, Yann, et al. "Gradient-based learning applied to document recognition." Proceedings of

• Very similar to LeNet-5. Difference:

• In the inception module owned by GoogLeNet, there is a

• Residual Network (ResNet) proposes a

• Extreme Inception (Xception) from Google proposes a

• In general, the depth-wise convolutional layer uses

• E.g., the input SE block is 256 feature maps.

• Image segmentation performance can be evaluated using precision, recall, F1-score

Long, Jonathan, Evan Shelhamer, and Trevor

• The results of (semantic) segmentation from low-level resolution feature

Encoder Decoder • U-Net expands by using encoder, decoder,

You might also like