0% found this document useful (0 votes)

6 views31 pages

Report Explo

The document presents a comparative analysis of semantic segmentation architectures, specifically U-Net, SegNet, and Fully Convolutional Networks (FCNs), focusing on their design, implementation, and evaluation on benchmark datasets. It highlights the strengths and weaknesses of each architecture in terms of model complexity, computational efficiency, and segmentation accuracy. The analysis aims to provide insights for selecting appropriate architectures based on specific application requirements in the field of computer vision.

Uploaded by

Mukhram yadav

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

6 views31 pages

Report Explo

Uploaded by

Mukhram yadav

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 31

Comparative Analysis of Semantic Segmentation

Architectures: U-Net, SegNet and FCN

Report submitted in fulfillment of the requirements

for the Exploratory Project of
II Year (B.Tech.)

MUKHRAM YADAV
22075049

Under the guidance of

DR. RAJEEV SRIVASTAVA

Department of Computer Science and Engineering

INDIAN INSTITUTE OF TECHNOLOGY (BHU) VARANASI
Varanasi 221005, Uttar Pradesh, India
May 2024
DECLARATION

I, the undersigned student, hereby declare that the project entitled ”Comparative
Analysis of Semantic Segmentation Architectures: U-Net, SegNet and
FCN” submitted by me to the Indian Institute of Technology (BHU) Varanasi
during the academic year 2023-24 in fulfillment of the requirements of the Ex-
ploratory Project for the award of Degree of Bachelor of Technology in Computer
Science and Engineering is a record of bonafide project work carried out by me
under the guidance and supervision of Dr. Rajeev Srivastava.

I have worked on the project and followed the ethical guidelines for conducting
research and ensured that my methods and results were accurate and reliable. I
have also maintained a detailed record of my research methodology, data collection,
and analysis procedures, and given due credit to external sources through citations.

I further declare that the work reported in this project has not been submitted
and will not be submitted, either in part or in full, for the award of any other
degree or diploma in this institute or any other University.

Mukhram yadav
(22075049)

Place: Varanasi, India

Date: May 6, 2024
Department of Computer Science and Engineering
INDIAN INSTITUTE OF TECHNOLOGY (BHU) VARANASI
Varanasi 221005, Uttar Pradesh, India

CERTIFICATE

This is to certify that the report entitled “Comparative Analysis of Seman-

tic Segmentation Architectures: U-Net, SegNet and FCN” submitted by
Mukhram yadav (22075049) carried out in the Department of Computer Sci-
ence and Engineering, Indian Institute of Technology (BHU) Varanasi is a bonafide
record of the project work carried out by them under our guidance and supervision.

Dr. Rajeev Srivastava

Department of Computer Science and Engineering,
Indian Institute of Technology (BHU) Varanasi,
Varanasi 221005, India

Place: Varanasi, India

Date: May 6, 2024
CONTENTS

ACKNOWLEDGEMENT i

ABSTRACT ii

LIST OF TABLES iii

LIST OF FIGURES iv

Chapter 1 : Introduction 1
1.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

Chapter 2 : Literature Review 4

Chapter 3 : Architectures Used 6

3.1 U-NET . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
3.2 SegNet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
3.2.1 Encoder . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
3.2.2 Decoder . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
3.3 Fully Convolutional Networks (FCNs) . . . . . . . . . . . . . . . . . 10

Chapter 4 : Training 13
4.0.1 Cityscapes dataset . . . . . . . . . . . . . . . . . . . . . . . 13

Chapter 5 : Results and Discussion 16

Chapter 6 : Conclusion and Future work 19

REFERENCES
ACKNOWLEDGEMENT

I would like to express my sincere gratitude to Dr. Rajeev Srivastava and Dr. SK
Singh, Head of the Department of Computer Science and Engineering, for their
guidance and Sir Arun shahi and Adrash kumar for their support.

Mukhram yadav

Place: Varanasi, India

Date: May 6, 2024

i
ABSTRACT

This report presents a comparative analysis of semantic segmentation architec-

tures employing Convolutional Neural Networks (CNNs), namely U-Net, SegNet
and Fully Convolutional Networks (FCNs)[6]. Semantic segmentation, a crucial
task in computer vision, entails assigning class labels to individual pixels in an
image.This core trainable segmentation engine consists of an encoder network, a
corresponding decoder network followed by a pixel-wise classification layer. The
architecture of the encoder network is topologically identical to the 13 convolu-
tional layers in the VGG16 network [1]. The role of the decoder network is to
map the low resolution encoder feature maps to full input resolution feature maps
for pixel-wise classification. Each architecture is meticulously implemented and
evaluated on benchmark datasets, including Pascal VOC and Cityscapes, using
common evaluation metrics such as Intersection over Union (IoU) and pixel accu-
racy. The comparative study examines various aspects such as model complexity,
computational efficiency, and segmentation accuracy across different architectures.
Additionally, the paper explores the impact of architectural variations and design
choices on segmentation performance. Insights from this comparative analysis pro-
vide valuable guidance for selecting appropriate semantic segmentation architec-
tures based on specific application requirements and computational constraints[7].

ii
LIST OF TABLES

5.1 mean IoU . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

5.2 loss table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
5.3 Computaional time space table . . . . . . . . . . . . . . . . . . . . 17
LIST OF FIGURES

3.1 Unet architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

3.2 segnet enco-deco . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
3.3 Upsampling via maxpooling indices . . . . . . . . . . . . . . . . . . 9
3.4 FCN structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
3.5 Upsampling Via Deconvolution . . . . . . . . . . . . . . . . . . . . 12

4.1 labels visulisation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

5.1 Quantative result . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

iv
CHAPTER 1

INTRODUCTION

1.1 Overview

Semantic segmentation is a pivotal task in computer vision, wherein the goal is to

assign a class label to each pixel in an image. This process enables machines to
understand the content of images at a pixel-level granularity, facilitating numer-
ous applications such as autonomous driving, medical image analysis, and scene
understanding. In recent years[10], Convolutional Neural Networks (CNNs) have
emerged as the cornerstone of semantic segmentation due to their ability to learn
hierarchical representations directly from raw image data.

This overview delves into the realm of semantic segmentation architectures, focus-
ing on four prominent CNN-based models: U-Net, SegNet and Fully Convolutional
Networks (FCNs). Each architecture offers unique design choices and innovations
tailored to address the challenges of semantic segmentation.

U-Net: Notable for its symmetric encoder-decoder architecture, U-Net incorpo-

rates skip connections to preserve spatial information and alleviate the vanishing
gradient problem. It has found widespread use in medical image segmentation
tasks[1].

SegNet: SegNet employs an encoder-decoder architecture, leveraging max-pooling

1
indices from the encoder to perform efficient upsampling in the decoder. It is
known for its simplicity and computational efficiency[15].

Fully Convolutional Networks (FCNs): FCNs were among the pioneering archi-
tectures for semantic segmentation, featuring a fully convolutional end-to-end de-
sign. They introduced transposed convolutions for upsampling, enabling dense
pixel-wise predictions[3].

To comprehensively evaluate these architectures, researchers often employ bench-

mark datasets such as Pascal VOC, Cityscapes, and COCO. Evaluation metrics
like Intersection over Union (IoU), pixel accuracy, and runtime efficiency provide
insights into segmentation performance and computational cost.

A comparative analysis of these architectures sheds light on their strengths, weak-

nesses, and applicability to diverse scenarios. Factors such as model complexity,
segmentation accuracy, and computational efficiency play crucial roles in selecting
the most suitable architecture for a given task[18]. Understanding the nuances of
these architectures is essential for advancing the field of semantic segmentation
and developing robust solutions for real-world applications.

1.2 Motivation

Over the past two decades, machine learning has advanced significantly, from a
curious idea in-lab tool to a useful tool with widespread commercial application.
Machine learning has become the approach of choice in artificial intelligence (AI)
for creating useful software for computer vision, speech recognition, natural lan-
guage processing, robot control, and other applications [9]. The recent surge of
interest in gaining knowledge about deep learning methods lies in the fact that
they have been shown to outperform previous previously existing technologies in
various tasks, as well as utilizing and learning on the abundance of complex data
from different sources (e.g., visual, audio, medical, social, and sensor) [17].

2
The primary motivation for taking on this project was our desire to venture into
an exciting field of research and space of immense futuristic capacity. The chance
to learn more about Computer Vision and Image Segmentation we had not been
exposed to earlier pushed us to work harder[13]. Studying the effects of implemen-
tations of various machine learning and deep learning algorithms to solve real-life
problems across domains has helped us to delve deeper into the impacts of our
work in practicality and motivated us to make them better.

3
CHAPTER 2

LITERATURE REVIEW

In summary, semantic segmentation architectures have evolved significantly with

the proliferation of CNNs and architectural innovations[11]. Comparative studies
have provided valuable insights into the strengths and weaknesses of different
approaches, guiding researchers towards the development of robust and efficient
solutions for diverse application domains. Future research directions may explore
the integration of attention mechanisms, adversarial training, and self-supervised
learning to further advance the state-of-the-art in semantic segmentation[12].

Semantic segmentation, a foundational task in computer vision, has witnessed

significant advancements driven by Convolutional Neural Networks (CNNs). This
literature review explores seminal works, recent developments, and comparative
studies in the domain of semantic segmentation architectures[5],[16].

Early approaches to semantic segmentation relied on handcrafted features and tra-

ditional machine learning algorithms. However, the advent of CNNs revolutionized
the field by enabling end-to-end learning of hierarchical representations directly
from raw image data. Among the pioneering architectures, Fully Convolutional
Networks (FCNs) by Long et al. (2015) introduced a breakthrough approach for
dense pixel-wise predictions, laying the groundwork for subsequent research[17].

U-Net, proposed by Ronneberger et al. (2015), introduced a symmetric encoder-

4
decoder architecture augmented with skip connections[1]. This design facilitated
the precise localization of objects in medical imaging tasks while mitigating the
vanishing gradient problem. SegNet, introduced by Badrinarayanan et al. (2017),
focused on computational efficiency by leveraging max-pooling indices for upsam-
pling in the decoder[1].

Comparative studies have played a pivotal role in evaluating the performance and
characteristics of different semantic segmentation architectures[14]. Zhao et al.
(2017) conducted a comprehensive comparative analysis of FCNs, SegNet, and
UNet on benchmark datasets, highlighting the trade-offs between accuracy and
computational efficiency. Similarly, Ronneberger et al. (2015) compared U-Net
with traditional segmentation methods, showcasing its superior performance in
medical image segmentation tasks.

Recent developments in semantic segmentation architectures have focused on im-

proving efficiency, scalability, and generalization capabilities. Attention mecha-
nisms, as explored by Zhao et al. (2020), have been integrated into CNN archi-
tectures to selectively attend to relevant spatial regions, enhancing segmentation
accuracy. Furthermore, adversarial training, explored by Luc et al. (2016), has
been employed to generate semantically meaningful segmentation masks and im-
prove model robustness[11],[15].

5
CHAPTER 3

ARCHITECTURES USED

3.1 U-NET

U-Net is a widely used deep learning architecture that was first introduced in
the “U-Net: Convolutional Networks for Biomedical Image Segmentation” paper.
The primary purpose of this architecture was to address the challenge of limited
annotated data in the medical field. This network was designed to effectively
leverage a smaller amount of data while maintaining speed and accuracy.

The architecture of U-Net is unique in that it consists of a contracting path and

an expansive path. The contracting path contains encoder layers that capture
contextual information and reduce the spatial resolution of the input, while the
expansive path contains decoder layers that decode the encoded data and use
the information from the contracting path via skip connections to generate a
segmentation map[4].

The contracting path in U-Net is responsible for identifying the relevant features in
the input image. The encoder layers perform convolutional operations that reduce
the spatial resolution of the feature maps while increasing their depth, thereby
capturing increasingly abstract representations of the input. This contracting path
is similar to the feedforward layers in other convolutional neural networks. On the

6
other hand, the expansive path works on decoding the encoded data and locating
the features while maintaining the spatial resolution of the input. The decoder
layers in the expansive path upsample the feature maps, while also performing
convolutional operations. The skip connections from the contracting path help
to preserve the spatial information lost in the contracting path, which helps the
decoder layers to locate the features more accurately.

Figure 3.1: Unet architecture

U-net architecture (example for 32x32 pixels in the lowest resolution). Each blue
box corresponds to a multi-channel feature map. The number of channels is
denoted on top of the box. The x-y-size is provided at the lower left edge of the
box. White boxes represent copied feature maps. The arrows denote the
different operations.

3.2 SegNet

SegNet has an encoder network and a corresponding decoder network, followed by

a final pixelwise classification layer. This architecture in Fig. 3.2 an illustration of

7
Figure 3.2: segnet enco-deco

the SegNet architecture. There are no fully connected layers and hence it is only
convolutional. A decoder upsamples its input using the transferred pool indices
from its encoder to produce a sparse feature map(s). It then performs convolution
with a trainable filter bank to densify the feature map. The final decoder output
feature maps are fed to a soft-max classifier for pixel-wise classification.The en-
coder network consists of 13 convolutional layers which correspond to the first 13
convolutional layers in the VGG16 network [1] designed for object classification.
We can therefore initialize the training process from weights trained for classifica-
tion on large datasets [41]. I can also discard the fully connected layers in favour
of retaining higher resolution feature maps at the deepest encoder output. This
also reduces the number of parameters in the SegNet encoder network significantly
(from 134M to 14.7M) as compared to other recent architectures [2], [4] . Each
encoder layer has a corresponding decoder layer and hence the decoder network
has 13 layers. The final decoder output is fed to a multi-class soft-max classifier
to produce class probabilities for each pixel independently.

3.2.1 Encoder

In the encoder network, each convolutional layer produces feature maps that un-
dergo batch normalization and rectified linear unit (ReLU) activation. Subse-
quently, max-pooling with a 2x2 window and stride 2 is applied for down-sampling,
aiming for translation invariance. However, multiple layers of max-pooling lead

8
to a loss of spatial resolution, which is detrimental for segmentation tasks requir-
ing precise boundary delineation. To address this, the encoder feature maps are
stored efficiently by retaining only the max-pooling indices, representing the lo-
cations of maximum feature values in each pooling window. This storage method
significantly reduces memory usage compared to storing entire feature maps. Al-
though this approach incurs a slight accuracy loss, it remains suitable for practical
applications with memory constraints.

3.2.2 Decoder

Figure 3.3: Upsampling via maxpooling indices

Figure 3.3 An illustration of SegNet decoder. (a,b,c,d) correspond to values in a

feature map. SegNet uses the max pooling indices to upsample (without
learning) the feature map(s) and convolves with a trainable decoder.

The appropriate decoder in the decoder network upsamples its input feature
map(s) using the memorized max-pooling indices from the corresponding encoder
feature map(s). This step produces sparse feature map(s). These feature maps
are then convolved with a trainable decoder filter bank to produce dense feature
maps. A batch normalization step is then applied to each of these maps. Note
that the decoder corresponding to the first encoder (closest to the input image)

9
produces a multi-channel feature map, although its encoder input has 3 channels
(RGB). This is unlike the other decoders in the network which produce feature
maps with the same number of size and channels as their encoder inputs. The
high dimensional feature representation at the output of the final decoder is fed to
a trainable soft-max classifier. This soft-max classifies each pixel independently.
The output of the soft-max classifier is a K channel image of probabilities where
K is the number of classes. The predicted segmentation corresponds to the class
with maximum probability at each pixel.

3.3 Fully Convolutional Networks (FCNs)

This show that a fully convolutional network (FCN), trained end-to-end, pixels-
to-pixels on semantic segmentation exceeds the state-of-the-art without further
machinery. To our knowledge, this is the first work to train FCNs end-to-end (1)
for pixelwise prediction and (2) from supervised pre-training. Fully convolutional
versions of existing networks predict dense outputs from arbitrary-sized inputs.
Both learning and inference are performed whole-image-at a-time by dense feed-
forward computation and backpropagation. In-network upsampling layers enable
pixelwise prediction and learning in nets with subsampled pooling.

This method is efficient, both asymptotically and absolutely, and precludes the
need for the complications in other works. Patchwise training is common [27, 2,
8, 28, 11], but lacks the efficiency of fully convolutional training. Our approach
does not make use of pre- and post-processing complications, including superpix-
els [8, 16], proposals [16, 14], or post-hoc refinement by random fields or local
classifiers [8, 16]. Our model transfers recent success in classification [19, 31, 32]
to dense prediction by reinterpreting classification nets as fully convolutional and
fine-tuning from their learned representations. In contrast, previous works have
applied small convnets without supervised pre-training [8, 28, 27].

10
Figure 3.4: FCN structure

In classification, conventionally, an input image is downsized and goes through

the convolution layers and fully connected (FC) layers, and output one predicted
label for the input image. if the image is not downsized, the output will not be
a single label. Instead, the output has a size smaller than the input image (due
to the max pooling): If we upsample the output above, then we can calculate the
pixelwise output (label map):

Convolution is a process getting the output size smaller. Thus, the name, de-
convolution, is coming from when we want to have upsampling to get the output
size larger. (But the name, deconvolution, is misinterpreted as reverse process of
convolution, but it is not.) And it is also called, up convolution, and transposed

11
Figure 3.5: Upsampling Via Deconvolution

convolution. And it is also called fractional stride convolution when fractional

stride is used.

12
CHAPTER 4

TRAINING

4.0.1 Cityscapes dataset

Cityscapes is a popular dataset used for image segmentation tasks, particularly in

the field of computer vision. It consists of high-quality images captured from ur-
ban environments, specifically streets, and includes pixel-level annotations for var-
ious objects like cars, pedestrians, road markings, and buildings. The Cityscapes
Dataset focuses on semantic understanding of urban street scenes.

The dataset contains annotations for 30 different classes of objects and elements
commonly found in urban environments.

Images are captured across 50 different cities, spanning several months and various
weather conditions (spring, summer, fall), primarily during daytime with good
to medium weather conditions. The dataset includes manually selected frames
with varying scene layouts, backgrounds, and a large number of dynamic objects,
contributing to its complexity.

Pixel-level annotations are available for dense semantic segmentation tasks.

As test dataset doesn’t have annotations so I used split traning data into train
and test data.

13
Figure 4.1: labels visulisation

Training set size: 2380

Validation set size: 500

Test set size: 595

As total 30 different classes there in datasets .here i used 19 lebels to segment

and 1 more for beckground(unlabeled). which are road ,sidewalk ,building ,wall,
Fance ,pole ,traffic light ,traffic sign ,vegetation ,person ,rider ,sky ,car ,truck ,bus
,train etc. further image resize followed by pixel normalization image/255 and
transformation .

We use the Cityscapes road scenes dataset to benchmark the perfor mance of
the decoder variants. This dataset is small, consisting of 2380 training and 595
testing RGB images (day and dusk scenes) at 360 480resolution. The challenge
is to segment 20 classes such as road, building, cars, pedestrians, signs, poles,
side-walk etc. We perform local contrast normalization [4] to the RGB input. The
encoder and decoder weights were all initialized using the technique described
in He et al. [5]. To train all the variants we use stochastic gradient descent
(SGD) with a fixed learning rate of 0.1 and momentum of 0.9 [12] using our Caffe

14
implementation of SegNet-Basic [6]. We train the variants until the training loss
converges. Before each epoch, the training set is shuffled and each mini-batch (12
images) is then picked in order thus ensuring that each image is used only once
in an epoch. We select the model which performs highest on a validation dataset.
We use the cross-entropy loss [2] as the objective function for training the network.
The loss is summed up over all the pixels in a mini-batch. When there is large
variation in the number of pixels in each class in the training set (e.g road, sky
and building pixels dominate the CamVid dataset) then there is a need to weight
the loss differently based on the true class. This is termed class balancing. We
use median frequency balancing [13] where the weight assigned to a class in the
loss function is the ratio of the median of class frequencies computed on the entire
training set divided by the class frequency. This implies that larger classes in the
training set have a weight smaller than 1 and the weights of the smallest classes
are the highest. We also experimented with training the different variants without
class balancing or equivalently using natural frequency balancing.

15
CHAPTER 5

RESULTS AND DISCUSSION

IoU = Area of Overlap Area of U nion

IoU = True Positives True Positives+False Positives+False Negatives

Here table shows the mIoU for training ,val,and test for all three architecture.

model traning validation test

Unet 0.53 0.47 0.45
SegNet 0.84 0.71 0.73
FCN 0.70 0.68 0.67
Table 5.1: mean IoU

Below table shows the loss for training ,val,and test for all three architecture.

model traning validation test

Unet 0.33 0.37 0.35
SegNet 0.30 0.31 0.33
FCN 0.40 0.48 0.47
Table 5.2: loss table

16
Below a comparison of computational time and hardware resources required for
various deep architectures. The caffe time command was used to compute time
requirement averaged over 10 iterations with mini batch size 1 and an image
of 360 480 resolution We used nvidia-smi unix command to compute memory
consumption. For training memory computation we used a mini-batch of size 4
and for inference memory the batch size was 1. Model size was the size of the
caffe models on disk. SegNet is most memory efficient during inference model.
Network forword pass(ms) beckword pass(ms) GPU training memory(MB)
Segnet 422.76 488.21 6803
Unet 317.34 394.71 9731
FCN 484 470.68 9735

Table 5.3: Computaional time space table

From the Table 5.3, we see that bilinear interpolation based upsampling without
any learning performs the worst based on all the measures of accuracy. All the
other methods which either use learning for upsampling (FCN-Basic and vari-
ants) or learning decoder filters after upsampling (SegNet-Basic and its variants)
perform significantly better. This emphasizes the need to learn decoders for seg-
mentation. This is also supported by experimental evidence gathered by other
authors when comparing FCN with SegNet-type decoding techniques [4].

An interesting comparison between FCN-Basic NoAddition and SegNet-Basic-

SingleChannelDecoder shows that using max-pooling indices for upsampling and
an overall larger decoder leads to better performance. This also lends evidence
to SegNet being a good architecture for segmentation, particularly when there
is a need to find a compromise between storage cost, accuracy versus inference
time. In the best case, when both memory and inference time is not constrained,
larger models such as FCN-Basic-NoDimReduction and SegNet-EncoderAddition
are both more accurate than the other variants. Particularly, discarding dimen-
sionality reduction in the FCN-Basic model leads to the best performance amongst
the FCN-Basic variants with a high BF score. This once again emphasizes the
trade-off involved between memory and accuracy in segmentation architectures.

17
Figure 5.1: Quantative result

18
CHAPTER 6

CONCLUSION AND FUTURE WORK

Deep learning models have often achieved increasing success due to the availability
of massive datasets and expanding model depth and parameterisation. However,
in practice factors like memory and computational time during training and test-
ing are important factors to consider when choosing a model from a large bank
of models. Training time becomes an important consideration particularly when
the performance gain is not commensurate with increased training time as shown
in our experiments[2]. Test time memory and computational load are important
to deploy models on specialised embedded devices, for example, in AR applica-
tions. From an overall efficiency viewpoint, I feel less attention has been paid to
smaller and more memory, time efficient models for real-time applications such as
road scene understanding and AR. This was the primary motivation behind the
proposal of SegNet, which is significantly smaller and faster than other competing
architectures, but which we have shown to be efficient for tasks such as road scene
understanding.

here presented SegNet, a deep convolutional network architecture for semantic

segmentation. The main motivation behind SegNet was the need to design an effi-
cient architecture for road and indoor scene understanding which is efficient both
in terms of memory and computational time. We analysed SegNet and compared
it with other important variants to reveal the practical trade-offs involved in de-

19
signing architectures for segmentation, particularly training time, memory versus
accuracy. Those architectures which store the encoder network feature maps in
full perform best but consume more memory during inference time[8]. SegNet on
the other hand is more efficient since it only stores the max-pooling indices of the
feature maps and uses them in its decoder network to achieve good performance.
On large and well known datasets SegNet performs competitively, achieving high
scores for road scene understanding. End-to-end learning of deep segmentation
architectures is a harder challenge and we hope to see more attention paid to this
important problem.

For the future, I would like to exploit our understanding of segmentation architec-
tures gathered from our analysis to design more efficient architectures for real-time
applications. I am also interested in estimating the model uncertainty for predic-
tions from deep segmentation architectures

20
REFERENCES

[1] aca. Sample title, September 2024.

[2] Meriem Amrane, Saliha Oukid, Ikram Gagaoua, and Tolga Ensari. Breast
cancer classification using machine learning. In 2018 electric electronics, com-
puter science, biomedical engineerings’ meeting (EBBT), pages 1–4. IEEE,
2018.

[3] Prakhar Bansal, Rahul Kumar, and Somesh Kumar. Disease detection in
apple leaves using deep convolutional neural network. Agriculture, 11(7):617,
2021.

[4] L. Bottu. ”large-scale machine learning stochastic gradient descent”. in pro-

ceeding of COMPSTAT’2010,pp. 177-186, Springer, 2010.

[5] G. Papandreou C. Liyang chieh and I. Kokkinos. ”semantic image segmen-

taion with deep convolution nets and fully connected crfs”. in ICLR, 2015.

[6] S. Hong H. Noh and B. Han. ”learning deconvolutional network for semantic
segmentation”. in ICCV ,pp. 1520-1528, 2015.

[7] ShelHamer J. Long and T. Darrell. ”fully convolutional network for semantic
segmentation”. in CVPR ,pp. 3431-3440, 2015.

[8] Peng Jiang, Yuehan Chen, Bin Liu, Dongjian He, and Chunquan Liang. Real-
time detection of apple leaf diseases using deep learning approach based on
improved convolutional neural networks. IEEE Access, 7:59069–59080, 2019.

[9] Michael I Jordan and Tom M Mitchell. Machine learning: Trends, perspec-
tives, and prospects. Science, 349(6245):255–260, 2015.

[10] A. Zisserman K. Simonyan. very deep convolutional network for large scale
image recognition. arXiv priprint arXiv, 2014.

[11] R Dhivya Praba, R Vennila, G Rohini, S Mithila, and K Kavitha. Foliar

disease classification in apple trees. In 2021 International Conference on Ad-
vancements in Electrical, Electronics, Communication, Computing and Au-
tomation (ICAECA), pages 1–5. IEEE, 2021.

[12] Muhammad Ramzan, Adnan Abid, Hikmat Ullah Khan, Shahid Mahmood
Awan, Amina Ismail, Muzamil Ahmed, Mahwish Ilyas, and Ahsan Mahmood.
A review on state-of-the-art violence detection techniques. IEEE Access,
7:107560–107575, 2019.

[13] Mubarak Shah, Omar Javed, and Khurram Shafique. Automated visual
surveillance in realistic scenarios. IEEE MultiMedia, 14(1):30–39, 2007.

[14] Wei Song, Dongliang Zhang, Xiaobing Zhao, Jing Yu, Rui Zheng, and Antai
Wang. A novel violent video detection scheme based on modified 3d convo-
lutional neural networks. IEEE Access, 7:39172–39179, 2019.

[15] Ranjita Thapa, Noah Snavely, Serge Belongie, and Awais Khan. The plant
pathology 2020 challenge dataset to classify foliar disease of apples. arXiv
preprint arXiv:2004.11958, 2020.

[16] Fath U Min Ullah, Mohammad S Obaidat, Amin Ullah, Khan Muhammad,
Mohammad Hijji, and Sung Wook Baik. A comprehensive review on vision-
based violence detection in surveillance videos. ACM Computing Surveys,
55(10):1–44, 2023.
[17] Athanasios Voulodimos, Nikolaos Doulamis, Anastasios Doulamis, Eftychios
Protopapadakis, et al. Deep learning for computer vision: A brief review.
Computational intelligence and neuroscience, 2018, 2018.

[18] Anju Yadav, Udit Thakur, Rahul Saxena, Vipin Pal, Vikrant Bhateja, and
Jerry Chun-Wei Lin. Afd-net: Apple foliar disease multi classification using
deep learning on plant pathology dataset. Plant and Soil, 477(1-2):595–611,
2022.

Student Book Touchstone 1
No ratings yet
Student Book Touchstone 1
153 pages
SGDS Sigma III Yaskawa Manual-1 PDF
0% (1)
SGDS Sigma III Yaskawa Manual-1 PDF
688 pages
Unet + RL
No ratings yet
Unet + RL
63 pages
Under Control:: Subaru Lineartronic CVT
No ratings yet
Under Control:: Subaru Lineartronic CVT
10 pages
Thẻ Ghi Nhớ CEA 201 Full Quizlet
No ratings yet
Thẻ Ghi Nhớ CEA 201 Full Quizlet
144 pages
The Unofficial Salesforce CPQ Specialist Study Guide4
No ratings yet
The Unofficial Salesforce CPQ Specialist Study Guide4
15 pages
Semantic Segmentation
No ratings yet
Semantic Segmentation
22 pages
Summ - Test Epp 4 q1 w1
No ratings yet
Summ - Test Epp 4 q1 w1
6 pages
Integration Server Admin Guide
No ratings yet
Integration Server Admin Guide
368 pages
204.4381.11 - DmOS - MIB Reference
No ratings yet
204.4381.11 - DmOS - MIB Reference
169 pages
Miyachi - MA-627 Program Box Manual
No ratings yet
Miyachi - MA-627 Program Box Manual
16 pages
A Comprehensive Review of Modern Object Segmentation Approaches
No ratings yet
A Comprehensive Review of Modern Object Segmentation Approaches
177 pages
ECS 3-6-1 - 2 - 800871b4
No ratings yet
ECS 3-6-1 - 2 - 800871b4
47 pages
Ieee Guide For The Protection of Shunt Reactors
No ratings yet
Ieee Guide For The Protection of Shunt Reactors
22 pages
BGC Form - Idb Check
No ratings yet
BGC Form - Idb Check
2 pages
Dell E6410 Error Codes
100% (1)
Dell E6410 Error Codes
3 pages
Semantic Segmentation Architecture: A Key Part of Scene Understanding Applications
No ratings yet
Semantic Segmentation Architecture: A Key Part of Scene Understanding Applications
9 pages
Lect-7 Segmentation Localization
No ratings yet
Lect-7 Segmentation Localization
151 pages
Industry and Firm Analysisll1.Edited
No ratings yet
Industry and Firm Analysisll1.Edited
21 pages
Architecture Design For Highly Flexible and Energy-Efficient Deep Neural Network Accelerators
No ratings yet
Architecture Design For Highly Flexible and Energy-Efficient Deep Neural Network Accelerators
147 pages
DL Unit 5
No ratings yet
DL Unit 5
63 pages
Brksec 2050
No ratings yet
Brksec 2050
115 pages
BP0273950-Customer Experience Strategy CW2
No ratings yet
BP0273950-Customer Experience Strategy CW2
7 pages
Final Copy For Gireesh
No ratings yet
Final Copy For Gireesh
61 pages
TCS Case Study
100% (2)
TCS Case Study
10 pages
Object Detection and Segmentation - Part 2
No ratings yet
Object Detection and Segmentation - Part 2
36 pages
Harley MSC Thesis Menos Especializadpo
No ratings yet
Harley MSC Thesis Menos Especializadpo
71 pages
Explo PPT
No ratings yet
Explo PPT
25 pages
CV Reza Iqbal PDF
No ratings yet
CV Reza Iqbal PDF
1 page
Lecture 5 - CNNs For Detection and Segmentation
No ratings yet
Lecture 5 - CNNs For Detection and Segmentation
62 pages
A Beginner's Guide To Deep Learning Based Semantic Segmentation Using Keras - Divam Gupta
No ratings yet
A Beginner's Guide To Deep Learning Based Semantic Segmentation Using Keras - Divam Gupta
14 pages
Dlcv2017d3l1segmentation 170623173102
No ratings yet
Dlcv2017d3l1segmentation 170623173102
36 pages
Overview of Semantic Segmentation
No ratings yet
Overview of Semantic Segmentation
20 pages
A Survey On Deep Learning Techniques For Image and Video Semantic Segmentation
No ratings yet
A Survey On Deep Learning Techniques For Image and Video Semantic Segmentation
68 pages
Lecture Sematic-Segmentation
No ratings yet
Lecture Sematic-Segmentation
23 pages
Transformer Segmentation
No ratings yet
Transformer Segmentation
35 pages
Certificate of Originality
No ratings yet
Certificate of Originality
8 pages
Image Segmentation Basics
No ratings yet
Image Segmentation Basics
11 pages
Image Segmentation DeepLearning
No ratings yet
Image Segmentation DeepLearning
18 pages
Applsci 11 08802 - Compressed
No ratings yet
Applsci 11 08802 - Compressed
28 pages
BSSNet A Real-Time Semantic Segmentation Network For Road Scenes Inspired From AutoEncoder
No ratings yet
BSSNet A Real-Time Semantic Segmentation Network For Road Scenes Inspired From AutoEncoder
15 pages
Computer Vision Experiential Learning Report
No ratings yet
Computer Vision Experiential Learning Report
20 pages
Exploring Fusion Techniques in U-Net and DeepLab V3 Architectures For Multi-Modal Land Cover Classification
No ratings yet
Exploring Fusion Techniques in U-Net and DeepLab V3 Architectures For Multi-Modal Land Cover Classification
12 pages
Semantic Segmentation: Tingwu Wang Machine Learning Group, University of Toronto
No ratings yet
Semantic Segmentation: Tingwu Wang Machine Learning Group, University of Toronto
28 pages
Fully Convolutional Networks For Semantic Segmentation
No ratings yet
Fully Convolutional Networks For Semantic Segmentation
12 pages
METHODOLOGY
No ratings yet
METHODOLOGY
5 pages
Implementation of Deep Neural Networks Learning On Unmanned Aerial Vehicle Based Remote-Sensing
No ratings yet
Implementation of Deep Neural Networks Learning On Unmanned Aerial Vehicle Based Remote-Sensing
7 pages
Optimisation of Semantic Segmentation Algorithm For Autonomous Driving Using U-NET Architecture
No ratings yet
Optimisation of Semantic Segmentation Algorithm For Autonomous Driving Using U-NET Architecture
16 pages
Deep Dual-Resolution Networks For Real-Time and Accurate Semantic Segmentation of Road Scenes
No ratings yet
Deep Dual-Resolution Networks For Real-Time and Accurate Semantic Segmentation of Road Scenes
12 pages
The Switched Capacitor Resistor: Electronic Circuit Discrete Time Signal Processing Switches Filters Integrated Circuits
No ratings yet
The Switched Capacitor Resistor: Electronic Circuit Discrete Time Signal Processing Switches Filters Integrated Circuits
2 pages
Large Kernel Matters
No ratings yet
Large Kernel Matters
11 pages
【全局卷积GAP】2017 - Large - Kernel - Matters - Improve - Semantic - Segmentation - by - Global - Convolutional - Network
No ratings yet
【全局卷积GAP】2017 - Large - Kernel - Matters - Improve - Semantic - Segmentation - by - Global - Convolutional - Network
9 pages
A Multi-Path Semantic Segmentation Network Based o
No ratings yet
A Multi-Path Semantic Segmentation Network Based o
17 pages
Ip Address in Networking 1663653009231
No ratings yet
Ip Address in Networking 1663653009231
7 pages
CV Expl 21070126001
No ratings yet
CV Expl 21070126001
16 pages
Expl CV
No ratings yet
Expl CV
16 pages
A Comparative Study of Real-Time Semantic Segmentation For Autonomous Driving
No ratings yet
A Comparative Study of Real-Time Semantic Segmentation For Autonomous Driving
11 pages
【SegFormer】NeurIPS 2021 Segformer Simple and Efficient Design for Semantic Segmentation With Transformers Paper
No ratings yet
【SegFormer】NeurIPS 2021 Segformer Simple and Efficient Design for Semantic Segmentation With Transformers Paper
14 pages
Advanced DL Computer Vision
No ratings yet
Advanced DL Computer Vision
10 pages
IJRAR1DUP001
No ratings yet
IJRAR1DUP001
3 pages
Group4 F5 Pumps
No ratings yet
Group4 F5 Pumps
16 pages
Impotatori Miere
No ratings yet
Impotatori Miere
11 pages
ML Report-Image Segmentation
No ratings yet
ML Report-Image Segmentation
19 pages
Deconvolution Network ICCV 2015 Paper PDF
No ratings yet
Deconvolution Network ICCV 2015 Paper PDF
9 pages
Sensors: Depth Estimation and Semantic Segmentation From A Single RGB Image Using A Hybrid Convolutional Neural Network
No ratings yet
Sensors: Depth Estimation and Semantic Segmentation From A Single RGB Image Using A Hybrid Convolutional Neural Network
20 pages
Auto-DeepLab Hierarchical Neural Architecture Search For Semantic Image Segmentation
No ratings yet
Auto-DeepLab Hierarchical Neural Architecture Search For Semantic Image Segmentation
12 pages
Deep Semantic Segmentation New Model of Natural and Medical Images
No ratings yet
Deep Semantic Segmentation New Model of Natural and Medical Images
4 pages
Deep Semantic Segmentation New Model of Natural and Medical Images
No ratings yet
Deep Semantic Segmentation New Model of Natural and Medical Images
4 pages
5g-core-guide-building-a-new-world Переход от лте к 5г английский
No ratings yet
5g-core-guide-building-a-new-world Переход от лте к 5г английский
13 pages
CV Project Proposal
No ratings yet
CV Project Proposal
3 pages
Image Segmentation Using Deep Learning: A Survey
No ratings yet
Image Segmentation Using Deep Learning: A Survey
23 pages
Image Segmentation Keras: Implementation of Segnet, FCN, Unet, Pspnet and Other Models in Keras
No ratings yet
Image Segmentation Keras: Implementation of Segnet, FCN, Unet, Pspnet and Other Models in Keras
5 pages
Kanoria Shubham Anil 2023HT01569
No ratings yet
Kanoria Shubham Anil 2023HT01569
9 pages
10623proposal Copy
No ratings yet
10623proposal Copy
4 pages
SegNet: A Deep Convolutional Encoder-Decoder Architecture For Image Segmentation
No ratings yet
SegNet: A Deep Convolutional Encoder-Decoder Architecture For Image Segmentation
15 pages
DFANet Deep Feature Aggregation For Real-Time Semantic Segmentation
No ratings yet
DFANet Deep Feature Aggregation For Real-Time Semantic Segmentation
10 pages
Image Segmentation in Deep Learning
No ratings yet
Image Segmentation in Deep Learning
12 pages
6 Segnet
No ratings yet
6 Segnet
14 pages
Group Research
No ratings yet
Group Research
3 pages
Deep Learning Models
No ratings yet
Deep Learning Models
3 pages
Fully Convolutional Networks For Semantic Segmentation: Jonathan Long Evan Shelhamer Trevor Darrell UC Berkeley
No ratings yet
Fully Convolutional Networks For Semantic Segmentation: Jonathan Long Evan Shelhamer Trevor Darrell UC Berkeley
10 pages
The One Hundred Layers Tiramisu: Fully Convolutional Densenets For Semantic Segmentation
No ratings yet
The One Hundred Layers Tiramisu: Fully Convolutional Densenets For Semantic Segmentation
9 pages
Fully Convolutional Networks For Semantic Segmentation
No ratings yet
Fully Convolutional Networks For Semantic Segmentation
12 pages
Semantic Image Segmentation With Task-Specific Edge Detection Using Cnns and A Discriminatively Trained Domain Transform
No ratings yet
Semantic Image Segmentation With Task-Specific Edge Detection Using Cnns and A Discriminatively Trained Domain Transform
10 pages
Al Billet Cutting
No ratings yet
Al Billet Cutting
5 pages
Script Reference - Artillery - Io Docs Payload
No ratings yet
Script Reference - Artillery - Io Docs Payload
9 pages
Hip End Fixing Details
No ratings yet
Hip End Fixing Details
7 pages
Catalogue
No ratings yet
Catalogue
1 page
Group Assignment 1 A192
No ratings yet
Group Assignment 1 A192
3 pages