0% found this document useful (0 votes)

15 views11 pages

Bayesian Segnet: Model Uncertainty in Deep Convolutional Encoder-Decoder Architectures For Scene Understanding

The document presents Bayesian SegNet, a deep learning framework for probabilistic pixel-wise semantic segmentation that incorporates model uncertainty through Monte Carlo sampling with dropout at test time. This approach improves segmentation performance by 2-3% across various architectures and is particularly effective for smaller datasets. The framework is benchmarked on the SUN Scene Understanding and CamVid datasets, demonstrating its ability to provide reliable uncertainty estimates in semantic segmentation tasks.

Uploaded by

24mu01

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

15 views11 pages

Bayesian Segnet: Model Uncertainty in Deep Convolutional Encoder-Decoder Architectures For Scene Understanding

Uploaded by

24mu01

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 11

Bayesian SegNet: Model Uncertainty in Deep Convolutional Encoder-Decoder

Architectures for Scene Understanding

Alex Kendall Vijay Badrinarayanan Roberto Cipolla

University of Cambridge
agk34, vb292, rc10001 @cam.ac.uk
arXiv:1511.02680v2 [cs.CV] 10 Oct 2016

Abstract Input Images

We present a deep learning framework for probabilistic

pixel-wise semantic segmentation, which we term Bayesian
SegNet. Semantic segmentation is an important tool for
visual scene understanding and a meaningful measure of
uncertainty is essential for decision making. Our contri-
bution is a practical system which is able to predict pixel- Bayesian SegNet Segmentation Output
wise class labels with a measure of model uncertainty. We
achieve this by Monte Carlo sampling with dropout at test
time to generate a posterior distribution of pixel class la-
bels. In addition, we show that modelling uncertainty im-
proves segmentation performance by 2-3% across a number
of state of the art architectures such as SegNet, FCN and
Dilation Network, with no additional parametrisation. We
also observe a significant improvement in performance for Bayesian SegNet Model Uncertainty Output
smaller datasets where modelling uncertainty is more effec-
tive. We benchmark Bayesian SegNet on the indoor SUN
Scene Understanding and outdoor CamVid driving scenes
datasets.

1. Introduction
Semantic segmentation requires an understanding of an Figure 1: Bayesian SegNet. These examples show the perfor-
image at a pixel level and is an important tool for scene un- mance of Bayesian SegNet on popular segmentation and scene
derstanding. It is a difficult problem as scenes often vary understanding benchmarks: SUN [35] (left), CamVid [4] (cen-
significantly in pose and appearance. However it is an im- ter column) and Pascal VOC [11] (right). The system takes an
portant problem as it can be used to infer scene geometry RGB image as input (top), and outputs a semantic segmentation
and object support relationships. This has wide ranging ap- (middle row) and model uncertainty estimate, averaged across
all classes (bottom row). We observe higher model uncertainty
plications from robotic interaction to autonomous driving.
at object boundaries and with visually difficult objects. An on-
Previous approaches to scene understanding used low line demo and source code can be found on our project webpage
level visual features [32]. We are now seeing the emergence mi.eng.cam.ac.uk/projects/segnet/
of machine learning techniques for this problem [31, 25].
In particular deep learning [25] has set the benchmark on
many popular datasets [11, 8]. However none of these deep can trust the semantic segmentation output is important for
learning methods produce a probabilistic segmentation with decision making. For instance, a system on an autonomous
a measure of model uncertainty. vehicle may segment an object as a pedestrian. But it is de-
Uncertainty should be a natural part of any predictive sirable to know the model uncertainty with respect to other
system’s output. Knowing the confidence with which we classes such as street sign or cyclist as this can have a strong

1
effect on behavioural decisions. Uncertainty is also imme- the output and ensuring label consistency. However none
diately useful for other applications such as active learning of these proposed segmentation methods generate a proba-
[7], semi-supervised learning, or label propagation [1]. bilistic output with a measure of model uncertainty.
The main contribution of this paper is extending deep Neural networks which model uncertainty are known as
convolutional encoder-decoder neural network architec- Bayesian neural networks [9, 26]. They offer a probabilistic
tures [3] to Bayesian convolutional neural networks which interpretation of deep learning models by inferring distribu-
can produce a probabilistic segmentation output [13]. In tions over the networks weights. They are often compu-
Section 4 we propose Bayesian SegNet, a probabilistic deep tationally very expensive, increasing the number of model
convolutional neural network framework for pixel-wise se- parameters without increasing model capacity significantly.
mantic segmentation. We use dropout at test time which Performing inference in Bayesian neural networks is a dif-
allows us to approximate the posterior distribution by sam- ficult task, and approximations to the model posterior are
pling from the Bernoulli distribution across the network’s often used, such as variational inference [14].
weights. This is achieved with no additional parameterisa- On the other hand, the already significant parameteriza-
tion. tion of convolutional network architectures leaves them par-
In Section 5, we demonstrate that Bayesian SegNet sets ticularly susceptible to over-fitting without large amounts of
the best performing benchmark on prominent scene under- training data. A technique known as dropout is commonly
standing datasets, CamVid Road Scenes [4] and SUN RGB- used as a regularizer in convolutional neural networks to
D Indoor Scene Understanding [35]. In particular, we find prevent overfitting and co-adaption of features [36]. During
a larger performance improvement on smaller datasets such training with stochastic gradient descent, dropout randomly
as CamVid where the Bayesian Neural Network is able to removes units within a network. By doing this it samples
cope with the additional uncertainty from a smaller amount from a number of thinned networks with reduced width. At
of data. test time, standard dropout approximates the effect of aver-
Moreover, we show in section 5.4 that this technique is aging the predictions of all these thinnned networks by us-
broadly applicable across a number of state of the art archi- ing the weights of the unthinned network. This is referred
tectures and achieves a 2-3% improvement in segmenation to as weight averaging.
accuracy when applied to SegNet [3], FCN [25] and Dila- Gal and Ghahramani [13] have cast dropout as approx-
tion Network [40]. imate Bayesian inference over the network’s weights. [12]
Finally in Section 5.5 we demonstrate the effectiveness shows that dropout can be used at test time to impose
of model uncertainty. We show this measure can be used to a Bernoulli distribution over the convolutional net filter’s
understand with what confidence we can trust image seg- weights, without requiring any additional model parame-
mentations. We also explore what factors contribute to ters. This is achieved by sampling the network with ran-
Bayesian SegNet making an uncertain prediction. domly dropped out units at test time. We can consider these
as Monte Carlo samples obtained from the posterior dis-
2. Related Work tribution over models. This technique has seen success in
modelling uncertainty for camera relocalisation [19]. Here
Semantic pixel labelling was initially approached with we apply it to pixel-wise semantic segmentation.
TextonBoost [32], TextonForest [30] and Random Forest We note that the probability distribution from Monte
Based Classifiers [31]. We are now seeing the emergence of Carlo sampling is significantly different to the ‘probabili-
deep learning architectures for pixel wise segmentation, fol- ties’ obtained from a softmax classifier. The softmax func-
lowing its success in object recognition for a whole image tion approximates relative probabilities between the class
[21]. Architectures such as SegNet [3] Fully Convolutional labels, but not an overall measure of the model’s uncertainty
Networks (FCN) [25] and Dilation Network [40] have been [13]. Figure 3 illustrates these differences.
proposed, which we refer to as the core segmentation en-
gine. FCN is trained using stochastic gradient descent with 3. SegNet Architecture
a stage-wise training scheme. SegNet was the first archi-
tecture proposed that can be trained end-to-end in one step, We briefly review the SegNet architecture [3] which we
due to its lower parameterisation. modify to produce Bayesian SegNet. SegNet is a deep
We have also seen methods which improve on these core convolutional encoder decoder architecture which consists
segmentation engine architectures by adding post process- of a sequence of non-linear processing layers (encoders)
ing tools. HyperColumn [16] and DeConvNet [27] use and a corresponding set of decoders followed by a pixel-
region proposals to bootstrap their core segmentation en- wise classifier. Typically, each encoder consists of one
gine. DeepLab [6] post-processes with conditional random or more convolutional layers with batch normalisation and
fields (CRFs) and CRF-RNN [42] use recurrent neural net- a ReLU non-linearity, followed by non-overlapping max-
works. These methods improve performance by smoothing pooling and sub-sampling. The sparse encoding due to the
Convolutional Encoder-Decoder Stochastic Dropout Segmentation
Input
Samples mean

Model Uncertainty
variance

RGB Image Conv + Batch Normalisation + ReLU

Dropout Pooling/Upsampling Softmax

Figure 2: A schematic of the Bayesian SegNet architecture. This diagram shows the entire pipeline for the system which is trained
end-to-end in one step with stochastic gradient descent. The encoders are based on the 13 convolutional layers of the VGG-16 network
[34], with the decoder placing them in reverse. The probabilistic output is obtained from Monte Carlo samples of the model with dropout
at test time. We take the variance of these softmax samples as the model uncertainty for each class.

pooling process is upsampled in the decoder using the max- the Kullback-Leibler (KL) divergence between this approx-
pooling indices in the encoding sequence. This has the im- imating distribution and the full posterior;
portant advantage of retaining class boundary details in the
segmented images and also reducing the total number of KL(q(W) || p(W | X, Y)). (2)
model parameters. The model is trained end to end using
stochastic gradient descent. Here, the approximating variational distribution q(Wi ) for
We take both SegNet [3] and a smaller variant termed every K × K dimensional convolutional layer i, with units
SegNet-Basic [2] as our base models. SegNet’s encoder is j, is defined as:
based on the 13 convolutional layers of the VGG-16 net-
work [34] followed by 13 corresponding decoders. SegNet- bi,j ∼ Bernoulli(pi ) for j = 1, ..., Ki ,
(3)
Basic is a much smaller network with only four layers each Wi = Mi diag(bi ),
for the encoder and decoder with a constant feature size of
64. We use SegNet-Basic as a smaller model for our analy- with bi vectors of Bernoulli distributed random variables
sis since it conceptually mimics the larger architecture. and variational parameters Mi we obtain the approximate
model of the Gaussian process in [12]. The dropout proba-
4. Bayesian SegNet bilities, pi , could be optimised. However we fix them to the
standard probability of dropping a connection as 50%, i.e.
The technique we use to form a probabilistic encoder-
pi = 0.5 [36].
decoder architecture is dropout [36], which we use as ap-
In [12] it was shown that minimising the cross entropy
proximate inference in a Bayesian neural network [12]. We
loss objective function has the effect of minimising the
can therefore consider using dropout as a way of getting
Kullback-Leibler divergence term. Therefore training the
samples from the posterior distribution of models. Gal and
network with stochastic gradient descent will encourage the
Ghahramani [12] link this technique to variational inference
model to learn a distribution of weights which explains the
in Bayesian convolutional neural networks with Bernoulli
data well while preventing over-fitting.
distributions over the network’s weights. We leverage this
method to perform probabilistic inference over our segmen- We train the model with dropout and sample the poste-
tation model, giving rise to Bayesian SegNet. rior distribution over the weights at test time using dropout
For Bayesian SegNet we are interested in finding the pos- to obtain the posterior distribution of softmax class prob-
terior distribution over the convolutional weights, W, given abilities. We take the mean of these samples for our seg-
our observed training data X and labels Y. mentation prediction and use the variance to output model
uncertainty for each class. We take the mean of the per class
p(W | X, Y) (1) variance measurements as an overall measure of model un-
certainty. We also explored using the variation ratio as
In general, this posterior distribution is not tractable, there- a measure of uncertainty (i.e. the percentage of samples
fore we need to approximate the distribution of these which agree with the class prediction) however we found
weights [9]. Here we use variational inference to approx- this to qualitatively produce a more binary measure of
imate it [14]. This technique allows us to learn the distri- model uncertainty. Fig. 2 shows a schematic of the segmen-
bution over the network’s weights, q(W), by minimising tation prediction and model uncertainty estimate process.
(a) Input Image (b) Semantic Segmentation (c) Softmax Uncertainty (d) Dropout Uncertainty (e) Dropout Uncertainty
Car Class Car Class All Classes

Figure 3: Comparison of uncertainty with Monte Carlo dropout and uncertainty from softmax regression (c-e: darker colour
represents larger value). This figure shows that softmax regression is only capable of inferring relative probabilities between classes. In
contrast, dropout uncertainty can produce an estimate of absolute model uncertainty.

Weight Monte Carlo Training

Averaging Sampling Fit
variant we insert dropout after the central four encoder
Probabilistic Variants G C I/U G C I/U G C I/U and decoder units.
No Dropout 82.9 62.4 46.4 n/a n/a n/a 94.7 96.2 92.7 • Bayesian Classifier. In this variant we insert dropout
Dropout Encoder 80.6 68.9 53.4 81.6 69.4 54.0 90.6 92.5 86.3 after the last decoder unit, before the classifier.
Dropout Decoder 82.4 64.5 48.8 82.6 62.4 46.1 94.6 96.0 92.4
Dropout Enc-Dec 79.9 69.0 54.2 79.8 68.8 54.0 88.9 89.0 80.6
Dropout Central Enc-Dec 81.1 70.6 55.7 81.6 70.6 55.8 90.4 92.3 85.9 For analysis we use the smaller eight layer SegNet-Basic
Dropout Center 82.9 68.9 53.1 82.7 68.9 53.2 93.3 95.4 91.2 architecture [3] and test these Bayesian variants on the
Dropout Classifier 84.2 62.6 46.9 84.2 62.6 46.8 94.9 96.0 92.3
CamVid dataset [4]. We observe qualitatively that all four
Table 1: Architecture Variants for SegNet-Basic on the variants produce similar looking model uncertainty output.
CamVid dataset [4]. We compare the performance of weight That is, they are uncertain near the border of segmentations
averaging against 50 Monte Carlo samples. We quantify perfor- and with visually ambiguous objects, such as cyclist and
mance with three metrics; global accuracy (G), class average ac- pedestrian classes. However, Table 1 shows a difference in
curacy (C) and intersection over union (I/U). Results are shown as quantitative segmentation performance.
percentages (%). We observe that dropping out every encoder and We observe using dropout after all the encoder and de-
decoder is too strong a regulariser and results in a lower training coder units results in a lower training fit and poorer test per-
fit. The optimal result across all classes is when only the central formance as it is too strong a regulariser on the model. We
encoder and decoders are dropped out.
find that dropping out half of the encoder or decoder units
is the optimal configuration. The best configuration is drop-
ping out the deepest half of the encoder and decoder units.
4.1. Probabilistic Variants
We therefore benchmark our Bayesian SegNet results on
A fully Bayesian network should be trained with dropout the Central Enc-Dec variant. For the full 26 layer Bayesian
after every convolutional layer. However we found in prac- SegNet, we add dropout to the central six encoders and de-
tice that this was too strong a regulariser, causing the net- coders. This is illustrated in Fig. 2.
work to learn very slowly. We therefore explored a number In the lower layers of convolutional networks basic fea-
of variants that have different configurations of Bayesian tures are extracted, such as edges and corners [41]. These
or deterministic encoder and decoder units. We note that results show that applying Bayesian weights to these lay-
an encoder unit contains one or more convolutional layers ers does not result in a better performance. We believe
followed by a max pooling layer. A decoder unit contains this is because these low level features are consistent across
one or more convolutional layers followed by an upsam- the distribution of models because they are better modelled
pling layer. The variants are as follows: with deterministic weights. However, the higher level fea-
tures that are formed in the deeper layers, such as shape and
• Bayesian Encoder. In this variant we insert dropout contextual relationships, are more effectively modelled with
after each encoder unit. Bayesian weights.
• Bayesian Decoder. In this variant we insert dropout
4.2. Comparing Weight Averaging and Monte Carlo
after each decoder unit.
Dropout Sampling
• Bayesian Encoder-Decoder. In this variant we insert
dropout after each encoder and decoder unit. Monte Carlo dropout sampling qualitatively allows us to
• Bayesian Center. In this variant we insert dropout af- understand the model uncertainty of the result. However,
ter the deepest encoder, between the encoder and de- for segmentation, we also want to understand the quanti-
coder stage. tative difference between sampling with dropout and using
• Bayesian Central Four Encoder-Decoder. In this the weight averaging technique proposed by [36]. Weight
Column-Pole
Sign-Symbol

Global avg.
Pedestrian

Class avg.
Side-walk

Mean I/U
Bicyclist
Building

Fence
Road
Tree

Sky

Car
Method
SfM+Appearance [5] 46.2 61.9 89.7 68.6 42.9
89.5 53.6 46.6 0.7 60.5 22.5 53.0 69.1 n/a
Boosting [37] 61.9 67.3 91.1 71.1 58.5
92.9 49.5 37.6 25.8 77.8 24.7 59.8 76.4 n/a
Structured Random Forests [20] n/a 51.4 72.5 n/a
Neural Decision Forests [29] n/a 56.1 82.1 n/a
Local Label Descriptors [39] 80.7 61.5 88.8 16.4 n/a 98.0 1.09 0.05 4.13 12.4 0.07 36.3 73.6 n/a
Super Parsing [38] 87.0 67.1 96.9 62.7 30.1 95.9 14.7 17.9 1.7 70.0 19.4 51.2 83.3 n/a
Boosting+Detectors+CRF [22] 81.5 76.6 96.2 78.7 40.2 93.9 43.0 47.6 14.3 81.5 33.9 62.5 83.8 n/a
SegNet-Basic (layer-wise training [2]) 75.0 84.6 91.2 82.7 36.9 93.3 55.0 37.5 44.8 74.1 16.0 62.9 84.3 n/a
SegNet-Basic [3] 80.6 72.0 93.0 78.5 21.0 94.0 62.5 31.4 36.6 74.0 42.5 62.3 82.8 46.3
SegNet [3] 88.0 87.3 92.3 80.0 29.5 97.6 57.2 49.4 27.8 84.8 30.7 65.9 88.6 50.2
FCN 8 [25] n/a 64.2 83.1 52.0
DeconvNet [27] n/a 62.1 85.9 48.9
DeepLab-LargeFOV-DenseCRF [6] n/a 60.7 89.7 54.7
Bayesian SegNet Models in this work:
Bayesian SegNet-Basic 75.1 68.8 91.4 77.7 52.0 92.5 71.5 44.9 52.9 79.1 69.6 70.5 81.6 55.8
Bayesian SegNet 80.4 85.5 90.1 86.4 67.9 93.8 73.8 64.5 50.8 91.7 54.6 76.3 86.9 63.1

Table 2: Quantitative results on CamVid [4] consisting of 11 road scene categories. Bayesian SegNet outperforms all other methods,
including shallow methods which utilise depth, video and/or CRF’s, and more contemporary deep methods. Particularly noteworthy are
the significant improvements in accuracy for the smaller/thinner classes.

82 87.5 computed in parallel on a GPU this cost can be reduced for

practical applications.
Global Accuracy (%)

Global Accuracy (%)

81.5
87

81
86.5
80.5 4.3. Training and Inference
86
80

85.5
Following [3] we train SegNet with median frequency
79.5
Monte Carlo Dropout Sampling
Weight Averaging
Monte Carlo Dropout Sampling
Weight Averaging
class balancing using the formula proposed by Eigen and
79 85
0 10 20 30 40
Number of Samples
50 60 0 10 20 30
Number of Samples
40 50 60
Fergus [10]. We use batch normalisation layers after ev-
ery convolutional layer [17]. We compute batch normali-
(a) SegNet Basic (b) SegNet
sation statistics across the training dataset and use these at
test time. We experimented with computing these statistics
Figure 4: Global segmentation accuracy against number of while using dropout sampling. However we experimentally
Monte Carlo samples for both SegNet and SegNet-Basic. Re- found that computing them with weight averaging produced
sults averaged over 5 trials, with two standard deviation error bars, better results.
are shown for the CamVid dataset. This shows that Monte Carlo We implement Bayesian SegNet using the Caffe library
sampling outperforms the weight averaging technique after ap- [18] and release the source code and trained models for pub-
proximately 6 samples. Monte Carlo sampling converges after lic evaluation 1 . We train the whole system end-to-end us-
around 40 samples with no further significant improvement be-
ing stochastic gradient descent with a base learning rate of
yond this point.
0.001 and weight decay parameter equal to 0.0005. We train
the network until convergence when we observe no further
reduction in training loss.
averaging proposes to remove dropout at test time and scale
the weights proportionally to the dropout percentage. Fig. 5. Experiments
4 shows that Monte Carlo sampling with dropout performs
better than weight averaging after approximately 6 samples. We quantify the performance of Bayesian SegNet on
We also observe no additional performance improvement three different benchmarks using our Caffe implementa-
beyond approximately 40 samples. Therefore the weight tion. Through this process we demonstrate the efficacy of
averaging technique produces poorer segmentation results, Bayesian SegNet for a wide variety of scene segmentation
in terms of global accuracy, in addition to being unable to tasks which have practical applications. CamVid [4] is a
provide a measure of model uncertainty. However, sam- 1 An online demo and source code can be found on our project webpage

pling comes at the expense of inference time, but when mi.eng.cam.ac.uk/projects/segnet/

Method G C I/U Method G C I/U
RGB RGB
Liu et al. [24] n/a 9.3 n/a FCN-32s RGB [25] 60.0 42.2 29.2
FCN 8 [25] 68.2 38.4 27.4 SegNet [3] 66.1 36.0 23.6
DeconvNet [27] 66.1 32.3 22.6 Bayesian SegNet (this work) 68.0 45.8 32.4
DeepLab-LargeFOV-DenseCRF [6] 67.0 33.0 24.1 RGB-D
SegNet [3] 70.3 35.6 22.1 Gupta et al. [15] 60.3 - 28.6
Bayesian SegNet (this work) 71.2 45.9 30.7 FCN-32s RGB-D [25] 61.5 42.4 30.5
RGB-D Eigen et al. [10] 65.6 45.1 -
Liu et al. [24] n/a 10.0 n/a RGB-HHA
Ren et. al [28] n/a 36.3 n/a FCN-16s RGB-HHA [25] 65.4 46.1 34.0

Table 3: SUN Indoor Scene Understanding. Quantitative com- Table 4: NYU v2. Results for the NYUv2 RGB-D dataset [33]
parison on the SUN RGB-D dataset [35] which consists of 5050 which consists of 654 test images. Bayesian SegNet is the top
test images of indoor scenes with 37 classes. SegNet RGB based performing RGB method.
predictions have a high global accuracy and out-perform all previ-
ous benchmarks, including those which use depth modality. Parameters Pascal VOC Test IoU
Method (Millions) Non-Bayesian Bayesian
Dilation Network [40] 140.8 71.3 73.1
road scene understanding dataset which has applications for FCN-8 [25] 134.5 62.2 65.4
autonomous driving. SUN RGB-D [35] is a very challeng- SegNet [3] 29.45 59.1 60.5
ing and large dataset of indoor scenes which is important
for domestic robotics. Finally, Pascal VOC 2012 [11] is a Table 5: Pascal VOC12 [11] test results evaluated from the online
RGB dataset for object segmentation. evaluation server. We compare to competing deep learning archi-
tectures. Bayesian SegNet is considerably smaller but achieves a
5.1. CamVid competitive accuracy to other methods. We also evaluate FCN
CamVid is a road scene understanding dataset with 367 [25] and Dilation Network (front end) [40] with Monte Carlo
training images and 233 testing images of day and dusk dropout sampling. We observe an improvement in segmentation
performance across all three deep learning models when using
scenes [4]. The challenge is to segment 11 classes such as
the Bayesian approach. This demonstrates this method’s appli-
road, building, cars, pedestrians, signs, poles, side-walk etc.
cability in general. Additional results available on the leaderboard
We resize images to 360x480 pixels for training and testing host.robots.ox.ac.uk:8080/leaderboard
of our system.
Table 2 shows our results and compares them to previous
benchmarks. We compare to methods which utilise depth Using the depth modality would necessitate architectural
and motion cues. Additionally we compare to other promi- modifications and careful post-processing to fill-in missing
nent deep learning architectures. Bayesian SegNet obtains depth measurements. This is beyond the scope of this paper.
the highest overall class average and mean intersection over Table 3 shows our results on this dataset compared to
union score by a significant margin. We set a new bench- other methods. Bayesian SegNet outperforms all previous
mark on 7 out of the 11 classes. Qualitative results can be benchmarks, including those which use depth modality. We
viewed in Fig. 5. also note that an earlier benchmark dataset, NYUv2 [33],
is included as part of this dataset, and Table 4 shows our
5.2. Scene Understanding (SUN) evaluation on this subset. Qualitative results can be viewed
SUN RGB-D [35] is a very challenging and large dataset in Fig. 6.
of indoor scenes with 5285 training and 5050 testing im-
5.3. Pascal VOC
ages. The images are captured by different sensors and
hence come in various resolutions. The task is to segment The Pascal VOC12 segmentation challenge [11] consists
37 indoor scene classes including wall, floor, ceiling, table, of segmenting a 20 salient object classes from a widely
chair, sofa etc. This task is difficult because object classes varying background class. For our model, we resize the
come in various shapes, sizes and in different poses with input images for training and testing to 224x224 pixels. We
frequent partial occlusions. These factors make this one of train on the 12031 training images and 1456 testing images,
the hardest segmentation challenges. For our model, we re- with scores computed remotely on a test server. Table 5
size the input images for training and testing to 224x224 shows our results compared to other methods, with qualita-
pixels. Note that we only use RGB input to our system. tive results in Fig. 9.
Figure 5: Bayesian SegNet results on CamVid road scene understanding dataset [4]. The top row is the input image, with the ground
truth shown in the second row. The third row shows Bayesian SegNet’s segmentation prediction, with overall model uncertainty, averaged
across all classes, in the bottom row (with darker colours indicating more uncertain predictions). In general, we observe high quality
segmentation, especially on more difficult classes such as poles, people and cyclists. Where SegNet produces an incorrect class label we
often observe a high model uncertainty.

Figure 6: Bayesian SegNet results on the SUN RGB-D indoor scene understanding dataset [35]. The top row is the input image, with
the ground truth shown in the second row. The third row shows Bayesian SegNet’s segmentation prediction, with overall model uncertainty,
averaged across all classes, in the bottom row (with darker colours indicating more uncertain predictions). Bayesian SegNet uses only RGB
input and is able to accurately segment 37 classes in this challenging dataset. Note that often parts of an image do not have ground truth
labels and these are shown in black colour.
Percentile Pixel-Wise Classification Accuracy
Confidence CamVid SUN RGBD
90 99.7 97.6
50 98.5 92.3
10 89.5 79.0
0 86.7 75.4

Table 6: Bayesian SegNets accuracy as a function of confidence

for the 90th percentile (10% most confident pixels) through to the
0th percentile (all pixels). This shows uncertainty is an effective
measure of prediction accuracy.

Figure 7: Bayesian SegNet performance compared to mean

model uncertainty for each class in CamVid road scene un-
This dataset is unlike the segmentation for scene under- derstanding dataset. This figure shows that there is a strong in-
standing benchmarks described earlier which require learn- verse relationship between class accuracy and model uncertainty.
ing both classes and their spatial context. A number of tech- It shows that the classes that Bayesian SegNet performs better at,
niques have been proposed based on this challenge which such as Sky and Road, it is also more confident at. Conversely,
are increasingly more accurate and complex 2 . Our efforts for the more difficult classes such as Sign Symbol and Bicyclist,
in this benchmarking experiment have not been diverted Bayesian SegNet has a much higher model uncertainty.
towards attaining the top rank by either using multi-stage
training [25], other datasets for pre-training such as MS-
COCO [23], training and inference aids such as object pro- 5.5. Understanding Model Uncertainty
posals [27] or post-processing using CRF based methods Qualitative observations. Fig. 5 shows segmentations
[6, 42]. Although these supporting techniques clearly have and model uncertainty results from Bayesian SegNet on
value towards increasing the performance it unfortunately CamVid Road Scenes [4]. Fig. 6 shows SUN RGB-D In-
does not reveal the true performance of the deep architec- door Scene Understanding [35] results and Fig. 9 has Pascal
ture which is the core segmentation engine. It however does VOC [11] results. These figures show the qualitative perfor-
indicate that some of the large deep networks are difficult to mance of Bayesian SegNet. We observe that segmentation
train end-to-end on this task even with pre-trained encoder predictions are smooth, with a sharp segmentation around
weights. Therefore, to encourage more controlled bench- object boundaries. These results also show that when the
marking, we trained Bayesian SegNet end-to-end without model predicts an incorrect label, the model uncertainty is
other aids and report this performance. generally very high. More generally, we observe that a high
model uncertainty is predominantly caused by three situa-
tions.
5.4. General Applicability
Firstly, at class boundaries the model often displays a
To demonstrate the general applicability of this method, high level of uncertainty. This reflects the ambiguity sur-
we also apply it to other deep learning architectures trained rounding the definition of defining where these labels tran-
with dropout; FCN [25] and Dilation Network [40]. We sition. The Pascal results clearly illustrated this in Fig. 9.
select these state-of-the-art methods as they are already Secondly, objects which are visually difficult to identify
trained by their respective authors using dropout. We take often appear uncertain to the model. This is often the case
their trained open source models off the shelf, and evalu- when objects are occluded or at a distance from the camera.
ate them using 50 Monte Carlo dropout samples. Table 5 The third situation causing model uncertainty is when
shows the mean IoU result of these methods evaluated as the object appears visually ambiguous to the model. As an
Bayesian Neural Networks, as computed by the online eval- example, cyclists in the CamVid results (Fig. 5) are visually
uation server. similar to pedestrians, and the model often displays uncer-
tainty around them. We observe similar results with visually
This shows the general applicability of our method. By
similar classes in SUN (Fig. 6) such as chair and sofa, or
leveraging this underlying Bayesian framework our method
bench and table. In Pascal this is often observed between
obtains 2-3% improvement across this range of architec-
cat and dog, or train and bus classes.
tures.
Quantitative observations. To understand what causes
the model to be uncertain, we have plotted the relationship
2 See the full leader board at http://host.robots.ox.ac.uk: between uncertainty and accuracy in Fig. 7 and between
8080/leaderboard uncertainty and the frequency of each class in the dataset in
Figure 9: Bayesian SegNet results on the Pascal VOC 2012 dataset [11]. The top row is the input image. The middle row shows
Bayesian SegNet’s segmentation prediction, with overall model uncertainty averaged across all classes in the bottom row (darker colours
indicating more uncertain predictions). Ground truth is not publicly available for these test images.

Shower curtain

Whiteboard

Night stand
Bookshelf

Floor Mat
Window

Counter

Bathtub
Cabinet

Shelves

Clothes
Dresser
Curtain

Ceiling
Picture

Person
Mirror
Blinds

Fridge
Pillow

Books

Towel

Lamp
Toilet
Paper
Chair

Table
Floor

Door

Desk
Wall

Sink
Sofa

Box
Bed

Bag
TV
86.6
92.0
52.4
68.4
76.0
54.3
59.3
37.4
53.8
29.2
49.7
32.5
31.2
17.8

53.2
28.8
36.5
29.6

14.4
67.7
32.4
10.2
18.3
19.2
11.5

38.7

22.6
55.6
52.7
27.9
29.9
5.3

0.0

0.0
8.9

4.9

8.1
SegNet [3]
80.2
90.9
58.9
64.8
76.0
58.6
62.6
47.7
66.4
31.2
63.6
33.8
46.7
19.7
16.2
67.0
42.3
57.1
39.1

24.4
84.0
48.7
21.3
49.5
30.6
18.8

24.1
56.8
17.9
42.9
73.0
66.2
48.8
45.1
24.1
0.1

0.1
Bayesian SegNet

Table 7: Class accuracy of Bayesian SegNet predictions for the 37 indoor scene classes in the SUN RGB-D benchmark dataset [35].

Fig. 8. Uncertainty is calculated as the mean uncertainty and Bayesian SegNet with 10 Monte Carlo samples at 90ms
value for each pixel of that class in a test dataset. We ob- per frame on Titan X GPU. However inference time will de-
serve an inverse relationship between uncertainty and class pend on the implementation.
accuracy or class frequency. This shows that the model is
more confident about classes which are easier or occur more
6. Conclusions
often, and less certain about rare and challenging classes.
Additionally, Table 6 shows segmentation accuracies for We have presented Bayesian SegNet, the first probabilis-
varying levels of confidence. We observe very high levels tic framework for semantic segmentation using deep learn-
of accuracy for values of model uncertainty above the 90th ing, which outputs a measure of model uncertainty for each
percentile across each dataset. This demonstrates that the class. We show that the model is uncertain at object bound-
model’s uncertainty is an effective measure of confidence aries and with difficult and visually ambiguous objects. We
in prediction. quantitatively show Bayesian SegNet produces a reliable
measure of model uncertainty and is very effective when
5.6. Real Time Performance
modelling smaller datasets. Bayesian SegNet outperforms
Table 5 shows that SegNet and Bayesian SegNet main- shallow architectures which use motion and depth cues, and
tains a far lower parameterisation than its competitors. other deep architectures. We obtain the highest perform-
Monte Carlo sampling requires additional inference time, ing result on CamVid road scenes and SUN RGB-D indoor
however if model uncertainty is not required, then the scene understanding datasets. We show that the segmenta-
weight averaging technique can be used to remove the need tion model can be run in real time on a GPU. For future
for sampling (Fig. 4 shows the performance drop is mod- work we intend to explore how video data can improve our
est). Our implementation can run SegNet at 35ms per frame model’s scene understanding performance.
[13] Y. Gal and Z. Ghahramani. Dropout as a Bayesian approximation:
Representing model uncertainty in deep learning. arXiv:1506.02142,
2015. 2
[14] A. Graves. Practical variational inference for neural networks. In
Advances in Neural Information Processing Systems, pages 2348–
2356, 2011. 2, 3
[15] S. Gupta, R. Girshick, P. Arbeláez, and J. Malik. Learning rich fea-
tures from rgb-d images for object detection and segmentation. In
Computer Vision–ECCV 2014, pages 345–360. Springer, 2014. 6
[16] B. Hariharan, P. Arbeláez, R. Girshick, and J. Malik. Hypercolumns
for object segmentation and fine-grained localization. arXiv preprint
arXiv:1411.5752, 2014. 2
[17] S. Ioffe and C. Szegedy. Batch normalization: Accelerating deep
network training by reducing internal covariate shift. arXiv preprint
arXiv:1502.03167, 2015. 5
Figure 8: Bayesian SegNet class frequency compared to mean [18] Y. Jia, E. Shelhamer, J. Donahue, S. Karayev, J. Long, R. Girshick,
model uncertainty for each class in CamVid road scene un- S. Guadarrama, and T. Darrell. Caffe: Convolutional architecture for
derstanding dataset. This figure shows that there is a strong in- fast feature embedding. arXiv preprint arXiv:1408.5093, 2014. 5
verse relationship between model uncertainty and the frequency at [19] A. Kendall and R. Cipolla. Modelling uncertainty in deep learning
which a class label appears in the dataset. It shows that the classes for camera relocalization. arXiv preprint arXiv:1509.05909, 2015. 2
that Bayesian SegNet is more confident at are more prevalent in [20] P. Kontschieder, S. Rota Buló, H. Bischof, and M. Pelillo. Structured
class-labels in random forests for semantic image labelling. In Com-
the dataset. Conversely, for the more rare classes such as Sign
puter Vision (ICCV), 2011 IEEE International Conference on, pages
Symbol and Bicyclist, Bayesian SegNet has a much higher model 2190–2197. IEEE, 2011. 5
uncertainty. [21] A. Krizhevsky, I. Sutskever, and G. E. Hinton. Imagenet classifica-
tion with deep convolutional neural networks. In Advances in neural
information processing systems, pages 1097–1105, 2012. 2
References [22] L. Ladickỳ, P. Sturgess, K. Alahari, C. Russell, and P. H. Torr. What,
where and how many? combining object detectors and crfs. In Com-
[1] V. Badrinarayanan, F. Galasso, and R. Cipolla. Label propagation puter Vision–ECCV 2010, pages 424–437. Springer, 2010. 5
in video sequences. In Computer Vision and Pattern Recognition [23] T.-Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan,
(CVPR), 2010 IEEE Conference on, pages 3265–3272. IEEE, 2010. P. Dollár, and C. L. Zitnick. Microsoft coco: Common objects in
2 context. In Computer Vision–ECCV 2014, pages 740–755. Springer,
[2] V. Badrinarayanan, A. Handa, and R. Cipolla. Segnet: A deep convo- 2014. 8
lutional encoder-decoder architecture for robust semantic pixel-wise [24] C. Liu, J. Yuen, A. Torralba, J. Sivic, and W. T. Freeman. Sift flow:
labelling. arXiv preprint arXiv:1505.07293, 2015. 3, 5 Dense correspondence across different scenes. In Computer Vision–
[3] V. Badrinarayanan, A. Kendall, and R. Cipolla. Segnet: A deep ECCV 2008, pages 28–42. Springer, 2008. 6
convolutional encoder-decoder architecture for image segmentation. [25] J. Long, E. Shelhamer, and T. Darrell. Fully convolutional networks
arXiv preprint arXiv:1511.00561, 2015. 2, 3, 4, 5, 6, 9 for semantic segmentation. arXiv preprint arXiv:1411.4038, 2014.
[4] G. J. Brostow, J. Fauqueur, and R. Cipolla. Semantic object classes in 1, 2, 5, 6, 8
video: A high-definition ground truth database. Pattern Recognition [26] D. J. MacKay. A practical bayesian framework for backpropagation
Letters, 30(2):88–97, 2009. 1, 2, 4, 5, 6, 7, 8 networks. Neural computation, 4(3):448–472, 1992. 2
[5] G. J. Brostow, J. Shotton, J. Fauqueur, and R. Cipolla. Segmenta- [27] H. Noh, S. Hong, and B. Han. Learning deconvolution network for
tion and recognition using structure from motion point clouds. In semantic segmentation. arXiv preprint arXiv:1505.04366, 2015. 2,
Computer Vision–ECCV 2008, pages 44–57. Springer, 2008. 5 5, 6, 8
[6] L.-C. Chen, G. Papandreou, I. Kokkinos, K. Murphy, and A. L. [28] X. Ren, L. Bo, and D. Fox. Rgb-(d) scene labeling: Features and
Yuille. Semantic image segmentation with deep convolutional nets algorithms. In Computer Vision and Pattern Recognition (CVPR),
and fully connected crfs. arXiv preprint arXiv:1412.7062, 2014. 2, 2012 IEEE Conference on, pages 2759–2766. IEEE, 2012. 6
5, 6, 8 [29] S. Rota Bulo and P. Kontschieder. Neural decision forests for se-
[7] D. A. Cohn, Z. Ghahramani, and M. I. Jordan. Active learning with mantic image labelling. In Computer Vision and Pattern Recognition
statistical models. Journal of artificial intelligence research, 1996. 2 (CVPR), 2014 IEEE Conference on, pages 81–88. IEEE, 2014. 5
[8] C. Couprie, C. Farabet, L. Najman, and Y. LeCun. Indoor se- [30] J. Shotton, M. Johnson, and R. Cipolla. Semantic texton forests for
mantic segmentation using depth information. arXiv preprint image categorization and segmentation. In Computer vision and pat-
arXiv:1301.3572, 2013. 1 tern recognition, 2008. CVPR 2008. IEEE Conference on, pages 1–8.
[9] J. Denker and Y. Lecun. Transforming neural-net output levels to IEEE, 2008. 2
probability distributions. In Advances in Neural Information Pro- [31] J. Shotton, T. Sharp, A. Kipman, A. Fitzgibbon, M. Finocchio,
cessing Systems 3. Citeseer, 1991. 2, 3 A. Blake, M. Cook, and R. Moore. Real-time human pose recog-
[10] D. Eigen and R. Fergus. Predicting depth, surface normals and se- nition in parts from single depth images. Communications of the
mantic labels with a common multi-scale convolutional architecture. ACM, 56(1):116–124, 2013. 1, 2
arXiv preprint arXiv:1411.4734, 2014. 5, 6 [32] J. Shotton, J. Winn, C. Rother, and A. Criminisi. Textonboost for
[11] M. Everingham, L. Van Gool, C. K. Williams, J. Winn, and A. Zisser- image understanding: Multi-class object recognition and segmenta-
man. The pascal visual object classes (voc) challenge. International tion by jointly modeling texture, layout, and context. International
journal of computer vision, 88(2):303–338, 2010. 1, 6, 8, 9 Journal of Computer Vision, 81(1):2–23, 2009. 1, 2
[12] Y. Gal and Z. Ghahramani. Bayesian convolutional neural networks [33] N. Silberman, D. Hoiem, P. Kohli, and R. Fergus. Indoor segmenta-
with bernoulli approximate variational inference. arXiv:1506.02158, tion and support inference from rgbd images. In Computer Vision–
2015. 2, 3 ECCV 2012, pages 746–760. Springer, 2012. 6
[34] K. Simonyan and A. Zisserman. Very deep convolutional networks
for large-scale image recognition. arXiv preprint arXiv:1409.1556,
2014. 3
[35] S. Song, S. P. Lichtenberg, and J. Xiao. Sun rgb-d: A rgb-d scene
understanding benchmark suite. In Proceedings of the IEEE Confer-
ence on Computer Vision and Pattern Recognition, pages 567–576,
2015. 1, 2, 6, 7, 8, 9
[36] N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, and
R. Salakhutdinov. Dropout: A simple way to prevent neural net-
works from overfitting. The Journal of Machine Learning Research,
15(1):1929–1958, 2014. 2, 3, 4
[37] P. Sturgess, K. Alahari, L. Ladicky, and P. H. Torr. Combining ap-
pearance and structure from motion features for road scene under-
standing. In BMVC, volume 1, page 6, 2009. 5
[38] J. Tighe and S. Lazebnik. Superparsing. International Journal of
Computer Vision, 101(2):329–349, 2013. 5
[39] Y. Yang, Z. Li, L. Zhang, C. Murphy, J. Ver Hoeve, and H. Jiang.
Local label descriptor for example based semantic image labeling.
In Computer Vision–ECCV 2012, pages 361–375. Springer, 2012. 5
[40] F. Yu and V. Koltun. Multi-scale context aggregation by dilated con-
volutions. In ICLR, 2016. 2, 6, 8
[41] M. D. Zeiler and R. Fergus. Visualizing and understanding convolu-
tional networks. In Computer Vision–ECCV 2014, pages 818–833.
Springer, 2014. 4
[42] S. Zheng, S. Jayasumana, B. Romera-Paredes, V. Vineet, Z. Su,
D. Du, C. Huang, and P. Torr. Conditional random fields as recur-
rent neural networks. arXiv preprint arXiv:1502.03240, 2015. 2,
8

A Comprehensive Review of Modern Object Segmentation Approaches
No ratings yet
A Comprehensive Review of Modern Object Segmentation Approaches
177 pages
AML - Lecture - 11 - 19nov24
No ratings yet
AML - Lecture - 11 - 19nov24
103 pages
DL Unit 5
No ratings yet
DL Unit 5
63 pages
A Survey On Deep Learning Techniques For Image and Video Semantic Segmentation
No ratings yet
A Survey On Deep Learning Techniques For Image and Video Semantic Segmentation
68 pages
Lecture 5 - CNNs For Detection and Segmentation
No ratings yet
Lecture 5 - CNNs For Detection and Segmentation
62 pages
Harley MSC Thesis Menos Especializadpo
No ratings yet
Harley MSC Thesis Menos Especializadpo
71 pages
Object Detection and Segmentation - Part 2
No ratings yet
Object Detection and Segmentation - Part 2
36 pages
DAAI - Lecture - 15 - 23nov22
No ratings yet
DAAI - Lecture - 15 - 23nov22
113 pages
Report Explo
No ratings yet
Report Explo
31 pages
Segmentation by Gan
No ratings yet
Segmentation by Gan
18 pages
Semantic Segmentation
No ratings yet
Semantic Segmentation
22 pages
【SegFormer】NeurIPS 2021 Segformer Simple and Efficient Design for Semantic Segmentation With Transformers Paper
No ratings yet
【SegFormer】NeurIPS 2021 Segformer Simple and Efficient Design for Semantic Segmentation With Transformers Paper
14 pages
SegNeXt Rethinking Convolutional Attention Design Segmentation
No ratings yet
SegNeXt Rethinking Convolutional Attention Design Segmentation
15 pages
Overview of Semantic Segmentation
No ratings yet
Overview of Semantic Segmentation
20 pages
BSSNet A Real-Time Semantic Segmentation Network For Road Scenes Inspired From AutoEncoder
No ratings yet
BSSNet A Real-Time Semantic Segmentation Network For Road Scenes Inspired From AutoEncoder
15 pages
Haar Wavelet Downsampling
No ratings yet
Haar Wavelet Downsampling
14 pages
BASeg - Boundary Aware Semantic Segmentation For Autonomous
No ratings yet
BASeg - Boundary Aware Semantic Segmentation For Autonomous
11 pages
NeurIPS 2022 Segnext Rethinking Convolutional Attention Design For Semantic Segmentation Paper Conference
No ratings yet
NeurIPS 2022 Segnext Rethinking Convolutional Attention Design For Semantic Segmentation Paper Conference
17 pages
Main
No ratings yet
Main
13 pages
Applsci 11 08802 - Compressed
No ratings yet
Applsci 11 08802 - Compressed
28 pages
Image Segmentation Using Deep Learning A Survey
No ratings yet
Image Segmentation Using Deep Learning A Survey
20 pages
Lecture Sematic-Segmentation
No ratings yet
Lecture Sematic-Segmentation
23 pages
【SETR】Zheng Rethinking Semantic Segmentation From a Sequence-To-Sequence Perspective With Transformers CVPR 2021 Paper
No ratings yet
【SETR】Zheng Rethinking Semantic Segmentation From a Sequence-To-Sequence Perspective With Transformers CVPR 2021 Paper
10 pages
Optimisation of Semantic Segmentation Algorithm For Autonomous Driving Using U-NET Architecture
No ratings yet
Optimisation of Semantic Segmentation Algorithm For Autonomous Driving Using U-NET Architecture
16 pages
Large Kernel Matters
No ratings yet
Large Kernel Matters
11 pages
Semantic Segmentation by Using Down-Sampling and S
No ratings yet
Semantic Segmentation by Using Down-Sampling and S
14 pages
Seggpt Paper
No ratings yet
Seggpt Paper
12 pages
Dlcv2017d3l1segmentation 170623173102
No ratings yet
Dlcv2017d3l1segmentation 170623173102
36 pages
2015 - DeepLab v1 - Semantic Image Segmentation With Deep Convolutional Nets and Fully Connected Crfs
No ratings yet
2015 - DeepLab v1 - Semantic Image Segmentation With Deep Convolutional Nets and Fully Connected Crfs
14 pages
Two-Stage Framework For Faster Semantic Segmentation
No ratings yet
Two-Stage Framework For Faster Semantic Segmentation
9 pages
Image Segmentation in Deep Learning
No ratings yet
Image Segmentation in Deep Learning
12 pages
Bise Net
No ratings yet
Bise Net
17 pages
Fully Convolutional Networks For Semantic Segmentation
No ratings yet
Fully Convolutional Networks For Semantic Segmentation
12 pages
Boundary-Aware Segmentation Network For Mobile and Web Applications
No ratings yet
Boundary-Aware Segmentation Network For Mobile and Web Applications
19 pages
Image Segmentation Using Deep Learning: A Survey
No ratings yet
Image Segmentation Using Deep Learning: A Survey
23 pages
DFANet Deep Feature Aggregation For Real-Time Semantic Segmentation
No ratings yet
DFANet Deep Feature Aggregation For Real-Time Semantic Segmentation
10 pages
Deep Dual-Resolution Networks For Real-Time and Accurate Semantic Segmentation of Road Scenes
No ratings yet
Deep Dual-Resolution Networks For Real-Time and Accurate Semantic Segmentation of Road Scenes
12 pages
UNet For Semantic Segmentation - DTD - 19april2024
No ratings yet
UNet For Semantic Segmentation - DTD - 19april2024
20 pages
Bayesian Deep Learning With Monte Carlo Dropout For Qualification of Semantic Segmentation
No ratings yet
Bayesian Deep Learning With Monte Carlo Dropout For Qualification of Semantic Segmentation
4 pages
2018 - SeGAN - Adversarial Network With Multi-Scale L 1 Loss For Medical
No ratings yet
2018 - SeGAN - Adversarial Network With Multi-Scale L 1 Loss For Medical
10 pages
Mobilepotrait c&g19
No ratings yet
Mobilepotrait c&g19
10 pages
Image Segmentation ÔÇö A BeginnerÔÇÖs Guide - Medium
No ratings yet
Image Segmentation ÔÇö A BeginnerÔÇÖs Guide - Medium
16 pages
Fully Convolutional Networks For Semantic Segmentation
No ratings yet
Fully Convolutional Networks For Semantic Segmentation
12 pages
BSBITU402 Learner Guide NewPi
No ratings yet
BSBITU402 Learner Guide NewPi
90 pages
Alexnet Tugce Kyunghee
No ratings yet
Alexnet Tugce Kyunghee
35 pages
F Test
No ratings yet
F Test
10 pages
Recent Progress in Semantic Image Segmentation: Xiaolong Liu Zhidong Deng Yuhan Yang
No ratings yet
Recent Progress in Semantic Image Segmentation: Xiaolong Liu Zhidong Deng Yuhan Yang
18 pages
Image Segmentation Keras: Implementation of Segnet, FCN, Unet, Pspnet and Other Models in Keras
No ratings yet
Image Segmentation Keras: Implementation of Segnet, FCN, Unet, Pspnet and Other Models in Keras
5 pages
Deep Semantic Segmentation New Model of Natural and Medical Images
No ratings yet
Deep Semantic Segmentation New Model of Natural and Medical Images
4 pages
Deep Semantic Segmentation New Model of Natural and Medical Images
No ratings yet
Deep Semantic Segmentation New Model of Natural and Medical Images
4 pages
Brain Function
No ratings yet
Brain Function
68 pages
SegNet: A Deep Convolutional Encoder-Decoder Architecture For Image Segmentation
No ratings yet
SegNet: A Deep Convolutional Encoder-Decoder Architecture For Image Segmentation
15 pages
6 Segnet
No ratings yet
6 Segnet
14 pages
Weakly Supervised Semantic Segmentation Using Superpixel Pooling Network
No ratings yet
Weakly Supervised Semantic Segmentation Using Superpixel Pooling Network
7 pages
W-Net A Deep Model For Fully Unsupervised Image Segmentation
No ratings yet
W-Net A Deep Model For Fully Unsupervised Image Segmentation
13 pages
2014-2015 Ece Consoldate
No ratings yet
2014-2015 Ece Consoldate
579 pages
Fully Convolutional Networks For Semantic Segmentation: Jonathan Long Evan Shelhamer Trevor Darrell UC Berkeley
No ratings yet
Fully Convolutional Networks For Semantic Segmentation: Jonathan Long Evan Shelhamer Trevor Darrell UC Berkeley
10 pages
A Comparative Study of Real-Time Semantic Segmentation For Autonomous Driving
No ratings yet
A Comparative Study of Real-Time Semantic Segmentation For Autonomous Driving
11 pages
Mine Wastes PDF
No ratings yet
Mine Wastes PDF
18 pages
Thesis AlexanderJaus BIBTEX
No ratings yet
Thesis AlexanderJaus BIBTEX
9 pages
Parts of A Bar.
100% (2)
Parts of A Bar.
3 pages
Image Segmentation DeepLearning
No ratings yet
Image Segmentation DeepLearning
18 pages
Dental X-Ray Image Segmenation Using A U-Shaped Deep Convolutional Network
No ratings yet
Dental X-Ray Image Segmenation Using A U-Shaped Deep Convolutional Network
13 pages
Creating Photorealistic Images With AI V1
100% (1)
Creating Photorealistic Images With AI V1
182 pages
Mpumalanga Geography - Desktop Research Task 2024
No ratings yet
Mpumalanga Geography - Desktop Research Task 2024
13 pages
Treatment Planning 02
No ratings yet
Treatment Planning 02
56 pages
Cakes
100% (1)
Cakes
21 pages
Nervous System Powerpoint Slides
No ratings yet
Nervous System Powerpoint Slides
47 pages
1 Image Segmentation Using Deep Learning
No ratings yet
1 Image Segmentation Using Deep Learning
6 pages
Uncreditted Remittances
No ratings yet
Uncreditted Remittances
67 pages
Overview of Lean Six Sigma: Presented by The University of Texas-School of Public Health
No ratings yet
Overview of Lean Six Sigma: Presented by The University of Texas-School of Public Health
22 pages
Coficients of Positive Moment For Continous Beam
No ratings yet
Coficients of Positive Moment For Continous Beam
44 pages
I-Arch: Nata Exam Syllabus
100% (1)
I-Arch: Nata Exam Syllabus
4 pages
Iit-Jee Prepration: A Complete Guide by Students Helper
No ratings yet
Iit-Jee Prepration: A Complete Guide by Students Helper
7 pages
Machine Learning - Advanced Concepts
From Everand
Machine Learning - Advanced Concepts
Derrick Mwiti
No ratings yet
Generalizability of Semantic Segmentation Techniques: Keshav Bhandari Texas State University, San Marcos, TX
No ratings yet
Generalizability of Semantic Segmentation Techniques: Keshav Bhandari Texas State University, San Marcos, TX
6 pages
Reformer Furnace 02
No ratings yet
Reformer Furnace 02
8 pages
Electrical Noise and Transients: Application Note
No ratings yet
Electrical Noise and Transients: Application Note
5 pages
BIOL122 Final Unit Outline Canberra
No ratings yet
BIOL122 Final Unit Outline Canberra
15 pages
IL Presentation
No ratings yet
IL Presentation
13 pages
Cala NJ Test Prep
No ratings yet
Cala NJ Test Prep
2 pages
Sie Gujrat WP PDWP
No ratings yet
Sie Gujrat WP PDWP
13 pages
En SCC Schirmklemmen LoRes
No ratings yet
En SCC Schirmklemmen LoRes
8 pages
How To Use Electrical Wiring Diagram: Section 1
No ratings yet
How To Use Electrical Wiring Diagram: Section 1
3 pages
Active Low-Pass Filters
No ratings yet
Active Low-Pass Filters
9 pages
ST 2SA928: PNP Silicon Epitaxial Planar Transistor
No ratings yet
ST 2SA928: PNP Silicon Epitaxial Planar Transistor
1 page
BME Assessment 02 QP
No ratings yet
BME Assessment 02 QP
2 pages
Truth Between Friends
No ratings yet
Truth Between Friends
9 pages
Press Release - RECARO Aircraft Seating AIX 2024 Preview
No ratings yet
Press Release - RECARO Aircraft Seating AIX 2024 Preview
3 pages
Rle Requirements Wardspcl Area NCP DS
No ratings yet
Rle Requirements Wardspcl Area NCP DS
3 pages

Bayesian Segnet: Model Uncertainty in Deep Convolutional Encoder-Decoder Architectures For Scene Understanding

Uploaded by

Bayesian Segnet: Model Uncertainty in Deep Convolutional Encoder-Decoder Architectures For Scene Understanding

Uploaded by

Bayesian SegNet: Model Uncertainty in Deep Convolutional Encoder-Decoder

Architectures for Scene Understanding

Alex Kendall Vijay Badrinarayanan Roberto Cipolla

Abstract Input Images

We present a deep learning framework for probabilistic

RGB Image Conv + Batch Normalisation + ReLU

Weight Monte Carlo Training

82 87.5 computed in parallel on a GPU this cost can be reduced for

Global Accuracy (%)

pling comes at the expense of inference time, but when mi.eng.cam.ac.uk/projects/segnet/

Table 6: Bayesian SegNets accuracy as a function of confidence

Figure 7: Bayesian SegNet performance compared to mean

You might also like