0% found this document useful (0 votes)
44 views7 pages

Deep Learning-Based Semantic Segmentation in Autonomous Driving

This document provides information about the 2021 IEEE 23rd International Conference on High Performance Computing & Communications; 7th International Conference on Data Science & Systems; 19th International Conference on Smart City; 7th International Conference on Dependability in Sensor, Cloud & Big Data Systems & Application. Specifically, it references a paper presented at the conference titled "Deep Learning-Based Semantic Segmentation in Autonomous Driving" by authors Hrag-Harout Jebamikyous, Rasha Kashef, and Dept of Electrical and Computer Engineering.

Uploaded by

prajna acharya
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
44 views7 pages

Deep Learning-Based Semantic Segmentation in Autonomous Driving

This document provides information about the 2021 IEEE 23rd International Conference on High Performance Computing & Communications; 7th International Conference on Data Science & Systems; 19th International Conference on Smart City; 7th International Conference on Dependability in Sensor, Cloud & Big Data Systems & Application. Specifically, it references a paper presented at the conference titled "Deep Learning-Based Semantic Segmentation in Autonomous Driving" by authors Hrag-Harout Jebamikyous, Rasha Kashef, and Dept of Electrical and Computer Engineering.

Uploaded by

prajna acharya
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

2021 IEEE 23rd Int Conf on High Performance Computing & Communications; 7th Int Conf on Data Science

& Systems; 19th Int Conf on Smart City; 7th Int Conf on Dependability in Sensor, Cloud & Big Data Systems & Application (HPCC/DSS/SmartCity/DependSys) | 978-1-6654-9457-1/21/$31.00 ©2021 IEEE | DOI: 10.1109/HPCC-DSS-SMARTCITY-DEPENDSYS53884.2021.00206
2021 IEEE Conf on High Performance Computing &
IEEE 23rd Int Conf & Communications; 7th Int Conf & Systems; 19th
Conf on Data Science & 19th
Conf on Smart City; 7th Int Conf
Int Conf & Big Data Systems
Conf on Dependability in Sensor, Cloud & & Application
Systems &

 
Deep   
Learning-B ased  Segmentation
Semantic    in

  
Autonomous Driving

  
Hrag-Harout 
Jebamikyous  Kashef
Rasha 
  of
Dept   and
Electrical   
Computer  of
Dept   and
Electrical   
Computer 
 
Engineering  
Engineering 
 
Ryerson 
University  
Ryerson 
University 
   
Toronto, Canada 
Toronto,   
Canada 
  
[email protected]  
rkashef@ryerson .ca


-%#'"!
A bstract-Perception & ' first
is the %&' !
and most"&' important
#"%'!' '&
task "
of 735:B;J7>;@3@;?397FA357DF3;@5>3EE3@63EE;9@;@93G@;CG7
each pixel in an image to a certain class and assigning a unique
!,
any ('"!"
autonomous "(& %)!
driving &,&'
system. It' +'%'&
extracts )&(
visual !"% '"!
information 5A>ADFA735:5>3EE
color to each class.
"(''&(%%"(!!!)%"!
about the surrounding environment !'"')#%#'"!
of the vehicle. The perception
'
data &
is '!
then 
fed '"to 
a &"! ! &,&'
decision-making system '"to #%")
provide '
the @F:;EB3B7DI7F35=>76E7?3@F;5E79?7@F3F;A@GE;@93H7DK
In this paper, we tackled semantic segmentation using a very
"#'
optimum(  &"!
decision )!
given 
a &#
specific &!%"
scenario '"
to )"
avoid #"'!'
potential I7>> =@AI@E7?3@F;5E79?7@F3F;A@?A67>GE768AD4;A?76;53>
well-known semantic segmentation model used for biomedical
"&"!&
collisions. In !'&##%*))"#)%!'&"''
this paper, we have developed variants of the U-Net ;?397E79?7@F3F;A@F3E=E53>>76+ %7F *:7@3?7A8F:7?A67>
image segmentation tasks, called U-Net. The name ofthe model
"'"#%"%
model to perform& !'&
semantic !''"!"!(%!&!
segmentation on urban scene images & ;E;@EB;D764KF:7E:3B7A8F:73D5:;F75FGD7I:;5:>AA=E>;=7F:7
is inspired by the shape of the architecture, which looks like the
'"(!%&'!'&(%%"(!!&"!('"!"
to understand the surroundings of an autonomous "(&)
vehicle. The U­ >7FF7D
letter +U.  *:7
The + %7F ?A67>
U-Net model ;E
is A@7
one A8
of F:7
the 87I
few 7J;EF;@9
existing
' "!'&)%!'&%"#'"%&
Net model and its variants are adopted for semantic !'& !''"!
segmentation
3D5:;F75FGD7EI:;5:B7D8AD?I7>>A@E?3>>63F3E7FE3@6I3E@AF
architectures which perform well on small datasets and was not
in this project to account for the power of the UNet in handling
!'&#%"''""(!'"%'#"*%"''!!!
%
BD7H;AGE>K
previously F7EF76
tested ;@
in 3@
an 3GFA@A?AGE 6D;H;@9 E57@3D;A
autonomous driving I;F: 3
scenario with a
large !
and &small  '&'&
datasets. 
We )
have &"
also " #% '
compared the &'
best­
#%"%
performing ! )%!'
variant *'
with "'% "
other commonly "!, (&
used & !'
semantic
>3D97@G?47DA85>3EE7E3@63E?3>>@G?47DA8FD3;@;@9;?397E
large number of classes, and a small number of training images.
& !''"! models.
segmentation "& 
The " #%') !,&&
comparative analysis *&
was #%"%
performed  8F7DFD3;@;@9?G>F;B>7+
After training multiple U-Net%7F?A67>EI;F:6;887D7@F35F;H3F;A@
models with different activation
(&!
using '%
three *!"*!
well-known models,"& !(!
including FCN-16, FCN-8,  !
and 8G@5F;A@E
functions, D79G>3D;L3F;A@
regularization F75:@;CG7E
techniques, 3@6
and 6;887D7@F
different 67BF:E
depths, I7
we
'
SegNet. After'%"!('!&!&')',!"
conducting sensitivity and comparative#%')!,&&'
analysis, it BDAH76F:3F+
proved that U-Net%7F5AG>6:3H73BDA?;E;@98GFGD7;@F:78;7>6A8
could have a promising future in the field of
&"!('''')%!'&#%"%
is concluded that the U-Net variants performed '&'!'%
the best in terms & 3GFA@A?AGE6D;H;@93@6E57@7G@67DEF3@6;@96G7FA;FE34;>;FK
autonomous driving and scene understanding due to its ability
"' !'%&'"!")%!"!
of the Intersection ")('"!
over Union (loU) evaluation metric'%!"'%
and other FA3@EI7DF:7N-:3FO3@6N-:7D7OF:7A4<75FCG7EF;A@E
to answer the "What" and "Where" the object questions.
$(', '%&
quality metrics.
*AF:747EFA8AGD=@AI>7697@AD7E73D5:IAD=:;9:>;9:FEF:7
To the best of our knowledge, no research work highlights the
 Driving,
Keywords-Autonomous  Semantic
 Segmentation,
 U­ GE7A8F:7+ %7F?A67>;@,'I;F:3@7JF7@E;H75A?B3D;EA@
use of the U-Net model in AVP, with an extensive comparison
 FCN,
Net,  SegNet,
 
Encoder-Decoder. I;F: AF:7D
with other 5A??A@>K
commonly GE76
used E7?3@F;5
semantic E79?7@F3F;A@
segmentation ?A67>E
models.
*:GEF:7?3;@5A@FD;4GF;A@EA8F:;EB3B7D3D7
Thus, the main contributions of this paper are:
I. INTRODUCTION

1) )GDH7K;@9
Surveying F:7
the ?AEF
most D757@F
recent D7E73D5:
research IAD=
work ;@
in )7?3@F;5
Semantic
E
As 3GFA?AF;H7
automotive F75:@A>A9K
technology 7HA>H7E
evolves, F:7
the 67?3@6
demand 8AD
for )79?7@F3F;A@A8GD43@3D73E
Segmentation of urban areas. 
GFA@A?AGE,7:;5>7E,I;F:6;887D7@F>7H7>EA83GFA@A?K
Autonomous Vehicles (AV) with different levels of autonomy 2) G;>6;@98;H7H3D;3@FEA8F:7+ %7F?A67>
Building five variants of the U-Net model
;E
is ;@5D73E;@9
increasing 6G7
due FA
to F:7
the ;@5D73E7
increase ;@
in 388AD634;>;FK
affordability 3@6
and 
3) G;>6;@9FIAH3D;3@FEA8F:7)79%7F%
Building two variants of the SegNet, FCN- 1 3@6%
6, and FCN-8
3557EE;4;>;FK;@6;887D7@FD79;A@E3DAG@6F:79>A47
accessibility in different regions around the globe. *:7;@5D73E7
The increase ?A67>E
models
;@F:7@G?47DA8,EI;>>D7EG>F;@3E387D6D;H;@97JB7D;7@57
in the number of AVs will result in a safer driving experience 
4) JF7@E;H7
Extensive E7@E;F;H;FK
sensitivity 3@6
and 5A?B3D3F;H7
comparative 3@3>KE;E
analysis A8
of
3@687I7D;@<GD;7E3@6673F:E6G7FA?;EF3=7E?3674K:G?3@
and fewer injuries and deaths due to mistakes made by human 6;887D7@F?A67>E
different models.
6D;H7DE
drivers. 7B>AK;@9355GD3F73@6788;5;7@F677B>73D@;@9?A67>E
Deploying accurate and efficient deep learning models
*:7 D7EF A8
The rest of F:;E
this D7BADF
report ;E
is AD93@;L76
organized 3E
as 8A>>AIE
follows : )75F;A@
Section II. 
FD3;@76A@>3D97D73> IAD>663F3E7FEI;F:H3D;AGEE57@3D;AE;E3@
trained on large real-world datasets with various scenarios is an
6;E5GEE76
discussed >;F7D3FGD7
literature D7H;7I
review 3@6
and D7>3F76
related IAD=
work, )75F;A@
Section III. 
7EE7@F;3>B3DFA83GFA@A?AGE6D;H;@9FA7@EGD7F:7E387FKA8F:7
essential part of autonomous driving to ensure the safety of the
;@FDA6G57E
introduces theF:7 BDABAE76
proposed + %7F ?A67>E
U-Net models GE76
used, theF:7 36ABF76
adopted
6D;H7DF:7B3EE7@97DE3@6F:7B767EFD;3@E
driver, the passengers, and the pedestrians. 
?A67>E
models for8AD 5A?B3D;EA@
comparison.  )75F;A@ , E:AIE
Section IV F:7 7JB7D;?7@F3>
shows the experimental
*:7
The GFA@A?AGE
Autonomous ,7:;5>7
Vehicle '7D57BF;A@
Perception ,'
(AVP) 7JFD35FE
extracts H;EG3>
visual D7EG>FE
results, 3@6
and 8;@3>>K
finally, )75F;A@
Section ,
V 5A@5>G67E
concludes F:7
the B3B7D
paper 3@6
and
;@8AD?3F;A@34AGFF:7EGDDAG@6;@97@H;DA@?7@FA8F:7H7:;5>7
information about the surrounding environment of the vehicle.  6;E5GEE7E8GFGD76;D75F;A@E
discusses future directions.
*:7 B7D57BF;A@ 63F3
The perception data ;E F:7@ 876
is then fed FA
to 3
a 677B
deep >73D@;@9
learning ?A67>
model FA
to
?3=7F:7ABF;?G?675;E;A@ The main two tasks that an AVP
make the optimum decision.*:7?3;@FIAF3E=EF:3F3@,' II.  ( -&("
RELATED WORK
EKEF7?
system B7D8AD?E
performs 3D7
are &4<75F 7F75F;A@ 3@6
Object Detection and )7?3@F;5
Semantic ,'
AVP EKEF7?E
systems rely D7>K :73H;>K
heavily A@on E7?3@F;5
semantic E79?7@F3F;A@
segmentation FA to
)79?7@F3F;A@
Segmentation.  &4<75F
Object 67F75F;A@
detection 1[ 1 21 212 ;E
] [2][3] is F:7
the F3E=
task A8
of @3H;93F7
navigate F:DAG9:
through GD43@
urban 3D73E
areas.  )7?3@F;5
Semantic E79?7@F3F;A@
segmentation 3EE;9@E
assigns
5>3EE;8K;@93@6>A53F;@93@A4<75F;@3@;?3973@6F:7@6D3I;@9
classifying and locating an object in an image and then drawing 735:
each B;J7>
pixel ;@ F:7 ;?397
in the image FAto 3a B3DF;5G>3D
particular 5>3EE
class.  >>
All B;J7>E
pixels
34AG@6;@94AJ3DAG@6F:3FA4<75F
a bounding box around that object.&@F:7AF:7D:3@6)7?3@F;5
On the other hand, Semantic 47>A@9;@9FA3EB75;8;55>3EE3D73EE;9@76FA3E;@9>75A>AD3E
belonging to a specific class are assigned to a single color, as
)79?7@F3F;A@
Segmentation B7D8AD?E
performs B7D B;J7> 5>3EE;8;53F;A@
per-pixel classification 4Kby 5>3EE;8K;@9
classifying E:AI@
shown ;@in ;9
Fig. 1, trees
FD77E are
3D7 B3;@F76
painted 9D77@ DA36E are
green, roads 3D7 B3;@F76
painted
4DAI@53DE3D7B3;@F76D767F5
brown, cars are painted red, etc. $AEFE7?3@F;5E79?7@F3F;A@
Most semantic segmentation

978-1-6654-9457-1/21/$3 1 .00 ©2021 IEEE


978-1-6654-9457-1/21/$31.00 1367
1367
DOl 10.1109/HPCC-DSS-SmartCity-DependSys53884.2021.00206
DOI 1 0.1 109/HPCC-DSS-SmartCity-DependSys53 884.2021 .00206
Authorized licensed use limited to: PES University Bengaluru. Downloaded on December 12,2023 at 15:24:38 UTC from IEEE Xplore. Restrictions apply.
?A67>E3D743E76A@677B>73D@;@93D5:;F75FGD7EI:;5::7>BF:7
models are based on deep learning architectures, which help the 5:3@@7>EI;F:F:7677B873FGD75:3@@7>E
channels with the deep feature channels.*:7?A67>I3EFD3;@76
The model was trained
?A67>E
models G@67DEF3@6
understand 5A?B>7J
complex ;?397E
images' 5A?B>7J
complex 873FGD7E
features 3@6
and A@?A4;>7>3E7DE53@@;@9$#)BA;@F5>AG6FA7JFD35FF:7DA36
on mobile laser scanning (MLS) point cloud to extract the road
B3FF7D@E
patterns. ?3D=;@9E
markings. *:7;D
Their 7JB7D;?7@F3>
experimental D7EG>FE
results :3H7
have E:AI@
shown F:3Fthat F:7;D
their
 ?7F:A6AGFB7D8AD?76EF3F7 A8 F:7 3DF?A67>E;@DA36?3D=;@9E
method outperformed state-of-the-art models in road markings
 E79?7@F3F;A@
segmentation. A 
3D E7?3@F;5
semantic E79?7@F3F;A@
segmentation A8 of BA;@F
point 5>AG6E
clouds
 43E76
based A@
on 677B
deep >73D@;@9
learning ;E
is ;@FDA6G576
introduced ;@in 12
[9] . *:7
The 3GF:ADE
authors
 5A@6G5F76
conducted 3 a 5A?B3D3F;H7
comparative EFG6K
study GE;@9
using F:D77
three E7?3@F;5
semantic
 E79?7@F3F;A@3>9AD;F:?E)'D3B:'A;@F%7F3@6'A;@F%%
segmentation algorithms, SPGraph, PointNet, and PointCNN.
 >>F:D77?A67>EI7D7FD3;@763@6F7EF76A@3@AGF6AAD37D;3>
All three models were trained and tested on an outdoor aerial
 EGDH7KBA;@F5>AG663F3E7F3@6F:7KGE76F:7AH7D3>>355GD35K
survey point cloud dataset, and they used the overall accuracy
 3EF:7;D7H3>G3F;A@?7FD;5
as their evaluation metric.JB7D;?7@F3>D7EG>FE:3H7E:AI@F:3F
Experimental results have shown that
 )'D3B:
SPGraph, 'A;@F%7F
PointNet, 3@6
and 'A;@F%%
PointCNN E5AD76
scored   
83 .4%, 8 3 %, 3@6
and
 72.7%
 AH7D3>>
overall 355GD35K
accuracy, D7EB75F;H7>K
respectively . In
@ 1[ 1 0],
2 F:7
the 3GF:ADE
authors
 5A?4;@76
combined 677Bdeep >73D@;@9
learning 3@6
and 5>3EE;53>
classical E79?7@F3F;A@
segmentation 8AD for
 E7?3@F;5
semantic E79?7@F;@9
segmenting F:7
the 3GFA@A?AGE
autonomous 6D;H;@9
driving ;?397E
images. *:;E
This
 @AH7>
novel 5A?4;@3F;A@
combination ?7F:A6
method 4Kby 6;H;6;@9
dividing F:7
the ;?397
image ;@FA
into ;FE
its
(&$*"+0(#$&*$+0"0(,+,+(03#"-$/"0"/$0 !
Fig. 1 . Semantic Segmentation on CityScapes Dataset. [ 4] 5A@EF;FG7@FEQD79;A@EGE;@95>3EE;53>E79?7@F3F;A@;?BDAH76F:7
constituents ' regions using classical segmentation improved the
 D7EG>FEA8F:777B#34H@7FIAD=
results of the DeepLab v3+ network.*:7BDABAE76?7F:A6I3E
The proposed method was
@12F:73GF:ADE;?B>7?7@F763@7@5A67D
In 675A67D 43E76677B
[ 5 ] , the authors implemented an encoder-decoder-based deep FD3;@76
trained I;F:
with FIA
two 435=4A@7
backbone @7FIAD=E
networks, $A4;>7%7F,
MobileNetV2 3@6 and
A@HA>GF;A@3>
Convolutional %7GD3>Neural %7FIAD=
Network %%
(CNN) ?A67>
model. *:7
The 7@5A67D
encoder .57BF;A@
Xception, A@
on F:7
the N;FKE53B7EO
"Cityscapes" 63F3E7F
dataset 3@6
and E:AI76
showed BDA?;E;@9
promising
@7FIAD=;EE;?;>3DFAF:7,
network is similar to the VGG- 13D5:;F75FGD7
6 architecture. It
F5A@E;EFEA8
consists of 1
3 D7EG>FE
results.*:75A?B3D;EA@47FI77@F:7D757@F>;F7D3FGD7B3B7DE;@
The comparison between the recent literature papers in
5A@HA>GF;A@>3K7DE735:>3K7D8A>>AI764K3?3J
convolution layers, each layer followed by a max-pooling BAA>;@9>3K7D
layer F7D?EA83>9AD;F:?EGE763@6F:763F3E7FE;EE:AI@;@*34>7
terms of algorithms used and the datasets is shown in Table I.
FA675D73E7F:7E;L7A8F:7873FGD7?3BE
to decrease the size of the feature maps.(7E;6G3>>73D@;@9I3E
Residual learning was
GE76FAB7D8AD?7>7?7@F
used to perform element-wise I;E7366;F;A@3@6E:ADF5GF5A@@75F;A@
addition and shortcut connection 
TABLE I.     REVIEW
LITERATURE   
COMPARISON
FA
to BD7E7DH7
preserve EB3F;3>
spatial ;@8AD?3F;A@
information. *:7
The 5ADD7EBA@6;@9
corresponding 675A67D
decoder
Reference
 Alg
 orithm
  Dataset

@7FIAD=
network 5A@E;EFE
consists A8
of 1
3 67 5A@HA>GF;A@3> >3K7DE
de-convolutional layers, 735:
each >3K7D
layer
8A>>AI764K3@GB E3?B>;@9>3K7D
followed by an up-sampling layer.*:7?A67>I3EFD3;@763@6
The model was trained and 12
[5] VGG16
VGG 1 6 &
& Cityscapes
Cityscapes &
&
F7EF76A@FIA63F3E7FEN3?,;6O3@6N;FK)53B7EO
tested on two datasets, "CamVid" and "CityScapes".*:7K3>EA They also
Residual
Residual Encoder-
Encoder- CamVid
CamVid
;?B>7?7@F76%7F3@6)79%7F?A67>EFAB7D8AD?5A?B3D3F;H7
implemented ENet and SegNet models to perform comparative
Decoder
Decoder
3@3>KE;E3@6BDAH76F:3FF:7;DBDABAE76?A67>AGFB7D8AD?76F:7
analysis and proved that their proposed model outperformed the [6]
[6] FCN,
FCN, SegNet,
SegNet, Cityscapes
Ci tyscapes &
&
FIAAF:7D?A67>E
two other models. Enet,
Enet, ERFNet
ERFNet CamVid
CamVid

*:73GF:ADE;@12F35=>76F:7BDA4>7?A8;9@AD;@9F:76;887D7@F
The authors in [6] tackled the problem of ignoring the different [7]
[7] SegNet
SegNet CamVid
CamVid
;?BADF3@57
importance >7H7>E
levels A8
of 5>3EE7E
classes ;@
in ?AEF
most E7?3@F;5
semantic E79?7@F3F;A@
segmentation
?A67>E [8]
[8] DFPN
DFPN Self-collected
Self-collected
models.AD;@EF3@57E79?7@F;@9B767EFD;3@E3@653DE3D7?AD7
For instance, segmenting pedestrians and cars are more
;?BADF3@F
important F:3@
than E79?7@F;@9
segmenting F:7
the E=K
sky . *A
To 3HA;6
avoid 53F3EFDAB:;5
catastrophic [9]
[9] PointNet,
PointNet, Fused
Fused 3D
3D point
point
5A>>;E;A@E
collisions, 53DE
cars 3@6
and B767EFD;3@E
pedestrians, 3@6
and ?3@K
many AF:7D
other 7EE7@F;3>
essential PointCNN,
PointCNN, cloud
cloud
5>3EE7E?GEF47E79?7@F763E355GD3F7>K3EBAEE;4>7
classes must be segmented as accurately as possible.*AF35=>7To tackle SPGraph
SPGraph
F:;E
this BDA4>7?
problem, F:7
the 3GF:ADE
authors BDABAE76
proposed 3 a >AEE
loss 8G@5F;A@
function 53>>76
called [10] DeepLab
[ 10] DeepLab v3+
v3+ Cityscapes
Cityscapes
P' Importance-Aware
?BADF3@57 I3D7#AEEQ #FA7?B:3E;L7F:7;?BADF3@57
Loss' (IAL) to emphasize the importance
A8
of 5D;F;53>
critical A4<75FE
obj ects ;@
in GD43@
urban E57@3D;AE
scenarios. AGD
Four E7?3@F;5
semantic [11]
[ 1 1] @7FA@@7F
Enet & Bonnet ;FKE53B7E+)/
Cityscapes & USYD
E79?7@F3F;A@?A67>EI7D7FD3;@76GE;@9F:7
segmentation models were trained using the IAL #>AEE8G@5F;A@
loss function,
[12] 88;5;7@F%7F ;FKE53B7E
[ 12] Efficient Net Cityscapes
@3?7>K
namely, )79%7F
SegNet, %7F
ENet, %
FCN, 3@6
and (%7F
ERFNet, 3@6and F7EF76
tested F:7E7
these
?A67>EA@N;FK)53B7EO3@6N3?,;6O63F3E7FE
models on "CityScapes" and "CamVid" datasets.JB7D;?7@F3>
Experimental [13]
[ 13 ] (A36'DA8;>7
Road Profile )7>8 5A>>75F76
Self-collected
D7EG>FE:3H7E:AI@F:3FF:7BDABAE76>AEE8G@5F;A@;?BDAH76F:7
results have shown that the proposed loss function improved the )7?3@F;5
Semantic
E79?7@F3F;A@D7EG>FEA@F:7;?BADF3@F5>3EE7E
segmentation results on the important classes. )79?7@F3F;A@
Segmentation

@12F:73GF:ADE4G;>F3AD?G>3
In [7] , the authors built a Formula-SAE)7>75FD;553D7CG;BB76
electric car equipped [14]
[ 1 4] RFNet
RFNet Cityscapes
Cityscapes &
& Lost
Lost
I;F:
with 3 a #;(
LiDAR E7@EAD
sensor, 4GF
but F:7
the #;(
LiDAR E7@EAD
sensor 5AG>6
could @AF
not and
and Found
Found
355GD3F7>K
accurately 67F75F
detect F:7
the DA36
road 7697E
edges 3@6
and DA36
road >3@7
lane ?3D=;@9E
markings . *A
To 
EA>H7
solve F:3F
that BDA4>7?
problem, F:7K
they ;@EF3>>76
installed 3@6
and 53>;4D3F76
calibrated 3
a >AI 5AEF
low-cost
?A@A5G>3D
monocular 53?7D3
camera 3@6
and GE76
used F:7
the )79%7F
SegNet ?A67>
model FA
to 67F75F
detect F:7
the III. #+
LEVERAGING %
U-NET FOR + )SCENE
URBAN )SEGMENTATION

34AH7 ?7@F;A@765>3EE7E355GD3F7>K
above-mentioned classes accurately.*:77JB7D;?7@F3>D7EG>FE
The experimental results *AB7D8AD?E7?3@F;5E79?7@F3F;A@8ADE57@7G@67DEF3@6;@9
To perform semantic segmentation for scene understanding
A@F:7N3?,;6O63F3E7FBDAH76F:3FF:7;D?7F:A6;?BDAH76F:7
on the "CamVid" dataset proved that their method improved the ;@
in 3GFA@A?AGE
autonomous H7:;5>7E
vehicles, I7
we :3H7
have ;?B>7?7@F76
implemented 8;H7
five 6;887D7@F
different
B7D8AD?3@57A867F75F;@9DA367697E3@6DA36>3@7?3D=;@9E
performance of detecting road edges and road lane markings. H3D;3F;A@E
variations A8
of F:7
the + %7F ?A67>
U-Net model. *:7The + %7F ?A67>
U-Net model I3E
was
BD7H;AGE>K67E;9@763@6;?B>7?7@F767J5>GE;H7>K8AD?76;53>
previously designed and implemented exclusively for medical
*:7
The 3GF:ADE
authors ;@
in 12
[8] ;@FDA6G576
introduced 3 a 7@E7
Dense 73FGD7
Feature 'KD3?;6
Pyramid
;?397
image E79?7@F3F;A@
segmentation F3E=E
tasks 1[ 12
5] . E
As F:7
the ?A67>E
model's @3?7
name ?3K
may
%7FIAD=
Network '%
(DFPN) 43E76
based 677B
deep >73D@;@9
learning ?A67>
model FA
to 355GD3F7>K
accurately
;?B>KF:7?A67>3D5:;F75FGD7;9
imply, the model architecture (Fig.2) :3EF:7>7FF7DP+QE:3B7
has the letter 'U' shape.
7JFD35FF:7DA36?3D=;@9E4K5A@53F7@3F;@9F:7E:3>>AI873FGD7
extract the road markings by concatenating the shallow feature

1368
1368

Authorized licensed use limited to: PES University Bengaluru. Downloaded on December 12,2023 at 15:24:38 UTC from IEEE Xplore. Restrictions apply.

-7
We 3>EA
also FD3;@76
trained FIA
two ?AD7
more + %7F H3D;3@FE
U-Net variants 3@6
and 53>>76
called F:7?
them

m-
N#A@9+ %7FO4753GE7I736676FIA>3K7DEA@F:75A@FD35F;@9
"Long U-Net", because we added two layers on the contracting
B3F:
path 3@6
and FIA
two >3K7DE
layers A@
on F:7
the 5ADD7EBA@6;@9
corresponding 7JB3@E;H7
expansive B3F:
path, 3E
as
E:AI@;@;9
shown in Fig.4.*A:7>BF:7?A67>97@7D3>;L747FF7DI7GE763
To help the model generalize better, we used a
D79G>3D;L3F;A@F75:@;CG753>>76DABAGF
regularization technique called Dropout.-7FD3;@76A@7N#A@9
We trained one "Long
+ %7FO?A67>I;F:3DABAGFD3F7A8
U-Net" model with a Dropout rate of 0 3@6FD3;@763@AF:7D
. 5 and trained another
N#A@9+ %7FOI;F:3DABAGFD3F7A8
"Long U-Net" with a Dropout rate of 0.7FA3@3>KL7F:7?A67>
to analyze the model
B7D8AD?3@57
performance.

A.  #$'#
The Adopted Models &
-7
We :3H7
have 36ABF76
adopted F:D77
three 5A??A@>K
commonly GE76 used E7?3@F;5
semantic
E79?7@F3F;A@
segmentation ?A67>E
models: )79%7F
SegNet, %
FCN- 1
6 , 3@6
and %  FA
FCN-8, to
. __ _ ;

• u.-c-; -
5A?B3D7
compare F:7;D
their B7D8AD?3@57
performance I;F:
with F:7
the 47EF
best B7D8AD?;@9
performing + %7F
U-Net
 ?A67>
model.&F:7DF:3@F:7AD;9;@3>F:D77?A67>EI7:3H74G;>FF:D77
Other than the original three models, we have built three
  AF:7D6;887D7@FI;F:F:7A@>K6;887D7@57;EF:7DABAGFF75:@;CG7
other different with the only difference is the Dropout technique
(&'$$0.#'(0$#01.$
Fig. 2. The U-Net Architecture 36676FA735:?A67>4753GE7F:747EFB7D8AD?;@9+ %7F;EF:7
added to each model because the best performing U-Net is the

A@7I;F:3DABAGFD3F7A8
one with a Dropout rate of 0.5 
.
*:7
The + %7F 5A@E;EFE
U-Net consists A8
of FIA
two B3F:E
paths, 3
a 5A@FD35F;@9
contracting B3F:
path 3@6
and 3@
an

7JB3@E;H7
expansive B3F:
path. *:7
The 5A@FD35F;@9
contracting B3F:
path 3>EA
also 53>>76
called F:7
the 6AI@
down­

E3?B>;@9B3F:5A@E;EFEA8D7B73F76FIAJ5A@HA>GF;A@EI;F:
sampling path, consists of repeated two 3 x 3 convolutions, with
3a (75F;8;76
Rectified #;@73D
Linear +@;F
Unit (7#+
(ReLU) 3Eas F:7;D
their 35F;H3F;A@
activation 8G@5F;A@
function,
8A>>AI764K3
followed by a 2J x 2?3JBAA>;@9AB7D3F;A@I;F:3EFD;67A8
max pooling operation with a stride of 2
GE76
used FA
to 6AI@ E3?B>7  35:
down-sample. Each 6AI@ E3?B>;@9 EF7B
down-sampling step ;@
in F:7
the
5A@FD35F;@9B3F:F:7@G?47DA8873FGD75:3@@7>E;E6AG4>76F:7
contracting path, the number of feature channels is doubled, the
;?397
image E;L7
size B3F:
path 9D36G3>>K
gradually 675D73E7E
decreases F:7the 67BF:
depth ;@5D73E7E
increases. *:7
The
7JB3@E;H7
expansive B3F:
path, 3>EA
also 53>>76
called F:7
the GB E3?B>;@9 B3F:
up-sampling path, 5A@E;EFE
consists A8
of
GB E3?B>;@9A8873FGD7?3B3@63
up-sampling of feature map and a 2J x 25A@HA>GF;A@FA:3>H7F:7
convolution to halve the
@G?47D
number A8 of 873FGD7
feature 5:3@@7>E
channels, 3@3
ana 5A@53F7@3F;A@
concatenation I;F:with F:7
the
5ADD7EBA@6;@9B3D3>>7>5DABB76873FGD7?3BA@F:75A@FD35F;@9
corresponding parallel cropped feature map on the contracting
B3F:
path, FIA
two  3 J
x 
3 5A@HA>GF;A@E
convolutions, I;F:
with (75F;8;76
Rectified #;@73D
Linear +@;F
Unit
(7#+3EF:7;D35F;H3F;A@8G@5F;A@
(ReLU) as their activation function.*:78;@3>>3K7D;E3
The final layer is a 1J x 1
5A@HA>GF;A@3>FA?3BF:7873FGD7H75FADEFAF:75ADD7EBA@6;@9
convolutional to map the feature vectors to the corresponding
@G?47DA85>3EE7E
number of classes.*:7E;L7A8F:7;?397;@F:77JB3@E;H7B3F:
The size of the image in the expansive path
9D36G3>>K
gradually ;@5D73E7E
increases, 3@6
and F:7
the 67BF:
depth 675D73E7E
decreases. -7
We FD3;@76
trained FIA
two
AF:7D+ %7F?A67>EI;F:8AGDF;?7EE?3>>7D873FGD75:3@@7>E
other U-Net models with four times smaller feature channels, 
3EE:AI@;@;9
as shown in Fig.A@7I;F:(7#+3E;FE35F;H3F;A@8G@5F;A@3@6
3 , one with ReLU as its activation function and 
F:7E75A@6?A67>I;F:#73=K(7#+3E;FE35F;H3F;A@8G@5F;A@
the second model with LeakyReLU as its activation function.  (&
Fig. 4'$,+&$0.#'(0$#01.$
. The Long U-Net Architecture
 
 
'!#
a) The SegNet model
*:7)79%7F?A67>5A@E;EFEA83@7@5A67D3@635ADD7EBA@6;@9
The SegNet model consists of an encoder and a corresponding
675A67D
decoder @7FIAD=
network. *:7
The 8;@3>
final >3K7D
layer B7D8AD?E
performs B;J7> I;E7
pixel-wise
5>3EE;8;53F;A@A8F:7;@BGF;?3973EE:AI@;@;9
classification of the input image, as shown in Fig.5. Inspired
@EB;D76
4KF:7,
by the VGG- 1@7FIAD=67E;9@768ADA4<75F5>3EE;8;53F;A@F:7K
6 network, designed for object classification, they
GE76
used 1 3 5A@HA>GF;A@3>
convolutional >3K7DE
layers ;@ in F:7
the 7@5A67D
encoder @7FIAD=
network
D7BD7E7@F76
represented 4Kby 4>G7
blue 4AJ7E
boxes, 8A>>AI76
followed 4K by BAA>;@9
pooling >3K7DE
layers
D7BD7E7@F76
represented 4K
by 9D77@
green 4AJ7E
boxes FA
to D76G57
reduce F:7
the 6;?7@E;A@E
dimensions A8
of F:7
the
873FGD7
feature ?3BE
maps. *:7K
They 6;E53D676
discarded F:7
the 8G>>K
fully 5A@@75F76
connected >3K7DE
layers FA
to
D7F3;@:;9:7DD7EA>GF;A@873FGD7?3BE3FF:77@5A67DAGFBGF
retain higher resolution feature maps at the encoder output.K By
6;E53D6;@9
discarding F:7
the F:D77
three 8G>>K
fully 5A@@75F76
connected >3K7DE
layers A8
of ,
VGG- 1
6, F:7
the
3GF:ADE
authors 6D3EF;53>>K
drastically D76G576
reduced F:7
the @G?47D
number A8 of )79%7F
SegNet ?A67>
model
B3D3?7F7DE
parameters. 35:
Each 7@5A67D
encoder >3K7D
layer :3E
has 3
a 5ADD7EBA@6;@9
corresponding 675A67D
decoder
t> Corwl�l '<�Ll >3K7D
layer.*:7675A67D@7FIAD=3>EA:3E
The decoder network also has 1>3K7DEBD7576764KGB
3 layers, preceded by up­
,. Coor 1110 c.�,

E3?B>;@9>3K7DED7BD7E7@F764KD764AJ7EFA
sampling layers represented by red boxes to ?3=7F:7AGFBGF
make the output
873FGD7?3BEF:7E3?7E;L73EF:7;@BGF
feature maps the same size as the input.*:7675A67DAGFBGF;E
The decoder output is
 876FA3EA8F ?3J5>3EE;8;7DI:;5:BDA6G57E5>3EEBDA434;>;F;7E
fed to a soft-max classifier which produces class probabilities

(& '$$0.#'(0$#01.$2(0'*"))$.$"01.$'"++$)/*"))$0
Fig. 3. The U-Net Architecture with Smaller Feature Channels (Small U-Net)

1369
1369

Authorized licensed use limited to: PES University Bengaluru. Downloaded on December 12,2023 at 15:24:38 UTC from IEEE Xplore. Restrictions apply.
8AD735:B;J7>3@6F:7BD76;5F;A@5ADD7EBA@6EFAF:75>3EEI;F:
for each pixel, and the prediction corresponds to the class with G7FA?7?ADK3@6'+F;?7D7EFD;5F;A@E;?BAE764KAA9>7
Due to memory and GPU time restrictions imposed by Google
?3J;?G?BDA434;>;FK3F7H7DKB;J7>
maximum probability at every pixel. A>34I7:36FAGE73I;67>KGE7663F3E7FI;F:3E?3>>@G?47D
Co lab, we had to use a widely used dataset with a small number
 A8
of ;?397E
images, 53>>76
called F:7
the 3?4D;697
Cambridge M- D;H;@9
Driving #347>76
Labeled ,;67A
Video
3F343E73?,;61 2I:;5:5A@E;EFEA8
, which consists of70 1AH7D3>>;?397E
overall images.
Convolutional Encoder-Decoder
Database (CamVid) [20]

C. %#%!") ('#"'%&
Peifonnance Evaluation Metrics
'7D8AD?3@577H3>G3F;A@;ED7CG;D76FA7H3>G3F73@6ABF;?;L7
Performance evaluation is required to evaluate and optimize
3@K?35:;@7>73D@;@9?A67>3@65A?B3D7;FI;F:AF:7D?A67>E
any machine learning model and compare it with other models.
;887D7@F7H3>G3F;A@?7FD;5E3D7GE76;@F:7>;F7D3FGD7F:;EE75F;A@
Different evaluation metrics are used in the literature; this section
67E5D;47EF:7?AEF788;5;7@F3@6I;67>KGE76?7FD;5E;@E7?3@F;5
describes the most efficient and widely used metrics in semantic
E79?7@F3F;A@F3E=E
segmentation tasks. Intersection
@F7DE75F;A@&H7D+@;A@ A+?3FD;53>EA
Over Union (loU) matric, also
=@AI@
known 3E as !3553D6 @67J ;E
Jaccard Index, is I;67>K
widely GE76
used FA
to 7H3>G3F7
evaluate E7?3@F;5
semantic
  segmentation models. It
E79?7@F3F;A@?A67>E F5A?BGF7EF:7B7D57@FAH7D>3B47FI77@
computes the percent overlap between
 (&
Fig. 5.'$$&$0.#'(0$#01.$
The SegNet Architecture [16]
!
 F:7
the 9DAG@6
ground FDGF:
truth ?3E=
mask 3@6
and F:7
the BD76;5F;A@
prediction AGFBGF
output. E
As E:AI@
shown ;@in
( , Convolutional
b) Fully #")# ('#" '*#% FCN-8)
Network (FCN-16,   C
Eq. l, loU
A+?73EGD7EF:7@G?47DA85A??A@B;J7>E47FI77@F:7
measures the number of common pixels between the

-7 BD76;5F;A@
prediction 3@6and 9DAG@6
ground FDGF:
truth ?3E=E
masks 3@6
and 6;H;67E
divides ;F
it 4K
by F:7
the FAF3>
total
We :3H7
have ;?B>7?7@F76
implemented F:7the %
FCN- 1 6 3@6
and %
FCN-8 1[ 121
7] [ 12
8] A@>K
only
4753GE7 @G?47D
number A8 of B;J7>E
pixels BD7E7@F
present ;@ in 4AF:
both ?3E=E
masks. $G>F; 5>3EE
Multi-class
because %FCN-32   :36
had BDAH7@
proven ;FEits BAAD
poor B7D8AD?3@57
performance ;@ in F:7
the
>;F7D3FGD74753GE73FF:7AGFBGFA85A@H3EE:AI@;@;9 E79?7@F3F;A@
segmentation F3E=Etasks GE7
use F:7
the ?73@ @F7DE75F;A@ &H7D
mean Intersection Over +@;A@
Union
literature, because at the output of conv7, as shown in Fig. 6
47>AI ? A+ ?7FD;5
(mioU) metric 8AD
for ?A67>
model 7H3>G3F;A@
evaluation, I:;5:
which 8;DEF
first 5A?BGF7E
computes F:7
the
below, F:7
the ;?397
image E;L7
size 475A?7E
becomes H7DK
very E?3>>
small, FA
to ?3=7
make F:7 the
E79?7@F3F;A@AGFBGF:3H7F:7E3?7E;L73EF:7;@BGF;?397 A+A8735:5>3EE3@6F:7@5A?BGF7EF:73H7D397AH7D3>>5>3EE7E
loU of each class and then computes the average overall classes.
segmentation output have the same size as the input image 3 2J x
GB E3?B>;@9;EB7D8AD?76I:;5:?3=7EF:7AGFBGFH7DKDAG9:
up-sampling is performed, which makes the output very rough
loU
 =
nPredicted

4753GE7I:7@9A;@9677B7DF:7EB3F;3>>A53F;A@;@8AD?3F;A@;E
because when going deeper the spatial location information is
 Target
UPredicted
Target 
( l
)

>AEF
lost.*:3F;EI:K%
That is why FCN- 13@6%
6 and FCN-8 B7D8AD?47FF7D4753GE7
perform better because
55GD35KC ;EF:7?AEFGE767H3>G3F;A@?7FD;5;@$35:;@7
Accuracy (Eq.2) is the most used evaluation metric in Machine
F:7K4AF:GE7FIA3@68AGDF;?7E>7EEGB E3?B>;@9  In
they both use two and four times less up-sampling. @F:7%
the FCN-
#73D@;@9D7E73D5:4GF;F;EG@D7>;34>7;@E7?3@F;5E79?7@F3F;A@
Learning research, but it is unreliable in semantic segmentation
@7FIAD=F:7AGFBGFA85A@H;E
16 network, the output of conv7 is 2JGB E3?B>763@68GE76
x up-sampled and fused
F3E=E
tasks. It
F ?73EGD7E
measures 3>>
all F:7
the 5ADD75F>K
correctly ;67@F;8;76
identified 5>3EE7E
classes 3@6
and ;E
is
I;F:
with BAA>
pool4 3@6
and B7D8AD?76
performed 16  J
x GB E3?B>;@9   In
up-sampling. @ F:7
the %FCN-8  :7>B8G>I:7@3>>F:75>3EE7E3D77CG3>>K;?BADF3@F
helpful when all the classes are equally important.
3D5:;F75FGD7F:7AGFBGFA85A@H;EJGB
architecture, the output of conv7 is 4 x up-sampledE3?B>763@68GE76
and fused
I;F:
with 2JBAA>3@6BAA>F:7@B7D8AD?76JGB E3?B>;@9 
x pool4 and pool3, then performed 8 x up-sampling.
A CCUracy=
  Positive
!+ True
  Negative
!

True
(2)

   Positive+
True ! ! True
False Positive+   Negative+False
! Negative
!

-7
We GE76
used F l -Score
)5AD7 1[ 2 12] -[25],
1 2 3a 47FF7D
better 7H3>G3F;A@
evaluation ?7FD;5
metric F:3@
than
55GD35K8AD;?43>3@5765>3EE6;EFD;4GF;A@3@6;F;E?73EGD76
I l l I I I I § I EJ [ I I I
Accuracy for imbalanced class distribution, and it is measured
4K53>5G>3F;@9F:7:3D?A@;5?73@A8F:7'D75;E;A@3@6(753>>
by calculating the harmonic mean of the Precision and Recall.
*:7 )5AD7;E53>5G>3F764KC
The F l -Score is calculated by Eq.5 .

�·� · :.:  Precision


(P)

 R
 =

__
T
..:.ru
..:.
  Positive
True !
  Positive+False
True ! Positive

ecall=   !
'-'
e Po
!
c.. s..:.
..:.it..:.
_
iv"-
e
_
!

!
True Positive+False Negative


(4)
(3)

Fl_ Score
 = 2 (S)
# (Precision
,. Recall)
$
 *
# (Precision+
 Recall)
$


-73>EAGE76F:7;57A788;5;7@FC
We also used the Dice Coefficient (Eq.6)E;?;>3DFA A+4753GE7
similar to loU because

 (& '$.#'(0$#01.$,%
Fig. 6. 
The Architecture ofFCN (FCN-32, 
FCN-16,  !
FCN-8) [19] F:7K3D7BAE;F;H7>K5ADD7>3F76
they are positively correlated. It
F53>5G>3F7EF:73D73A8AH7D>3B
calculates the area of overlap
 47FI77@F:79DAG@6FDGF:3@6F:7BD76;5F76?3E=3@66;H;67E;F
between the ground truth and the predicted mask and divides it
I;F:F:7FAF3>@G?47DA8B;J7>E;@4AF:?3E=E3@6F:7@6AG4>7
with the total number of pixels in both masks and then double
B. %""
Training F:7D7EG>F;@9H3>G7E
the resulting values 
.
*:7
The 5A67 code I3E
was ID;FF7@
written ;@
in F:7
the 'KF:A@
Python3 BDA9D3??;@9
programming
>3@9G397GE;@9F:7*7@EAD8>AI>;4D3DK3EF:7435=7@63@6F:7 D i c e Coefficient= 2
 !
language, using the Tensorflow library as the backend and the  *
 Number
Total
Area of Overlap
 "
of Pixels in both Masks

(6)
"7D3E>;4D3DK3EF:78DA@F7@6
Keras library as the frontend.*:7?A67>EI7D7FD3;@76A@F:7
The models were trained on the

AA9>7A>34B>3F8AD?GE;@9F:7BDAH;676'+4KAA9>7
Google Co lab platform using the provided GPU by Google.-7 We
GE76N63?O3EF:7ABF;?;L7D8AD3>>?A67>EF:7N3F79AD;53>
used "Adam" as the optimizer for all models, the "Categorical ,  EXPERIMENTAL
IV. -%'"#)!( AND
(*!)(#
RESULTS #!.((
ANALYSIS
DAEE@FDABKO3EF:7>AEE8G@5F;A@3@6F:743F5:E;L7I3EE7F
Cross Entropy" as the loss function, and the batch size was set
*:;E
This E75F;A@
section I;>>
will E:AI
show 3@6
and 3@3>KL7
analyze F:7
the D7EG>FE
results A8of
FA
to 3 2, 3E
as B7D
per F:7
the 47EF
best BD35F;57
practice ;@
in $35:;@7
Machine #73D@;@9
Learning D7E73D5:
research.
;?B>7?7@F;@97>7H7@?A67>E3@65A?B3D7F:7?43E76A@F:7;D
implementing eleven models and compare them based on their
*:7
The ?3J;?G?
maximum @G?47D number A8of 7BA5:E
epochs ;F7D3F;A@E
(iterations) I3E
was E7F
set FA
to 1 00.
55GD35K#AEE?
Accuracy, Loss, mloA+ )5AD73@6;575A788;5;7@F
U, F 1 -Score, and Dice coefficient.-7I;>>
We will
)F;>>I7GE76F:773D>KEFABB;@9?7F:A6FAEFABFD3;@;@9I:7@
Still, w e used the early stopping method t o stop training when
3>EA
also 67?A@EFD3F7
demonstrate F:7
the B7D8AD?3@57
performance A8of 735:
each ?A67>
model A@ on F:D77
three
F:7
the ?A67>EQ
models' H3>;63F;A@
validation ?73@ @F7DE75F;A@ AH7D
mean Intersection over +@;A@
Union ? A+
(mioU)
6;887D7@F;?397E8DA?F:7F7EFE7F
different images from the test set.
6A7E@AF;?BDAH738F7DF7@7BA5:E
does not improve after ten epochs.

1370
1370

Authorized licensed use limited to: PES University Bengaluru. Downloaded on December 12,2023 at 15:24:38 UTC from IEEE Xplore. Restrictions apply.
A. '&'
Dataset 8;DEFA@7;EF:7AD;9;@3>?A67>3@6F:7E75A@6H3D;3@FGE;@9F:7
first one is the original model and the second variant using the
*:7
The 63F3E7F
dataset I7
we GE76
used ;E
is 53>>76
called *:7
The 3?4D;697 D;H;@9
Cambridge-Driving
6DABAGFD79G>3D;L3F;A@F75:@;CG7I;F:36DABAGFD3F7A8
dropout regularization technique with a dropout rate of 0 .3@6
5 , and

#347>76
Labeled ,;67A
Video 3F343E7
Database 3?,;6
(Cam Vid).  ItF BDAH;67E
provides B7D B;J7>
per-pixel
F:7
the #A@9
Long + %7F ?A67>E
U-Net models EF;>>
still AGFB7D8AD?76
outperformed F:7
the F:D77
three AF:7D
other

E7?3@F;5
semantic E79?7@F3F;A@
segmentation A8 of AH7D
over  700 ;?397E
images 3@6
and F:7;D
their
?A67>E
models. 

5ADD7EBA@6;@9DAG@6*DGF:?3E=E>347>EFD3;@;@9
corresponding Ground Truth masks (labels), 367 training, 1 0 1  -73>EA5A?B3D76F:7?A67>E;@F7D?EA8F:7;D#AEE55GD35K
We also compared the models in terms of their Loss, Accuracy,
H3>;63F;A@
validation, 3@6and 2
3 3 F7EF
test B3;DE
pairs A8
of 
32 E7?3@F;5
semantic 5>3EE7E
classes.  *:7
The  )5AD7 3@6
F l -Score, and ;57
Dice 5A788;5;7@F
coefficient. *:7 + %7F ?A67>
The U-Net model I;F:
with (7#+
ReLU
E7?3@F;5
semantic 5>3EE7E
classes 3D7
are A8
of F:7
the 5A??A@>K
commonly 7J;EF;@9
existing A4<75FE
obj ects ;@
in 3
a 35F;H3F;A@8G@5F;A@D75AD676F:7>AI7EF>AEEA8
activation function recorded the lowest loss of 0. 3823. *:7#A@9
The Long
D79G>3D6D;H;@9E57@7D3@9;@98DA?3DE'767EFD;3@E@;?3>E
regular driving scene, ranging from Cars, Pedestrians, Animals, + %7FI;F:3DABAGFD3F7A8
U-Net with a Dropout rate of 0. D75AD676F:7:;9:7EF355GD35K
7 recorded the highest accuracy
G;>6;@9E
Buildings, E;67I3>=E
sidewalks, *D388;5 #;9:FE 3@6
Traffic Lights, and ?3@K
many ?AD7
more.  *:7
The A8 3@6F:7:;9:7EF;57A788;5;7@F
of 89.85% and the highest Dice Coefficient 0. 9 1 F:7#A@9+
9, the Long U­
AH7D3>>63F343E75A@E;EFEA8F7@?;@GF7EA8:;9:
overall database consists of ten minutes of high-quality CG3>;FK3 0HZ
0 %7FI;F:3DABAGFD3F7A8
Net with a Dropout rate of 0 .D75AD676F:7:;9:7EF )5AD7A8
5 recorded the highest F l -Score of
8AAF397
footage, 3@6
and F:7
the ;?397E
images I7we GE76
used FA
to FD3;@
train AGD
our ?A67>E
models I7D7
were 3EE:AI@;@*34>7E
0.6384, as shown in Tables II3@6;9GD7
and Figure 8. 
53BFGD763F
captured at 1HZ. 0 
)!
TABLE II. ! +
(LOSS  
Vs.  
ACCURACY)
B. Preprocessing
%$%#&&"

Model 
Loss  
Accuracy
*:7;?397E3@6?3E=E3D7;@E7B3D3F78A>67DE3@6I7B3;D76
The images and masks are in separate folders, and we paired *#6E'6!*
U-Net ReLU 
0.3823  
89.74%
735:;?397I;F:;FE5ADD7EBA@6;@9?3E=
each image with its corresponding mask.AF:;?397E3@6?3E=E
Both images and masks
3D7D7E;L76FA
are resized to 5 1 2J
x 5 1 2 . Images
?397E3D75A@H7DF76FA@G?BK3DD3KE
are converted to numpy arrays (>2==*#6E'6!*
Small U-Net ReLU  
0.5254  
87.96%
8AD73E;7DF7@EAD53>5G>3F;A@E3@6@AD?3>;L764K6;H;6;@94K
for easier tensor calculations and normalized by dividing by 255.   (>2==*#6E!62<J'6!*
Small U-Net Leaky ReLU  
0.4436   
87.80%
*:7?3E=E3D73>EA5A@H7DF76FA@G?BK3DD3KE3@63D7?3BB76
The masks are also converted to numpy arrays and are mapped
!@?8*#6EC@A@FE
Long U-Net (Dropout�O.
7)  
0.3971 89.85%

U-Net
Long  (Dropout�0.5)
  
0.3901  
89.13%
FAF:75ADD7EBA@6;@95>3EE7E
to the corresponding classes.*:75>3EE7E3D79;H7@;@3@7J57>
The classes are given in an excel
SegNet
(68#6E 
0.4978  
85.37°'
o
E:77FI;F:F:7PDQP9Q3@6P4QH3>G7EA8735:5>3EE
sheet with the 'r' , 'g', and 'b' values of each class.
(68#6EC@A@FE
SegNet (Dropout�50)  
0.5652 83.97%

C. +$%!"' &(
Experimental Results'& #
FCN-16   
0.5027   
84.92%
# C@A@FE
FCN- 16 (Dropout�50)  
0.4482  
85.85%
8F7D
Mter 4G;>6;@9
building 3a FAF3>
total A8
of 7>7H7@
eleven ?A67>E
models : 
5 + %7F 2 )79%7F
U-Net, SegNet, 2 #
FCN-8  
0.4860  
86.03%
%
FCN- 16,  3@6
and 2 %
FCN-8,  I7
we 53@
can 5A@5>G67
conclude F:3F F:7 #A@9
that the Long + %7F
U-Net #C@A@FE
FCN-8 (Dropout� 50)  
0.4216 
87%
I;F:
with DABAGF
Dropout=0 . 5 B7D8AD?76
performed F:7
the 47EF
best, 43E76
based A@
on F:7
the ?73@
mean
@F7DE75F;A@AH7D+@;A@?
Intersection over Union (mio A+7H3>G3F;A@?7FD;5I;F:?
U) evaluation metric, with mio A+A8
U of

0. 
573 1 3EE:AI@;@*34>7
, as shown in Table II,3@6;9
and Fig.7.*:7EGB7D;AD;FKA8F:;E
The superiority of this a

?A67>
model >;7Elies ;@
in F:7
the 67BF:
depth A8of ;FE
its 3D5:;F75FGD7
architecture. *:7
The DABAGF
Dropout
D79G>3D;L3F;A@
regularization I3Ewas GE76
used, I:;5:
which ;?BDAH76
improved F:7
the 97@7D3>;L3F;A@
generalization A8of
the model, and because the model converged at the 1 ooth7BA5:

F:7?A67>3@64753GE7F:7?A67>5A@H7D9763FF:7 epoch,
G@>;=7AF:7D?A67>EF:3F5A@H7D9763F3?G5:E?3>>7D7BA5:
unlike other models that converged at a much smaller epoch.

"

065 - boo

h
0 611

055

�" .?
"
..�
""' ,-.... "' ¥' .p.
.:,�"..
�·f" �.f
"''" .t""' �/' <I' ,l
-5-# ��o<f .§f �/
-

""' ,
4'� ��-- {? .:t
.� ,.;"'
����- �.;!' "'t• #�
"'•" .._,? ...p�
,p'"


:8
Fig.8.%6C7@C>2?46"62DFC6D>@* Score. and Dice Coefficient)
Performance Measures (mloU. F l(4@C62?5:46@677:4:6?E

*:7E7@E;F;H;FK3@3>KE;EI7B7D8AD?764K4G;>6;@98;H7+
The sensitivity analysis we performed by building five U-Net %7F
?A67>EI3EFA8;@6F:747EFB7D8AD?;@9+
models was to find the best performing U-Net %7F?A67>
model. -78;DEF
We first

!pDdt
;?B>7?7@F76
implemented F:7the AD;9;@3>
original + %7F ?A67>
U-Net model I;F:
with (7#+
ReLU.  -7
We 4G;>F
built
 FIAAF:7D?A67>EI;F:E?3>>7D873FGD75:3@@7>EA@7I;F:(7#+
two other models with smaller feature channels, one with ReLU
:8
Fig.7.)96>@*@7E96!@?8*#6E>@56=H:E9C@A@FE 
The mloU of the Long U-Net model with Dropout�0.5
3@6F:7AF:7DI;F:#73=K(7#+3EF:7;D35F;H3F;A@8G@5F;A@E
and the other with Lea�-y ReLU as their activation functions. F At
*AA4F3;@34;997D6;887D7@57F:7?A67>3D5:;F75FGD7@776EFA
To obtain a bigger difference, the model architecture needs to F:;E
this EF397
stage, I7
we 5AG>6
could @AF
not E77
see 3
a 6;887D7@57
difference ;@
in F7D?E
terms A8
of ? A+
mioU;
5:3@97
change .-74G;>FFIA677B7D?A67>EI;F:FIA>3K7DE36676A@
We built two deeper models with two layers added on ;@EF736 F:7
instead, the 355GD35;7E
accuracies 8AD F:7 ?A67>E
for the models I;F:
with E?3>>7D
smaller 873FGD7
feature
735:
each B3F:
path #A@9
(Long + %7F I;F:
U-Net), with 6;887D7@F
different 6DABAGF
dropout D3F7E
rates, I:;5:
which
5:3@@7>E
channels I7D7
were >AI7D
lower, 3@6
and F:7
the >AEE7E
losses I7D7
were :;9:7D
higher F:3@
than F:7
the
47FF7D>73D@76F:7EAB:;EF;53F76873FGD7E;@F:763F3E7F
better learned the sophisticated features in the dataset.*:7FIA
The two
AD;9;@3>+ %7F 
original U-Net.
?A67>E
models E5AD76
scored :;9:7D
higher ? A+ F:3@
mioU than F:7
the BD7H;AGE
previous ?A67>E
models, I;F:
with 3
a
6DABAGF D3F7 A8
dropout rate of 0 .
5 B7D8AD?;@9
performing theF:7 47EF
best. *A
To ?3=7
make 3a 83;D
fair @ *34>7
In Table III, I7 :3H7 F:D77
we have three E3?B>7E
samples A8
of ;?397E
images I;F:
with 6;887D7@F
different
5A?B3D;EA@I;F:F:7F:D77?A67>E)79%7F%
comparison with the three models: SegNet, FCN- 1 3@6%
6, and FCN- E57@3D;AE
scenarios 3@6
and >;9:F;@9
lighting 5A@6;F;A@E
conditions 3@6
and 5ADD7EBA@6;@9
corresponding 9DAG@6
ground
I7;?B>7?7@F76FIAH3D;3@FEA8735:A8F:7F:D77?A67>EF:7
8, we implemented two variants of each of the three models, the FDGF:>347>E3@6I7F7EF76735:;?397A@3>>7>7H7@?A67>E
truth labels, and we tested each image on all eleven models.

1371
1371

Authorized licensed use limited to: PES University Bengaluru. Downloaded on December 12,2023 at 15:24:38 UTC from IEEE Xplore. Restrictions apply.
TABLE III.
)! (SEGMENTATION
   ' 
RESULTS "
(ALL 
MODELS)

 
 
 &D;9;@3>
Original
 
?397
Image
 

    
 

 
 *DG7#347>
True Label 

 
 
   
 
 
 #A@9+ %7F
Long U-Net
 
DABAGF
Dropout=0.
5

 

 
  
 

 

 )79%7F
SegNet


 

 
  
 

 

 )79%7F
SegNet 
DABAGF
Dropout=0.
5


 
   

 

 %
FCN- 1
6 
DABAGF
Dropout=0.
5 

 
   

 

 
% 
FCN-8

 DABAGF
Dropout=0.
5 

 

    

 

1372
1372

Authorized licensed use limited to: PES University Bengaluru. Downloaded on December 12,2023 at 15:24:38 UTC from IEEE Xplore. Restrictions apply.
-7
We 53@can E77
see F:3F
that ?AEF
most + %7F ?A67>E
U-Net models B7D8AD?76
performed 47FF7D
better ;@
in [6] 
01 C. G. J. .Y. :<6
Bike 96?
Chen, >A@CE2?462H2C6
"Importance-aware D6>2?E:4 semantic D68>6?E2E:@?
segmentation 7@C for
autonomous G69:4=6D
2FE@?@>@FD vehicles," IEEE )C2?D24E:@?D
Transactions @? on ?E6==:86?E
Intelligent )C2?DA@CE2E:@?
Transportation
E79?7@F;@9
segmenting F:7
the ;?397E
images F:3@
than %
FCN- 1
6, %
FCN-8,  3@6
and )79%7F
SegNet
Systems, vol. 20,AA
(JDE6>DG@= pp. 137-148,
  2019. 
?A67>E
models.GFF:747EFE79?7@F3F;A@EI7D7B7D8AD?764KF:7#A@9
But the best segmentations were performed by the Long
[7] K. !
01 L. !:>
Lim, )T. C286Drage 2?5 and )T. CLF?=
Briiunl, >A=6>6?E2E:@?
"Implementation @7 of D6>2?E:4
semantic
+ %7F ?A67>
U-Net model, 3Eas 7JB75F76
expected 4K
by F:7
the @G?7D;53>
numerical 3@3>KE;E
analysis 3@6
and
segmentation 7@C
D68>6?E2E:@? for C@25
road 2?5and =2?6
lane 56E64E:@?
detection @? on 2?
an 2FE@?@>@FD
autonomous 8C@F?5 ground
5A?B3D;EA@
comparison. It FB7D8AD?76H7DKI7>>;@F:78;DEFFIA;?397E6G7
performed very well in the first two images due vehicle with LIDAR," in International Conference on Multisensor Fusion
G69:4=6H:E9!':??E6C?2E:@?2=@?76C6?46@?"F=E:D6?D@CFD:@?
FA
to F:7
the 9AA6
good >;9:F;@9
lighting 5A@6;F;A@E
conditions ;@
in 4AF:
both ;?397E
images. In @ F:7
the F:;D6
third and Integration for Intelligent Systems (MFI), 2017.
2?5?E68C2E:@?7@C?E6==:86?E(JDE6>D" 
;?397
image, I7we 53@
can E77
see F:3F
that 7H7@
even F:AG9:
though F:7
the >;9:F;@9
lighting 5A@6;F;A@
condition ;Eis [8] (S . 96?
01 Chen, / Z. /92?8
Zhang, ' R. /9@?8
Zhong, ! L. /92?8
Zhang,  H. "2
Ma 2?5
and ! L. !:F
Liu, 
"A 6?D6
Dense
BAAD
poor, F:7
the #3D97
Large + %7F ?A67>
U-Net model B7D8AD?76
performed I7>>
well ;@
in E79?7@F;@9
segmenting feature AJC2>:5
762EFC6 pyramid ?6EH@C<32D65
network-based 566A deep =62C?:?8
learning >@56=
model 7@Cfor C@25
road >2C<:?8
marking
?AEFA8F:7A4<75FE7J57BF8ADF:74GEI:;5:53@47FD3576FA
most of the obj ects, except for the bus, which can be traced to instance D68>6?E2E:@?
:?DE2?46 segmentation FD:?8 using "!(
MLS A@:?E
point 4=@F5D
clouds," 
IEEE )C2?D24E:@?D
Transactions @? on
Geoscience and Remote Sensing, vol.AA
6@D4:6?462?5'6>@E6(6?D:?8G@= 59, pp.
784-800, 2021.
F:7
the >35=
lack A8
of FD3;@;@9
training ;?397E
images I;F:
with 4GEE7E
busses ;@in F:7?
them. *:7
The D76
red
[9] 
01 C.!@HA92?D:C:<F=
Lowphansirikul, K.-S. (  Kim,
:>%P.+:?2J2C2;2?5(
Vinayaraj and S .)F2C@3
Tuarob, "3D Semantic
(6>2?E:4
E79?7@F3F;A@E;@EF736A8B;@=3D76G7FAF:74GEE:7;9:FI:;5:
segmentations instead of pink are due to the bus's height, which
segmentation @7
D68>6?E2E:@? of =2C86D42=6
large-scale A@:?E4=@F5D
point-clouds :? in FC32?
urban 2C62D
areas FD:?8
using 566A
deep
F:7?A67>E3D7?;EF3=7@8AD34G;>6;@9
the models are mistaken for a building. learning," :?
=62C?:?8 in 1 1th International @?76C6?46
E9 ?E6C?2E:@?2= Conference @? on Know ledge 2?5
?@H=6586 and (>2CE
Smart
Technology (KST),
)649?@=@8J Phuket, Thailand, 2019.
()%9F<6E)92:=2?5 
V. 
, CONCLUSIONS AND 
$#!*($#(# *)*'
FUTURE ')$#(
DIRECTIONS
0[ 10]1 " M.  H. 2>:2?
Hamian,  A. 6:<>@92>>25:
Beikrnohammadi,  A. 9>25:
Ahmadi 2?5 and  B. #2D6CD92C:7
Nasersharif,
E
As F:7
the 36ABF;A@
adoption A8
of 3GFA@A?AGE
autonomous H7:;5>7E
vehicles I;F:
with 6;887D7@F
different "Semantic D68>6?E2E:@?
(6>2?E:4 segmentation @7 of 2FE@?@>@FD
autonomous 5C:G:?8 driving :>286D
images 3J by E96
the
combination @7
4@>3:?2E:@? of 566A
deep =62C?:?8
learning 2?5 and 4=2DD:42=
classical D68>6?E2E:@?
segmentation," :? in
>7H7>EA83GFA@A?K;@5D73E7EF:7@7768ADBD75;E73@6355GD3F7
levels of autonomy increases, the need for precise and accurate
$"$  CSI
International   Computer
!%$" Con "
ference. CS  ICC, 202 1 .
B7D57BF;A@EKEF7?E;@5D73E7E6D3EF;53>>KFA7@EGD7F:7E387FKA8
perception systems increases drastically to ensure the safety of
0[ 1 1]1 ,W.2a. B.J 2. a.,W .(S .2a.#
N .E./9@FFE@>2E656G2=F2E:@?@7D6>2?E:4
Zhou, "Automated evaluation o f semantic
F:7B3EE7@97DEB767EFD;3@E3@6F:7E387FKA8F:7EGDDAG@6;@9
the passengers, pedestrians, and the safety of the surrounding segmentation robustness for autonomous driving," in IEEE Transactions
D68>6?E2E:@?C@3FDE?6DD7@C2FE@?@>@FD5C:G:?8:?)C2?D24E:@?D
H7:;5>7EQ
vehicles' 6D;H7DE
drivers. 3E76
Based A@ on AGD
our 7JF7@E;H7
extensive 7JB7D;?7@FE
experiments on Intelligent Transportation Systems, 20 18.
@??E6==:86?E)C2?DA@CE2E:@?(JDE6>D 
BD7E7@F76
presented ;@in F:;E
this BDA<75F
proj ect, I7
we 53@
can 5A@5>G67
conclude F:3F
that + %7F 53@
U-Net can 0[ 12]1 J.(S.!L.2a.)T. H.%2C<(6>2?E:4D68>6?E2E:@?H:E9:>AC@G65658656E2:=
Park, "Semantic segmentation with improved edge detail
for 2FE@?@>@FD
7@C autonomous G69:4=6Dvehicles," :? in 
IEEE 16th International @?76C6?46
E9 ?E6C?2E:@?2= Conference @? on
BD75;E7>K
precisely 5>3EE;8K
classify 3@6
and >A53>;L7
localize 3a I;67
wide D3@97
range A8
of A4<75FE
obj ects ;@
in 3
a
Automation Science and Engineering (CASE), Hong Kong,
FE@>2E:@?(4:6?462?5?8:?66C:?8(@?8 China, 2020.
@?89:?2
5A?B>7J6D;H;@97@H;DA@?7@F3@6AGFB7D8AD?BD7H;AGE>KGE76
complex driving environment and outperform previously used
0[ 13]1  G. 96?8
Cheng, J. . Y. /96?8
Zheng, 2?5 and " M. Kilicarslan, "Semantic D68>6?E2E:@?
:=:42CD=2? (6>2?E:4 segmentation @7 of
I7>> =@AI@?A67>E;@F7D?EA8?
well-known models in terms of mio A+ )5AD73@6355GD35K
U, F 1 -Score, and accuracy . road AC@7:=6D
C@25 profiles 7@C for 677:4:6?E
efficient D6?D:?8
sensing :? in 2FE@?@>@FD
autonomous 5C:G:?8
driving," :? in 
IEEE
@F:78GFGD7I7I;>>FD3;@F:7+
In the future, we will train the U-Net%7F?A67>EI;F:6;887D7@F63F3
models with different data Symposium on Intelligent Vehicle, 2019.
(J>A@D:F>@??E6==:86?E+69:4=6 
3G9?7@F3F;A@
augmentation F75:@;CG7E
techniques I;F:
with 47FF7D
better 5A?BGF;@9
computing BAI7D
power. *ATo 0[ 14]
1 !L.2a.. Y. K.2a. H.- X.2a.,W. H.2a.,W. K.(F?'62=E:>67FD:@??6EH@C<7@C
Sun, "Real-time fusion network for
F35=>7F:7BDA4>7?A8BAAD>KE79?7@F;@957DF3;@5>3EE7EEG5:
tackle the problem of poorly segmenting certain classes, such RGB-D D6>2?E:4
' semantic D68>6?E2E:@?
segmentation :?4@CA@C2E:?8
incorporating F?6IA64E65
unexpected @3DE24=6
obstacle
detection 7@C
56E64E:@? for C@255C:G:?8
road-driving :>286D
images," IEEE '@3@E:4D
Robotics 2?5 and FE@>2E:@?
Automation
3EF:74GE;@F:7F:;D6;?397ADF:7FD388;5BA>7EI7I;>>;@5D73E7
as the bus in the third image or the traffic poles, we will increase
Letters, vol.?@
!6EE6CDG@= 5, no.
4, 2020.
F:7I7;9:FEA@EG5:5>3EE7E3@6675D73E7F:7I7;9:FEA8AF:7D
the weights on such classes and decrease the weights of other
0[ 11
5] $ 0. 2a. F. %P. 2a. 
B. )T. '@??636C86C
Ronneberger, *#6E"U-Net: @?G@=FE:@?2=
Convolutional ?6EH@C<D
networks 7@C for
>7EE
less @757EE3DK
necessary 5>3EE7E
classes FAto ;?BDAH7
improve F:7the ?A67>E
model's AH7D3>>
overall biomedical image segmentation," LNCS.
3:@>65:42=:>286D68>6?E2E:@?  G@= vol.935 1,AA
pp. 234-24
  1, 20 15 .
B7D8AD?3@57
performance .-7I;>>3>EAFD3;@3@6F7EF6;887D7@FH3D;3F;A@EA8
We will also train and test different variations of 0[ 16]
1 + V. 25C:?2C2J2?2?
Badrinarayanan,  A . Kendall
6?52== 2?5and ' R . :A@==2
Cipolla, (68#6E
"SegNet:  A 566A
deep
F:7+
the U-Net%7F?A67>A@>3D97D63F3E7FE3@65A?B3D7;FI;F:AF:7D
model on larger datasets and compare it with other convolutional 6?4@56C564@56C
4@?G@=FE:@?2= encoder-decoder 2C49:E64EFC6
architecture 7@C for :>286
image D68>6?E2E:@?
segmentation,"
EF3F7 A8 F:7 3DF E7?3@F;5
state-of-the-art semantic E79?7@F3F;A@
segmentation ?A67>E On F:7
models. &@ the AF:7D
other in IEEE"#$
:? Transactions # on$$"
Pattern A&## $G@=
nalysis and Machine Intelligence, vol.
39,
?@no. 12,AA pp. 2481-2495,
   164 Dec. 2017. 
:3@6
hand, I7 we I;>>
will ;?B>7?7@F
implement 7@E7?4>7
ensemble >73D@;@9
learning 3>9AD;F:?E
algorithms 4Kby
0[ 17]
1 J.96?8!
Cheng, L..6. Ye, Y.F@
Guo, J./92?82?5 An, "Ground crack recognition
Zhang and H.?C@F?54C24<C64@8?:E:@?
5A?4;@;@9?G>F;B>7EF3F7 A8 F:7 3DF?A67>EFA35:;7H7F:747EF
combining multiple state-of-the-art models to achieve the best
based @?
32D65 on 7F==J
fully 4@?G@=FE:@?2=
convolutional ?6EH@C<
network H:E9with >F=E:D42=6
multi-scale :?AFE
input," :?in 
IEEE
B7D8AD?3@57BAEE;4>7
performance possible.-7I;>>3>EAF3=736H3@F397A8G@>347>76
We will also take advantage of unlabeled Access, vol.AA
446DDG@= 8, pp.53034-53048,
  2020.
63F3E7FE
datasets FA
to FD3;@
train G@EGB7DH;E76
unsupervised >73D@;@9
learning 3>9AD;F:?E
algorithms, 3@6
and F:AE7
those 0[ 11
8] K c;. Kaymak
2J>2<2?5 and A.*M2C(6>2?E:4:>286D68>6?E2E:@?7@C2FE@?@>@FD
U9ar, "Semantic image segmentation for autonomous
?A67>E
models 53@can 3GFA?3F;53>>K
automatically >73D@
learn 5A?B>7J
complex DA36
road 873FGD7E
features I;F:
with driving using fully convolutional networks," 2019
5C:G:?8FD:?87F==J4@?G@=FE:@?2=?6EH@C<D International Artificial
?E6C?2E:@?2=CE:7:4:2=
Intelligence and Data Processing Symposium (IDAP), 2019,
?E6==:86?462?52E2%C@46DD:?8(J>A@D:F>% pp. 1-8.
AA  
minimal human input.
?;@;?3>:G?3@;@BGF
0[ 19]
1 J.!@?8
Long, E.(96=92>6C2?5)
Shelhamer and T.2CC6==F==J4@?G@=FE:@?2=?6EH@C<D7@C
Darrell, "Fully convolutional networks for
 semantic segmentation," 2015 IEEE Conf
D6>2?E:4D68>6?E2E:@? "
erence on  Computer
!%$"# Vision
and
$$"
Pattern Recognition $ (CV PR), 20 15,
 AApp. 343
 1-3440.
  
('#(
REFERENCES
0[20]1 (68>6?E2E:@?
Segmentation 2?5 and '64@8?:E:@?
Recognition *D:?8 Using (ECF4EFC6
Structure 7C@>from "@E:@?
Motion %@:?EPoint
 Clouds,
=@F5D ECCV
+ 2008 
0[ 11] H. Kim,
:>. Y.!66
Lee, B..:>
Yim, E.%2C<2?5
Park and H. Kim, "On-road object detection
:>$?C@25@3;64E56E64E:@? Brostow, Shotton, Fauqueur, Cipolla.
C@DE@H(9@EE@?2FBF6FC:A@==2
using 566A
FD:?8 deep ?6FC2=
neural ?6EH@C<
network," :? in 
IEEE ?E6C?2E:@?2=
International @?76C6?46
Conference @? on 0[21]1 Karami,
2C2>: /Z.,  & Kashef,
2D967 ' R. (2020). Smart EC2?DA@CE2E:@?
  (>2CE transportation A=2??:?8
planning: 2E2
Data,
Consumer Electronics-Asia (ICCE-Asia), Seoul, Korea
@?DF>6C=64EC@?:4DD:2D:2(6@F= (South), 2016.
@C62(@FE9  models, and algorithms."#!
>@56=D2?52=8@C:E9>D "$$  Engineering,
Transportation " 2, 1000 13.
0[2]1 J. .2?8
Yang,  C. ,2?8
Wang,  H. ,2?8
Wang 2?5and & Q. !:
Li, 
"A '
RGB-D 32D65
based C62=E:>6
real-time 0[22]1 Kashef,
2D967 ' R. (2021). A 3@@DE65
   boosted (+" SVM 4=2DD:7:6C
classifier EC2:?65
trained 3J by :?4C6>6?E2=
incremental
multiple @3;64E
>F=E:A=6 object 56E64E:@?
detection 2?5
and C2?8:?8
ranging DJDE6>
system 7@C
for 2FE@?@>@FD
autonomous 5C:G:?8
driving," learning 2?5
=62C?:?8 and 564C6>6?E2=
decremental F?=62C?:?8
unlearning 2AAC@249
approach.IA6CE
Expert (JDE6>D
Systems H:E9with
IEEE Sensors Journal, vol. 20,AA
(6?D@CD@FC?2=G@= pp. 1 1959-1
 1966,  2020. Applications, 167,
AA=:42E:@?D  1 14 154.
0[3]1  G.%C2392<2C
Prabhakar, B. Kailath,
2:=2E9(S.#2E2C2;2?2?5'
Natarajan and R. Kumar, "Obstacle detection
F>2C$3DE24=656E64E:@? 0[23]1 Kashef,
2D967 ' R. (2021, AC:=
April). (42EE6C:?832D65
Scattering-based &F2=:EJ Quality "62DFC6D
Measures. ? In 2021
and 4=2DD:7:42E:@?
2?5 classification FD:?8
using 566A
deep =62C?:?8
learning 7@C for EC24<:?8
tracking :?in 9:89DA665
high-speed IEEE ?E6C?2E:@?2=
 International $) lOT, =64EC@?:4D
Electronics 2?5 and "6492EC@?:4D
Mechatronics @?76C6?46
Conference
autonomous 5C:G:?8
2FE@?@>@FD driving," :?
in 
IEEE '68:@?
Region 10 (J>A@D:F>
Symposium )#(."%
(TENSYMP), (IEMTRONICS) (pp. 1-8).
")'$#(AA IEEE.
 
Cochin, India, 20 17.
@49:??5:2  0[24]
1 Kashef,
2D967' R.(2020). Enhancing the Role of Large-Scale Recommendation
 ?92?4:?8E96'@=6@7!2C86(42=6'64@>>6?52E:@?
01
[4] " M. $0. (S. '
R. )T. '
R. "
M. 
E. '
R. 
B. *U. F. (S. '
R. 
B. (S. "2C:FD
Marius @C5ED
Cordts, )96
"The Systems in the loT Context.446DD
(JDE6>D:?E96@)@?E6IE IEEE Access, 8, 178248-178257.
    
Cityscapes 52E2D6E
:EJD42A6D dataset 7@C
for D6>2?E:4
semantic F32?uban D46?6
scene F?56CDE2?5:?8
understanding," :? in 
IEEE
0[25] Nawara, 
1 #2H2C2 D., 
& Kashef,
2D967 ' R. (2021). Context-Aware '64@>>6?52E:@?
  @?E6IEH2C6 Recommendation
Conference on Computer Vision and Pattern Recognition (CVPR), 2016.
@?76C6?46@?@>AFE6C+:D:@?2?5%2EE6C?'64@8?:E:@?+%'  Systems :?
(JDE6>D in )96
The @)loT ?G:C@?>6?E
Environment @)'(O
(IoT-CARS)-A @>AC696?D:G6
Comprehensive
J.=6C<"2IH6==)C62E:D6@?=64EC:4:EJ2?5"28?6E:D>
Clerk Maxwell, A Treatise on Electricity and Magnetism, 3rd ed.,
C565 vol.
G@= Overview.446DD
IEEE Access.
$G6CG:6H
2.$I7@C5=2C6?5@?
Oxford: Clarendon, 1892,  AApp.68-73.
O 
[5] (S. !
01 L. #
N.  0. .
E. $ Y.G Naresh, 
 #2C6D9 "A '6D:5F2=
Residual 6?4@56C564@56C
encoder-decoder ?6EH@C<
network 7@C
for
semantic D68>6?E2E:@?
D6>2?E:4 segmentation :? in 2FE@?@>@FD
autonomous 5C:G:?8driving D46?2C:@D
scenarios," :?
in FC@A62?
European
Signal Processing Conference (EUSIPCO), 20 18.
(:8?2=%C@46DD:?8@?76C6?46*(%$ 


1373
1373

Authorized licensed use limited to: PES University Bengaluru. Downloaded on December 12,2023 at 15:24:38 UTC from IEEE Xplore. Restrictions apply.

You might also like