Deep Learning-Based Semantic Segmentation in Autonomous Driving
Deep Learning-Based Semantic Segmentation in Autonomous Driving
& Systems; 19th Int Conf on Smart City; 7th Int Conf on Dependability in Sensor, Cloud & Big Data Systems & Application (HPCC/DSS/SmartCity/DependSys) | 978-1-6654-9457-1/21/$31.00 ©2021 IEEE | DOI: 10.1109/HPCC-DSS-SMARTCITY-DEPENDSYS53884.2021.00206
2021 IEEE Conf on High Performance Computing &
IEEE 23rd Int Conf & Communications; 7th Int Conf & Systems; 19th
Conf on Data Science & 19th
Conf on Smart City; 7th Int Conf
Int Conf & Big Data Systems
Conf on Dependability in Sensor, Cloud & & Application
Systems &
Deep
Learning-B ased Segmentation
Semantic in
Autonomous Driving
Hrag-Harout
Jebamikyous Kashef
Rasha
of
Dept and
Electrical
Computer of
Dept and
Electrical
Computer
Engineering
Engineering
Ryerson
University
Ryerson
University
Toronto, Canada
Toronto,
Canada
[email protected]
rkashef@ryerson .ca
-%#'"!
A bstract-Perception & ' first
is the %&' !
and most"&' important
#"%'!' '&
task "
of 735:B;J7>;@3@;?397FA357DF3;@5>3EE3@63EE;9@;@93G@;CG7
each pixel in an image to a certain class and assigning a unique
!,
any ('"!"
autonomous "(& %)!
driving &,&'
system. It' +'%'&
extracts )&(
visual !"% '"!
information 5A>ADFA735:5>3EE
color to each class.
"(''&(%%"(!!!)%"!
about the surrounding environment !'"')#%#'"!
of the vehicle. The perception
'
data &
is '!
then
fed '"to
a &"! ! &,&'
decision-making system '"to #%")
provide '
the @F:;EB3B7DI7F35=>76E7?3@F;5E79?7@F3F;A@GE;@93H7DK
In this paper, we tackled semantic segmentation using a very
"#'
optimum( &"!
decision )!
given
a &#
specific &!%"
scenario '"
to )"
avoid #"'!'
potential I7>> =@AI@E7?3@F;5E79?7@F3F;A@?A67>GE768AD4;A?76;53>
well-known semantic segmentation model used for biomedical
"&"!&
collisions. In !'&##%*))"#)%!'&"''
this paper, we have developed variants of the U-Net ;?397E79?7@F3F;A@F3E=E53>>76+ %7F *:7@3?7A8F:7?A67>
image segmentation tasks, called U-Net. The name ofthe model
"'"#%"%
model to perform& !'&
semantic !''"!"!(%!&!
segmentation on urban scene images & ;E;@EB;D764KF:7E:3B7A8F:73D5:;F75FGD7I:;5:>AA=E>;=7F:7
is inspired by the shape of the architecture, which looks like the
'"(!%&'!'&(%%"(!!&"!('"!"
to understand the surroundings of an autonomous "(&)
vehicle. The U >7FF7D
letter +U. *:7
The + %7F ?A67>
U-Net model ;E
is A@7
one A8
of F:7
the 87I
few 7J;EF;@9
existing
' "!'&)%!'&%"#'"%&
Net model and its variants are adopted for semantic !'& !''"!
segmentation
3D5:;F75FGD7EI:;5:B7D8AD?I7>>A@E?3>>63F3E7FE3@6I3E@AF
architectures which perform well on small datasets and was not
in this project to account for the power of the UNet in handling
!'&#%"''""(!'"%'#"*%"''!!!
%
BD7H;AGE>K
previously F7EF76
tested ;@
in 3@
an 3GFA@A?AGE 6D;H;@9 E57@3D;A
autonomous driving I;F: 3
scenario with a
large !
and &small '&'&
datasets.
We )
have &"
also " #% '
compared the &'
best
#%"%
performing ! )%!'
variant *'
with "'% "
other commonly "!, (&
used & !'
semantic
>3D97@G?47DA85>3EE7E3@63E?3>>@G?47DA8FD3;@;@9;?397E
large number of classes, and a small number of training images.
& !''"! models.
segmentation "&
The " #%') !,&&
comparative analysis *&
was #%"%
performed 8F7DFD3;@;@9?G>F;B>7+
After training multiple U-Net%7F?A67>EI;F:6;887D7@F35F;H3F;A@
models with different activation
(&!
using '%
three *!"*!
well-known models,"& !(!
including FCN-16, FCN-8, !
and 8G@5F;A@E
functions, D79G>3D;L3F;A@
regularization F75:@;CG7E
techniques, 3@6
and 6;887D7@F
different 67BF:E
depths, I7
we
'
SegNet. After'%"!('!&!&')',!"
conducting sensitivity and comparative#%')!,&&'
analysis, it BDAH76F:3F+
proved that U-Net%7F5AG>6:3H73BDA?;E;@98GFGD7;@F:78;7>6A8
could have a promising future in the field of
&"!('''')%!'&#%"%
is concluded that the U-Net variants performed '&'!'%
the best in terms & 3GFA@A?AGE6D;H;@93@6E57@7G@67DEF3@6;@96G7FA;FE34;>;FK
autonomous driving and scene understanding due to its ability
"' !'%&'"!")%!"!
of the Intersection ")('"!
over Union (loU) evaluation metric'%!"'%
and other FA3@EI7DF:7N-:3FO3@6N-:7D7OF:7A4<75FCG7EF;A@E
to answer the "What" and "Where" the object questions.
$(', '%&
quality metrics.
*AF:747EFA8AGD=@AI>7697@AD7E73D5:IAD=:;9:>;9:FEF:7
To the best of our knowledge, no research work highlights the
Driving,
Keywords-Autonomous Semantic
Segmentation,
U GE7A8F:7+ %7F?A67>;@,'I;F:3@7JF7@E;H75A?B3D;EA@
use of the U-Net model in AVP, with an extensive comparison
FCN,
Net, SegNet,
Encoder-Decoder. I;F: AF:7D
with other 5A??A@>K
commonly GE76
used E7?3@F;5
semantic E79?7@F3F;A@
segmentation ?A67>E
models.
*:GEF:7?3;@5A@FD;4GF;A@EA8F:;EB3B7D3D7
Thus, the main contributions of this paper are:
I. INTRODUCTION
1) )GDH7K;@9
Surveying F:7
the ?AEF
most D757@F
recent D7E73D5:
research IAD=
work ;@
in )7?3@F;5
Semantic
E
As 3GFA?AF;H7
automotive F75:@A>A9K
technology 7HA>H7E
evolves, F:7
the 67?3@6
demand 8AD
for )79?7@F3F;A@A8GD43@3D73E
Segmentation of urban areas.
GFA@A?AGE,7:;5>7E,I;F:6;887D7@F>7H7>EA83GFA@A?K
Autonomous Vehicles (AV) with different levels of autonomy 2) G;>6;@98;H7H3D;3@FEA8F:7+ %7F?A67>
Building five variants of the U-Net model
;E
is ;@5D73E;@9
increasing 6G7
due FA
to F:7
the ;@5D73E7
increase ;@
in 388AD634;>;FK
affordability 3@6
and
3) G;>6;@9FIAH3D;3@FEA8F:7)79%7F%
Building two variants of the SegNet, FCN- 1 3@6%
6, and FCN-8
3557EE;4;>;FK;@6;887D7@FD79;A@E3DAG@6F:79>A47
accessibility in different regions around the globe. *:7;@5D73E7
The increase ?A67>E
models
;@F:7@G?47DA8,EI;>>D7EG>F;@3E387D6D;H;@97JB7D;7@57
in the number of AVs will result in a safer driving experience
4) JF7@E;H7
Extensive E7@E;F;H;FK
sensitivity 3@6
and 5A?B3D3F;H7
comparative 3@3>KE;E
analysis A8
of
3@687I7D;@<GD;7E3@6673F:E6G7FA?;EF3=7E?3674K:G?3@
and fewer injuries and deaths due to mistakes made by human 6;887D7@F?A67>E
different models.
6D;H7DE
drivers. 7B>AK;@9355GD3F73@6788;5;7@F677B>73D@;@9?A67>E
Deploying accurate and efficient deep learning models
*:7 D7EF A8
The rest of F:;E
this D7BADF
report ;E
is AD93@;L76
organized 3E
as 8A>>AIE
follows : )75F;A@
Section II.
FD3;@76A@>3D97D73> IAD>663F3E7FEI;F:H3D;AGEE57@3D;AE;E3@
trained on large real-world datasets with various scenarios is an
6;E5GEE76
discussed >;F7D3FGD7
literature D7H;7I
review 3@6
and D7>3F76
related IAD=
work, )75F;A@
Section III.
7EE7@F;3>B3DFA83GFA@A?AGE6D;H;@9FA7@EGD7F:7E387FKA8F:7
essential part of autonomous driving to ensure the safety of the
;@FDA6G57E
introduces theF:7 BDABAE76
proposed + %7F ?A67>E
U-Net models GE76
used, theF:7 36ABF76
adopted
6D;H7DF:7B3EE7@97DE3@6F:7B767EFD;3@E
driver, the passengers, and the pedestrians.
?A67>E
models for8AD 5A?B3D;EA@
comparison. )75F;A@ , E:AIE
Section IV F:7 7JB7D;?7@F3>
shows the experimental
*:7
The GFA@A?AGE
Autonomous ,7:;5>7
Vehicle '7D57BF;A@
Perception ,'
(AVP) 7JFD35FE
extracts H;EG3>
visual D7EG>FE
results, 3@6
and 8;@3>>K
finally, )75F;A@
Section ,
V 5A@5>G67E
concludes F:7
the B3B7D
paper 3@6
and
;@8AD?3F;A@34AGFF:7EGDDAG@6;@97@H;DA@?7@FA8F:7H7:;5>7
information about the surrounding environment of the vehicle. 6;E5GEE7E8GFGD76;D75F;A@E
discusses future directions.
*:7 B7D57BF;A@ 63F3
The perception data ;E F:7@ 876
is then fed FA
to 3
a 677B
deep >73D@;@9
learning ?A67>
model FA
to
?3=7F:7ABF;?G?675;E;A@ The main two tasks that an AVP
make the optimum decision.*:7?3;@FIAF3E=EF:3F3@,' II. ( -&("
RELATED WORK
EKEF7?
system B7D8AD?E
performs 3D7
are &4<75F 7F75F;A@ 3@6
Object Detection and )7?3@F;5
Semantic ,'
AVP EKEF7?E
systems rely D7>K :73H;>K
heavily A@on E7?3@F;5
semantic E79?7@F3F;A@
segmentation FA to
)79?7@F3F;A@
Segmentation. &4<75F
Object 67F75F;A@
detection 1[ 1 21 212 ;E
] [2][3] is F:7
the F3E=
task A8
of @3H;93F7
navigate F:DAG9:
through GD43@
urban 3D73E
areas. )7?3@F;5
Semantic E79?7@F3F;A@
segmentation 3EE;9@E
assigns
5>3EE;8K;@93@6>A53F;@93@A4<75F;@3@;?3973@6F:7@6D3I;@9
classifying and locating an object in an image and then drawing 735:
each B;J7>
pixel ;@ F:7 ;?397
in the image FAto 3a B3DF;5G>3D
particular 5>3EE
class. >>
All B;J7>E
pixels
34AG@6;@94AJ3DAG@6F:3FA4<75F
a bounding box around that object.&@F:7AF:7D:3@6)7?3@F;5
On the other hand, Semantic 47>A@9;@9FA3EB75;8;55>3EE3D73EE;9@76FA3E;@9>75A>AD3E
belonging to a specific class are assigned to a single color, as
)79?7@F3F;A@
Segmentation B7D8AD?E
performs B7D B;J7> 5>3EE;8;53F;A@
per-pixel classification 4Kby 5>3EE;8K;@9
classifying E:AI@
shown ;@in ;9
Fig. 1, trees
FD77E are
3D7 B3;@F76
painted 9D77@ DA36E are
green, roads 3D7 B3;@F76
painted
4DAI@53DE3D7B3;@F76D767F5
brown, cars are painted red, etc. $AEFE7?3@F;5E79?7@F3F;A@
Most semantic segmentation
*:73GF:ADE;@12F35=>76F:7BDA4>7?A8;9@AD;@9F:76;887D7@F
The authors in [6] tackled the problem of ignoring the different [7]
[7] SegNet
SegNet CamVid
CamVid
;?BADF3@57
importance >7H7>E
levels A8
of 5>3EE7E
classes ;@
in ?AEF
most E7?3@F;5
semantic E79?7@F3F;A@
segmentation
?A67>E [8]
[8] DFPN
DFPN Self-collected
Self-collected
models.AD;@EF3@57E79?7@F;@9B767EFD;3@E3@653DE3D7?AD7
For instance, segmenting pedestrians and cars are more
;?BADF3@F
important F:3@
than E79?7@F;@9
segmenting F:7
the E=K
sky . *A
To 3HA;6
avoid 53F3EFDAB:;5
catastrophic [9]
[9] PointNet,
PointNet, Fused
Fused 3D
3D point
point
5A>>;E;A@E
collisions, 53DE
cars 3@6
and B767EFD;3@E
pedestrians, 3@6
and ?3@K
many AF:7D
other 7EE7@F;3>
essential PointCNN,
PointCNN, cloud
cloud
5>3EE7E?GEF47E79?7@F763E355GD3F7>K3EBAEE;4>7
classes must be segmented as accurately as possible.*AF35=>7To tackle SPGraph
SPGraph
F:;E
this BDA4>7?
problem, F:7
the 3GF:ADE
authors BDABAE76
proposed 3 a >AEE
loss 8G@5F;A@
function 53>>76
called [10] DeepLab
[ 10] DeepLab v3+
v3+ Cityscapes
Cityscapes
P' Importance-Aware
?BADF3@57 I3D7#AEEQ #FA7?B:3E;L7F:7;?BADF3@57
Loss' (IAL) to emphasize the importance
A8
of 5D;F;53>
critical A4<75FE
obj ects ;@
in GD43@
urban E57@3D;AE
scenarios. AGD
Four E7?3@F;5
semantic [11]
[ 1 1] @7FA@@7F
Enet & Bonnet ;FKE53B7E+)/
Cityscapes & USYD
E79?7@F3F;A@?A67>EI7D7FD3;@76GE;@9F:7
segmentation models were trained using the IAL #>AEE8G@5F;A@
loss function,
[12] 88;5;7@F%7F ;FKE53B7E
[ 12] Efficient Net Cityscapes
@3?7>K
namely, )79%7F
SegNet, %7F
ENet, %
FCN, 3@6
and (%7F
ERFNet, 3@6and F7EF76
tested F:7E7
these
?A67>EA@N;FK)53B7EO3@6N3?,;6O63F3E7FE
models on "CityScapes" and "CamVid" datasets.JB7D;?7@F3>
Experimental [13]
[ 13 ] (A36'DA8;>7
Road Profile )7>8 5A>>75F76
Self-collected
D7EG>FE:3H7E:AI@F:3FF:7BDABAE76>AEE8G@5F;A@;?BDAH76F:7
results have shown that the proposed loss function improved the )7?3@F;5
Semantic
E79?7@F3F;A@D7EG>FEA@F:7;?BADF3@F5>3EE7E
segmentation results on the important classes. )79?7@F3F;A@
Segmentation
@12F:73GF:ADE4G;>F3AD?G>3
In [7] , the authors built a Formula-SAE)7>75FD;553D7CG;BB76
electric car equipped [14]
[ 1 4] RFNet
RFNet Cityscapes
Cityscapes &
& Lost
Lost
I;F:
with 3 a #;(
LiDAR E7@EAD
sensor, 4GF
but F:7
the #;(
LiDAR E7@EAD
sensor 5AG>6
could @AF
not and
and Found
Found
355GD3F7>K
accurately 67F75F
detect F:7
the DA36
road 7697E
edges 3@6
and DA36
road >3@7
lane ?3D=;@9E
markings . *A
To
EA>H7
solve F:3F
that BDA4>7?
problem, F:7K
they ;@EF3>>76
installed 3@6
and 53>;4D3F76
calibrated 3
a >AI 5AEF
low-cost
?A@A5G>3D
monocular 53?7D3
camera 3@6
and GE76
used F:7
the )79%7F
SegNet ?A67>
model FA
to 67F75F
detect F:7
the III. #+
LEVERAGING %
U-NET FOR + )SCENE
URBAN )SEGMENTATION
34AH7 ?7@F;A@765>3EE7E355GD3F7>K
above-mentioned classes accurately.*:77JB7D;?7@F3>D7EG>FE
The experimental results *AB7D8AD?E7?3@F;5E79?7@F3F;A@8ADE57@7G@67DEF3@6;@9
To perform semantic segmentation for scene understanding
A@F:7N3?,;6O63F3E7FBDAH76F:3FF:7;D?7F:A6;?BDAH76F:7
on the "CamVid" dataset proved that their method improved the ;@
in 3GFA@A?AGE
autonomous H7:;5>7E
vehicles, I7
we :3H7
have ;?B>7?7@F76
implemented 8;H7
five 6;887D7@F
different
B7D8AD?3@57A867F75F;@9DA367697E3@6DA36>3@7?3D=;@9E
performance of detecting road edges and road lane markings. H3D;3F;A@E
variations A8
of F:7
the + %7F ?A67>
U-Net model. *:7The + %7F ?A67>
U-Net model I3E
was
BD7H;AGE>K67E;9@763@6;?B>7?7@F767J5>GE;H7>K8AD?76;53>
previously designed and implemented exclusively for medical
*:7
The 3GF:ADE
authors ;@
in 12
[8] ;@FDA6G576
introduced 3 a 7@E7
Dense 73FGD7
Feature 'KD3?;6
Pyramid
;?397
image E79?7@F3F;A@
segmentation F3E=E
tasks 1[ 12
5] . E
As F:7
the ?A67>E
model's @3?7
name ?3K
may
%7FIAD=
Network '%
(DFPN) 43E76
based 677B
deep >73D@;@9
learning ?A67>
model FA
to 355GD3F7>K
accurately
;?B>KF:7?A67>3D5:;F75FGD7;9
imply, the model architecture (Fig.2) :3EF:7>7FF7DP+QE:3B7
has the letter 'U' shape.
7JFD35FF:7DA36?3D=;@9E4K5A@53F7@3F;@9F:7E:3>>AI873FGD7
extract the road markings by concatenating the shallow feature
1368
1368
Authorized licensed use limited to: PES University Bengaluru. Downloaded on December 12,2023 at 15:24:38 UTC from IEEE Xplore. Restrictions apply.
-7
We 3>EA
also FD3;@76
trained FIA
two ?AD7
more + %7F H3D;3@FE
U-Net variants 3@6
and 53>>76
called F:7?
them
m-
N#A@9+ %7FO4753GE7I736676FIA>3K7DEA@F:75A@FD35F;@9
"Long U-Net", because we added two layers on the contracting
B3F:
path 3@6
and FIA
two >3K7DE
layers A@
on F:7
the 5ADD7EBA@6;@9
corresponding 7JB3@E;H7
expansive B3F:
path, 3E
as
E:AI@;@;9
shown in Fig.4.*A:7>BF:7?A67>97@7D3>;L747FF7DI7GE763
To help the model generalize better, we used a
D79G>3D;L3F;A@F75:@;CG753>>76DABAGF
regularization technique called Dropout.-7FD3;@76A@7N#A@9
We trained one "Long
+ %7FO?A67>I;F:3DABAGFD3F7A8
U-Net" model with a Dropout rate of 0 3@6FD3;@763@AF:7D
. 5 and trained another
N#A@9+ %7FOI;F:3DABAGFD3F7A8
"Long U-Net" with a Dropout rate of 0.7FA3@3>KL7F:7?A67>
to analyze the model
B7D8AD?3@57
performance.
A. #$'#
The Adopted Models &
-7
We :3H7
have 36ABF76
adopted F:D77
three 5A??A@>K
commonly GE76 used E7?3@F;5
semantic
E79?7@F3F;A@
segmentation ?A67>E
models: )79%7F
SegNet, %
FCN- 1
6 , 3@6
and % FA
FCN-8, to
. __ _ ;
• u.-c-; -
5A?B3D7
compare F:7;D
their B7D8AD?3@57
performance I;F:
with F:7
the 47EF
best B7D8AD?;@9
performing + %7F
U-Net
?A67>
model.&F:7DF:3@F:7AD;9;@3>F:D77?A67>EI7:3H74G;>FF:D77
Other than the original three models, we have built three
AF:7D6;887D7@FI;F:F:7A@>K6;887D7@57;EF:7DABAGFF75:@;CG7
other different with the only difference is the Dropout technique
(&'$$0.#'(0$#01.$
Fig. 2. The U-Net Architecture 36676FA735:?A67>4753GE7F:747EFB7D8AD?;@9+ %7F;EF:7
added to each model because the best performing U-Net is the
A@7I;F:3DABAGFD3F7A8
one with a Dropout rate of 0.5
.
*:7
The + %7F 5A@E;EFE
U-Net consists A8
of FIA
two B3F:E
paths, 3
a 5A@FD35F;@9
contracting B3F:
path 3@6
and 3@
an
7JB3@E;H7
expansive B3F:
path. *:7
The 5A@FD35F;@9
contracting B3F:
path 3>EA
also 53>>76
called F:7
the 6AI@
down
E3?B>;@9B3F:5A@E;EFEA8D7B73F76FIAJ5A@HA>GF;A@EI;F:
sampling path, consists of repeated two 3 x 3 convolutions, with
3a (75F;8;76
Rectified #;@73D
Linear +@;F
Unit (7#+
(ReLU) 3Eas F:7;D
their 35F;H3F;A@
activation 8G@5F;A@
function,
8A>>AI764K3
followed by a 2J x 2?3JBAA>;@9AB7D3F;A@I;F:3EFD;67A8
max pooling operation with a stride of 2
GE76
used FA
to 6AI@ E3?B>7 35:
down-sample. Each 6AI@ E3?B>;@9 EF7B
down-sampling step ;@
in F:7
the
5A@FD35F;@9B3F:F:7@G?47DA8873FGD75:3@@7>E;E6AG4>76F:7
contracting path, the number of feature channels is doubled, the
;?397
image E;L7
size B3F:
path 9D36G3>>K
gradually 675D73E7E
decreases F:7the 67BF:
depth ;@5D73E7E
increases. *:7
The
7JB3@E;H7
expansive B3F:
path, 3>EA
also 53>>76
called F:7
the GB E3?B>;@9 B3F:
up-sampling path, 5A@E;EFE
consists A8
of
GB E3?B>;@9A8873FGD7?3B3@63
up-sampling of feature map and a 2J x 25A@HA>GF;A@FA:3>H7F:7
convolution to halve the
@G?47D
number A8 of 873FGD7
feature 5:3@@7>E
channels, 3@3
ana 5A@53F7@3F;A@
concatenation I;F:with F:7
the
5ADD7EBA@6;@9B3D3>>7>5DABB76873FGD7?3BA@F:75A@FD35F;@9
corresponding parallel cropped feature map on the contracting
B3F:
path, FIA
two 3 J
x
3 5A@HA>GF;A@E
convolutions, I;F:
with (75F;8;76
Rectified #;@73D
Linear +@;F
Unit
(7#+3EF:7;D35F;H3F;A@8G@5F;A@
(ReLU) as their activation function.*:78;@3>>3K7D;E3
The final layer is a 1J x 1
5A@HA>GF;A@3>FA?3BF:7873FGD7H75FADEFAF:75ADD7EBA@6;@9
convolutional to map the feature vectors to the corresponding
@G?47DA85>3EE7E
number of classes.*:7E;L7A8F:7;?397;@F:77JB3@E;H7B3F:
The size of the image in the expansive path
9D36G3>>K
gradually ;@5D73E7E
increases, 3@6
and F:7
the 67BF:
depth 675D73E7E
decreases. -7
We FD3;@76
trained FIA
two
AF:7D+ %7F?A67>EI;F:8AGDF;?7EE?3>>7D873FGD75:3@@7>E
other U-Net models with four times smaller feature channels,
3EE:AI@;@;9
as shown in Fig.A@7I;F:(7#+3E;FE35F;H3F;A@8G@5F;A@3@6
3 , one with ReLU as its activation function and
F:7E75A@6?A67>I;F:#73=K(7#+3E;FE35F;H3F;A@8G@5F;A@
the second model with LeakyReLU as its activation function. (&
Fig. 4'$,+&$0.#'(0$#01.$
. The Long U-Net Architecture
'!#
a) The SegNet model
*:7)79%7F?A67>5A@E;EFEA83@7@5A67D3@635ADD7EBA@6;@9
The SegNet model consists of an encoder and a corresponding
675A67D
decoder @7FIAD=
network. *:7
The 8;@3>
final >3K7D
layer B7D8AD?E
performs B;J7> I;E7
pixel-wise
5>3EE;8;53F;A@A8F:7;@BGF;?3973EE:AI@;@;9
classification of the input image, as shown in Fig.5. Inspired
@EB;D76
4KF:7,
by the VGG- 1@7FIAD=67E;9@768ADA4<75F5>3EE;8;53F;A@F:7K
6 network, designed for object classification, they
GE76
used 1 3 5A@HA>GF;A@3>
convolutional >3K7DE
layers ;@ in F:7
the 7@5A67D
encoder @7FIAD=
network
D7BD7E7@F76
represented 4Kby 4>G7
blue 4AJ7E
boxes, 8A>>AI76
followed 4K by BAA>;@9
pooling >3K7DE
layers
D7BD7E7@F76
represented 4K
by 9D77@
green 4AJ7E
boxes FA
to D76G57
reduce F:7
the 6;?7@E;A@E
dimensions A8
of F:7
the
873FGD7
feature ?3BE
maps. *:7K
They 6;E53D676
discarded F:7
the 8G>>K
fully 5A@@75F76
connected >3K7DE
layers FA
to
D7F3;@:;9:7DD7EA>GF;A@873FGD7?3BE3FF:77@5A67DAGFBGF
retain higher resolution feature maps at the encoder output.K By
6;E53D6;@9
discarding F:7
the F:D77
three 8G>>K
fully 5A@@75F76
connected >3K7DE
layers A8
of ,
VGG- 1
6, F:7
the
3GF:ADE
authors 6D3EF;53>>K
drastically D76G576
reduced F:7
the @G?47D
number A8 of )79%7F
SegNet ?A67>
model
B3D3?7F7DE
parameters. 35:
Each 7@5A67D
encoder >3K7D
layer :3E
has 3
a 5ADD7EBA@6;@9
corresponding 675A67D
decoder
t> Corwl�l '<�Ll >3K7D
layer.*:7675A67D@7FIAD=3>EA:3E
The decoder network also has 1>3K7DEBD7576764KGB
3 layers, preceded by up
,. Coor 1110 c.�,
E3?B>;@9>3K7DED7BD7E7@F764KD764AJ7EFA
sampling layers represented by red boxes to ?3=7F:7AGFBGF
make the output
873FGD7?3BEF:7E3?7E;L73EF:7;@BGF
feature maps the same size as the input.*:7675A67DAGFBGF;E
The decoder output is
876FA3EA8F ?3J5>3EE;8;7DI:;5:BDA6G57E5>3EEBDA434;>;F;7E
fed to a soft-max classifier which produces class probabilities
(& '$$0.#'(0$#01.$2(0'*"))$.$"01.$'"++$)/*"))$0
Fig. 3. The U-Net Architecture with Smaller Feature Channels (Small U-Net)
1369
1369
Authorized licensed use limited to: PES University Bengaluru. Downloaded on December 12,2023 at 15:24:38 UTC from IEEE Xplore. Restrictions apply.
8AD735:B;J7>3@6F:7BD76;5F;A@5ADD7EBA@6EFAF:75>3EEI;F:
for each pixel, and the prediction corresponds to the class with G7FA?7?ADK3@6'+F;?7D7EFD;5F;A@E;?BAE764KAA9>7
Due to memory and GPU time restrictions imposed by Google
?3J;?G?BDA434;>;FK3F7H7DKB;J7>
maximum probability at every pixel. A>34I7:36FAGE73I;67>KGE7663F3E7FI;F:3E?3>>@G?47D
Co lab, we had to use a widely used dataset with a small number
A8
of ;?397E
images, 53>>76
called F:7
the 3?4D;697
Cambridge M- D;H;@9
Driving #347>76
Labeled ,;67A
Video
3F343E73?,;61 2I:;5:5A@E;EFEA8
, which consists of70 1AH7D3>>;?397E
overall images.
Convolutional Encoder-Decoder
Database (CamVid) [20]
C. %#%!") ('#"'%&
Peifonnance Evaluation Metrics
'7D8AD?3@577H3>G3F;A@;ED7CG;D76FA7H3>G3F73@6ABF;?;L7
Performance evaluation is required to evaluate and optimize
3@K?35:;@7>73D@;@9?A67>3@65A?B3D7;FI;F:AF:7D?A67>E
any machine learning model and compare it with other models.
;887D7@F7H3>G3F;A@?7FD;5E3D7GE76;@F:7>;F7D3FGD7F:;EE75F;A@
Different evaluation metrics are used in the literature; this section
67E5D;47EF:7?AEF788;5;7@F3@6I;67>KGE76?7FD;5E;@E7?3@F;5
describes the most efficient and widely used metrics in semantic
E79?7@F3F;A@F3E=E
segmentation tasks. Intersection
@F7DE75F;A@&H7D+@;A@ A+?3FD;53>EA
Over Union (loU) matric, also
=@AI@
known 3E as !3553D6 @67J ;E
Jaccard Index, is I;67>K
widely GE76
used FA
to 7H3>G3F7
evaluate E7?3@F;5
semantic
segmentation models. It
E79?7@F3F;A@?A67>E F5A?BGF7EF:7B7D57@FAH7D>3B47FI77@
computes the percent overlap between
(&
Fig. 5.'$$&$0.#'(0$#01.$
The SegNet Architecture [16]
!
F:7
the 9DAG@6
ground FDGF:
truth ?3E=
mask 3@6
and F:7
the BD76;5F;A@
prediction AGFBGF
output. E
As E:AI@
shown ;@in
( , Convolutional
b) Fully #")# ('#" '*#% FCN-8)
Network (FCN-16, C
Eq. l, loU
A+?73EGD7EF:7@G?47DA85A??A@B;J7>E47FI77@F:7
measures the number of common pixels between the
-7 BD76;5F;A@
prediction 3@6and 9DAG@6
ground FDGF:
truth ?3E=E
masks 3@6
and 6;H;67E
divides ;F
it 4K
by F:7
the FAF3>
total
We :3H7
have ;?B>7?7@F76
implemented F:7the %
FCN- 1 6 3@6
and %
FCN-8 1[ 121
7] [ 12
8] A@>K
only
4753GE7 @G?47D
number A8 of B;J7>E
pixels BD7E7@F
present ;@ in 4AF:
both ?3E=E
masks. $G>F; 5>3EE
Multi-class
because %FCN-32 :36
had BDAH7@
proven ;FEits BAAD
poor B7D8AD?3@57
performance ;@ in F:7
the
>;F7D3FGD74753GE73FF:7AGFBGFA85A@H3EE:AI@;@;9 E79?7@F3F;A@
segmentation F3E=Etasks GE7
use F:7
the ?73@ @F7DE75F;A@ &H7D
mean Intersection Over +@;A@
Union
literature, because at the output of conv7, as shown in Fig. 6
47>AI ? A+ ?7FD;5
(mioU) metric 8AD
for ?A67>
model 7H3>G3F;A@
evaluation, I:;5:
which 8;DEF
first 5A?BGF7E
computes F:7
the
below, F:7
the ;?397
image E;L7
size 475A?7E
becomes H7DK
very E?3>>
small, FA
to ?3=7
make F:7 the
E79?7@F3F;A@AGFBGF:3H7F:7E3?7E;L73EF:7;@BGF;?397 A+A8735:5>3EE3@6F:7@5A?BGF7EF:73H7D397AH7D3>>5>3EE7E
loU of each class and then computes the average overall classes.
segmentation output have the same size as the input image 3 2J x
GB E3?B>;@9;EB7D8AD?76I:;5:?3=7EF:7AGFBGFH7DKDAG9:
up-sampling is performed, which makes the output very rough
loU
=
nPredicted
4753GE7I:7@9A;@9677B7DF:7EB3F;3>>A53F;A@;@8AD?3F;A@;E
because when going deeper the spatial location information is
Target
UPredicted
Target
( l
)
>AEF
lost.*:3F;EI:K%
That is why FCN- 13@6%
6 and FCN-8 B7D8AD?47FF7D4753GE7
perform better because
55GD35KC ;EF:7?AEFGE767H3>G3F;A@?7FD;5;@$35:;@7
Accuracy (Eq.2) is the most used evaluation metric in Machine
F:7K4AF:GE7FIA3@68AGDF;?7E>7EEGB E3?B>;@9 In
they both use two and four times less up-sampling. @F:7%
the FCN-
#73D@;@9D7E73D5:4GF;F;EG@D7>;34>7;@E7?3@F;5E79?7@F3F;A@
Learning research, but it is unreliable in semantic segmentation
@7FIAD=F:7AGFBGFA85A@H;E
16 network, the output of conv7 is 2JGB E3?B>763@68GE76
x up-sampled and fused
F3E=E
tasks. It
F ?73EGD7E
measures 3>>
all F:7
the 5ADD75F>K
correctly ;67@F;8;76
identified 5>3EE7E
classes 3@6
and ;E
is
I;F:
with BAA>
pool4 3@6
and B7D8AD?76
performed 16 J
x GB E3?B>;@9 In
up-sampling. @ F:7
the %FCN-8 :7>B8G>I:7@3>>F:75>3EE7E3D77CG3>>K;?BADF3@F
helpful when all the classes are equally important.
3D5:;F75FGD7F:7AGFBGFA85A@H;EJGB
architecture, the output of conv7 is 4 x up-sampledE3?B>763@68GE76
and fused
I;F:
with 2JBAA>3@6BAA>F:7@B7D8AD?76JGB E3?B>;@9
x pool4 and pool3, then performed 8 x up-sampling.
A CCUracy=
Positive
!+ True
Negative
!
True
(2)
Positive+
True ! ! True
False Positive+ Negative+False
! Negative
!
-7
We GE76
used F l -Score
)5AD7 1[ 2 12] -[25],
1 2 3a 47FF7D
better 7H3>G3F;A@
evaluation ?7FD;5
metric F:3@
than
55GD35K8AD;?43>3@5765>3EE6;EFD;4GF;A@3@6;F;E?73EGD76
I l l I I I I § I EJ [ I I I
Accuracy for imbalanced class distribution, and it is measured
4K53>5G>3F;@9F:7:3D?A@;5?73@A8F:7'D75;E;A@3@6(753>>
by calculating the harmonic mean of the Precision and Recall.
*:7 )5AD7;E53>5G>3F764KC
The F l -Score is calculated by Eq.5 .
R
=
__
T
..:.ru
..:.
Positive
True !
Positive+False
True ! Positive
ecall= !
'-'
e Po
!
c.. s..:.
..:.it..:.
_
iv"-
e
_
!
!
True Positive+False Negative
(4)
(3)
Fl_ Score
= 2 (S)
# (Precision
,. Recall)
$
*
# (Precision+
Recall)
$
-73>EAGE76F:7;57A788;5;7@FC
We also used the Dice Coefficient (Eq.6)E;?;>3DFA A+4753GE7
similar to loU because
(& '$.#'(0$#01.$,%
Fig. 6.
The Architecture ofFCN (FCN-32,
FCN-16, !
FCN-8) [19] F:7K3D7BAE;F;H7>K5ADD7>3F76
they are positively correlated. It
F53>5G>3F7EF:73D73A8AH7D>3B
calculates the area of overlap
47FI77@F:79DAG@6FDGF:3@6F:7BD76;5F76?3E=3@66;H;67E;F
between the ground truth and the predicted mask and divides it
I;F:F:7FAF3>@G?47DA8B;J7>E;@4AF:?3E=E3@6F:7@6AG4>7
with the total number of pixels in both masks and then double
B. %""
Training F:7D7EG>F;@9H3>G7E
the resulting values
.
*:7
The 5A67 code I3E
was ID;FF7@
written ;@
in F:7
the 'KF:A@
Python3 BDA9D3??;@9
programming
>3@9G397GE;@9F:7*7@EAD8>AI>;4D3DK3EF:7435=7@63@6F:7 D i c e Coefficient= 2
!
language, using the Tensorflow library as the backend and the *
Number
Total
Area of Overlap
"
of Pixels in both Masks
(6)
"7D3E>;4D3DK3EF:78DA@F7@6
Keras library as the frontend.*:7?A67>EI7D7FD3;@76A@F:7
The models were trained on the
AA9>7A>34B>3F8AD?GE;@9F:7BDAH;676'+4KAA9>7
Google Co lab platform using the provided GPU by Google.-7 We
GE76N63?O3EF:7ABF;?;L7D8AD3>>?A67>EF:7N3F79AD;53>
used "Adam" as the optimizer for all models, the "Categorical , EXPERIMENTAL
IV. -%'"#)!( AND
(*!)(#
RESULTS #!.((
ANALYSIS
DAEE@FDABKO3EF:7>AEE8G@5F;A@3@6F:743F5:E;L7I3EE7F
Cross Entropy" as the loss function, and the batch size was set
*:;E
This E75F;A@
section I;>>
will E:AI
show 3@6
and 3@3>KL7
analyze F:7
the D7EG>FE
results A8of
FA
to 3 2, 3E
as B7D
per F:7
the 47EF
best BD35F;57
practice ;@
in $35:;@7
Machine #73D@;@9
Learning D7E73D5:
research.
;?B>7?7@F;@97>7H7@?A67>E3@65A?B3D7F:7?43E76A@F:7;D
implementing eleven models and compare them based on their
*:7
The ?3J;?G?
maximum @G?47D number A8of 7BA5:E
epochs ;F7D3F;A@E
(iterations) I3E
was E7F
set FA
to 1 00.
55GD35K#AEE?
Accuracy, Loss, mloA+ )5AD73@6;575A788;5;7@F
U, F 1 -Score, and Dice coefficient.-7I;>>
We will
)F;>>I7GE76F:773D>KEFABB;@9?7F:A6FAEFABFD3;@;@9I:7@
Still, w e used the early stopping method t o stop training when
3>EA
also 67?A@EFD3F7
demonstrate F:7
the B7D8AD?3@57
performance A8of 735:
each ?A67>
model A@ on F:D77
three
F:7
the ?A67>EQ
models' H3>;63F;A@
validation ?73@ @F7DE75F;A@ AH7D
mean Intersection over +@;A@
Union ? A+
(mioU)
6;887D7@F;?397E8DA?F:7F7EFE7F
different images from the test set.
6A7E@AF;?BDAH738F7DF7@7BA5:E
does not improve after ten epochs.
1370
1370
Authorized licensed use limited to: PES University Bengaluru. Downloaded on December 12,2023 at 15:24:38 UTC from IEEE Xplore. Restrictions apply.
A. '&'
Dataset 8;DEFA@7;EF:7AD;9;@3>?A67>3@6F:7E75A@6H3D;3@FGE;@9F:7
first one is the original model and the second variant using the
*:7
The 63F3E7F
dataset I7
we GE76
used ;E
is 53>>76
called *:7
The 3?4D;697 D;H;@9
Cambridge-Driving
6DABAGFD79G>3D;L3F;A@F75:@;CG7I;F:36DABAGFD3F7A8
dropout regularization technique with a dropout rate of 0 .3@6
5 , and
#347>76
Labeled ,;67A
Video 3F343E7
Database 3?,;6
(Cam Vid). ItF BDAH;67E
provides B7D B;J7>
per-pixel
F:7
the #A@9
Long + %7F ?A67>E
U-Net models EF;>>
still AGFB7D8AD?76
outperformed F:7
the F:D77
three AF:7D
other
E7?3@F;5
semantic E79?7@F3F;A@
segmentation A8 of AH7D
over 700 ;?397E
images 3@6
and F:7;D
their
?A67>E
models.
5ADD7EBA@6;@9DAG@6*DGF:?3E=E>347>EFD3;@;@9
corresponding Ground Truth masks (labels), 367 training, 1 0 1 -73>EA5A?B3D76F:7?A67>E;@F7D?EA8F:7;D#AEE55GD35K
We also compared the models in terms of their Loss, Accuracy,
H3>;63F;A@
validation, 3@6and 2
3 3 F7EF
test B3;DE
pairs A8
of
32 E7?3@F;5
semantic 5>3EE7E
classes. *:7
The )5AD7 3@6
F l -Score, and ;57
Dice 5A788;5;7@F
coefficient. *:7 + %7F ?A67>
The U-Net model I;F:
with (7#+
ReLU
E7?3@F;5
semantic 5>3EE7E
classes 3D7
are A8
of F:7
the 5A??A@>K
commonly 7J;EF;@9
existing A4<75FE
obj ects ;@
in 3
a 35F;H3F;A@8G@5F;A@D75AD676F:7>AI7EF>AEEA8
activation function recorded the lowest loss of 0. 3823. *:7#A@9
The Long
D79G>3D6D;H;@9E57@7D3@9;@98DA?3DE'767EFD;3@E@;?3>E
regular driving scene, ranging from Cars, Pedestrians, Animals, + %7FI;F:3DABAGFD3F7A8
U-Net with a Dropout rate of 0. D75AD676F:7:;9:7EF355GD35K
7 recorded the highest accuracy
G;>6;@9E
Buildings, E;67I3>=E
sidewalks, *D388;5 #;9:FE 3@6
Traffic Lights, and ?3@K
many ?AD7
more. *:7
The A8 3@6F:7:;9:7EF;57A788;5;7@F
of 89.85% and the highest Dice Coefficient 0. 9 1 F:7#A@9+
9, the Long U
AH7D3>>63F343E75A@E;EFEA8F7@?;@GF7EA8:;9:
overall database consists of ten minutes of high-quality CG3>;FK3 0HZ
0 %7FI;F:3DABAGFD3F7A8
Net with a Dropout rate of 0 .D75AD676F:7:;9:7EF )5AD7A8
5 recorded the highest F l -Score of
8AAF397
footage, 3@6
and F:7
the ;?397E
images I7we GE76
used FA
to FD3;@
train AGD
our ?A67>E
models I7D7
were 3EE:AI@;@*34>7E
0.6384, as shown in Tables II3@6;9GD7
and Figure 8.
53BFGD763F
captured at 1HZ. 0
)!
TABLE II. ! +
(LOSS
Vs.
ACCURACY)
B. Preprocessing
%$%#&&"
Model
Loss
Accuracy
*:7;?397E3@6?3E=E3D7;@E7B3D3F78A>67DE3@6I7B3;D76
The images and masks are in separate folders, and we paired *#6E'6!*
U-Net ReLU
0.3823
89.74%
735:;?397I;F:;FE5ADD7EBA@6;@9?3E=
each image with its corresponding mask.AF:;?397E3@6?3E=E
Both images and masks
3D7D7E;L76FA
are resized to 5 1 2J
x 5 1 2 . Images
?397E3D75A@H7DF76FA@G?BK3DD3KE
are converted to numpy arrays (>2==*#6E'6!*
Small U-Net ReLU
0.5254
87.96%
8AD73E;7DF7@EAD53>5G>3F;A@E3@6@AD?3>;L764K6;H;6;@94K
for easier tensor calculations and normalized by dividing by 255. (>2==*#6E!62<J'6!*
Small U-Net Leaky ReLU
0.4436
87.80%
*:7?3E=E3D73>EA5A@H7DF76FA@G?BK3DD3KE3@63D7?3BB76
The masks are also converted to numpy arrays and are mapped
!@?8*#6EC@A@FE
Long U-Net (Dropout�O.
7)
0.3971 89.85%
U-Net
Long (Dropout�0.5)
0.3901
89.13%
FAF:75ADD7EBA@6;@95>3EE7E
to the corresponding classes.*:75>3EE7E3D79;H7@;@3@7J57>
The classes are given in an excel
SegNet
(68#6E
0.4978
85.37°'
o
E:77FI;F:F:7PDQP9Q3@6P4QH3>G7EA8735:5>3EE
sheet with the 'r' , 'g', and 'b' values of each class.
(68#6EC@A@FE
SegNet (Dropout�50)
0.5652 83.97%
C. +$%!"' &(
Experimental Results'& #
FCN-16
0.5027
84.92%
# C@A@FE
FCN- 16 (Dropout�50)
0.4482
85.85%
8F7D
Mter 4G;>6;@9
building 3a FAF3>
total A8
of 7>7H7@
eleven ?A67>E
models :
5 + %7F 2 )79%7F
U-Net, SegNet, 2 #
FCN-8
0.4860
86.03%
%
FCN- 16, 3@6
and 2 %
FCN-8, I7
we 53@
can 5A@5>G67
conclude F:3F F:7 #A@9
that the Long + %7F
U-Net #C@A@FE
FCN-8 (Dropout� 50)
0.4216
87%
I;F:
with DABAGF
Dropout=0 . 5 B7D8AD?76
performed F:7
the 47EF
best, 43E76
based A@
on F:7
the ?73@
mean
@F7DE75F;A@AH7D+@;A@?
Intersection over Union (mio A+7H3>G3F;A@?7FD;5I;F:?
U) evaluation metric, with mio A+A8
U of
0.
573 1 3EE:AI@;@*34>7
, as shown in Table II,3@6;9
and Fig.7.*:7EGB7D;AD;FKA8F:;E
The superiority of this a
?A67>
model >;7Elies ;@
in F:7
the 67BF:
depth A8of ;FE
its 3D5:;F75FGD7
architecture. *:7
The DABAGF
Dropout
D79G>3D;L3F;A@
regularization I3Ewas GE76
used, I:;5:
which ;?BDAH76
improved F:7
the 97@7D3>;L3F;A@
generalization A8of
the model, and because the model converged at the 1 ooth7BA5:
F:7?A67>3@64753GE7F:7?A67>5A@H7D9763FF:7 epoch,
G@>;=7AF:7D?A67>EF:3F5A@H7D9763F3?G5:E?3>>7D7BA5:
unlike other models that converged at a much smaller epoch.
"
065 - boo
h
0 611
055
�" .?
"
..�
""' ,-.... "' ¥' .p.
.:,�"..
�·f" �.f
"''" .t""' �/' <I' ,l
-5-# ��o<f .§f �/
-
�
""' ,
4'� ��-- {? .:t
.� ,.;"'
����- �.;!' "'t• #�
"'•" .._,? ...p�
,p'"
:8
Fig.8.%6C7@C>2?46"62DFC6D>@* Score. and Dice Coefficient)
Performance Measures (mloU. F l(4@C62?5:46@677:4:6?E
*:7E7@E;F;H;FK3@3>KE;EI7B7D8AD?764K4G;>6;@98;H7+
The sensitivity analysis we performed by building five U-Net %7F
?A67>EI3EFA8;@6F:747EFB7D8AD?;@9+
models was to find the best performing U-Net %7F?A67>
model. -78;DEF
We first
!pDdt
;?B>7?7@F76
implemented F:7the AD;9;@3>
original + %7F ?A67>
U-Net model I;F:
with (7#+
ReLU. -7
We 4G;>F
built
FIAAF:7D?A67>EI;F:E?3>>7D873FGD75:3@@7>EA@7I;F:(7#+
two other models with smaller feature channels, one with ReLU
:8
Fig.7.)96>@*@7E96!@?8*#6E>@56=H:E9C@A@FE
The mloU of the Long U-Net model with Dropout�0.5
3@6F:7AF:7DI;F:#73=K(7#+3EF:7;D35F;H3F;A@8G@5F;A@E
and the other with Lea�-y ReLU as their activation functions. F At
*AA4F3;@34;997D6;887D7@57F:7?A67>3D5:;F75FGD7@776EFA
To obtain a bigger difference, the model architecture needs to F:;E
this EF397
stage, I7
we 5AG>6
could @AF
not E77
see 3
a 6;887D7@57
difference ;@
in F7D?E
terms A8
of ? A+
mioU;
5:3@97
change .-74G;>FFIA677B7D?A67>EI;F:FIA>3K7DE36676A@
We built two deeper models with two layers added on ;@EF736 F:7
instead, the 355GD35;7E
accuracies 8AD F:7 ?A67>E
for the models I;F:
with E?3>>7D
smaller 873FGD7
feature
735:
each B3F:
path #A@9
(Long + %7F I;F:
U-Net), with 6;887D7@F
different 6DABAGF
dropout D3F7E
rates, I:;5:
which
5:3@@7>E
channels I7D7
were >AI7D
lower, 3@6
and F:7
the >AEE7E
losses I7D7
were :;9:7D
higher F:3@
than F:7
the
47FF7D>73D@76F:7EAB:;EF;53F76873FGD7E;@F:763F3E7F
better learned the sophisticated features in the dataset.*:7FIA
The two
AD;9;@3>+ %7F
original U-Net.
?A67>E
models E5AD76
scored :;9:7D
higher ? A+ F:3@
mioU than F:7
the BD7H;AGE
previous ?A67>E
models, I;F:
with 3
a
6DABAGF D3F7 A8
dropout rate of 0 .
5 B7D8AD?;@9
performing theF:7 47EF
best. *A
To ?3=7
make 3a 83;D
fair @ *34>7
In Table III, I7 :3H7 F:D77
we have three E3?B>7E
samples A8
of ;?397E
images I;F:
with 6;887D7@F
different
5A?B3D;EA@I;F:F:7F:D77?A67>E)79%7F%
comparison with the three models: SegNet, FCN- 1 3@6%
6, and FCN- E57@3D;AE
scenarios 3@6
and >;9:F;@9
lighting 5A@6;F;A@E
conditions 3@6
and 5ADD7EBA@6;@9
corresponding 9DAG@6
ground
I7;?B>7?7@F76FIAH3D;3@FEA8735:A8F:7F:D77?A67>EF:7
8, we implemented two variants of each of the three models, the FDGF:>347>E3@6I7F7EF76735:;?397A@3>>7>7H7@?A67>E
truth labels, and we tested each image on all eleven models.
1371
1371
Authorized licensed use limited to: PES University Bengaluru. Downloaded on December 12,2023 at 15:24:38 UTC from IEEE Xplore. Restrictions apply.
TABLE III.
)! (SEGMENTATION
'
RESULTS "
(ALL
MODELS)
&D;9;@3>
Original
?397
Image
*DG7#347>
True Label
#A@9+ %7F
Long U-Net
DABAGF
Dropout=0.
5
)79%7F
SegNet
)79%7F
SegNet
DABAGF
Dropout=0.
5
%
FCN- 1
6
DABAGF
Dropout=0.
5
%
FCN-8
DABAGF
Dropout=0.
5
1372
1372
Authorized licensed use limited to: PES University Bengaluru. Downloaded on December 12,2023 at 15:24:38 UTC from IEEE Xplore. Restrictions apply.
-7
We 53@can E77
see F:3F
that ?AEF
most + %7F ?A67>E
U-Net models B7D8AD?76
performed 47FF7D
better ;@
in [6]
01 C. G. J. .Y. :<6
Bike 96?
Chen, >A@CE2?462H2C6
"Importance-aware D6>2?E:4 semantic D68>6?E2E:@?
segmentation 7@C for
autonomous G69:4=6D
2FE@?@>@FD vehicles," IEEE )C2?D24E:@?D
Transactions @? on ?E6==:86?E
Intelligent )C2?DA@CE2E:@?
Transportation
E79?7@F;@9
segmenting F:7
the ;?397E
images F:3@
than %
FCN- 1
6, %
FCN-8, 3@6
and )79%7F
SegNet
Systems, vol. 20,AA
(JDE6>DG@= pp. 137-148,
2019.
?A67>E
models.GFF:747EFE79?7@F3F;A@EI7D7B7D8AD?764KF:7#A@9
But the best segmentations were performed by the Long
[7] K. !
01 L. !:>
Lim, )T. C286Drage 2?5 and )T. CLF?=
Briiunl, >A=6>6?E2E:@?
"Implementation @7 of D6>2?E:4
semantic
+ %7F ?A67>
U-Net model, 3Eas 7JB75F76
expected 4K
by F:7
the @G?7D;53>
numerical 3@3>KE;E
analysis 3@6
and
segmentation 7@C
D68>6?E2E:@? for C@25
road 2?5and =2?6
lane 56E64E:@?
detection @? on 2?
an 2FE@?@>@FD
autonomous 8C@F?5 ground
5A?B3D;EA@
comparison. It FB7D8AD?76H7DKI7>>;@F:78;DEFFIA;?397E6G7
performed very well in the first two images due vehicle with LIDAR," in International Conference on Multisensor Fusion
G69:4=6H:E9!':??E6C?2E:@?2=@?76C6?46@?"F=E:D6?D@CFD:@?
FA
to F:7
the 9AA6
good >;9:F;@9
lighting 5A@6;F;A@E
conditions ;@
in 4AF:
both ;?397E
images. In @ F:7
the F:;D6
third and Integration for Intelligent Systems (MFI), 2017.
2?5?E68C2E:@?7@C?E6==:86?E(JDE6>D"
;?397
image, I7we 53@
can E77
see F:3F
that 7H7@
even F:AG9:
though F:7
the >;9:F;@9
lighting 5A@6;F;A@
condition ;Eis [8] (S . 96?
01 Chen, / Z. /92?8
Zhang, ' R. /9@?8
Zhong, ! L. /92?8
Zhang, H. "2
Ma 2?5
and ! L. !:F
Liu,
"A 6?D6
Dense
BAAD
poor, F:7
the #3D97
Large + %7F ?A67>
U-Net model B7D8AD?76
performed I7>>
well ;@
in E79?7@F;@9
segmenting feature AJC2>:5
762EFC6 pyramid ?6EH@C<32D65
network-based 566A deep =62C?:?8
learning >@56=
model 7@Cfor C@25
road >2C<:?8
marking
?AEFA8F:7A4<75FE7J57BF8ADF:74GEI:;5:53@47FD3576FA
most of the obj ects, except for the bus, which can be traced to instance D68>6?E2E:@?
:?DE2?46 segmentation FD:?8 using "!(
MLS A@:?E
point 4=@F5D
clouds,"
IEEE )C2?D24E:@?D
Transactions @? on
Geoscience and Remote Sensing, vol.AA
6@D4:6?462?5'6>@E6(6?D:?8G@= 59, pp.
784-800, 2021.
F:7
the >35=
lack A8
of FD3;@;@9
training ;?397E
images I;F:
with 4GEE7E
busses ;@in F:7?
them. *:7
The D76
red
[9]
01 C.!@HA92?D:C:<F=
Lowphansirikul, K.-S. ( Kim,
:>%P.+:?2J2C2;2?5(
Vinayaraj and S .)F2C@3
Tuarob, "3D Semantic
(6>2?E:4
E79?7@F3F;A@E;@EF736A8B;@=3D76G7FAF:74GEE:7;9:FI:;5:
segmentations instead of pink are due to the bus's height, which
segmentation @7
D68>6?E2E:@? of =2C86D42=6
large-scale A@:?E4=@F5D
point-clouds :? in FC32?
urban 2C62D
areas FD:?8
using 566A
deep
F:7?A67>E3D7?;EF3=7@8AD34G;>6;@9
the models are mistaken for a building. learning," :?
=62C?:?8 in 1 1th International @?76C6?46
E9 ?E6C?2E:@?2= Conference @? on Know ledge 2?5
?@H=6586 and (>2CE
Smart
Technology (KST),
)649?@=@8J Phuket, Thailand, 2019.
()%9F<6E)92:=2?5
V.
, CONCLUSIONS AND
$#!*($#(# *)*'
FUTURE ')$#(
DIRECTIONS
0[ 10]1 " M. H. 2>:2?
Hamian, A. 6:<>@92>>25:
Beikrnohammadi, A. 9>25:
Ahmadi 2?5 and B. #2D6CD92C:7
Nasersharif,
E
As F:7
the 36ABF;A@
adoption A8
of 3GFA@A?AGE
autonomous H7:;5>7E
vehicles I;F:
with 6;887D7@F
different "Semantic D68>6?E2E:@?
(6>2?E:4 segmentation @7 of 2FE@?@>@FD
autonomous 5C:G:?8 driving :>286D
images 3J by E96
the
combination @7
4@>3:?2E:@? of 566A
deep =62C?:?8
learning 2?5 and 4=2DD:42=
classical D68>6?E2E:@?
segmentation," :? in
>7H7>EA83GFA@A?K;@5D73E7EF:7@7768ADBD75;E73@6355GD3F7
levels of autonomy increases, the need for precise and accurate
$"$ CSI
International Computer
!%$" Con "
ference. CS ICC, 202 1 .
B7D57BF;A@EKEF7?E;@5D73E7E6D3EF;53>>KFA7@EGD7F:7E387FKA8
perception systems increases drastically to ensure the safety of
0[ 1 1]1 ,W.2a. B.J 2. a.,W .(S .2a.#
N .E./9@FFE@>2E656G2=F2E:@?@7D6>2?E:4
Zhou, "Automated evaluation o f semantic
F:7B3EE7@97DEB767EFD;3@E3@6F:7E387FKA8F:7EGDDAG@6;@9
the passengers, pedestrians, and the safety of the surrounding segmentation robustness for autonomous driving," in IEEE Transactions
D68>6?E2E:@?C@3FDE?6DD7@C2FE@?@>@FD5C:G:?8:?)C2?D24E:@?D
H7:;5>7EQ
vehicles' 6D;H7DE
drivers. 3E76
Based A@ on AGD
our 7JF7@E;H7
extensive 7JB7D;?7@FE
experiments on Intelligent Transportation Systems, 20 18.
@??E6==:86?E)C2?DA@CE2E:@?(JDE6>D
BD7E7@F76
presented ;@in F:;E
this BDA<75F
proj ect, I7
we 53@
can 5A@5>G67
conclude F:3F
that + %7F 53@
U-Net can 0[ 12]1 J.(S.!L.2a.)T. H.%2C<(6>2?E:4D68>6?E2E:@?H:E9:>AC@G65658656E2:=
Park, "Semantic segmentation with improved edge detail
for 2FE@?@>@FD
7@C autonomous G69:4=6Dvehicles," :? in
IEEE 16th International @?76C6?46
E9 ?E6C?2E:@?2= Conference @? on
BD75;E7>K
precisely 5>3EE;8K
classify 3@6
and >A53>;L7
localize 3a I;67
wide D3@97
range A8
of A4<75FE
obj ects ;@
in 3
a
Automation Science and Engineering (CASE), Hong Kong,
FE@>2E:@?(4:6?462?5?8:?66C:?8(@?8 China, 2020.
@?89:?2
5A?B>7J6D;H;@97@H;DA@?7@F3@6AGFB7D8AD?BD7H;AGE>KGE76
complex driving environment and outperform previously used
0[ 13]1 G. 96?8
Cheng, J. . Y. /96?8
Zheng, 2?5 and " M. Kilicarslan, "Semantic D68>6?E2E:@?
:=:42CD=2? (6>2?E:4 segmentation @7 of
I7>> =@AI@?A67>E;@F7D?EA8?
well-known models in terms of mio A+ )5AD73@6355GD35K
U, F 1 -Score, and accuracy . road AC@7:=6D
C@25 profiles 7@C for 677:4:6?E
efficient D6?D:?8
sensing :? in 2FE@?@>@FD
autonomous 5C:G:?8
driving," :? in
IEEE
@F:78GFGD7I7I;>>FD3;@F:7+
In the future, we will train the U-Net%7F?A67>EI;F:6;887D7@F63F3
models with different data Symposium on Intelligent Vehicle, 2019.
(J>A@D:F>@??E6==:86?E+69:4=6
3G9?7@F3F;A@
augmentation F75:@;CG7E
techniques I;F:
with 47FF7D
better 5A?BGF;@9
computing BAI7D
power. *ATo 0[ 14]
1 !L.2a.. Y. K.2a. H.- X.2a.,W. H.2a.,W. K.(F?'62=E:>67FD:@??6EH@C<7@C
Sun, "Real-time fusion network for
F35=>7F:7BDA4>7?A8BAAD>KE79?7@F;@957DF3;@5>3EE7EEG5:
tackle the problem of poorly segmenting certain classes, such RGB-D D6>2?E:4
' semantic D68>6?E2E:@?
segmentation :?4@CA@C2E:?8
incorporating F?6IA64E65
unexpected @3DE24=6
obstacle
detection 7@C
56E64E:@? for C@255C:G:?8
road-driving :>286D
images," IEEE '@3@E:4D
Robotics 2?5 and FE@>2E:@?
Automation
3EF:74GE;@F:7F:;D6;?397ADF:7FD388;5BA>7EI7I;>>;@5D73E7
as the bus in the third image or the traffic poles, we will increase
Letters, vol.?@
!6EE6CDG@= 5, no.
4, 2020.
F:7I7;9:FEA@EG5:5>3EE7E3@6675D73E7F:7I7;9:FEA8AF:7D
the weights on such classes and decrease the weights of other
0[ 11
5] $ 0. 2a. F. %P. 2a.
B. )T. '@??636C86C
Ronneberger, *#6E"U-Net: @?G@=FE:@?2=
Convolutional ?6EH@C<D
networks 7@C for
>7EE
less @757EE3DK
necessary 5>3EE7E
classes FAto ;?BDAH7
improve F:7the ?A67>E
model's AH7D3>>
overall biomedical image segmentation," LNCS.
3:@>65:42=:>286D68>6?E2E:@? G@= vol.935 1,AA
pp. 234-24
1, 20 15 .
B7D8AD?3@57
performance .-7I;>>3>EAFD3;@3@6F7EF6;887D7@FH3D;3F;A@EA8
We will also train and test different variations of 0[ 16]
1 + V. 25C:?2C2J2?2?
Badrinarayanan, A . Kendall
6?52== 2?5and ' R . :A@==2
Cipolla, (68#6E
"SegNet: A 566A
deep
F:7+
the U-Net%7F?A67>A@>3D97D63F3E7FE3@65A?B3D7;FI;F:AF:7D
model on larger datasets and compare it with other convolutional 6?4@56C564@56C
4@?G@=FE:@?2= encoder-decoder 2C49:E64EFC6
architecture 7@C for :>286
image D68>6?E2E:@?
segmentation,"
EF3F7 A8 F:7 3DF E7?3@F;5
state-of-the-art semantic E79?7@F3F;A@
segmentation ?A67>E On F:7
models. &@ the AF:7D
other in IEEE"#$
:? Transactions # on$$"
Pattern A&## $G@=
nalysis and Machine Intelligence, vol.
39,
?@no. 12,AA pp. 2481-2495,
164 Dec. 2017.
:3@6
hand, I7 we I;>>
will ;?B>7?7@F
implement 7@E7?4>7
ensemble >73D@;@9
learning 3>9AD;F:?E
algorithms 4Kby
0[ 17]
1 J.96?8!
Cheng, L..6. Ye, Y.F@
Guo, J./92?82?5 An, "Ground crack recognition
Zhang and H.?C@F?54C24<C64@8?:E:@?
5A?4;@;@9?G>F;B>7EF3F7 A8 F:7 3DF?A67>EFA35:;7H7F:747EF
combining multiple state-of-the-art models to achieve the best
based @?
32D65 on 7F==J
fully 4@?G@=FE:@?2=
convolutional ?6EH@C<
network H:E9with >F=E:D42=6
multi-scale :?AFE
input," :?in
IEEE
B7D8AD?3@57BAEE;4>7
performance possible.-7I;>>3>EAF3=736H3@F397A8G@>347>76
We will also take advantage of unlabeled Access, vol.AA
446DDG@= 8, pp.53034-53048,
2020.
63F3E7FE
datasets FA
to FD3;@
train G@EGB7DH;E76
unsupervised >73D@;@9
learning 3>9AD;F:?E
algorithms, 3@6
and F:AE7
those 0[ 11
8] K c;. Kaymak
2J>2<2?5 and A.*M2C(6>2?E:4:>286D68>6?E2E:@?7@C2FE@?@>@FD
U9ar, "Semantic image segmentation for autonomous
?A67>E
models 53@can 3GFA?3F;53>>K
automatically >73D@
learn 5A?B>7J
complex DA36
road 873FGD7E
features I;F:
with driving using fully convolutional networks," 2019
5C:G:?8FD:?87F==J4@?G@=FE:@?2=?6EH@C<D International Artificial
?E6C?2E:@?2=CE:7:4:2=
Intelligence and Data Processing Symposium (IDAP), 2019,
?E6==:86?462?52E2%C@46DD:?8(J>A@D:F>% pp. 1-8.
AA
minimal human input.
?;@;?3>:G?3@;@BGF
0[ 19]
1 J.!@?8
Long, E.(96=92>6C2?5)
Shelhamer and T.2CC6==F==J4@?G@=FE:@?2=?6EH@C<D7@C
Darrell, "Fully convolutional networks for
semantic segmentation," 2015 IEEE Conf
D6>2?E:4D68>6?E2E:@? "
erence on Computer
!%$"# Vision
and
$$"
Pattern Recognition $ (CV PR), 20 15,
AApp. 343
1-3440.
('#(
REFERENCES
0[20]1 (68>6?E2E:@?
Segmentation 2?5 and '64@8?:E:@?
Recognition *D:?8 Using (ECF4EFC6
Structure 7C@>from "@E:@?
Motion %@:?EPoint
Clouds,
=@F5D ECCV
+ 2008
0[ 11] H. Kim,
:>. Y.!66
Lee, B..:>
Yim, E.%2C<2?5
Park and H. Kim, "On-road object detection
:>$?C@25@3;64E56E64E:@? Brostow, Shotton, Fauqueur, Cipolla.
C@DE@H(9@EE@?2FBF6FC:A@==2
using 566A
FD:?8 deep ?6FC2=
neural ?6EH@C<
network," :? in
IEEE ?E6C?2E:@?2=
International @?76C6?46
Conference @? on 0[21]1 Karami,
2C2>: /Z., & Kashef,
2D967 ' R. (2020). Smart EC2?DA@CE2E:@?
(>2CE transportation A=2??:?8
planning: 2E2
Data,
Consumer Electronics-Asia (ICCE-Asia), Seoul, Korea
@?DF>6C=64EC@?:4DD:2D:2(6@F= (South), 2016.
@C62(@FE9 models, and algorithms."#!
>@56=D2?52=8@C:E9>D "$$ Engineering,
Transportation " 2, 1000 13.
0[2]1 J. .2?8
Yang, C. ,2?8
Wang, H. ,2?8
Wang 2?5and & Q. !:
Li,
"A '
RGB-D 32D65
based C62=E:>6
real-time 0[22]1 Kashef,
2D967 ' R. (2021). A 3@@DE65
boosted (+" SVM 4=2DD:7:6C
classifier EC2:?65
trained 3J by :?4C6>6?E2=
incremental
multiple @3;64E
>F=E:A=6 object 56E64E:@?
detection 2?5
and C2?8:?8
ranging DJDE6>
system 7@C
for 2FE@?@>@FD
autonomous 5C:G:?8
driving," learning 2?5
=62C?:?8 and 564C6>6?E2=
decremental F?=62C?:?8
unlearning 2AAC@249
approach.IA6CE
Expert (JDE6>D
Systems H:E9with
IEEE Sensors Journal, vol. 20,AA
(6?D@CD@FC?2=G@= pp. 1 1959-1
1966, 2020. Applications, 167,
AA=:42E:@?D 1 14 154.
0[3]1 G.%C2392<2C
Prabhakar, B. Kailath,
2:=2E9(S.#2E2C2;2?2?5'
Natarajan and R. Kumar, "Obstacle detection
F>2C$3DE24=656E64E:@? 0[23]1 Kashef,
2D967 ' R. (2021, AC:=
April). (42EE6C:?832D65
Scattering-based &F2=:EJ Quality "62DFC6D
Measures. ? In 2021
and 4=2DD:7:42E:@?
2?5 classification FD:?8
using 566A
deep =62C?:?8
learning 7@C for EC24<:?8
tracking :?in 9:89DA665
high-speed IEEE ?E6C?2E:@?2=
International $) lOT, =64EC@?:4D
Electronics 2?5 and "6492EC@?:4D
Mechatronics @?76C6?46
Conference
autonomous 5C:G:?8
2FE@?@>@FD driving," :?
in
IEEE '68:@?
Region 10 (J>A@D:F>
Symposium )#(."%
(TENSYMP), (IEMTRONICS) (pp. 1-8).
")'$#(AA IEEE.
Cochin, India, 20 17.
@49:??5:2 0[24]
1 Kashef,
2D967' R.(2020). Enhancing the Role of Large-Scale Recommendation
?92?4:?8E96'@=6@7!2C86(42=6'64@>>6?52E:@?
01
[4] " M. $0. (S. '
R. )T. '
R. "
M.
E. '
R.
B. *U. F. (S. '
R.
B. (S. "2C:FD
Marius @C5ED
Cordts, )96
"The Systems in the loT Context.446DD
(JDE6>D:?E96@)@?E6IE IEEE Access, 8, 178248-178257.
Cityscapes 52E2D6E
:EJD42A6D dataset 7@C
for D6>2?E:4
semantic F32?uban D46?6
scene F?56CDE2?5:?8
understanding," :? in
IEEE
0[25] Nawara,
1 #2H2C2 D.,
& Kashef,
2D967 ' R. (2021). Context-Aware '64@>>6?52E:@?
@?E6IEH2C6 Recommendation
Conference on Computer Vision and Pattern Recognition (CVPR), 2016.
@?76C6?46@?@>AFE6C+:D:@?2?5%2EE6C?'64@8?:E:@?+%' Systems :?
(JDE6>D in )96
The @)loT ?G:C@?>6?E
Environment @)'(O
(IoT-CARS)-A @>AC696?D:G6
Comprehensive
J.=6C<"2IH6==)C62E:D6@?=64EC:4:EJ2?5"28?6E:D>
Clerk Maxwell, A Treatise on Electricity and Magnetism, 3rd ed.,
C565 vol.
G@= Overview.446DD
IEEE Access.
$G6CG:6H
2.$I7@C5=2C6?5@?
Oxford: Clarendon, 1892, AApp.68-73.
O
[5] (S. !
01 L. #
N. 0. .
E. $ Y.G Naresh,
#2C6D9 "A '6D:5F2=
Residual 6?4@56C564@56C
encoder-decoder ?6EH@C<
network 7@C
for
semantic D68>6?E2E:@?
D6>2?E:4 segmentation :? in 2FE@?@>@FD
autonomous 5C:G:?8driving D46?2C:@D
scenarios," :?
in FC@A62?
European
Signal Processing Conference (EUSIPCO), 20 18.
(:8?2=%C@46DD:?8@?76C6?46*(%$
1373
1373
Authorized licensed use limited to: PES University Bengaluru. Downloaded on December 12,2023 at 15:24:38 UTC from IEEE Xplore. Restrictions apply.