0% found this document useful (0 votes)

24 views8 pages

Paper 1

Uploaded by

giribabukande

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

24 views8 pages

Paper 1

Uploaded by

giribabukande

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 8

Engineering Applications of Artificial Intelligence 127 (2024) 107260

Contents lists available at ScienceDirect

Engineering Applications of Artificial Intelligence

journal homepage: www.elsevier.com/locate/engappai

Image semantic segmentation approach based on DeepLabV3 plus network

with an attention mechanism
Yanyan Liu a, Xiaotian Bai b, Jiafei Wang a, Guoning Li b, **, Jin Li c, *, Zengming Lv b
a
Department of Electronics and Information Engineering, Changchun University of Science and Technology, Changchun, 130022, China
b
Changchun Institute of Optics, Fine Mechanics and Physics, Chinese Academy of Sciences (CIOMP), Changchun, 130033, China
c
School of Instrumentation and Optoelectronic Engineering, Beihang University, Beijing, 100191, China

A B S T R A C T

Image semantic segmentation is a technique that distinguishes different kinds of things in an image by assigning a label to each point in a target category based on its
"semantics". The Deeplabv3+ image semantic segmentation method currently in use has high computational complexity and large memory consumption, making it
difficult to deploy on embedded platforms with limited computational power. When extracting image feature information, Deeplabv3+ struggles to fully utilize
multiscale information. This can result in a loss of detailed information and damage to segmentation accuracy. An improved image semantic segmentation method
based on the DeepLabv3+ network is proposed, with the lightweight MobileNetv2 serving as the model’s backbone. The ECAnet channel attention mechanism is
applied to low-level features, reducing computational complexity and improving target boundary clarity. The polarized self-attention mechanism is introduced after
the ASPP module to improve the spatial feature representation of the feature map. Validated on the VOC2012 dataset, the experimental results indicate that the
improved model achieved an MloU of 69.29% and a mAP of 80.41%, which can predict finer semantic segmentation results and effectively optimize the model
complexity and segmentation accuracy.

1. Introduction methods (Boykov et al., 2001; Plath et al., 2009). To compensate for the
lack of traditional methods, the semantic segmentation methods of deep
The emergence of artificial intelligence (AI) has dramatically learning mainly have two types of classification from the model struc
changed every aspect of our lives. The concept of semantic segmentation ture: based on information fusion and based on coder-decoder(Minaee
is easy to understand. When people see a picture, it is easy to understand et al., 2021). Based on the information fusion method, the model utili
the content of the picture. Semantic segmentation allows the machine to zation is improved by increasing the number of layers of the network
understand the content of the picture. The application, in reality, is also (Starck et al., 2005; Minaee et al., 2017). The representative algorithms
increasingly extensive, for example, scene recognition of automatic include the full convolutional network (FCN) algorithm and a series of
driving technology, for surgical navigation in the field of medical image improved algorithms (Biao et al., 2018), such as FCN–32S, FCN–16S,
segmentation, and advertising recommendations. The wide application and FCN–8S. Based on the coder-decoder method (Liu et al., 2018; Fu
of image semantic segmentation has high practical value (Iftikhar et al., et al., 2022), the accuracy of the network is improved by adopting
2022, 2023). different backbone network forms and pyramid pooling modules. The
To date, many different semantic segmentation algorithms have been representative algorithms include the pyramid scene parsing network
proposed, including traditional and deep learning semantic segmenta (PSPNet)(Sun and Wang, 2018) and DeepLabv series. The current
tion. From the traditional methods, such as threshold (Otsu, 1979), method based on Deeplabv3+ has high computational complexity and
histogram-based bundling, region-grow (Nock and Nielsen, 2004), large memory consumption, and it is difficult to deploy on embedded
k-means clustering (Dhanachandra et al., 2015), and watersheds (Naj platforms with limited computational power. Deeplabv3+ cannot fully
man et al., 1994), to more advanced algorithms such as active contours utilize the multiscale information when extracting the image feature
(Dhanachandra et al., 2015), graph cut (Najman et al., 1994), condi information, and it is easy to cause the loss of detail information and
tional and Markov random fields (Kass et al., 2004), and sparsity-based lead to damage of segmentation accuracy. To further improve the ability

* Corresponding author.
** Corresponding author.
E-mail addresses: [email protected] (Y. Liu), [email protected] (X. Bai), [email protected] (J. Wang), [email protected]
(G. Li), [email protected] (J. Li), [email protected] (Z. Lv).

https://doi.org/10.1016/j.engappai.2023.107260
Received 3 May 2023; Received in revised form 15 September 2023; Accepted 3 October 2023
Available online 10 October 2023
0952-1976/© 2023 Elsevier Ltd. All rights reserved.
Y. Liu et al. Engineering Applications of Artificial Intelligence 127 (2024) 107260

Fig. 1. Deeplabv3 plus model.

Fig. 2. Improved DeepLabv3 plus.

of the DeepLabv3 plus network to obtain key category information, 2. DeepLabv3 plus network
improvements are mainly made based on DeepLabv3 plus. The main
contributions of this paper are summarized as follows. The DeepLabv3 plus network (Yang et al., 2020) is shown in Fig. 1.
The role of the backbone network is to extract feature semantic infor
1. The DeepLabv3+ network is improved to make it suitable to fit the mation (Zhao et al., 2017). The function of ASPP is to extract feature
needs of realistic scenarios. The original feature extraction network information from the backbone network again to obtain sufficient
parameter amount is too large, and the model adopts the lightweight feature information. DCNN is generally a deep convolutional neural
MobileNetV2 as the backbone network, based on which it is further network. The ASPP module is mainly composed of 5 parts, 1 × 1
optimized to solve the problems of spatial detail loss and insufficient Convolution and void ratio are 6, 12, and 18 times, respectively 3 × 3
feature extraction. Convolution and global average pooling. These five parts are in parallel
2. In DeepLabv3+, the polarized self-attention mechanism (PSA-P, and together constitute the ASPP part. Backbone network low-level
PSA-S) is added after the ASPP module to increase the ability of the feature postaccess 1 × 1. The convolution and ASPP are then con
feature map to extract detailed information to improve the accuracy nected to the 4 times downsampling part for feature fusion and then
performance of semantic segmentation. A channel attention mech connected to the 3 × 3 convolution and 4 times downsampling to
anism (ECA-Net) is added after the MobileNetv2 low-level features to recover the size of the image.
recover clearer segmentation boundaries.
3. Stripe pooling is utilized in the ASPP module instead of the original 3. Improved DeepLabv3 plus network
global average pooling to effectively capture long-range de
pendencies, and hybrid pooling is utilized instead of the original The DeepLabv3 plus model is taken as the main body for improve
global average pooling to effectively capture short-range and long- ment. In image semantic segmentation based on the DeepLabv3 plus
range interdependencies between different locations, thus network, this paper uses lightweight MobileNetV2 as the backbone
improving the efficiency and reliability of the system. network. Then, ASPP is used to extract multiscale information from the

2
Y. Liu et al. Engineering Applications of Artificial Intelligence 127 (2024) 107260

Fig. 3. Structure of strip pooling.

Fig. 4. PSA in parallel.

Fig. 5. PSA in series.

3
Y. Liu et al. Engineering Applications of Artificial Intelligence 127 (2024) 107260

improve image segmentation performance. The improved model is

shown in Fig. 2.

3.1. Strip pooling

The pooling window of global average pooling is square, which has

certain limitations, and it is difficult to obtain the correlation of graph
scales in different directions. Strip pooling has more advantages than
global average pooling. The pooling window of strip pooling is rectan
gular, and the design of strip pooling can obtain global information from
Fig. 6. ECA-Net diagram. horizontal and vertical dimensions, expanding the scope of obtaining
feature information (Hou et al., 2020).
Different from the global average pooling calculation method, strip
Table 1
pooling is performed simultaneously according to the horizontal and
Comparison results of ASPP improvement experiments. vertical spatial dimensions. In addition, when two spatial dimensions
are pooled, the eigenvalues of a column or row are weighted averages.
Algorithm Backbone MloU mAP
The model structure is shown in Fig. 3 below.
Deeplabv3 plus MobileNetV2 66.16% 78.75% For the input image, the calculation formula of the row vector output
Deeplabv3 plus-SP 67.6% 78.6%
is as follows:
1 ∑
yhi = Xi,j (1)
W 0≤j<w
Table 2
Comparison of different attention mechanisms.
The calculation formula of the column vector output is as follows:
Backbone Attention MloU mAP
1 ∑
MobileNetV2 ECA-Net 66.95% 79.64% yvi = Xi,j (2)
H 0≤i<H
MobileNetV2 PSA_p 67.3% 80.34%
MobileNetV2 PSA_s 67.74% 81.3%
For an input X ∈ RC×H×W , where C refers to the number of channels, H
and W represent the height and width, respectively. X enters the hori
zontal and vertical paths for pooling, and the outputs in the vertical and
Table 3
Comparison of network segmentation accuracy by integrating different modules.
horizontal directions are yh ∈ RC×H and yv ∈ RC×W , respectively. After
combining the two, the output is calculated as follows:
Group SP PSA_p PSA_S ECA-Net MloU MAP

① × × × × 66.16% 78.75% yc,i,j = yhc,j + yvc,j (3)

② ✓ ✓ × × 68.67% 80.34%
③ ✓ × ✓ × 69.05% 79.65% The convolution and sigmoid function will obtain the characteristic
④ ✓ ✓ × ✓ 68.74% 79.01% image, which will be fused with the original image to obtain the output
⑤ ✓ × ✓ ✓ 69.77% 79.29% z. The output z calculation formula is:
z = Scale(X, σ (f (y))) (4)
feature maps obtained in the backbone network while using strip
pooling instead of global pooling to retain more detailed information. In the above formula, scale () represents multiplication, σ represents the
Introduce the attention mechanism and add a polarization self-attention sigmoid function, and f represents 1 × 1 convolution.
mechanism to weigh the feature maps obtained by the ASPP module.
ECA-Net was added to fuse shallow features of MobileNetV2 and

Fig. 7. Comparison chart of category segmentation accuracy.

4
Y. Liu et al. Engineering Applications of Artificial Intelligence 127 (2024) 107260

Fig. 8. Comparison of PASCAL VOC 2012 dataset segmentation results.

3.2. Polarized self-attention mechanism effect. This paper mainly adds polarization self-attention and channel
attention mechanisms to the DeepLabv3 plus network. The two attention
We are all familiar with the concept of attention (Zeng et al., 2020). mechanisms are added at different locations in the network, and both
People cannot pay attention to the whole picture when they watch a show good performance.
picture. It must be that the eyes tend to be more interested in the part of The polarized self-attention mechanism (Hridoy et al., 2021; Liu
the painting, and people will ignore the part that they are not interested et al., 2021) has two main forms, series and parallel. The serial form
in. Based on such characteristics, the attention mechanism in the neural refers to the serial form of the channel self-attention mechanism and
network takes advantage of this, that is, to screen out effective infor spatial self-attention mechanism. The parallel form refers to the parallel
mation from complex information (Chen et al., 2017a). For image pro form of the channel self-attention mechanism and spatial self-attention
cessing, the target will be locked in one part of the image while ignoring mechanism. The two ways together constitute the polarized
other areas, which can improve the efficiency of image processing and self-attention mechanism. After inserting the polarization self-attention
save unnecessary trouble. With the rapid development of attention mechanism into the ASPP module (Yang, 2020; Zhu et al., 2019), the
mechanisms, an increasing number of neural network models have model can increase the extraction of important information and improve
added attention mechanisms (Zhang et al., 2020; Honarbakhsh et al., the utilization of the model. PSA_p and PSA_s can maintain high reso
2023) to improve the efficiency of the model, which has shown a good lution in the channel and spatial dimensions, which is why they are

5
Y. Liu et al. Engineering Applications of Artificial Intelligence 127 (2024) 107260

increasingly widely used in deep learning networks. The model diagram

is shown in Figs. 4 and 5 below. 1 ∑ k
Pii
MIoU = (7)
The series and parallel forms of the polarization self-attention K + 1 i=0 ∑
k
Pij +
∑k
Pij − Pii
mechanism are formally divided into two branches: channel branches j=0 j=0

and space branches.

The channel weight calculation formula is as follows: 1 ∑ K
pii
mPA = ∑ (8)
[ (( ( ( ))))] K + 1 i=0 Kj=0 pij
ch
A (X) = FSG Wz|θ1 σ1 (Wv (X)) × FSM σ2 Wq (X) (5)

where and σ 1 σ 2 represent the 1 × 1 convolution. FSM represents the

softmax function part. WZ|θ1 Representing 1 × 1 convolution and LN 4.3. Experimental comparison
elevates the dimension of C/2 on the channel to C. FSG represents the
sigmoid function. The algorithm proposed in this paper is based on the original
The spatial weight calculation formula is as follows: DeepLabv3 plus model (Sun et al., 2019; Badrinarayanan et al., 2017).
[ ))) ))] The ASPP module is redesigned, and an attention mechanism is intro
sp
A (X) = FSG σ 3 (FSM (σ 1 (FGP (Wq (X) × (X) (6) duced to make the shallow and deep features of the model pay more
attention to important semantic information(He et al., 2016; Chen et al.,
where σ 1 σ 2 and σ 3 represent the 1 × 1 convolution. FSM represents the 2017b, 2018; Sehar and Naseem, 2022)-(He et al., 2016; Chen et al.,
softmax function. FGP represents global pooling. FSG Represents the 2017b, 2018; Sehar and Naseem, 2022). The fitting effect can be ach
sigmoid function. ieved by training the algorithm for 100 epochs using the Adam network
The above formula shows the calculation formula for two branch model optimizer. The training was divided into two phases: the freezing
weights. The polarization self-attention mechanism is fused based on the phase and the unfreezing phase. A learning rate of 0.005 is used in the
branching weight. Parallel and series are just two simple calculations for freezing phase, and the batch size is set to 8. A learning rate of 0.0005 is
shunt weights, similar to addition and multiplication. used in the unfreezing phase, and the batch size is set to 4. To prevent
overfitting, the weight decay rate is set to 0.005. Epoch refers to the
3.3. ECA attention mechanism process of all the data entering the network to complete the forward
computation and backpropagation once, and the number of epochs is set
The advantage of ECA-Net (Liu, 2020) is that it utilizes global to 100, with 50 rounds in the freezing phase and 50 rounds in the un
pooling to transform spatial matrices into one-dimensional vectors .(see freezing phase. Phase of 50 rounds and the unfreezing phase of 50
Fig. 6) Then, the size of the one-dimensional convolutional kernel can be rounds. Before and after improvement. This article adopts the MloU and
obtained based on the number of network channels. Then, an adaptive MAP evaluation index system and conducts ASPP module optimization,
size convolution kernel is used for the convolution operation, and the attention mechanism addition, and mutual fusion experiments on
feature map of the input image is obtained through a weighted form. PASCAL VOC2012 to verify the performance of the model.
Finally, the input image is multiplied by the feature map obtained after
convolution calculation to extract the information of interest. Due to the 4.3.1. ASPP improvement experiment
pretraining method of the backbone network adopted by the network, The stripe pooling module (SP) is introduced in the ASPP module,
inserting ECA-Net into MobileNetV2 damages the network structure of where Deeplabv3 plus-sp represents using stripe pooling instead of
the backbone network. Therefore, inserting ECA-Net into the shallow global pooling in the ASPP module. To demonstrate the applicability of
features of MobileNetV2 can improve the segmentation effect without stripe pooling, MloU improved the DeepLabv3 plus network by 1.09%
damaging the network. before and after improvement. As shown in Table 1 below.

4. Experiments 4.3.2. Introduction of different attention experiments

Based on the MobileneV2 backbone network and ASPP module,
4.1. Datasets different attention mechanisms are introduced. The polarization self-
attention mechanism in series and parallel forms was introduced after
The PASCAL VOC2012 dataset is widely used and can be effectively the ASPP module. ECA-Net is introduced after the shallow layer of
utilized in the field of image processing. A dataset that can be used for MobileneV2. MloU increased by 0.79% after joining PSA and ECA-Net.
image semantic segmentation. There are four main types in this dataset: PSA_s has a better performance than PSA_p. In particular, MloU
indoor furniture, people, vehicles, and common animals. There are 21 increased by 1.68% after adding PSA_s. As shown in Table 2 below.
categories in four categories, and 3200 images are randomly selected
and divided into 9:1:1. A total of 2616 images are used as the training 4.3.3. Comparative experiments of different models
set, 292 images are used as the validation set, and 292 images are used as To demonstrate the effectiveness of the stripe pooling module, po
the testing set. larization self-attention mechanism module, and ECA Net module and to
verify the accuracy of the improved algorithm, five control experiments
were established. Among them, ① refers to the DeepLabv3 plus network.
4.2. Experimental equipment and evaluation indicators ② It refers to changing the global average pooling to stripe pooling in
the ASPP module of DeepLabv3 plus and adding a polarization self-
The operating system is Ubuntu 20.04, using the Python 1.2.0 deep attention mechanism in parallel after the ASPP module. It refers to
learning open source framework and CUDA version 10.0. The pro changing global pooling to stripe pooling in the ASPP module. Deep
gramming language is Python 3.6, and the hardware configuration is as Labv3 plus, and adding a polarization self-attention mechanism in a
follows: The CPU is i7-9600, and the GPU is NVIDIA 3060-Ti. The concatenated form after the ASPP module. ④ It refers to changing global
average intersection to union ratio (MloU) and average pixel accuracy pooling to stripe pooling in the ASPP module of DeepLabv3 plus, adding
(mAP) are used as performance evaluation coefficients for image se a parallel form of polarization self-attention mechanism after the ASPP
mantic segmentation. Where k represents k categories, Pij indicates that module, and adding the ECA-Net module after the shallow features of
the true value is i and the predicted value is j; Pji indicates that the true MobileneV2. ⑤ It refers to changing global pooling to stripe pooling in
value is j, the predicted value is i, and Pii indicates that the true and the ASPP module of DeepLabv3 plus, adding a concatenated form of the
predicted values are i. The calculation formulas for MloU and mPA are: polarization self-attention mechanism after the ASPP module, and

6
Y. Liu et al. Engineering Applications of Artificial Intelligence 127 (2024) 107260

adding the ECA-Net module after the shallow features of MobileneV2. CRediT authorship contribution statement
Table 3 compares ① and ② and ① and ③ of table. By using stripe
pooling instead of global average pooling and introducing a polarization Yanyan Liu: Conceptualization, Methodology, Experiments. Xiao
self-attention mechanism, Mlou improved by 2.51% and 2.89%, tian Bai: Experimental results analysis, Writing – review & editing.
respectively. Compare ① and ④, ① and ⑤ of the table. In the ASPP Jiafei Wang: Conceptualization, Methodology, Experiments. Guoning
module, stripe pooling replaces global average pooling, and the polari Li: Supervision. Jin Li: Supervision, Writing – review & editing. Zen
zation self-attention mechanism and ECA-Net are introduced, resulting gming Lv: Experimental results analysis.
in increases of 2.58% and 3.61% in MloU, respectively. By analyzing the
above table, it has been verified that all modules have played a role, and
all the improvements mentioned above can greatly improve the accu Declaration of competing interest
racy of the algorithm.
The authors declare that they have no known competing financial
4.4. Comparison of segmentation results for different categories interests or personal relationships that could have appeared to influence
the work reported in this paper.
The most important evaluation indicator for accuracy in semantic
segmentation is the average intersection-to-union ratio, which can be Data availability
seen from the graph among the 21 categories. The modified model only
has 6 categories that are lower than the original algorithm, and the No data was used for the research described in the article.
accuracy of the 6 lower categories is not significantly different from the
original algorithm. The remaining 15 categories are all higher than those References
of the original algorithm. Especially for categories such as houses, dogs,
cats, trains, sheep, etc., showing better advantages. After adding the Badrinarayanan, V., Kendall, A., Cipolla, R., 2017. Segnet: a deep convolutional encoder-
decoder architecture for image segmentation[J]. IEEE Trans. Pattern Anal. Mach.
attention mechanism, the accuracy of key categories is improved, which
Intell. 39 (12), 2481–2495. https://doi.org/10.1109/TPAMI.2016.2644615.
can to some extent improve the accuracy of the original algorithm. The Biao, W., Yali, G., Qingchuan, Z., 2018. Research on Image Semantic Segmentation
category segmentation results are shown in Fig. 7. Algorithm Based on Fully Convolutional HED-CRF[C]//2018 Chinese Automation
Congress (CAC). IEEE, pp. 3055–3058. https://doi.org/10.1109/
To see the effects before and after the improvement more clearly, the
CAC.2018.8623459.
segmentation prediction maps of the DeepLabv3 plus network and the Boykov, Yuri, Veksler, Olga, Zabih, Ramin, 2001. Fast approximate energy minimization
improved DeepLabv3 plus network were compared. Where (a) repre via graph cuts. In: Proceedings of the Seventh IEEE International Conference on
sents the original image, (b) represents the image label, (c) represents Computer Vision 1, pp. 377–384, 1.
Chen, L.C., Papandreou, G., Kokkinos, I., et al., 2017a. Deeplab: semantic image
the DeepLabv3 plus segmentation image, and (d) represents the segmentation with deep convolutional nets, atrous convolution, and fully connected
improved DeepLabv3 plus segmentation image. From the results, it can crfs[J]. IEEE Trans. Pattern Anal. Mach. Intell. 40 (4), 834–848. https://doi.org/
be seen that the model segmentation that integrates stripe pooling and 10.1109/TPAMI.2017.2699184.
Chen, L.C., Papandreou, G., Schroff, F., et al., 2017b. Rethinking Atrous Convolution for
introduces the attention mechanism is relatively smoother and more Semantic Image segmentation[J]. https://doi.org/10.48550/arXiv:1706.05587
complete. The original DeepLabv3 plus network has problems with arXiv preprint arXiv:1706.05587.
misclassification and discontinuous segmentation. The optimized Chen, L.C., Zhu, Y., Papandreou, G., et al., 2018. Encoder-decoder with atrous separable
convolution for semantic image segmentation[C]. In: Proceedings of the European
network has improved the semantic segmentation effect, better resolu Conference on Computer Vision. ECCV, pp. 801–818.
tion, refined the segmentation boundary of the target and achieved Dhanachandra, Nameirakpam, Manglem, Khumanthem, Yambem Jina Chanu, 2015.
better accuracy. The selected segmentation prediction diagram is shown Image segmentation using K -means clustering algorithm and subtractive clustering
algorithm. Procedia Comput. Sci. 54, 764–771.
in Fig. 8.
Fu, J., Yi, X., Wang, G., et al., 2022. Research on ground object classification method of
high resolution remote-sensing images based on improved DeeplabV3+[J]. Sensors
5. Summary 22 (19), 7477. https://doi.org/10.3390/S22197477.
He, K., Zhang, X., Ren, S., et al., 2016. Deep residual learning for image recognition[C].
Proc. IEEE Conf. on Comput. Vision and Pattern Recogn. 770–778. https://doi.org/
This article proposes a DeepLabv3 plus network based on the 10.3390/APP12188972.
attention mechanism. Changing global pooling to stripe pooling in the Honarbakhsh, V., Siahkoohi, H.R., Rezghi, M., et al., 2023. SeisDeepNET: an extension of
ASPP module captures global contextual information, while the addition Deeplabv3+ for full waveform inversion problem[J]. Expert Syst. Appl. 213, 118848
https://doi.org/10.1016/J.ESWA.2022.118848.
of the polarization self-attention mechanism enhances the utilization of Hou, Qibin, Zhang, Li, Cheng, Ming-Ming, Feng, Jiashi, 2020. Strip pooling: rethinking
image spatial features. Finally, by adding ECA-Net after the low-level spatial pooling for scene parsing. Proceedings of the IEEE/CVF Conference on
features of MobileNetV2, the acquisition of shallow features improves. Computer Vision and Pattern Recognition 4003–4012.
Hridoy, R.H., Habib, T., Jabiullah, I., et al., 2021. Early recognition of betel leaf disease
The experimental results show that embedding the attention module using deep learning with depthwise separable convolutions[C]. In: 2021 IEEE Region
into DeepLabv3 plus as a network can improve the accuracy of key 10 Symposium (TENSYMP). IEEE, pp. 1–7. https://doi.org/10.1109/
categories and effectively improve the segmentation accuracy of objects TENSYMP52854.2021.9551009.
Iftikhar, S., Asim, M., Zhang, Z., et al., 2022. Advance generalization technique through
in images by the network. The objective indicator MIoU improved by 3D CNN to overcome the false positives pedestrian in autonomous vehicles.
approximately 2%. Our work improves the performance of image se Telecommun. Syst. 80, 545–557. https://doi.org/10.1007/s11235-022-00930-1.
mantic segmentation, which provides new ideas for autonomous Iftikhar, Sundas, Asim, Muhammad, Zhang, Zuping, Muthanna, Ammar, Chen, Junhong,
El-Affendi, Mohammed, Ahmed, Sedik, Ahmed, A., Abd El-Latif, 2023. Target
driving, medical imaging, and other fields and provides direction for the
detection and recognition for traffic congestion in smart cities using deep learning-
field of computer vision. enabled UAVs: a review and analysis. Appl. Sci. 13 (6), 3995. https://doi.org/
Although the improved algorithm has made good improvements, 10.3390/app13063995.
Kass, Michael, Witkin, Andrew P., Terzopoulos, Demetri, 2004. Snakes: active contour
there are still shortcomings. Since the introduction of the attention
models. Int. J. Comput. Vis. 1, 321–331.
mechanism increases the model complexity to some extent, further Liu, M z, 2020. Research on Image Semantic Segmentation Algorithm Based on Self-
research is needed in terms of model complexity and parameter quan Attention Mechanism [D] Dalian. Dalian Univ. Technol. 20–35. https://doi.org/
tity. In the future, we will consider using model compression methods to 10.26991/d.cnki.gdllu.2020.001777.
Liu, A., Yang, Y., Sun, Q., Xu, Q., 2018. A deep fully convolution neural network for
optimize the network so that the model can balance high accuracy and semantic segmentation based on adaptive feature fusion. In: 2018 5th International
light weight. Conference on Information Science and Control Engineering (ICISCE), pp. 16–20.
https://doi.org/10.1109/ICISCE.2018.00013. Zhengzhou, China.
Liu, H., Liu, F., Fan, X., et al., 2021. Polarized self-attention: toward high-quality
pixelwise regression[J]. arXivpreprintarXiv:2107.00782. https://doi.org/10.4855
0/arXiv.2107.00782.

7
Y. Liu et al. Engineering Applications of Artificial Intelligence 127 (2024) 107260

Minaee, Shervin, Wang, Yao, 2017. An ADMM approach to masked signal decomposition Sun, Y., Jiang, Q., Hu, J., et al., 2019. Attention mechanism based pedestrian trajectory
using subspace representation. IEEE Trans. Image Process. 28, 3192–3204. prediction generation model[J]. J. Comput. Appl. 39 (3), 668. https://doi.org/
Minaee, S., Boykov, Y., Porikli, F., et al., 2021. Image segmentation using deep learning: 10.13203/j.whugis20200159.
a survey[J]. IEEE Trans. Pattern Anal. Mach. Intell. 44 (7), 3523–3542. Yang, X., 2020. An overview of the attention mechanisms in computer vision[C]//
Najman, Laurent, Schmitt, Michel, 1994. Watershed of a continuous function. Signal Journal of Physics: conference Series. IOP Publish. 1693 (1), 012173 https://doi.
Process. 38, 99–112. org/10.1088/1742-6596/1693/1/012173.
Nock, R., Nielsen, F., 2004. Statistical region merging. IEEE Trans. Pattern Anal. Mach. Yang, Z., Peng, X., Yin, Z., 2020. Deeplab_v3_plus-net for image semantic segmentation
Intell. 26 (11), 1452–1458. https://doi.org/10.1109/TPAMI.2004.110. with channel compression[C]//2020 IEEE 20th international conference on
Otsu, N., 1979. A threshold selection method from gray-level histograms. IEEE Trans. communication technology (ICCT). IEEE 1320–1324. https://doi.org/10.1109/
Syst. Man, and Cybern. 9 (1), 62–66. https://doi.org/10.1109/TSMC.1979.4310076. ICCT50939.2020.9295748.
Plath, Nils, Toussaint, Marc, Nakajima, Shinichi, 2009. Multiclass image segmentation Zeng, H., Peng, S., Li, D., 2020. Deeplabv3+ semantic segmentation model based on
using conditional random fields and global classification. International Conference feature cross attention mechanism[C]. In: Journal of Physics: Conference Series.
on Machine Learning. IOPPublishing, 012106. https://doi.org/10.1088/1742-6596/1678/1/012106,
Sehar, U., Naseem, M.L., 2022. How deep learning is empowering semantic 1678(1).
segmentation: traditional and deep learning techniques for semantic segmentation: a Zhang, Z., Huang, J., Jiang, T., et al., 2020. Semantic segmentation of very high-
comparison[J]. Multimed. Tool. Appl. 81 (21), 30519–30544. https://doi.org/ resolution remote sensing image based on multiple band combinations and
10.1007/S11042-022-12821-3. patchwise scene analysis[J]. J. Appl. Remote Sens. 14 (1) https://doi.org/10.1117/
Starck, J.-L., Elad, M., Donoho, D.L., 2005. Image decomposition via the combination of 1.JRS.14.016502, 016502-016502.
sparse representations and a variational approach. IEEE Trans. Image Process. 14 Zhao, Hengshuang, Shi, Jianping, Qi, Xiaojuan, Wang, Xiaogang, Jia, Jiaya, 2017.
(10), 1570–1582. https://doi.org/10.1109/TIP.2005.852206. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.
Sun, W., Wang, R., 2018. Fully convolutional networks for semantic segmentation of very CVPR) 2881–2890.
high resolution remotely sensed images combined with DSM[J]. Geosci. Rem. Sens. Zhu, Z.L., Rao, Y., Wu, Y., et al., 2019. Research progress of attention mechanism in deep
Lett. IEEE 15 (3), 474–478. https://doi.org/10.1109/LGRS.2018.2795531. learning[J]. J. Chin. Inf. Process. 33 (6), 1–11. https://doi.org/10.13374/j.issn2095-
9389.2021.01.30.005.

LM-DeeplabV3 A Lightweight Image Segmentation Algo
No ratings yet
LM-DeeplabV3 A Lightweight Image Segmentation Algo
15 pages
RA ASPP Deeplab
No ratings yet
RA ASPP Deeplab
12 pages
Applsci 11 08802 - Compressed
No ratings yet
Applsci 11 08802 - Compressed
28 pages
1 s2.0 S026288562300197X Main
No ratings yet
1 s2.0 S026288562300197X Main
11 pages
BSSNet A Real-Time Semantic Segmentation Network For Road Scenes Inspired From AutoEncoder
No ratings yet
BSSNet A Real-Time Semantic Segmentation Network For Road Scenes Inspired From AutoEncoder
15 pages
Image Segmentation Using Deep Learning A Survey
No ratings yet
Image Segmentation Using Deep Learning A Survey
20 pages
SDPT Semantic-Aware Dimension-Pooling Transformer For Image Segmentation
No ratings yet
SDPT Semantic-Aware Dimension-Pooling Transformer For Image Segmentation
13 pages
Understanding Deep Learning Techniques For Image Segmentation
No ratings yet
Understanding Deep Learning Techniques For Image Segmentation
58 pages
Image Segmentation Using Deep Learning: A Survey
No ratings yet
Image Segmentation Using Deep Learning: A Survey
23 pages
A Comprehensive Review of Modern Object Segmentation Approaches
No ratings yet
A Comprehensive Review of Modern Object Segmentation Approaches
177 pages
ML Report-Image Segmentation
No ratings yet
ML Report-Image Segmentation
19 pages
Sensors: Semantic Segmentation With Transfer Learning For Off-Road Autonomous Driving
No ratings yet
Sensors: Semantic Segmentation With Transfer Learning For Off-Road Autonomous Driving
21 pages
【SegFormer】NeurIPS 2021 Segformer Simple and Efficient Design for Semantic Segmentation With Transformers Paper
No ratings yet
【SegFormer】NeurIPS 2021 Segformer Simple and Efficient Design for Semantic Segmentation With Transformers Paper
14 pages
Machine Learning - Advanced Concepts
From Everand
Machine Learning - Advanced Concepts
Derrick Mwiti
No ratings yet
6 Segnet
No ratings yet
6 Segnet
14 pages
Boundary-Aware Segmentation Network For Mobile and Web Applications
No ratings yet
Boundary-Aware Segmentation Network For Mobile and Web Applications
19 pages
A Survey On Deep Learning Techniques For Image and Video Semantic Segmentation
No ratings yet
A Survey On Deep Learning Techniques For Image and Video Semantic Segmentation
68 pages
Image Segmentation in Deep Learning
No ratings yet
Image Segmentation in Deep Learning
12 pages
R - Fpga 4: EAL Time Semantic Segmentation On S For Autonomous Vehicles With Hls ML
No ratings yet
R - Fpga 4: EAL Time Semantic Segmentation On S For Autonomous Vehicles With Hls ML
11 pages
DFANet Deep Feature Aggregation For Real-Time Semantic Segmentation
No ratings yet
DFANet Deep Feature Aggregation For Real-Time Semantic Segmentation
10 pages
Image Segmentation Using Deep Learning: A Survey
No ratings yet
Image Segmentation Using Deep Learning: A Survey
22 pages
Implementation of Deep Neural Networks Learning On Unmanned Aerial Vehicle Based Remote-Sensing
No ratings yet
Implementation of Deep Neural Networks Learning On Unmanned Aerial Vehicle Based Remote-Sensing
7 pages
Deep Dual-Resolution Networks For Real-Time and Accurate Semantic Segmentation of Road Scenes
No ratings yet
Deep Dual-Resolution Networks For Real-Time and Accurate Semantic Segmentation of Road Scenes
12 pages
MPFNet Multiscale Prediction Network With Cross Fu
No ratings yet
MPFNet Multiscale Prediction Network With Cross Fu
12 pages
Main
No ratings yet
Main
13 pages
Sensors: Depth Estimation and Semantic Segmentation From A Single RGB Image Using A Hybrid Convolutional Neural Network
No ratings yet
Sensors: Depth Estimation and Semantic Segmentation From A Single RGB Image Using A Hybrid Convolutional Neural Network
20 pages
A Review On Multiscale-Deep-Learning Applications
No ratings yet
A Review On Multiscale-Deep-Learning Applications
28 pages
DL Segmentation 2
No ratings yet
DL Segmentation 2
18 pages
Real-Time Traffic Scene Segmentation Based On Multi-Feature Map and Deep Learning
No ratings yet
Real-Time Traffic Scene Segmentation Based On Multi-Feature Map and Deep Learning
6 pages
Harley MSC Thesis Menos Especializadpo
No ratings yet
Harley MSC Thesis Menos Especializadpo
71 pages
Optimisation of Semantic Segmentation Algorithm For Autonomous Driving Using U-NET Architecture
No ratings yet
Optimisation of Semantic Segmentation Algorithm For Autonomous Driving Using U-NET Architecture
16 pages
Image Segmentationand Semantic Labelingusing Machine Learning
No ratings yet
Image Segmentationand Semantic Labelingusing Machine Learning
6 pages
Deep Semantic Segmentation New Model of Natural and Medical Images
No ratings yet
Deep Semantic Segmentation New Model of Natural and Medical Images
4 pages
Unit - 3 - DL
No ratings yet
Unit - 3 - DL
15 pages
BASeg - Boundary Aware Semantic Segmentation For Autonomous
No ratings yet
BASeg - Boundary Aware Semantic Segmentation For Autonomous
11 pages
Thesis Z Ai
No ratings yet
Thesis Z Ai
46 pages
A System of Semantic Segmentation For Autonomous Driving
No ratings yet
A System of Semantic Segmentation For Autonomous Driving
4 pages
IJRAR1DUP001
No ratings yet
IJRAR1DUP001
3 pages
Efficient Deep Learning Infrastructures For Embedded Computing Systems: A Comprehensive Survey and Future Envision
No ratings yet
Efficient Deep Learning Infrastructures For Embedded Computing Systems: A Comprehensive Survey and Future Envision
101 pages
A Comparative Study of Real-Time Semantic Segmentation For Autonomous Driving
No ratings yet
A Comparative Study of Real-Time Semantic Segmentation For Autonomous Driving
11 pages
A Brief Survey and An Application of Sem
No ratings yet
A Brief Survey and An Application of Sem
38 pages
10623proposal Copy
No ratings yet
10623proposal Copy
4 pages
Two-Stage Framework For Faster Semantic Segmentation
No ratings yet
Two-Stage Framework For Faster Semantic Segmentation
9 pages
A Review On Deep Learning Approaches To Image Classification and Object Segmentation 1
No ratings yet
A Review On Deep Learning Approaches To Image Classification and Object Segmentation 1
23 pages
MMDetection Open MMLab Detection Toolbox and Benchmark
No ratings yet
MMDetection Open MMLab Detection Toolbox and Benchmark
13 pages
Summary
No ratings yet
Summary
65 pages
1 s2.0 S0278612522001054 Main
No ratings yet
1 s2.0 S0278612522001054 Main
14 pages
Int 2022 0048.1
No ratings yet
Int 2022 0048.1
10 pages
Recent Progress in Semantic Image Segmentation: Xiaolong Liu Zhidong Deng Yuhan Yang
No ratings yet
Recent Progress in Semantic Image Segmentation: Xiaolong Liu Zhidong Deng Yuhan Yang
18 pages
Real Time Object Detection Using SSD and MobileNet
No ratings yet
Real Time Object Detection Using SSD and MobileNet
6 pages
Deep Semantic Segmentation New Model of Natural and Medical Images
No ratings yet
Deep Semantic Segmentation New Model of Natural and Medical Images
4 pages
DDSNet Deep Dual-Branch Networks For Surface Defect Segmentation
No ratings yet
DDSNet Deep Dual-Branch Networks For Surface Defect Segmentation
16 pages
Auto-DeepLab Hierarchical Neural Architecture Search For Semantic Image Segmentation
No ratings yet
Auto-DeepLab Hierarchical Neural Architecture Search For Semantic Image Segmentation
12 pages
Sensors 21 00891 v2
No ratings yet
Sensors 21 00891 v2
17 pages
SegNet: A Deep Convolutional Encoder-Decoder Architecture For Image Segmentation
No ratings yet
SegNet: A Deep Convolutional Encoder-Decoder Architecture For Image Segmentation
15 pages
Thesis AlexanderJaus BIBTEX
No ratings yet
Thesis AlexanderJaus BIBTEX
9 pages
(IJCST-V12I3P11) :M. Rega, Dr. S. Sivakumar
No ratings yet
(IJCST-V12I3P11) :M. Rega, Dr. S. Sivakumar
6 pages
Wang Dual Super-Resolution Learning For Semantic Segmentation CVPR 2020 Paper
No ratings yet
Wang Dual Super-Resolution Learning For Semantic Segmentation CVPR 2020 Paper
10 pages
Post-Reading Report Alex Shen (Mid Exam)
No ratings yet
Post-Reading Report Alex Shen (Mid Exam)
36 pages
Visual Sensor Network: Exploring the Power of Visual Sensor Networks in Computer Vision
From Everand
Visual Sensor Network: Exploring the Power of Visual Sensor Networks in Computer Vision
Fouad Sabry
No ratings yet
A Review On Different Glaucoma Detection PDF
No ratings yet
A Review On Different Glaucoma Detection PDF
6 pages
2 - Benefits of IEEE Membership and Join IEEE
No ratings yet
2 - Benefits of IEEE Membership and Join IEEE
15 pages
Paper 5
No ratings yet
Paper 5
15 pages
A Review On Different Glaucoma Detection PDF
No ratings yet
A Review On Different Glaucoma Detection PDF
6 pages
Subject Title: Analog Circuits Course Code: Year and Semester: II & II
No ratings yet
Subject Title: Analog Circuits Course Code: Year and Semester: II & II
8 pages
Analog Circuits Lab
No ratings yet
Analog Circuits Lab
1 page
Elregaily 20
No ratings yet
Elregaily 20
7 pages
Department Vision Mission
No ratings yet
Department Vision Mission
4 pages
EDC Unit-2
No ratings yet
EDC Unit-2
22 pages
SS Jntu Hyd
No ratings yet
SS Jntu Hyd
19 pages
Extra Bits SS
No ratings yet
Extra Bits SS
2 pages
Ss Jntuk Dec 2015
No ratings yet
Ss Jntuk Dec 2015
4 pages
Harnessing Digital Technologies For Triple Bottom Line Sustainability in The Banking Industry: A Bibliometric Review
No ratings yet
Harnessing Digital Technologies For Triple Bottom Line Sustainability in The Banking Industry: A Bibliometric Review
23 pages
Module 1 Introduction To AI
No ratings yet
Module 1 Introduction To AI
40 pages
AIML 2nd Year
No ratings yet
AIML 2nd Year
5 pages
AI and IoT For Energy Optimization
No ratings yet
AI and IoT For Energy Optimization
3 pages
Review - UNet++ - A Nested U-Net Architecture (Biomedical Image Segmentation) - by Sik-Ho Tsang - Medium
No ratings yet
Review - UNet++ - A Nested U-Net Architecture (Biomedical Image Segmentation) - by Sik-Ho Tsang - Medium
9 pages
Flower Classification Via Convolutional Neural Network
No ratings yet
Flower Classification Via Convolutional Neural Network
7 pages
AgentDefender by Lyzr
No ratings yet
AgentDefender by Lyzr
6 pages
Artificial Intelligence in Healthcare - Past, Present and Future
No ratings yet
Artificial Intelligence in Healthcare - Past, Present and Future
14 pages
Full Download Intelligent Natural Language Processing Trends and Applications 1st Edition Khaled Shaalan PDF
88% (8)
Full Download Intelligent Natural Language Processing Trends and Applications 1st Edition Khaled Shaalan PDF
55 pages
Ai For SMB Guide Salesforce
No ratings yet
Ai For SMB Guide Salesforce
18 pages
Grief AI Manuscript
No ratings yet
Grief AI Manuscript
5 pages
Machine Learning For Fluid Property Correlations: Classroom Examples With MATLAB
No ratings yet
Machine Learning For Fluid Property Correlations: Classroom Examples With MATLAB
7 pages
AI-Lecture-08-11 (Agents)
No ratings yet
AI-Lecture-08-11 (Agents)
68 pages
Camp K12 Curriculum GR 5-6
No ratings yet
Camp K12 Curriculum GR 5-6
5 pages
Shashank Rana 11915092
No ratings yet
Shashank Rana 11915092
2 pages
Artificial Intelligence
No ratings yet
Artificial Intelligence
5 pages
Unit-IV Strategic Workforce Planning (SWP)
No ratings yet
Unit-IV Strategic Workforce Planning (SWP)
32 pages
Introduction To ML,: Module-I
No ratings yet
Introduction To ML,: Module-I
48 pages
M.tech CSE NEP Curricular Structure Compressed
No ratings yet
M.tech CSE NEP Curricular Structure Compressed
9 pages
Syllabus - ML
No ratings yet
Syllabus - ML
9 pages
Exploring Adapter-Based Transfer Learning For Recommender Systems: Empirical Studies and Practical Insights
No ratings yet
Exploring Adapter-Based Transfer Learning For Recommender Systems: Empirical Studies and Practical Insights
10 pages
GE Higher Education Edu Praxis 4 Isbn9782889315789
No ratings yet
GE Higher Education Edu Praxis 4 Isbn9782889315789
128 pages
CISC 867: Deep Learning Assignment #1: K J Net
No ratings yet
CISC 867: Deep Learning Assignment #1: K J Net
3 pages
Data Sciene Bro
No ratings yet
Data Sciene Bro
18 pages
EPC - Pillar I of EthicBizz
No ratings yet
EPC - Pillar I of EthicBizz
9 pages
Adapting To The Human - A Systematic Review of A Decade of Human Factors Research On Adaptive Autonomy
No ratings yet
Adapting To The Human - A Systematic Review of A Decade of Human Factors Research On Adaptive Autonomy
8 pages
Latest Dissertation Topics For Mba Marketing
100% (2)
Latest Dissertation Topics For Mba Marketing
7 pages
2024 Fall CPS843 CP8307
No ratings yet
2024 Fall CPS843 CP8307
8 pages
Home X
No ratings yet
Home X
1 page
Ten - Questions-1 Quamtum Computing PDF
No ratings yet
Ten - Questions-1 Quamtum Computing PDF
2 pages