Research Paper
Research Paper
Research Paper
Stdc
Stdc is an image segmentation method
The architecture described in these steps based on a deep structured learning
appears to be a complex neural network approach.
composed of several components,
specifically an Encoder-Decoder It uses a set of deep convolutional neural
architecture followed by a Vision networks to capture the context
Transformer, Dropout, GELU activation information of an image.
function, MLANeck, Relu,
SETRMLAHead, cross-entropy loss, Relu,
FCNHead, CrossEntropyLoss, FCNhead
and CrossEntropyLoss. The Encoder-
Decoder architecture is commonly used in
image segmentation tasks, it extracts
features from the input image, while the
decoder part is responsible for generating a
segmentation mask that segments the
This architecture appears to be a specific
image into different regions. Vision
implementation for image segmentation
Transformer is an attention-based
tasks. The encoder part of the architecture
mechanism that allows the model to learn
is responsible for extracting features from
to selectively attend to different regions of
the input image, while the decoder part is
the image. Dropout is used for
responsible for generating a segmentation
regularization, to avoid overfitting. GELU
mask that segments the image into
is a non-linear activation function that is
different regions. The encoder is typically
used to introduce non-linearity into the
a convolutional neural network (CNN) that
network. MLANeck is a specific
reduces the spatial resolution of the image
component of the architecture that is
while increasing the number of feature
responsible for performing feature fusion
maps, which allows it to capture
and attention operations on the multi-level
increasingly abstract features of the image.
features. Relu is a non-linear activation
function that is commonly used in neural The architecture includes several specific
networks. SETRMLAHead is a specific components such as STDCContextPathNet
type of architecture used in image and STDCNet, which are used to capture
segmentation tasks, it allows the network spatial-temporal dynamic context and
to effectively use features from different context path information, respectively. The
levels of the network and selectively attend decoder includes several layers including a
to the most informative features. Cross- ReLU activation function, FCNHead,
entropy loss is a commonly used loss OHEMPixelSampler, and
CrossEntropyLoss layers. The FCNHead is Experiments
a specific type of architecture that is used
to create a full-resolution segmentation
mask from the feature maps generated by
the encoder. The OHEMPixelSampler is an
operation that samples pixels from the
predicted segmentation mask. The
CrossEntropyLoss is used to measure the
dissimilarity between the predicted
segmentation mask and the ground truth
mask.
Later in the architecture, there are This table compares the performance of
OHEMPixelSampler and ReLU layers, different models (Pspnet, Resnet, Setdr,
then it concludes with Dice loss function, Stdc, and Unet) on three types of graphics
cards: 3080, 3050Ti, and 3050. The
which is used to measure the dissimilarity
metrics used to evaluate the performance
between the predicted segmentation mask
of the models are Average Accuracy
and the ground truth mask. (aAcc), Mean Intersection over Union
(mIoU), and Mean Accuracy (mAcc). The
Dataset table shows the performance of each model
In our case we compared between two on each graphics card, with the highest
datasets Ade 20K and Cityscapes. performance being indicated by the highest
value in each cell. It appears that overall,
In the ADE20K dataset, the images contain the 3080 graphics card has the best
a variety of objects, scenes, and performance across all models and metrics,
backgrounds, with a total of 150K images while the 3050 graphics card has the
and 20K categories. lowest performance.
Reference
[1] “MALUNet: A Multi-Attention and Light-weight
UNet for Skin Lesion Segmentation”by Jiacheng Ruan,
Suncheng Xiang, Mingye Xie, Ting Liu, Yuzhuo Fu