0% found this document useful (0 votes)
61 views15 pages

Han Et Al. - 2023 - Deep Learning Based Approach For Automated Charact

Uploaded by

mariem elattar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
61 views15 pages

Han Et Al. - 2023 - Deep Learning Based Approach For Automated Charact

Uploaded by

mariem elattar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 15

Marine Environmental Research 183 (2023) 105829

Contents lists available at ScienceDirect

Marine Environmental Research


journal homepage: www.elsevier.com/locate/marenvrev

Deep learning based approach for automated characterization of large


marine microplastic particles
Xiao-Le Han a, b, Ning-Jun Jiang a, *, Toshiro Hata c, Jongseong Choi d, Yan-Jun Du a,
Yi-Jie Wang b
a
Institute of Geotechnical Engineering, Southeast University, Nanjing, Jiangsu, China
b
Department of Civil and Environmental Engineering, University of Hawaii at Manoa, Honolulu, HI, USA
c
Department of Engineering, Hiroshima University, Hiroshima, Japan
d
Department of Mechanical Engineering, The State University of New York, SUNY Korea, Incheon, South Korea

A R T I C L E I N F O A B S T R A C T

Keywords: The rapidly growing concern of marine microplastic pollution has drawn attentions globally. Microplastic par­
Deep learning ticles are normally subjected to visual characterization prior to more sophisticated chemical analyses. However,
Microplastics the misidentification rate of current visual inspection approaches remains high. This study proposed a state-of-
Mask R–CNN
the-art deep learning-based approach, Mask R–CNN, to locate, classify, and segment large marine microplastic
U-Net
Instance segmentation
particles with various shapes (fiber, fragment, pellet, and rod). A microplastic dataset including 3000 images was
established to train and validate this Mask R–CNN algorithm, which was backboned by a Resnet 101 architecture
and could be tuned in less than 8 h. The fully trained Mask R–CNN algorithm was compared with U-Net in
characterizing microplastics against various backgrounds. The results showed that the algorithm could achieve
Precision = 93.30%, Recall = 95.40%, F1 score = 94.34%, APbb (Average precision of bounding box) = 92.7%,
and APm (Average precision of mask) = 82.6% in a 250 images test dataset. The algorithm could also achieve a
processing speed of 12.5 FPS. The results obtained in this study implied that the Mask R–CNN algorithm is a
promising microplastic characterization method that can be potentially used in the future for large-scale surveys.

1. Introduction (Barboza et al., 2018). Besides, due to their small size, these plastic
pieces could be easily swallowed by various marine organisms and
The United Nations Environment Programme has reported that transferred into the food chain, ultimately ending in human beings
plastic pollution has been listed as one of the top environmental issues (Sharma and Chatterjee, 2017). Therefore, the microplastic pollution
by Mason et al. (2016). Due to the limited disposal capability, most issues have attracted an intensive spotlight from experts in the marine
waste plastic is accumulated by landfilling. During this period, the environment, marine biology, and coastal ecology.
plastic chunk will degrade or fragment into small pieces through me­ In efforts to maintain a safe marine environment, European Union
chanical, thermal oxidation, hydrolysis, and biological processes. These (EU) passed the Marine Strategy Framework Directive (MSFD) and re­
various processes make small plastic particles have unique shapes quires all member nations to monitor and classify MPs based on their
(Bertoldi et al., 2021). When their sizes are less than 5 mm, the plastic physical characteristics, including color, brightness, size, morphology,
particles are defined as microplastics (MPs) (Arthur et al., 2009). So far, and polymer type (Gago et al., 2016; Gauci et al., 2019). Since these
MPs have been found in terrestrial and aquatic ecosystems, from the characteristics are closely related to their origin, degradation rate,
inland water system to the soil profile, from the ocean surface to the transportation processes, impacts on the environment, and destinations
sediment, and from pole to pole (Ivar do and Costa, 2014). Once the MPs (Hidalgo-Ruz et al., 2012; Zhang et al., 2017; Kooi and Koelmans, 2019),
enter the oceans, they can be transported a long distance by ocean it is essential to characterize and classify MPs (Wang et al., 2019).
currents and wind. Moreover, MPs can remain in the water for decades, Currently, the image capture methods for MPs with relatively large sizes
which makes them perfect carriers of toxic hydrophobic compounds (up to 5 mm) mainly include naked eye with a digital camera (1 mm–5
absorbed from the ocean (Nobre et al., 2015) and pathogenic organisms mm) (Lorenzo-Navarro et al., 2021), stereo microscope (0.3–5 mm)

* Corresponding author.
E-mail address: [email protected] (N.-J. Jiang).

https://doi.org/10.1016/j.marenvres.2022.105829
Received 13 June 2022; Received in revised form 5 October 2022; Accepted 15 November 2022
Available online 18 November 2022
0141-1136/© 2022 Elsevier Ltd. All rights reserved.
X.-L. Han et al. Marine Environmental Research 183 (2023) 105829

(Hanvey et al., 2017), and digital holographic imaging (0.1–5 mm) (Zhu Girshick (2015) developed the Fast R–CNN by changing multi-stage
et al., 2021a). For small MPs at a micrometer level, Raman training into multi-task training. Followed by the Fast R–CNN algo­
micro-spectroscopy (10–300 μm) (Lenz et al., 2015), fluorescent dyes rithm, Ren et al. (2015) developed the Faster R–CNN to improve the
(20–100 μm) (Lv et al., 2019), and staining (Hata and Jiang., 2021) have algorithm efficiency. A lightweight Region Proposal Network (RPN) was
been utilized. Images captured with relatively fewer MPs are typically adopted to replace the selective search function inherited from the
processed using naked eyes to obtain information about morphology. R–CNN algorithm (Choi et al., 2022). The advancement of the CNN al­
Then MPs will be classified, analyzed, and recorded. (Shim et al., 2017). gorithms has embraced the new object detection algorithm with more
Benefiting from the rapid development of computer and information functions. He et al. (2017) proposed the mask region conventional
technologies, optical microscopy, scanning electron microscopy, fluo­ neural network (Mask R–CNN) based on Faster R–CNN by attaching a
rescence microscopy, and spectral analysis from Fourier transform binary mask to the detected object, which could help highlight each
infrared (FT-IR) spectroscopy, Raman spectroscopy, pyrolysis target from the background with masks in varying colors. Mask R–CNN
gas-chromatography mass-spectrometry, and energy dispersive X-ray could achieve instance segmentation, in which every single object
spectroscopy are increasingly deployed to analyze and interpret MPs within the same category could be recognized and distinguished.
data (Cowger et al., 2020). Mukhanov et al. (2019) used ImageJ to Therefore, Mask R–CNN could simultaneously achieve a threefold
convert RGB images into binary images and then classify the micro­ function including object classification, object detection, and segmen­
plastic into four categories: rounded, irregular, elongated, and fiber, tation. The state-of-the-art pixel-level instance segmentation of Mask
based on their morphometry parameters. In addition, with the help of a R–CNN was soon applied in medical science, industrial robots, animal
hyperspectral image capture system employing infrared spectrometry husbandry, structure health monitoring, etc. (Johnson, 2018; Yu et al.,
and the Raman technique, Serranti et al. (2018) applied Multivariate 2019; Zhao et al., 2020; Xu et al., 2020). For instance, Johnson (2018)
Image Analysis and partial least squares discriminant analysis to identify demonstrated that Mask R–CNN could be used for cell nuclei segmen­
the plastic material and shape. However, the high cost and limited tation under microscopic images. Zhao et al. (2020) developed a tunnel
accessibility of such expensive chemometric tools prohibit broader ap­ image capture system using the Mask R–CNN algorithm to detect
plications. Gauci et al. (2019) developed an algorithm using the moisture marks in shield tunnel lining. However, regarding the char­
least-squares method based on the MATLAB platform to conduct acterization of MPs, only a few studies using CNN-based deep learning
dimension measurements and surface roughness evaluation of MPs methods have been reported. Lorenzo-Navarro et al. (2021) developed a
collected on the island of Malta in the Central Mediterranean. The color deep learning method by combining U-Net and VGG16 neural networks,
of the microplastic was recorded by calculating the Mean Square Error counting and classifying common MPs in laboratory conditions. How­
(MSE) of value in Red (R), Green (G), and Blue (B) channels. Although ever, detection results were obtained from a pure white background and
these image processing methods can independently complete required good illumination conditions. No MPs embedded in the complex back­
tasks like border detection and area calculation, these methods still rely ground were tested in the study. Wegmayr et al. (2020) compared the
on pre-designed algorithms and are regulated by human insight. instance segmentation accuracy of microplastic fiber microscopic im­
Therefore, it is necessary to develop new methods for the MPs detection ages by applying Mask R–CNN and Deep Pixel Embeddings (DPE). The
and classification processes with sufficient accuracy and generalization study showed that in complex tangled cases, DPE showed better per­
ability. formance. However, their work only focused on fiber MPs detection and
In the 1950s, with the development of recognition in the human segmentation performance. No MPs classification function was built in
learning process, the definition of artificial intelligence (AI) was first the algorithm. Zhu et al. (2021b) developed an HC-CNN algorithm to
initiated (Russell and Norvig., 1995). AI involved creating machines that classify microplastic using a hologram image dataset. The lightweight
could solve problems requiring human intelligence. The machine algorithm achieved high accuracy and efficiency. However, the algo­
learning method partially solved the task, which included giving data, rithm could not offer more classification information. So far, no other
generating models, and making decisions. Within this stage, a series of studies have been reported about using the Mask R–CNN algorithm to
famous algorithms such as linear regression, support vector machine characterize MPs.
(SVM), K-nearest neighbors (KNN), Naive Bayes, etc (Hart et al., 2000). In this study, a Mask R–CNN based deep learning model was devel­
Kedzierski et al. (2019) adopted the KNN algorithm to detect the oped and used for MPs (1–5 mm diameter) classification, localization,
chemical nature of the MPs based on the FTIR- ATR spectra. The auto­ and segmentation. A deep learning dataset of MPs was constructed and
mated classification method could improve the efficiency of inspecting utilized to train and validate the Mask R–CNN model using only optical
these spectra. Bianco et al. (2020) applied SVM to differentiate diatom cameras and available image processing software. The developed
and MPs based on 3D coherent images using the holographic imaging model’s classification, localization, segmentation, and computational
technique. Meyers et al. (2022) adopted Nile red staining method and performances were evaluated. Then, the validity of the trained model
decision tree algorithms to classify seven different MPs based on their was tested for the classification, localization, and segmentation of real
materials. As the need for more powerful algorithms proliferates, deep MPs with different morphologies and at various scales. Meanwhile, the
learning algorithms begin to stand out in the machine learning field. MPs were also presented against white and real-world backgrounds to
Inspired by vision mechanisms of the visual cortex in human eyes, a demonstrate the Mask R–CNN model’s validity.
neural network focused on feature extraction was developed and defined
as the convolutional neural network (CNN or CovNet) (Fukushima and 2. Methodologies
Miyake, 1982; LeCun et al., 1989). After development in the past de­
cades, CNN has become one of the most widely used architectures in 2.1. Mask R–CNN architecture
deep learning. A CNN module normally consists of a series of convolu­
tional and pooling layers, which can effectively capture the grid-like The Mask R–CNN model adopted in this study is backboned by the
topology features of images while requiring less computational effort. Resnet 101 network (Wu et al., 2019), a 101-layer residual neural
Compared with traditional neural networks, CNNs offer more flexibility network. Resnet 101 is divided into five stages and serves as the con­
to users with different accuracy demands since they can be built by volutional layer to extract features directly from images of MPs. Each
combining CNN modules (Cha et al., 2017). CNNs could consist of more stage contains convolution and an identity block. Furthermore, each
than 100 layers to build a deeper network. So far, CNNs have brought identity block consists of several convolution layers.
about breakthroughs in the processing of image, video, speech, and Using the Pytorch platform, the training of this model is initiated via
audio by improving the accuracy and efficiency to a new level, which transfer learning, which loads a pre-trained model weight based on the
was not well addressed in traditional AI studies (LeCun et al., 2015). COCO dataset (0.3 million images and 80 categories) (Ren et al., 2015).

2
X.-L. Han et al. Marine Environmental Research 183 (2023) 105829

Transfer learning could save training time since the pre-trained weights humanly inspected to confirm they were microplastics. Those suspects
can be directly utilized to initialize the training process. Therefore, it has had been tested with Raman spectroscopy to eliminate the plant and
been widely adopted in deep learning, especially in cases where the marine creature residues. The Raman spectroscopy of a low-density
number of images for training is limited (Pan and Yang, 2010). polyethylene (LDPET) MP particle is shown in Fig. 3.
The architecture of the Mask R–CNN model is shown in Fig. 1. Firstly, Each time at least 20 individual MP particles were placed on a white
feature maps are generated after the backbone Resnet 101 CNN pro­ background within the mini-studio. Once the image was taken, these
cesses the input raw images. Then, the feature maps are fed into light­ MPs were discarded and new MPs particles were placed in the mini-
weight region proposal networks (RPNs) to detect candidate objects studio until 100 laboratory captured images were obtained. All 500
with a sliding window and to generate regions of interest (RoIs) with raw images from onsite and laboratory capturing were cut into small
bounding boxes (anchors). The actual sizes of these boxes are deter­ patches with a uniform resolution of 512 × 512 via a Python program,
mined by the anchor scales and ratios, which are essential hyper­ and 1500 patches with intact MPs were finally selected (Han et al.,
parameters in the model tuning process. The RoIs are further classified 2022). Previous studies showed that crucial features of MPs could be
as foreground or background with scores, and NMS (no maximum sup­ kept and the training efficiency was superior at this resolution (Yu et al.,
pression) is applied to keep only high-score ones (Neubeck and Van 2019). According to the classification methods mentioned by Hartmann
Gool, 2006). Then, the RoI Align network is utilized to adjust the et al. (2019), Mukhanov et al. (2019), and Lorenzo-Navarro et al.
dimension of the RoIs generated by RPN and produce a fixed-size feature (2021), the MPs were classified as fiber, fragment, pellet, and rod based
map. In the meantime, followed by a three-branch-paralleled prediction on their morphological characteristics as shown in Fig. 4. Among these
network, the functions of classification, object localization, and instance 1500 patches (512 × 512 resolution), there were 386 fibers, 1015
segmentation are achieved. The Fully Connected (FC) layers will pass fragments, 844 pellets, and 322 rods. 1500 patches were duplicated by
the feature map to a normalized exponential function (SoftMax) and a randomly choosing one of four common data augmentation methods
bounding box regression, giving the classification and object detection (left-right flipping, up-down flipping, rotation 90◦ , and scaling) (Han
results, respectively. The SoftMax function is widely applied in machine et al., 2022). Thus, the final MP dataset possessed 3000 images of 512 ×
learning classification problems, which gives the probability of target 512 resolution.
objects belonging to a specific category. A Fully Convolution Network Before the dataset was randomly divided into three subsets: training,
(FCN) is utilized to generate a binary mask, which could highlight the validation, and test, the MPs dataset was annotated using the VGG Image
detected objects. The generated masks can be further used to determine Annotator (Dutta and Zisserman, 2019). Besides, a unique dataset
the dimensions of objects, which is out of the scope of the current study. named test-complex was prepared to test the algorithm performance on
images with different scales and backgrounds. The images in the
2.2. Dataset preparation and preprocessing test-complex dataset had never been used for training and validation.
The details for each subset are displayed in Table 1. These resolutions
With the influence of both wind and ocean currents, the Oahu selected in the test-complex dataset were similar to the general image
island’s windward side (east coast) has suffered from the MPs pollution size obtained from digital cameras, mobile phones, and social media
on the beach areas. In total, 400 onsite MPs images (JPG format with a platforms. The test-complex dataset included images captured under
resolution of 3008 × 1688 pixels) were taken by a Sony α7 Mark II complex backgrounds and illumination conditions to simulate
digital camera from five different beaches on Oahu Island, which can be real-world soil and an aqueous environment where microplastic could
observed in Fig. 2. These images were taken under daylight and natural be spotted.
background (sand, grass, and water on the beaches in Fig. 2).
Then MPs were sieved through US No.4 (4.76 mm) and No.20 (0.85 2.3. Mask R–CNN model training and validation
mm) sieves, and MPs left on the No.20 sieve were sampled for laboratory
image capturing. The laboratory MP image capturing was carried out The Mask R–CNN model was trained and tuned on Google Colabor­
within a LED illuminated mini photo studio using the same digital atory (Colab) with a Tesla P100 GPU (16 GB graphic memory). Syn­
camera. The LED lights have a color temperature of 6500K, the same as chronous stochastic gradient descent (SGD) was used to train the model
daylight. Before image capturing, the MPs samples were washed, dis­ and the weight decay and momentum were set as 0.0001 and 0.9,
infected, and oven-dried at a low temperature. These samples had been respectively. The batch size of the training dataset was set as 4. In order

Fig. 1. Overview of the Mask R–CNN architecture.

3
X.-L. Han et al. Marine Environmental Research 183 (2023) 105829

Fig. 2. Microplastics sampling locations: (a) Kahuku beach; (b) Kahana bay beach; (c) Kailua beach; (d) Waimanalo beach; (d) Sandy beach.

to ensure the validity of the deep learning training parameters, the with the Mask R–CNN. The U-Net was designed based on the Fully
parameter selection refers to Table 2, which summarizes recent research Convolutional Network (FCN) as a famous semantic segmentation al­
adopting the same algorithm structure (Hou et al., 2020; Chen et al., gorithm. The U-Net classification and segmentation function has proven
2020; Kim and Cho, 2019; Nie et al., 2020; Politikos et al., 2021). very effective in microplastic segmentation (Lorenzo-Navarro et al.,
All the backbone CNN layers were frozen for the first 20 epochs. Only 2021; Lee et al., 2022).
the network head, which contained the classifier, bounding box gener­
ator, and mask generator was trained with the pre-trained COCO dataset
3.1. Loss function
weights using an initial learning rate of 0.001. For the following 20
epochs, the learning rate was reduced by ten times and the first three
The loss function was a direct metric to measure the deviation be­
stages of the Resnet 101 were activated to continue the training process.
tween the prediction results and the ground truths labeled by the dataset
Finally, the learning rate was decreased by ten times again, and all five
maker. Properly selecting the loss function can benefit the training
stages of the Resnet 101 were activated to train the model for another 20
process by updating the model weights effectively, contributing to the
epochs. All hyperparameters (learning rate, weight decay, scale of an­
final model performance. Therefore, the target of deep learning algo­
chor, anchor ratio, and NMS ratio) were fine-tuned to achieve higher
rithm training was to decrease the deviation as possible.
accuracy and faster processing speed. The whole training process took
Since the Mask R–CNN algorithm could achieve localization, classi­
less than 8 h to complete.
fication, and segmentation functions, assessing the algorithm from these
three aspects is necessary. The loss function employed in Mask R–CNN is
3. Model evaluation
a multi-task function shown in Eq. (1), which includes three parts: the
position regression loss of the bounding box Lbox (localization error), the
To evaluate the performance of the proposed Mask R–CNN algorithm
classification loss Lcls (classification error), and the segmentation loss of
in microplastic classification, localization, and segmentation, the U-Net
mask Lmask (segmentation error)calculated by pixel accuracy (He et al.,
algorithm (Ronneberger et al., 2015) was also trained and compared
2017).

4
X.-L. Han et al. Marine Environmental Research 183 (2023) 105829

L = Lbox + Lcls + Lmask (1)


The calculation for the classification loss (Lcls), the regression loss of
the bounding box (Lbox), and the segmentation loss of mask (Lmask) is
listed in Eq. (2) to Eq. (5):
( ) ( )
Lbox ti , ti∗ = L1 smooth ti − ti∗ (2)
{
0.5x2 , if |x| < 1
L1 smooth(x) = (3)
|x| − 0.5, otherwise
( ) ( )
Lcls pi , p∗i = − p∗i log pi − 1 − p∗i log(1 − pi ) (4)

( ) 1 ∑[ ∗ ( ) ( )]
Lmask sij , s∗ij = − 2
sij log skij + 1 − s∗ij log 1 − skij (5)
m ij

Table 1
Image details in the MPs dataset.
Dataset Image Image resolution Image background
Name amount
Fig. 3. Raman spectra of low-density polyethylene (LDPET) MP particle.
Training 2100 512 × 512 white, sand, natural
Validation 600 soil, and water
Test 300
Test- 180 512 × 512, 600 × 400 1504 sand, natural soil, and
complex × 844 and 3008 × 1688 water

Fig. 4. Sample images from the training dataset: (a) fiber, (b) fragment, (c) pellet, (d) rod.

5
X.-L. Han et al. Marine Environmental Research 183 (2023) 105829

Table 2
Parameters summary from research papers with the same algorithm structure.
Author Minimum Image dimension Weight decay Momentum Learning rate Batch size Training epochs GPU amount

This study 512 1 £ 10¡4


0.9 1 £ 10 ¡3
~ 1 £ 10
¡5
4 60 1
4 3
Kim and Cho (2019) 800 1 × 10− 0.9 1 × 10− 1 50 4
5 11
Chen et al. (2020) 648 4 × 10− 0.9 5 × 10− 32 75 1
5 3
Hou et al. (2020) 512 1 × 10− 0.9 2 × 10− 8 160 1
4 3
Nie et al. (2020) 768 1 × 10− 0.9 1 × 10− 1 – 1
3 3
Politikos et al. (2021) 448 1 × 10− – 1 × 10− 4 40 1

where ti is a vector containing the bounding box location and size (x intersection over union (IoU), a measurement based on Jaccard Index,
coordinate, y coordinate, width, and height), while ti* is the vector for was used as a criterion to determine if the localization is correct (Padilla
the ground truth bounding box. m2 is the mask resolution used in the et al., 2020). The calculation process is illustrated in Fig. 6.
algorithm, here is 28 by 28 pixels. k represents the kth class object in the The area of overlap is the intersection of the ground truth bounding
dataset and here k is four due to fragment, pellet, rod, and rod shall be box and the prediction bounding box. The area of union is the total area
classified. sij and sij* are the binary value (0 or 1) in the predicted mask covered by both bounding boxes. Therefore, the IoU value should fall
and ground truth mask, respectively. between zero and one. Furthermore, zero represents prediction results
As for the U-Net, the dice loss was selected to assess the segmentation that have no overlap with ground truths. In contrast, one represents that
of the U-Net (Sudre et al., 2017). The calculation was shown in Eq. (6): prediction results and ground truths reach a 100% overlap. The most
common metric to measure localization accuracy is the average preci­
2 × |Prediction result ∩ Ground truth| + λ
L=1− (Dice coefficient) (6) sion of the bounding box (APbb). However, with different IoU values, the
APbb can vary a lot. In order to evaluate the algorithm performance
|Prediction result| + |Ground truth| + λ
The dice coefficient is a widely used metric in the computer vision comprehensively, three different average precision values are adopted
bb
community to calculate the similarity between two images (Jadon, here: APbb50 , AP75 , and AP . The AP50 indicates the average precision
bb bb

2020). The λ is a minor constant to avoid the numerical issue of dividing when IoU = 50%. This value is always regarded as localization accuracy
by zero. In this case, λ was set as 10− 7. under a loose localization criterion. In comparison, APbb 75 represents the
The loss curves of both Mask R–CNN and U-Net during the training average precision when IoU = 75%, which adopts a stricter localization
and validation process are shown in Fig. 5. The green lines represent the criterion than APbb bb
50 . The AP is the mean value of ten average precision
training and validation loss of Mask R–CNN. The training loss was values where IoU continuously increases from IoU = 50% to IoU = 95%
initially 0.5746 and then dropped sharply to 0.3068 after the first five with a step of 5%.
epochs. The validation loss decreased even faster and reached 0.1507
Overall, the Mask R–CNN achieved APbb 50 = 99.94%, AP75 = 99.63%,
bb
after five epochs. As the training process continued, the training loss
and APbb = 91.36% on the training dataset in this study. The results
finally fluctuated slightly around 0.12 and the validation loss around
were much better than a similar application of Mask R–CNN on marine
0.10. The U-Net loss could be obtained from the blue lines in Fig. 5. The
litter detection (Politikos et al., 2021), where the APbb50 and the AP75 were
bb
training loss started from 0.5129 and decreased to 0.10 in about ten
around 76%. On the other hand, the U-Net could not provide the
epochs. Then the training loss smoothly decreased to 0.07 at the training
localization module to locate targets. Thus the localization performance
end. Compared with the training loss, the validation loss experienced
could not be compared between the U-Net and Mask R–CNN.
several fluctuations during the decreasing process. The validation loss
finally ended up at around 0.05.
3.3. Classification performance
3.2. Localization performance
For Mask R–CNN and U-Net, the precision (P), recall (R) and F1 score
For Mask R–CNN, the algorithm could generate bounding boxes to were used to evaluate the classification performance. These metrics
highlight the location of the targets to be found. The accuracy of the were calculated as follows:
bounding box could be used to evaluate the localization performance. TP
However, before the evaluation results of the proposed algorithm, it is P= (7)
TP + FP
necessary to define what is a correct localization. In this study, the

Fig. 5. Loss curves during the model training process.

6
X.-L. Han et al. Marine Environmental Research 183 (2023) 105829

Fig. 6. Diagram of intersection over union (IoU).

TP 3.5. Computation performance


R= (8)
TP + FN
To check the computational performance of the trained Mask R–CNN
2 × precision × recall model, the test dataset with 250 images, was fed into both networks.
F1 score = (9)
precision + recall With the acceleration of the Tesla P100 GPU, the Mask R–CNN algorithm
completed the detection in 20.0 s, which is about 12.5 frames per second
For example, during the classification of a microplastic fiber, TP
(FPS). Meanwhile, the U-Net completed the task in 14.9 s and achieved
(true positive) denotes the number of fiber pixels that are classified as
an FPS of 16.8. The overall processing speed for the two algorithms was
fiber correctly. FP (false positive) represents the number of fiber pixels
acceptable in this study.
classified as other MP types (fragment, pellet, or rod). FN (false nega­
tive) denotes the pixels of other MP types (fragment, pellet, or rod)
4. Classification, localization, and segmentation results
classified as fiber pixels. As the precisions for all four categories (fiber,
fragment, pellet, and rod) are calculated, these precisions will be aver­
Two cases were analyzed to assess the performance of the Mask
aged to obtain the final precision of the algorithm. Similarly, the final
R–CNN and the U-Net algorithm for automated MPs localization, clas­
recall can be calculated, and the F1 score can be acquired.
sification, and segmentation. The images used in Case 1 came from the
In this study, the Mask R–CNN obtained precision, recall, and F1
“Test” dataset (see Table 1), which contained 250 MPs images with
score of 92.40%, 94.40%, and 93.39%, respectively, based on the
white backgrounds. This was an ideal scenario where the boundaries
training dataset. In comparison, the U-Net reached a precision of
and texture details could be easily observed. The images used in Case 2
93.70%, a recall of 96.32%, and an F1 score of 94.99% using the same
were from the “Test-complex” dataset (see Table 1), where the MPs had
training dataset. The U-Net appeared to have a slightly better classifi­
a more complex background and larger resolution. It was apparent that
cation performance based on the dataset. When comparing these clas­
Case 2 was more complicated and represented more real-world scenarios
sification metrics with other research on the classification of MPs,
where MPs were often present (such as beach areas). These two cases
Lorenzo-Navarro et al. (2021) achieved a precision of 98.17% and a
represented gradually more sophisticated cases for the localization,
recall of 98.11% in MPs classification using a 2-stage modified U-Net
classification, and segmentation of MPs. The algorithm performance of
algorithm.
both Mask R–CNN and U-Net were displayed and compared.
3.4. Segmentation performance
4.1. MPs characterization against white background (Case 1)
Similar to the localization performance, the IoU was also adopted to
determine a correct segmentation for Mask R–CNN. The metrics APm 50 , Case 1 demonstrated the MPs characterization results against a white
m
APm75 , and AP were used to evaluate the segmentation performance of background using Mask R–CNN and U-Net. In total, nine sets of typical
the algorithm. prediction outputs from both algorithms and the original images are
To evaluate the microplastic segmentation performance of U-Net, the listed in Fig. 7 (a)-(i). Among these nine images, Fig. 7 (a)–(d) show MPs
mean intersection over union (MIoU) was selected as an indicator in this with similar particle sizes, whereas Fig. 7 (e)–(i) show those with
study (Garcia-Garcia et al., 2018). varying particle sizes. From visual evaluation, both algorithms achieved
satisfying performance in characterizing MPs. Then, quantitative met­
1 ∑ n
pii
MIoU = (10) rics were calculated to compare the performances of these two algo­
n + 1 i=0 ∑
n ∑n
pij + pji − pii rithms, as shown in Table 3.
For the Mask R–CNN outputs in Case 1, the algorithm had an
j=0 j=0

outstanding performance in localization, classification, and segmenta­


where n is the total classes annotated in the dataset, which is four in this
tion of all four types of MPs. Regarding the localization performance,
study; pii is the number of pixels correctly recognized; pij equals the
bounding boxes were adopted to show the location of the MP particles,
number of pixels that should be category i but have been incorrectly
which can be seen in Fig. 7 (a)–(i). All MP particle locations were suc­
predicted as category j; pij equals the number of pixels in category j but
cessfully highlighted, except for one noticeable flaw in Fig. 7 (h), where
have been mislabeled as category i.
only part of the large pellet was enclosed in the bounding box. In
In the current study, the Mask R–CNN was able to complete the
m addition, it seemed the MP particle size had little influence on the
segmentation of MPs with APm 50 = 99.60%, AP75 = 88.40% and AP =
m
localization performance. Table 3 shows that the algorithm achieved
79.90% based on the training dataset. The MIoU obtained from U-Net on
the training dataset was 90.75%, higher than the MIoU reported by APbb = 92.7%, APbb 50 = 99.30% and AP75 = 99.30%, demonstrating su­
bb

Wegmayr et al. (2020) in an MP fiber segmentation task using the Mask perior precision in locating MPs. The minor errors were not even notable
R–CNN. to naked eye.
Regarding the classification performance, the Mask R–CNN gener­
ated labels with classification confidences on the top of the bounding

7
X.-L. Han et al. Marine Environmental Research 183 (2023) 105829

Fig. 7. Localization, classification, and segmentation result of multiple MPs against a white background (Case 1): (a) fibers of similar sizes; (b) fragments of similar
sizes; (c) pellets of similar sizes; (d) rods of similar sizes; (e) fibers of varying sizes; (f) fragments of varying sizes; (g) pellets of varying sizes; (h) rods of varying sizes;
(i) mixed particles of varying sizes.

boxes for classification performance. All MPs were classified successfully were applied to segment each MP particle. Most masks were placed
with high classification confidences, which were shown in Fig. 7 (a)–(i). precisely over the MPs even though the particle boundaries were hard to
The algorithm achieved over 99% confidence in classification results. predict. The notable errors appeared in Fig. 7 (e), (f), and (h), where the
More specifically, the precision was 93.30% and recall was 95.40%, edge of the fiber particle, the end of the rod, and the upper part of the
which resulted in an F1 score of 94.34%, shown in Table 3. pellet were not precisely segmented. From Table 3, the segmentation
As for the segmentation performance, the masks in various colors performance metrics showed that APm = 82.6%, APm 50 = 98.50% and

8
X.-L. Han et al. Marine Environmental Research 183 (2023) 105829

Fig. 7. (continued).

APm75 = 83.60%. Understandably, as the segmentation criterion precisions were higher than 97%, undoubtedly deemed outstanding.
increased from IoU = 50% to IoU = 75%, the average segmentation For the U-Net outputs in Case 1, the algorithm could also accurately
precision decreased from 98.50% to 83.60%. Still, the metrics were classify and segment MPs. Fiber, fragment, pellet, and rod pixels were
sufficiently superior to segment all MPs against a white background. colored in red, green, yellow, and blue, respectively. In terms of classi­
Furthermore, Fig. 8 illustrates the detailed segmentation performance fication performance, the U-Net could classify most particles with
for each MP category. The fiber particles had the lowest segmentation similar sizes, which can be seen in Fig. 7(b), (c), and Fig. 7(d). Never­
precision (51.51%), which was attributed to the thin fiber and unpre­ theless, when classifying fiber particles of similar sizes, the ends of
dictable shapes (L-shape, U-shape, I-shape, and S-shape). Then, the several fibers were misclassified as rods. The misclassification in Fig. 7
second-lowest was from rod particles, where the segmentation precision (a) can be seen in the blue part. The situation seemed to be worse when
was 86.08%. The errors could also be attributed to the thin shape of the the MP particle sizes increased because more errors in classification
rod particles. As for fragment and pellet particles, the segmentation were found in Fig. 7(e) and (f), and Fig. 7(h). More specifically, in Fig. 7

9
X.-L. Han et al. Marine Environmental Research 183 (2023) 105829

Fig. 7. (continued).

Table 3
Performance evaluation for both Mask R–CNN and U-Net in Case 1.
Localization (%) Classification (%) Segmentation (%)

Mask R–CNN APbb APbb


50 APbb
75
Precision Recall F1 APm APm
50 APm
75
92.7 99.3 99.3 93.30 95.40 94.34 82.60 98.50 83.60
U-Net – Precision Recall F1 MIoU
– 90.50 96.10 93.20 87.30

10
X.-L. Han et al. Marine Environmental Research 183 (2023) 105829

(c), the localization results for MP particles on the water surface showed
that Mask R–CNN could locate most of the particles except one white
fiber particle. Based on the evaluation metrics in Table 4, the localiza­
tion performance of the Mask R–CNN was slightly impaired compared
with the excellent performance in Case 1. More specifically, the APbb
dropped from 82.6% to 67.50%. Besides, both the APbb 50 and AP75
bb

decreased from 99.30% to 84.30%. It was worth mentioning that though


the metrics were impaired, most MP particles could still be located
precisely.
For the classification performance, all MP particles located success­
fully were classified correctly, as shown in Fig. 9(a)–(c). Nevertheless,
based on the classification metrics calculated from 180 images in the
test-complex dataset, the classification accuracy somehow deteriorated.
The precision, recall, and F1 scores were 78.70%, 80.90%, and 79.78%,
respectively, which were about 15% lower than those in Case 1. As for
the segmentation results, most of the MP particles were covered with
Fig. 8. Detailed segmentation performance per microplastic category of Case 1. high-accuracy masks. However, some edges or ends of the MP particles
were not fully segmented. Besides, in Fig. 9(a), the masks looked slightly
(e), the straight parts of one fiber particle were misidentified as rods and wider for those thin rods and fibers. Since the sand background had
were colored blue. Then in Fig. 7(f), parts of a large rod were classified blurred MPs’ boundaries, the Mask R–CNN algorithm had difficulty
into fragments and fiber due to the texture of the rod surface. Moreover, completing the pixel-level segmentation along the edge. Therefore, the
the central part of a large pellet was identified as fragments when un­ segmentation metrics shown in Table 4 were lower than those in Case 1.
even surface texture appeared on the pellet in Fig. 7(h). The U-Net Similarly, when the segmentation criterion increased from IoU = 50% to
generally achieved a precision of 90.50%, a recall of 96.10%, and an F1 IoU = 75%, the average segmentation precision decreased sharply from
score of 93.20%. Compared with the classification results of Mask 84.30% (APm 50 ) to 63.70% (AP75 ). Compared with those in Case 1, the
m
m
R–CNN, the performance of the U-Net was slightly worse. However, the AP values in Case 2 dropped to 59.50%. The more detailed segmen­
differences in classification metrics between these two algorithms were tation performance can be found in Fig. 10.. More specifically, fragments
minor. As for the segmentation, since the U-Net algorithm classifies and still had the highest segmentation precision, which was related to the
segments targets on a pixel level, the segmentation errors were the same relatively regular outline of the fragments. Nevertheless, the segmen­
as the classification ones in Fig. 7. The U-Net gained a segmentation tation performance of fibers was still the lowest, which could be
performance MIoU = 87.30%, shown in Table 3. More specifically, the attributed to the thin and irregular shapes. Furthermore, the segmen­
detailed segmentation performance for each category was listed in tation precisions of pellets and rods were only about 60%.
Fig. 8. It was interesting that the fragment had the highest segmentation The performance of the U-Net in Case 2 was much worse than that in
performance, followed by pellet, rod, and fiber, demonstrating the same Case 1. Moreover, Mask R–CNN demonstrated a dominant advantage
trend as Mask R–CNN. Overall, the results showed that fiber would be over the U-Net in Case 2, not even necessary to refer to Table 4 for
the most challenging category to characterize by the U-Net in the current detailed metrics. In Fig. 9(a)–(c), only limited MP particles were clas­
study. sified correctly, with most background areas like sand, natural soil, and
water mistakenly classified and segmented into MP particles. The
misclassification of background accounted for the low metrics in
4.2. MPs characterization against natural background (Case 2) Table 4, where the precision, recall, and F1 scores were merely 30.60%,
48.50%, and 37.50%, respectively. In addition, the segmentation of
Case 2 demonstrated the MPs characterization results against a these classified particles was not accurate enough and many segmen­
natural background when Mask R–CNN and U-Net algorithms were tations were much larger than the actual sizes. Thus, the segmentation
applied. Three sets of prediction outputs and their original images were metric MIoU was only 10.80%. In Fig. 10, the segmentation accuracy of
displayed in Fig. 9(a)-(c), demonstrating MPs’ characterization on the fiber and fragment particles was less than 10%. In comparison, the ac­
sand, on natural soil, and in water. Fig. 9(a) showed that twenty-nine MP curacy for the pellet and rod was slightly over 20%. For the above­
particles composed of all four categories were present on the surface of mentioned observations, the U-Net algorithm might need some
natural beach sand. In Fig. 9(b), eight MP particles were present on the modification before it can be applied to real-world tasks of MP
surface of natural soil sediment. In Fig. 9(c), nine MP particles floated on characterization.
the water surface in a clear container. Based on the visual evaluation, the
Mask R–CNN had a better performance than the U-Net, since most 5. Discussion
particles from U-Net outputs were neither classified nor segmented. The
detailed evaluation metrics of both Mask R–CNN and U-Net for Case 2 5.1. Error analysis
are summarized in Table 4.
For the Mask R–CNN performance in Case 2, the algorithm generally By classifying, locating, and segmenting MPs against various back­
did well in locating, classifying, and segmenting all four types of MPs. grounds, the proposed Mask R–CNN algorithm has demonstrated an
However, compared with Case 1, several particles could not be located, overall high accuracy performance in both cases. Unlike the apparent
classified, or segmented correctly due to the increased complexity. For performance decay of the U-Net, the Mask R–CNN has shown its
the localization performance, the Mask R–CNN performed well when the outstanding capability to deal with complicated real-world scenarios.
background was beach sand since all MP particles were identified pre­ Furthermore, MP images taken by microscopes could be processed by
cisely in Fig. 9(a). Then, when the MPs appeared on the surface of Mask R–CNN and U-Net, since the microscopic photos usually have a
natural soil sediment, the Mask R–CNN could locate most of them, and transparent background. The performance would be close to Case 1.
the results can be seen in Fig. 9(b). Even though these large soil particles However, Mask R–CNN might be more appealing since it could provide
had similar outlines to fragment or pellet MPs, they did not cause any extra information on MPs numbers via various colors.
trouble in MPs localization by Mask R–CNN. It should be noted that two However, some classification, localization, or segmentation errors
white MP particles on the left top corner could not be located. In Fig. 9

11
X.-L. Han et al. Marine Environmental Research 183 (2023) 105829

Fig. 9. Localization, classification, and segmentation result of multiple MPs against a natural background (Case 2): (a) sand background; (b) natural soil background;
(c) water background.

Table 4
Performance of both Mask R–CNN and U-Net in Case 2.
Metrics Localization (%) Classification (%) Segmentation (%)

Mask R–CNN APbb APbb


50 APbb
75
Precision Recall F1 score APm APm
50 APm
75
67.50 84.30 84.30 78.70 80.90 79.78 59.50 84.30 63.70
U-Net – Precision Recall F1 MIoU
– 30.60 48.50 37.50 10.80

appeared in the current study, which was likely due to (1) the general­ (1) Generalization ability of dataset: As mentioned in the dataset
ization ability of the dataset and (2) bad image quality. These factors preparation and preprocessing section, the current dataset has been
will be analyzed in more detail in the following paragraphs. prepared in white and natural backgrounds (soil, sand, and water).

12
X.-L. Han et al. Marine Environmental Research 183 (2023) 105829

image is likely to be blurred. As mentioned in the methodology section,


the CNN algorithm extracts the shape features of objects from the given
image. If such a blurry image is fed into the algorithm, it will be chal­
lenging to determine the exact location using one bounding box. The
unclear boundaries of the object will also reduce the accuracy of clas­
sification and segmentation. In Fig. 9(b), the white pellet on the top of
the image has a blurred outline, making it unable to be appropriately
located. The transparent water made objects on the bottom, and the
water surface seemed on the same plane.
The second issue is focal length. By adjusting the focal length, the
same target occupies a different portion of the image, as demonstrated in
Fig. 11. When the MP particle is shot at a short focal length (20 mm), the
image gives a broader view. However, each MP particle occupies rela­
tively fewer pixels (563 × 287 pixels). Thus, only limited shape infor­
mation can be used. In contrast, if a long focal length (140 mm) is used,
the MP particles will become more prominent and occupy more pixels
(1558 × 658 pixels) in the image. A detailed shape contour can be easily
Fig. 10. Detailed segmentation performance per microplastic category of obtained. However, the maximum and minimum object detection pixels
Case 2.
will be fixed for a well-trained Mask R–CNN algorithm (He et al., 2017).
In Fig. 7(h), the largest pellet particle was not fully covered by the mask,
Lorenzo-Navarro et al. (2021) mentioned that it would be tough to get which could be attributed to the particle size beyond the algorithm limit.
enough training images if MPs and the annotation process require plenty Thus, a proper focal length should be carefully selected. In addition, the
of time to complete, even for experienced researchers. It is easy to obtain minimum and maximum pixel size settings within the Mask R–CNN
the boundary annotation on a white background since the high contrast should be adjusted based on engineering practice. It is also recom­
between MPs and the white background will help highlight the particle mended to keep the MP particles occupying a proper portion of the
boundaries. Nevertheless, the annotation process of MPs on natural image.
background requires about 4× time than on white background. At the
end of the dataset preparation stage, 70% of the dataset is MPs on white
background. The dataset preparation method adopted in this research 5.2. Implications for practice
could speed up the annotation process and save the model training time.
However, the generalization ability of the dataset seems to be limited Currently, most of the MPs identification (chemical composition)
compared with full complex background training images. Nevertheless, and classification (morphology and color) have to be conducted in a
the Mask R–CNN outputs demonstrate that the performance deteriora­ laboratory (Cowger et al., 2020). It is difficult and time-consuming to
tion is not evident due to the robust feature extraction ability of the deploy large-scale in-situ MPs investigations. Even though large plastic
algorithm. For the U-Net, it would be necessary to include more images chunks can be recognized from satellite images, detecting microplastics
with complex backgrounds to boost the classification and segmentation and nanoplastics with satellite images could also be challenging (Corbari
performance. et al., 2020). The proposed algorithm provides a potential solution for a
Furthermore, no overlapped or clustered MPs are included in the quick field investigation along the coastal area. Attributed to the
current dataset. However, there is a high chance that MPs overlap or are development of commercial drones, 4K resolution images or videos can
clustered in real-world scenarios, as shown in Fig. 2. The outlines of be obtained for research purposes, which offers highly detailed MPs
these MPs would be different from individual particles. More complex images with GPS coordinates. Besides, these drones can survey a 3–5 km
images with overlapping or touching MP particles should be added to coastal line within half an hour, indicating a multi-round MPs survey
the existing microplastic dataset to solve this issue. In addition, during can be completed within a day. Once the images and videos are trans­
the annotation stage of the dataset preparation, the outline of MPs in an mitted to a laptop nearby, the quantification analysis of the MPs can be
overlapping or clustered microplastic image should be marked along the executed automatically. Compared with human inspection or satellite
boundary clearly, and no particles should be ignored. images, the site survey using a drone is more affordable and
(2) Bad image quality: image quality involves two main issues. The time-efficient. Furthermore, the algorithm could be deployed on local
first one is about the focus. When the target is not focused well or out of workstations or cloud computing platforms. The accessibility of the al­
focus, the object’s boundaries cannot be captured precisely. Thus, the gorithm could make it a promising solution for analyzing MPs with

Fig. 11. The same MP particle image shot under different focal lengths.

13
X.-L. Han et al. Marine Environmental Research 183 (2023) 105829

images. Department of Transportation (No. 2020-4R–SUPP). We would also like


to thank the Hawaiian local nonprofit organization “Sustainable Coast­
6. Conclusions lines Hawaii” for offering suggestions for microplastic sampling and
organizing the beach cleanup campaigns on the Hawaiian Islands.
In the current study, a deep learning based approach, Mask R–CNN,
was modified and implemented to achieve the automated localization, References
classification, and segmentation of large marine microplastic particles
based on their morphologies. The performance of the Mask R–CNN has Arthur, C., Baker, J.E., Bamford, H.A., 2009. Effects, and fate of microplastic marine
debris. In: Proceedings of the International Research Workshop on the Occurrence,
been compared with that of the U-Net to achieve a comprehensive September 9-11, 2008. University of Washington Tacoma, Tacoma, WA, USA.
evaluation. The following conclusions could be obtained from the cur­ Barboza, L.G.A., Vethaak, A.D., Lavorante, B.R., Lundebye, A.K., Guilhermino, L., 2018.
rent study: Marine microplastic debris: an emerging issue for food security, food safety and
human health. Mar. Pollut. Bull. 133, 336–348.
Bertoldi, C., Lara, L.Z., Gomes, A.A., Fernandes, A.N., 2021. Microplastic abundance
(1) The Mask R–CNN algorithm can be trained and validated with a quantification via a computer-vision-based chemometrics-assisted approach.
dataset that includes both MPs on white and real-world back­ Microchem. J. 160, 105690.
Bianco, V., Memmolo, P., Carcagnì, P., Merola, F., Paturzo, M., Distante, C., Ferraro, P.,
grounds, simplifying the dataset preparation and annotation 2020. Microplastic identification via holographic imaging and machine learning.
process. After the training process, the algorithm can be imple­ Adv. Intell. Syst 2 (2).
mented on white and complex backgrounds (sand, natural soil, Cha, Y.J., Choi, W., Büyüköztürk, O., Cha, Y.J., Choi, W., Büyüköztürk, O., 2017. Deep
learning-based crack damage detection using convolutional neural networks.
and water). The evaluation metrics show a competitive result
Comput. Aided Civ. Infrastruct. Eng. 32 (5), 361–378, 2017.
compared to similar MPs characterization applications with the Chen, J., Zhang, D., Huang, H., Shadabfar, M., Zhou, M., 2020. Image-based
U-Net algorithm. segmentation and quantification of weak interlayers in rock tunnel face via deep
(2) When characterizing MPs against the white background, the learning Automation in Construction Image-based segmentation and quantification
of weak interlayers in rock tunnel face via deep learning. Autom. ConStruct. 120
Mask R–CNN and U-Net could achieve high accuracy in (December), 103371.
describing MPs particles. The Mask R–CNN could provide more Choi, J., Toumanidis, L., Yeum, C.M., Charalampos, P., Lenjani, A., Liu, X., Kasnesis, P.,
outstanding outputs than U-Net results by segmenting each MP Ortiz, R., Jiang, N.J., Dyke, S.J., 2022. Automated graffiti detection: a novel
approach to maintaining historical architecture in communities. Appl. Sci. 12 (6),
particle. More specifically, the Mask R–CNN can achieve APbb = 2983.
91.36%, APm = 79.90%, Precision = 92.40%, Recall = 94.40% Corbari, L., Maltese, A., Capodici, F., Mangano, M.C., Sar, G., Ciraolo, G., 2020. Indoor
and F1 score = 93.39%, while the U-Net has Precision = 90.50%, spectroradiometric characterization of plastic litters commonly polluting the
Mediterranean Sea: toward the application of multispectral imagery. Sci. Rep. 10,
Recall = 96.10% and F1 score = 93.20%. 19850.
(3) When characterizing MPs against complex backgrounds, the Cowger, W., Gray, A., Christiansen, S.H., DeFrond, H., Deshpande, A.D.,
Mask R–CNN can still maintain high accuracy, while the U-Net Hemabessiere, L., et al., 2020. Critical review of processing and classification
techniques for images and spectra in microplastic research. Appl. Spectrosc. 74 (9),
performance deteriorates significantly. The Mask R–CNN shows 989–1010.
better adaptability since the APbb and APm only drops to 67.50% Dutta, A., Zisserman, A., 2019. The VIA annotation software for images, audio and video.
and 59.50%, respectively. Furthermore, the Precision, Recall, and In: Proceedings of the 27th ACM International Conference on Multimedia,
pp. 2276–2279.
F1 score of the Mask R–CNN are 78.70%, 80.90%, and F1 score =
Fukushima, K., Miyake, S., 1982. Neocognitron: a self-organizing neural network model
79.78%. In comparison, the U-Net can only maintain a Precision for a mechanism of visual pattern recognition. In: Competition and Cooperation in
of 30.60, a Recall of 48.50, and an F1 score of 37.50%. Neural Nets. Springer, Berlin, Heidelberg, pp. 267–285.
(4) The Mask R–CNN has demonstrated a satisfying performance in Gago, J., Galgani, F., Maes, T., Thompson, R.C., 2016. Microplastics in seawater:
recommendations from the marine strategy framework directive implementation
current scenarios and showed its potential for large-scale onsite process. Front. Mar. Sci. 3, 219.
MPs survey. However, only four types of MPs are characterized in Garcia-Garcia, A., Orts-Escolano, S., Oprea, S., Villena-Martinez, V., Martinez-
this study, and in reality there are more types of MPs in the real Gonzalez, P., Garcia-Rodriguez, J., 2018. A survey on deep learning techniques for
image and video semantic segmentation. Appl. Soft Comput. 70, 41–65.
world. Therefore, it is necessary to enlarge the current dataset Gauci, A., Deidun, A., Montebello, J., Abela, J., Galgani, F., 2019. Automating the
and improve the algorithm’s performance. characterisation of beach microplastics through the application of image analyses.
Ocean Coast Manag. 182, 104950.
Girshick, R., 2015. Fast r-cnn. In: Proceedings of the IEEE International Conference on
CRediT authorship contribution statement Computer Vision, pp. 1440–1448.
Han, X.L., Jiang, N.J., Yang, Y.F., Choi, J., Singh, D.N., Beta, P., Du, Y.J., Wang, Y.J.,
Xiao-Le Han: Methodology, Software, Writing – original draft. Ning- 2022. Deep learning based approach for the instance segmentation of clayey soil
desiccation cracks. Comput. Geotech. 146, 104733.
Jun Jiang: Conceptualization, Writing – review & editing. Toshiro Hata: Hanvey, J.S., Lewis, P.J., Lavers, J.L., Crosbie, N.D., Pozo, K., Clarke, B.O., 2017.
Conceptualization, Writing – review & editing. Jongseong Choi: Writing A review of analytical techniques for quantifying microplastics in sediments. Anal.
– review & editing. Yan-Jun Du: Conceptualization. Yi-Jie Wang: Methods 9, 1369–1383.
Hart, Peter E., Stork, David G., Duda, Richard O., 2000. Pattern Classification. Wiley,
Methodology, Data preparation.
Hoboken.
Hartmann, N.B., Hüffer, T., Thompson, R.C., Hassellöv, M., Verschoor, A., Daugaard, A.
Declaration of competing interest E., Rist, S., Karlsson, T., Brennholt, N., Cole, M., Herrling, M.P., Hess, M.C., Ivleva, N.
P., Lusher, A.L., Wagner, M., 2019. Are we speaking the same language?
Recommendations for a definition and categorization framework for plastic debris.
The authors declare that they have no known competing financial Environ. Sci. Technol. 53, 1039–1047.
interests or personal relationships that could have appeared to influence Hata, T., Jiang, N., 2021. Proposal for an initial screening method for identifying
the work reported in this paper. microplastics in marine sediments. Sci. Rep. 11 (1), 1–9.
He, K., Gkioxari, G., Dollár, P., Girshick, R., 2017. Mask r-cnn. In: Proceedings of the
IEEE International Conference on Computer Vision, pp. 2961–2969.
Data availability Hidalgo-Ruz, Valeria, Gutow, Lars, Thompson, Richard C., Thiel, Martin, 2012.
Microplastics in the Marine Environment: A Review of the Methods Used for
Identification and Quantification. Environmental Science & Technology. https://doi.
Data will be made available on request. org/10.1021/es2031505.
Hou, S., Dong, B., Wang, H., Wu, G., 2020. Automation in Construction Inspection of
Acknowledgments surface defects on stay cables using a robot and transfer learning. Autom. ConStruct.
119 (October 2019), 103382.
Ivar do, Sul JA, Costa, M.F., 2014. The present and future of microplastic pollution in the
This study was financially supported by the National Natural Science marine environment. Environ. Pollut. 185, 352–364.
Foundation of China (No. 42007246), the Fundamental Research Funds
for the Central Universities (2242022k30055), and the Hawaii

14
X.-L. Han et al. Marine Environmental Research 183 (2023) 105829

Jadon, Shruti, 2020. A survey of loss functions for semantic segmentation. 2020 IEEE Padilla, R., Netto, S.L., da Silva, E.A., 2020. A survey on performance metrics for object-
Conference on Computational Intelligence in Bioinformatics and Computational detection algorithms. In: 2020 International Conference on Systems, Signals and
Biology (CIBCB). https://doi.org/10.1109/CIBCB48159.2020.9277638. Image Processing (IWSSIP). IEEE, pp. 237–242.
Johnson, J.W., 2018. Adapting Mask-Rcnn for Automatic Nucleus Segmentation arXiv Pan, S.J., Yang, Q., 2010. A survey on transfer learning. IEEE Trans. Knowl. Data Eng. 22
preprint arXiv:1805.00500. (10), 1345–1359.
Kedzierski, M., Falcou-Préfol, M., Kerros, M.E., Henry, M., Pedrotti, M.L., Bruzaud, S., Politikos, D.V., Fakiris, E., Davvetas, A., Klampanos, I.A., Papatheodorou, G., 2021a.
2019. A machine learning algorithm for high throughput identification of FTIR Automatic detection of seafloor marine litter using towed camera images and deep
spectra: application on microplastics collected in the Mediterranean Sea. learning. Mar. Pollut. Bull. 164, 111974.
Chemosphere 234, 242–251. Ren, S., He, K., Girshick, R., Sun, J., 2015. Faster R-Cnn: towards Real-Time Object
Kim, B., Cho, S., 2019. Image-based concrete crack assessment using mask and region- Detection with Region Proposal Networks arXiv preprint arXiv:1506.01497.
based convolutional neural network. Struct. Control Health Monit. 26 (8), 1–15. Ronneberger, Olaf, Fischer, Philipp, Brox, Thomas, 2015. U-Net: Convolutional Networks
https://doi.org/10.1002/stc.2381. for Biomedical Image Segmentation. International Conference on Medical image
Kooi, M., Koelmans, A.A., 2019. Simplifying microplastic via continuous probability computing and computer-assisted intervention.
distributions for size, shape, and density. Environ. Sci. Technol. Lett. 6, 551–557. Russell, S., Norvig, P., 1995. Artificial Intelligence: A Modern Approach. prentice-hall,
LeCun, Y., Bengio, Y., Hinton, G., 2015. Deep learning. Nature 521 (7553), 436–444. Englewood cliffs.
LeCun, Y., Boser, B., Denker, J.S., Henderson, D., Howard, R.E., Hubbard, W., Jackel, L. Serranti, S., Palmieri, R., Bonifazi, G., Cózar, A., 2018. Characterization of microplastic
D., 1989. Backpropagation applied to handwritten zip code recognition. Neural litter from oceans by an innovative approach based on hyperspectral imaging. Waste
Comput. 1 (4), 541–551. Manage. (Tucson, Ariz.) 76, 117–125.
Lee, Ka Shing, Chen, Hui Ling, Ng, Yong Sin, et al., 2022. U-Net skip-connection Sharma, S., Chatterjee, S., 2017. Microplastic pollution, a threat to marine ecosystem and
architectures for the automated counting of microplastics. Neural Comput. Appl. human health: a short review. Environ. Sci. Pollut. Res. 24 (27), 21530–21547.
https://doi.org/10.1007/s00521-021-06876-w. Shim, W.J., Hong, S.H., Eo, S.E., 2017. Identification methods in microplastic analysis: a
Lenz, R., Enders, K., Stedmon, C.A., Mackenzie, D.M., Nielsen, T.G., 2015. A critical review. Anal. Methods 9 (9), 1384–1391.
assessment of visual identification of marine microplastic using Raman spectroscopy Sudre, Carole H, Li, Wenqi, Vercauteren, Tom, Ourselin, Sebastien, Jorge Cardoso, M.,
for analysis improvement. Mar. Pollut. Bull. 100 (1), 82–91. 2017. Generalised Dice Overlap as a Deep Learning Loss Function for Highly
Lorenzo-Navarro, J., Castrillón-Santana, M., Sánchez-Nielsen, E., Zarco, B., Herrera, A., Unbalanced Segmentations. Deep Learn Med Image Anal Multimodal Learn Clin
Martínez, I., Gómez, M., 2021. Deep learning approach for automatic microplastics Decis Support. https://doi.org/10.1007/978-3-319-67558-9_28.
counting and classification. Sci. Total Environ. 765, 142728. Wang, T., Zou, X., Li, B., Yao, Y., Zang, Z., Li, Y., Yu, W., Wang, W., 2019. Preliminary
Lv, L., Qu, J., Yu, Z., Chen, D., Zhou, C., Hong, P., Sun, S., Li, C., 2019. A simple method study of the source apportionment and diversity of microplastics: taking floating
for detecting and quantifying microplastics utilizing fluorescent dyes-Safranine T, microplastics in the South China Sea as an example. Environ. Pollut. 245, 965–974.
fluorescein isophosphate, Nile red based on thermal expansion and contraction Wegmayr, V., Sahin, A., Saemundsson, B., Buhmann, J., 2020. Instance segmentation for
property. Environ. Pollut. 255, 113283. the quantification of microplastic fiber images. In: Proceedings of the IEEE/CVF
Mason, S.A., Garneau, D., Sutton, R., Chu, Y., Ehmann, K., Barnes, J., Fink, P., Winter Conference on Applications of Computer Vision, pp. 2210–2217.
Papazissimos, D., Rogers, D.L., 2016. Microplastic pollution is widely detected in US Wu, Z., Shen, C., Van Den Hengel, A., 2019. Wider or deeper: revisiting the resnet model
municipal wastewater treatment plant effluent. Environ. Pollut. 218, 1045–1054. for visual recognition. Pattern Recogn. 90, 119–133.
Meyers, N., Catarino, A.I., Declercq, A.M., Brenan, A., Devriese, L., Vandegehuchte, M., Xu, B., Wang, W., Falzon, G., Kwan, P., Guo, L., Chen, G., Tait, A., Schneider, D., 2020.
De Witte, B., Janssen, C., Everaert, G., 2022. Microplastic detection and Automated cattle counting using Mask R-CNN in quadcopter vision system. Comput.
identification by Nile red staining: towards a semi-automated, cost- and time- Electron. Agric. 171, 105300.
effective technique. Sci. Total Environ. 823, 153441. Yu, Y., Zhang, K., Yang, L., Zhang, D., 2019. Fruit detection for strawberry harvesting
Mukhanov, V.S., Litvinyuk, D.A., Sakhon, E.G., Bagaev, A.V., Veerasingam, S., robot in non-structural environment based on Mask-RCNN. Comput. Electron. Agric.
Venkatachalapathy, R., 2019. A new method for analyzing microplastic particle size 163, 104846.
distribution in marine environmental samples. Ecol. Montenegrina. 23, 77–86. Zhang, W., Zhang, S., Wang, J., Wang, Y., Mu, J., Wang, P., Lin, X., Ma, D., 2017.
Neubeck, A., Van Gool, L., 2006. Efficient non-maximum suppression. In: 18th Microplastic pollution in the surface waters of the Bohai Sea, China. Environ. Pollut.
International Conference on Pattern Recognition (ICPR’06), vol. 3. IEEE, 231, 541–548.
pp. 850–855. Zhao, S., Zhang, D.M., Huang, H.W., 2020. Deep learning–based image instance
Nie, X., Duan, M., Ding, H., Hu, B., Wong, E.K., 2020. Attention Mask R-CNN for ship segmentation for moisture marks of shield tunnel lining. Tunn. Undergr. Space
detection and segmentation from remote sensing images. IEEE Access 8, 9325–9334. Technol. 95, 103156.
Nobre, C.R., Santana, M.F.M., Maluf, A., Cortez, F.S., Cesar, A., Pereira, C.D.S., Turra, A., Zhu, Y., Yeung, C.H., Lam, E.Y., 2021a. Digital holographic imaging and classification of
2015. Assessment of microplastic toxicity to embryonic development of the sea microplastics using deep transfer learning. Appl. Opt. 60 (4), A38–A47.
urchin Lytechinus variegatus (Echinodermata: Echinoidea). Mar. Pollut. Bull. 92 Zhu, Y., Yeung, C.H., Lam, E.Y., 2021b. Microplastic pollution monitoring with
(1–2), 99–104. holographic classification and deep learning. J. Phys. Photon. 3 (2), 024013.

15

You might also like