Han Et Al. - 2023 - Deep Learning Based Approach For Automated Charact
Han Et Al. - 2023 - Deep Learning Based Approach For Automated Charact
A R T I C L E I N F O A B S T R A C T
Keywords: The rapidly growing concern of marine microplastic pollution has drawn attentions globally. Microplastic par
Deep learning ticles are normally subjected to visual characterization prior to more sophisticated chemical analyses. However,
Microplastics the misidentification rate of current visual inspection approaches remains high. This study proposed a state-of-
Mask R–CNN
the-art deep learning-based approach, Mask R–CNN, to locate, classify, and segment large marine microplastic
U-Net
Instance segmentation
particles with various shapes (fiber, fragment, pellet, and rod). A microplastic dataset including 3000 images was
established to train and validate this Mask R–CNN algorithm, which was backboned by a Resnet 101 architecture
and could be tuned in less than 8 h. The fully trained Mask R–CNN algorithm was compared with U-Net in
characterizing microplastics against various backgrounds. The results showed that the algorithm could achieve
Precision = 93.30%, Recall = 95.40%, F1 score = 94.34%, APbb (Average precision of bounding box) = 92.7%,
and APm (Average precision of mask) = 82.6% in a 250 images test dataset. The algorithm could also achieve a
processing speed of 12.5 FPS. The results obtained in this study implied that the Mask R–CNN algorithm is a
promising microplastic characterization method that can be potentially used in the future for large-scale surveys.
1. Introduction (Barboza et al., 2018). Besides, due to their small size, these plastic
pieces could be easily swallowed by various marine organisms and
The United Nations Environment Programme has reported that transferred into the food chain, ultimately ending in human beings
plastic pollution has been listed as one of the top environmental issues (Sharma and Chatterjee, 2017). Therefore, the microplastic pollution
by Mason et al. (2016). Due to the limited disposal capability, most issues have attracted an intensive spotlight from experts in the marine
waste plastic is accumulated by landfilling. During this period, the environment, marine biology, and coastal ecology.
plastic chunk will degrade or fragment into small pieces through me In efforts to maintain a safe marine environment, European Union
chanical, thermal oxidation, hydrolysis, and biological processes. These (EU) passed the Marine Strategy Framework Directive (MSFD) and re
various processes make small plastic particles have unique shapes quires all member nations to monitor and classify MPs based on their
(Bertoldi et al., 2021). When their sizes are less than 5 mm, the plastic physical characteristics, including color, brightness, size, morphology,
particles are defined as microplastics (MPs) (Arthur et al., 2009). So far, and polymer type (Gago et al., 2016; Gauci et al., 2019). Since these
MPs have been found in terrestrial and aquatic ecosystems, from the characteristics are closely related to their origin, degradation rate,
inland water system to the soil profile, from the ocean surface to the transportation processes, impacts on the environment, and destinations
sediment, and from pole to pole (Ivar do and Costa, 2014). Once the MPs (Hidalgo-Ruz et al., 2012; Zhang et al., 2017; Kooi and Koelmans, 2019),
enter the oceans, they can be transported a long distance by ocean it is essential to characterize and classify MPs (Wang et al., 2019).
currents and wind. Moreover, MPs can remain in the water for decades, Currently, the image capture methods for MPs with relatively large sizes
which makes them perfect carriers of toxic hydrophobic compounds (up to 5 mm) mainly include naked eye with a digital camera (1 mm–5
absorbed from the ocean (Nobre et al., 2015) and pathogenic organisms mm) (Lorenzo-Navarro et al., 2021), stereo microscope (0.3–5 mm)
* Corresponding author.
E-mail address: [email protected] (N.-J. Jiang).
https://doi.org/10.1016/j.marenvres.2022.105829
Received 13 June 2022; Received in revised form 5 October 2022; Accepted 15 November 2022
Available online 18 November 2022
0141-1136/© 2022 Elsevier Ltd. All rights reserved.
X.-L. Han et al. Marine Environmental Research 183 (2023) 105829
(Hanvey et al., 2017), and digital holographic imaging (0.1–5 mm) (Zhu Girshick (2015) developed the Fast R–CNN by changing multi-stage
et al., 2021a). For small MPs at a micrometer level, Raman training into multi-task training. Followed by the Fast R–CNN algo
micro-spectroscopy (10–300 μm) (Lenz et al., 2015), fluorescent dyes rithm, Ren et al. (2015) developed the Faster R–CNN to improve the
(20–100 μm) (Lv et al., 2019), and staining (Hata and Jiang., 2021) have algorithm efficiency. A lightweight Region Proposal Network (RPN) was
been utilized. Images captured with relatively fewer MPs are typically adopted to replace the selective search function inherited from the
processed using naked eyes to obtain information about morphology. R–CNN algorithm (Choi et al., 2022). The advancement of the CNN al
Then MPs will be classified, analyzed, and recorded. (Shim et al., 2017). gorithms has embraced the new object detection algorithm with more
Benefiting from the rapid development of computer and information functions. He et al. (2017) proposed the mask region conventional
technologies, optical microscopy, scanning electron microscopy, fluo neural network (Mask R–CNN) based on Faster R–CNN by attaching a
rescence microscopy, and spectral analysis from Fourier transform binary mask to the detected object, which could help highlight each
infrared (FT-IR) spectroscopy, Raman spectroscopy, pyrolysis target from the background with masks in varying colors. Mask R–CNN
gas-chromatography mass-spectrometry, and energy dispersive X-ray could achieve instance segmentation, in which every single object
spectroscopy are increasingly deployed to analyze and interpret MPs within the same category could be recognized and distinguished.
data (Cowger et al., 2020). Mukhanov et al. (2019) used ImageJ to Therefore, Mask R–CNN could simultaneously achieve a threefold
convert RGB images into binary images and then classify the micro function including object classification, object detection, and segmen
plastic into four categories: rounded, irregular, elongated, and fiber, tation. The state-of-the-art pixel-level instance segmentation of Mask
based on their morphometry parameters. In addition, with the help of a R–CNN was soon applied in medical science, industrial robots, animal
hyperspectral image capture system employing infrared spectrometry husbandry, structure health monitoring, etc. (Johnson, 2018; Yu et al.,
and the Raman technique, Serranti et al. (2018) applied Multivariate 2019; Zhao et al., 2020; Xu et al., 2020). For instance, Johnson (2018)
Image Analysis and partial least squares discriminant analysis to identify demonstrated that Mask R–CNN could be used for cell nuclei segmen
the plastic material and shape. However, the high cost and limited tation under microscopic images. Zhao et al. (2020) developed a tunnel
accessibility of such expensive chemometric tools prohibit broader ap image capture system using the Mask R–CNN algorithm to detect
plications. Gauci et al. (2019) developed an algorithm using the moisture marks in shield tunnel lining. However, regarding the char
least-squares method based on the MATLAB platform to conduct acterization of MPs, only a few studies using CNN-based deep learning
dimension measurements and surface roughness evaluation of MPs methods have been reported. Lorenzo-Navarro et al. (2021) developed a
collected on the island of Malta in the Central Mediterranean. The color deep learning method by combining U-Net and VGG16 neural networks,
of the microplastic was recorded by calculating the Mean Square Error counting and classifying common MPs in laboratory conditions. How
(MSE) of value in Red (R), Green (G), and Blue (B) channels. Although ever, detection results were obtained from a pure white background and
these image processing methods can independently complete required good illumination conditions. No MPs embedded in the complex back
tasks like border detection and area calculation, these methods still rely ground were tested in the study. Wegmayr et al. (2020) compared the
on pre-designed algorithms and are regulated by human insight. instance segmentation accuracy of microplastic fiber microscopic im
Therefore, it is necessary to develop new methods for the MPs detection ages by applying Mask R–CNN and Deep Pixel Embeddings (DPE). The
and classification processes with sufficient accuracy and generalization study showed that in complex tangled cases, DPE showed better per
ability. formance. However, their work only focused on fiber MPs detection and
In the 1950s, with the development of recognition in the human segmentation performance. No MPs classification function was built in
learning process, the definition of artificial intelligence (AI) was first the algorithm. Zhu et al. (2021b) developed an HC-CNN algorithm to
initiated (Russell and Norvig., 1995). AI involved creating machines that classify microplastic using a hologram image dataset. The lightweight
could solve problems requiring human intelligence. The machine algorithm achieved high accuracy and efficiency. However, the algo
learning method partially solved the task, which included giving data, rithm could not offer more classification information. So far, no other
generating models, and making decisions. Within this stage, a series of studies have been reported about using the Mask R–CNN algorithm to
famous algorithms such as linear regression, support vector machine characterize MPs.
(SVM), K-nearest neighbors (KNN), Naive Bayes, etc (Hart et al., 2000). In this study, a Mask R–CNN based deep learning model was devel
Kedzierski et al. (2019) adopted the KNN algorithm to detect the oped and used for MPs (1–5 mm diameter) classification, localization,
chemical nature of the MPs based on the FTIR- ATR spectra. The auto and segmentation. A deep learning dataset of MPs was constructed and
mated classification method could improve the efficiency of inspecting utilized to train and validate the Mask R–CNN model using only optical
these spectra. Bianco et al. (2020) applied SVM to differentiate diatom cameras and available image processing software. The developed
and MPs based on 3D coherent images using the holographic imaging model’s classification, localization, segmentation, and computational
technique. Meyers et al. (2022) adopted Nile red staining method and performances were evaluated. Then, the validity of the trained model
decision tree algorithms to classify seven different MPs based on their was tested for the classification, localization, and segmentation of real
materials. As the need for more powerful algorithms proliferates, deep MPs with different morphologies and at various scales. Meanwhile, the
learning algorithms begin to stand out in the machine learning field. MPs were also presented against white and real-world backgrounds to
Inspired by vision mechanisms of the visual cortex in human eyes, a demonstrate the Mask R–CNN model’s validity.
neural network focused on feature extraction was developed and defined
as the convolutional neural network (CNN or CovNet) (Fukushima and 2. Methodologies
Miyake, 1982; LeCun et al., 1989). After development in the past de
cades, CNN has become one of the most widely used architectures in 2.1. Mask R–CNN architecture
deep learning. A CNN module normally consists of a series of convolu
tional and pooling layers, which can effectively capture the grid-like The Mask R–CNN model adopted in this study is backboned by the
topology features of images while requiring less computational effort. Resnet 101 network (Wu et al., 2019), a 101-layer residual neural
Compared with traditional neural networks, CNNs offer more flexibility network. Resnet 101 is divided into five stages and serves as the con
to users with different accuracy demands since they can be built by volutional layer to extract features directly from images of MPs. Each
combining CNN modules (Cha et al., 2017). CNNs could consist of more stage contains convolution and an identity block. Furthermore, each
than 100 layers to build a deeper network. So far, CNNs have brought identity block consists of several convolution layers.
about breakthroughs in the processing of image, video, speech, and Using the Pytorch platform, the training of this model is initiated via
audio by improving the accuracy and efficiency to a new level, which transfer learning, which loads a pre-trained model weight based on the
was not well addressed in traditional AI studies (LeCun et al., 2015). COCO dataset (0.3 million images and 80 categories) (Ren et al., 2015).
2
X.-L. Han et al. Marine Environmental Research 183 (2023) 105829
Transfer learning could save training time since the pre-trained weights humanly inspected to confirm they were microplastics. Those suspects
can be directly utilized to initialize the training process. Therefore, it has had been tested with Raman spectroscopy to eliminate the plant and
been widely adopted in deep learning, especially in cases where the marine creature residues. The Raman spectroscopy of a low-density
number of images for training is limited (Pan and Yang, 2010). polyethylene (LDPET) MP particle is shown in Fig. 3.
The architecture of the Mask R–CNN model is shown in Fig. 1. Firstly, Each time at least 20 individual MP particles were placed on a white
feature maps are generated after the backbone Resnet 101 CNN pro background within the mini-studio. Once the image was taken, these
cesses the input raw images. Then, the feature maps are fed into light MPs were discarded and new MPs particles were placed in the mini-
weight region proposal networks (RPNs) to detect candidate objects studio until 100 laboratory captured images were obtained. All 500
with a sliding window and to generate regions of interest (RoIs) with raw images from onsite and laboratory capturing were cut into small
bounding boxes (anchors). The actual sizes of these boxes are deter patches with a uniform resolution of 512 × 512 via a Python program,
mined by the anchor scales and ratios, which are essential hyper and 1500 patches with intact MPs were finally selected (Han et al.,
parameters in the model tuning process. The RoIs are further classified 2022). Previous studies showed that crucial features of MPs could be
as foreground or background with scores, and NMS (no maximum sup kept and the training efficiency was superior at this resolution (Yu et al.,
pression) is applied to keep only high-score ones (Neubeck and Van 2019). According to the classification methods mentioned by Hartmann
Gool, 2006). Then, the RoI Align network is utilized to adjust the et al. (2019), Mukhanov et al. (2019), and Lorenzo-Navarro et al.
dimension of the RoIs generated by RPN and produce a fixed-size feature (2021), the MPs were classified as fiber, fragment, pellet, and rod based
map. In the meantime, followed by a three-branch-paralleled prediction on their morphological characteristics as shown in Fig. 4. Among these
network, the functions of classification, object localization, and instance 1500 patches (512 × 512 resolution), there were 386 fibers, 1015
segmentation are achieved. The Fully Connected (FC) layers will pass fragments, 844 pellets, and 322 rods. 1500 patches were duplicated by
the feature map to a normalized exponential function (SoftMax) and a randomly choosing one of four common data augmentation methods
bounding box regression, giving the classification and object detection (left-right flipping, up-down flipping, rotation 90◦ , and scaling) (Han
results, respectively. The SoftMax function is widely applied in machine et al., 2022). Thus, the final MP dataset possessed 3000 images of 512 ×
learning classification problems, which gives the probability of target 512 resolution.
objects belonging to a specific category. A Fully Convolution Network Before the dataset was randomly divided into three subsets: training,
(FCN) is utilized to generate a binary mask, which could highlight the validation, and test, the MPs dataset was annotated using the VGG Image
detected objects. The generated masks can be further used to determine Annotator (Dutta and Zisserman, 2019). Besides, a unique dataset
the dimensions of objects, which is out of the scope of the current study. named test-complex was prepared to test the algorithm performance on
images with different scales and backgrounds. The images in the
2.2. Dataset preparation and preprocessing test-complex dataset had never been used for training and validation.
The details for each subset are displayed in Table 1. These resolutions
With the influence of both wind and ocean currents, the Oahu selected in the test-complex dataset were similar to the general image
island’s windward side (east coast) has suffered from the MPs pollution size obtained from digital cameras, mobile phones, and social media
on the beach areas. In total, 400 onsite MPs images (JPG format with a platforms. The test-complex dataset included images captured under
resolution of 3008 × 1688 pixels) were taken by a Sony α7 Mark II complex backgrounds and illumination conditions to simulate
digital camera from five different beaches on Oahu Island, which can be real-world soil and an aqueous environment where microplastic could
observed in Fig. 2. These images were taken under daylight and natural be spotted.
background (sand, grass, and water on the beaches in Fig. 2).
Then MPs were sieved through US No.4 (4.76 mm) and No.20 (0.85 2.3. Mask R–CNN model training and validation
mm) sieves, and MPs left on the No.20 sieve were sampled for laboratory
image capturing. The laboratory MP image capturing was carried out The Mask R–CNN model was trained and tuned on Google Colabor
within a LED illuminated mini photo studio using the same digital atory (Colab) with a Tesla P100 GPU (16 GB graphic memory). Syn
camera. The LED lights have a color temperature of 6500K, the same as chronous stochastic gradient descent (SGD) was used to train the model
daylight. Before image capturing, the MPs samples were washed, dis and the weight decay and momentum were set as 0.0001 and 0.9,
infected, and oven-dried at a low temperature. These samples had been respectively. The batch size of the training dataset was set as 4. In order
3
X.-L. Han et al. Marine Environmental Research 183 (2023) 105829
Fig. 2. Microplastics sampling locations: (a) Kahuku beach; (b) Kahana bay beach; (c) Kailua beach; (d) Waimanalo beach; (d) Sandy beach.
to ensure the validity of the deep learning training parameters, the with the Mask R–CNN. The U-Net was designed based on the Fully
parameter selection refers to Table 2, which summarizes recent research Convolutional Network (FCN) as a famous semantic segmentation al
adopting the same algorithm structure (Hou et al., 2020; Chen et al., gorithm. The U-Net classification and segmentation function has proven
2020; Kim and Cho, 2019; Nie et al., 2020; Politikos et al., 2021). very effective in microplastic segmentation (Lorenzo-Navarro et al.,
All the backbone CNN layers were frozen for the first 20 epochs. Only 2021; Lee et al., 2022).
the network head, which contained the classifier, bounding box gener
ator, and mask generator was trained with the pre-trained COCO dataset
3.1. Loss function
weights using an initial learning rate of 0.001. For the following 20
epochs, the learning rate was reduced by ten times and the first three
The loss function was a direct metric to measure the deviation be
stages of the Resnet 101 were activated to continue the training process.
tween the prediction results and the ground truths labeled by the dataset
Finally, the learning rate was decreased by ten times again, and all five
maker. Properly selecting the loss function can benefit the training
stages of the Resnet 101 were activated to train the model for another 20
process by updating the model weights effectively, contributing to the
epochs. All hyperparameters (learning rate, weight decay, scale of an
final model performance. Therefore, the target of deep learning algo
chor, anchor ratio, and NMS ratio) were fine-tuned to achieve higher
rithm training was to decrease the deviation as possible.
accuracy and faster processing speed. The whole training process took
Since the Mask R–CNN algorithm could achieve localization, classi
less than 8 h to complete.
fication, and segmentation functions, assessing the algorithm from these
three aspects is necessary. The loss function employed in Mask R–CNN is
3. Model evaluation
a multi-task function shown in Eq. (1), which includes three parts: the
position regression loss of the bounding box Lbox (localization error), the
To evaluate the performance of the proposed Mask R–CNN algorithm
classification loss Lcls (classification error), and the segmentation loss of
in microplastic classification, localization, and segmentation, the U-Net
mask Lmask (segmentation error)calculated by pixel accuracy (He et al.,
algorithm (Ronneberger et al., 2015) was also trained and compared
2017).
4
X.-L. Han et al. Marine Environmental Research 183 (2023) 105829
( ) 1 ∑[ ∗ ( ) ( )]
Lmask sij , s∗ij = − 2
sij log skij + 1 − s∗ij log 1 − skij (5)
m ij
Table 1
Image details in the MPs dataset.
Dataset Image Image resolution Image background
Name amount
Fig. 3. Raman spectra of low-density polyethylene (LDPET) MP particle.
Training 2100 512 × 512 white, sand, natural
Validation 600 soil, and water
Test 300
Test- 180 512 × 512, 600 × 400 1504 sand, natural soil, and
complex × 844 and 3008 × 1688 water
Fig. 4. Sample images from the training dataset: (a) fiber, (b) fragment, (c) pellet, (d) rod.
5
X.-L. Han et al. Marine Environmental Research 183 (2023) 105829
Table 2
Parameters summary from research papers with the same algorithm structure.
Author Minimum Image dimension Weight decay Momentum Learning rate Batch size Training epochs GPU amount
where ti is a vector containing the bounding box location and size (x intersection over union (IoU), a measurement based on Jaccard Index,
coordinate, y coordinate, width, and height), while ti* is the vector for was used as a criterion to determine if the localization is correct (Padilla
the ground truth bounding box. m2 is the mask resolution used in the et al., 2020). The calculation process is illustrated in Fig. 6.
algorithm, here is 28 by 28 pixels. k represents the kth class object in the The area of overlap is the intersection of the ground truth bounding
dataset and here k is four due to fragment, pellet, rod, and rod shall be box and the prediction bounding box. The area of union is the total area
classified. sij and sij* are the binary value (0 or 1) in the predicted mask covered by both bounding boxes. Therefore, the IoU value should fall
and ground truth mask, respectively. between zero and one. Furthermore, zero represents prediction results
As for the U-Net, the dice loss was selected to assess the segmentation that have no overlap with ground truths. In contrast, one represents that
of the U-Net (Sudre et al., 2017). The calculation was shown in Eq. (6): prediction results and ground truths reach a 100% overlap. The most
common metric to measure localization accuracy is the average preci
2 × |Prediction result ∩ Ground truth| + λ
L=1− (Dice coefficient) (6) sion of the bounding box (APbb). However, with different IoU values, the
APbb can vary a lot. In order to evaluate the algorithm performance
|Prediction result| + |Ground truth| + λ
The dice coefficient is a widely used metric in the computer vision comprehensively, three different average precision values are adopted
bb
community to calculate the similarity between two images (Jadon, here: APbb50 , AP75 , and AP . The AP50 indicates the average precision
bb bb
2020). The λ is a minor constant to avoid the numerical issue of dividing when IoU = 50%. This value is always regarded as localization accuracy
by zero. In this case, λ was set as 10− 7. under a loose localization criterion. In comparison, APbb 75 represents the
The loss curves of both Mask R–CNN and U-Net during the training average precision when IoU = 75%, which adopts a stricter localization
and validation process are shown in Fig. 5. The green lines represent the criterion than APbb bb
50 . The AP is the mean value of ten average precision
training and validation loss of Mask R–CNN. The training loss was values where IoU continuously increases from IoU = 50% to IoU = 95%
initially 0.5746 and then dropped sharply to 0.3068 after the first five with a step of 5%.
epochs. The validation loss decreased even faster and reached 0.1507
Overall, the Mask R–CNN achieved APbb 50 = 99.94%, AP75 = 99.63%,
bb
after five epochs. As the training process continued, the training loss
and APbb = 91.36% on the training dataset in this study. The results
finally fluctuated slightly around 0.12 and the validation loss around
were much better than a similar application of Mask R–CNN on marine
0.10. The U-Net loss could be obtained from the blue lines in Fig. 5. The
litter detection (Politikos et al., 2021), where the APbb50 and the AP75 were
bb
training loss started from 0.5129 and decreased to 0.10 in about ten
around 76%. On the other hand, the U-Net could not provide the
epochs. Then the training loss smoothly decreased to 0.07 at the training
localization module to locate targets. Thus the localization performance
end. Compared with the training loss, the validation loss experienced
could not be compared between the U-Net and Mask R–CNN.
several fluctuations during the decreasing process. The validation loss
finally ended up at around 0.05.
3.3. Classification performance
3.2. Localization performance
For Mask R–CNN and U-Net, the precision (P), recall (R) and F1 score
For Mask R–CNN, the algorithm could generate bounding boxes to were used to evaluate the classification performance. These metrics
highlight the location of the targets to be found. The accuracy of the were calculated as follows:
bounding box could be used to evaluate the localization performance. TP
However, before the evaluation results of the proposed algorithm, it is P= (7)
TP + FP
necessary to define what is a correct localization. In this study, the
6
X.-L. Han et al. Marine Environmental Research 183 (2023) 105829
Wegmayr et al. (2020) in an MP fiber segmentation task using the Mask perior precision in locating MPs. The minor errors were not even notable
R–CNN. to naked eye.
Regarding the classification performance, the Mask R–CNN gener
ated labels with classification confidences on the top of the bounding
7
X.-L. Han et al. Marine Environmental Research 183 (2023) 105829
Fig. 7. Localization, classification, and segmentation result of multiple MPs against a white background (Case 1): (a) fibers of similar sizes; (b) fragments of similar
sizes; (c) pellets of similar sizes; (d) rods of similar sizes; (e) fibers of varying sizes; (f) fragments of varying sizes; (g) pellets of varying sizes; (h) rods of varying sizes;
(i) mixed particles of varying sizes.
boxes for classification performance. All MPs were classified successfully were applied to segment each MP particle. Most masks were placed
with high classification confidences, which were shown in Fig. 7 (a)–(i). precisely over the MPs even though the particle boundaries were hard to
The algorithm achieved over 99% confidence in classification results. predict. The notable errors appeared in Fig. 7 (e), (f), and (h), where the
More specifically, the precision was 93.30% and recall was 95.40%, edge of the fiber particle, the end of the rod, and the upper part of the
which resulted in an F1 score of 94.34%, shown in Table 3. pellet were not precisely segmented. From Table 3, the segmentation
As for the segmentation performance, the masks in various colors performance metrics showed that APm = 82.6%, APm 50 = 98.50% and
8
X.-L. Han et al. Marine Environmental Research 183 (2023) 105829
Fig. 7. (continued).
APm75 = 83.60%. Understandably, as the segmentation criterion precisions were higher than 97%, undoubtedly deemed outstanding.
increased from IoU = 50% to IoU = 75%, the average segmentation For the U-Net outputs in Case 1, the algorithm could also accurately
precision decreased from 98.50% to 83.60%. Still, the metrics were classify and segment MPs. Fiber, fragment, pellet, and rod pixels were
sufficiently superior to segment all MPs against a white background. colored in red, green, yellow, and blue, respectively. In terms of classi
Furthermore, Fig. 8 illustrates the detailed segmentation performance fication performance, the U-Net could classify most particles with
for each MP category. The fiber particles had the lowest segmentation similar sizes, which can be seen in Fig. 7(b), (c), and Fig. 7(d). Never
precision (51.51%), which was attributed to the thin fiber and unpre theless, when classifying fiber particles of similar sizes, the ends of
dictable shapes (L-shape, U-shape, I-shape, and S-shape). Then, the several fibers were misclassified as rods. The misclassification in Fig. 7
second-lowest was from rod particles, where the segmentation precision (a) can be seen in the blue part. The situation seemed to be worse when
was 86.08%. The errors could also be attributed to the thin shape of the the MP particle sizes increased because more errors in classification
rod particles. As for fragment and pellet particles, the segmentation were found in Fig. 7(e) and (f), and Fig. 7(h). More specifically, in Fig. 7
9
X.-L. Han et al. Marine Environmental Research 183 (2023) 105829
Fig. 7. (continued).
Table 3
Performance evaluation for both Mask R–CNN and U-Net in Case 1.
Localization (%) Classification (%) Segmentation (%)
10
X.-L. Han et al. Marine Environmental Research 183 (2023) 105829
(c), the localization results for MP particles on the water surface showed
that Mask R–CNN could locate most of the particles except one white
fiber particle. Based on the evaluation metrics in Table 4, the localiza
tion performance of the Mask R–CNN was slightly impaired compared
with the excellent performance in Case 1. More specifically, the APbb
dropped from 82.6% to 67.50%. Besides, both the APbb 50 and AP75
bb
11
X.-L. Han et al. Marine Environmental Research 183 (2023) 105829
Fig. 9. Localization, classification, and segmentation result of multiple MPs against a natural background (Case 2): (a) sand background; (b) natural soil background;
(c) water background.
Table 4
Performance of both Mask R–CNN and U-Net in Case 2.
Metrics Localization (%) Classification (%) Segmentation (%)
appeared in the current study, which was likely due to (1) the general (1) Generalization ability of dataset: As mentioned in the dataset
ization ability of the dataset and (2) bad image quality. These factors preparation and preprocessing section, the current dataset has been
will be analyzed in more detail in the following paragraphs. prepared in white and natural backgrounds (soil, sand, and water).
12
X.-L. Han et al. Marine Environmental Research 183 (2023) 105829
Fig. 11. The same MP particle image shot under different focal lengths.
13
X.-L. Han et al. Marine Environmental Research 183 (2023) 105829
14
X.-L. Han et al. Marine Environmental Research 183 (2023) 105829
Jadon, Shruti, 2020. A survey of loss functions for semantic segmentation. 2020 IEEE Padilla, R., Netto, S.L., da Silva, E.A., 2020. A survey on performance metrics for object-
Conference on Computational Intelligence in Bioinformatics and Computational detection algorithms. In: 2020 International Conference on Systems, Signals and
Biology (CIBCB). https://doi.org/10.1109/CIBCB48159.2020.9277638. Image Processing (IWSSIP). IEEE, pp. 237–242.
Johnson, J.W., 2018. Adapting Mask-Rcnn for Automatic Nucleus Segmentation arXiv Pan, S.J., Yang, Q., 2010. A survey on transfer learning. IEEE Trans. Knowl. Data Eng. 22
preprint arXiv:1805.00500. (10), 1345–1359.
Kedzierski, M., Falcou-Préfol, M., Kerros, M.E., Henry, M., Pedrotti, M.L., Bruzaud, S., Politikos, D.V., Fakiris, E., Davvetas, A., Klampanos, I.A., Papatheodorou, G., 2021a.
2019. A machine learning algorithm for high throughput identification of FTIR Automatic detection of seafloor marine litter using towed camera images and deep
spectra: application on microplastics collected in the Mediterranean Sea. learning. Mar. Pollut. Bull. 164, 111974.
Chemosphere 234, 242–251. Ren, S., He, K., Girshick, R., Sun, J., 2015. Faster R-Cnn: towards Real-Time Object
Kim, B., Cho, S., 2019. Image-based concrete crack assessment using mask and region- Detection with Region Proposal Networks arXiv preprint arXiv:1506.01497.
based convolutional neural network. Struct. Control Health Monit. 26 (8), 1–15. Ronneberger, Olaf, Fischer, Philipp, Brox, Thomas, 2015. U-Net: Convolutional Networks
https://doi.org/10.1002/stc.2381. for Biomedical Image Segmentation. International Conference on Medical image
Kooi, M., Koelmans, A.A., 2019. Simplifying microplastic via continuous probability computing and computer-assisted intervention.
distributions for size, shape, and density. Environ. Sci. Technol. Lett. 6, 551–557. Russell, S., Norvig, P., 1995. Artificial Intelligence: A Modern Approach. prentice-hall,
LeCun, Y., Bengio, Y., Hinton, G., 2015. Deep learning. Nature 521 (7553), 436–444. Englewood cliffs.
LeCun, Y., Boser, B., Denker, J.S., Henderson, D., Howard, R.E., Hubbard, W., Jackel, L. Serranti, S., Palmieri, R., Bonifazi, G., Cózar, A., 2018. Characterization of microplastic
D., 1989. Backpropagation applied to handwritten zip code recognition. Neural litter from oceans by an innovative approach based on hyperspectral imaging. Waste
Comput. 1 (4), 541–551. Manage. (Tucson, Ariz.) 76, 117–125.
Lee, Ka Shing, Chen, Hui Ling, Ng, Yong Sin, et al., 2022. U-Net skip-connection Sharma, S., Chatterjee, S., 2017. Microplastic pollution, a threat to marine ecosystem and
architectures for the automated counting of microplastics. Neural Comput. Appl. human health: a short review. Environ. Sci. Pollut. Res. 24 (27), 21530–21547.
https://doi.org/10.1007/s00521-021-06876-w. Shim, W.J., Hong, S.H., Eo, S.E., 2017. Identification methods in microplastic analysis: a
Lenz, R., Enders, K., Stedmon, C.A., Mackenzie, D.M., Nielsen, T.G., 2015. A critical review. Anal. Methods 9 (9), 1384–1391.
assessment of visual identification of marine microplastic using Raman spectroscopy Sudre, Carole H, Li, Wenqi, Vercauteren, Tom, Ourselin, Sebastien, Jorge Cardoso, M.,
for analysis improvement. Mar. Pollut. Bull. 100 (1), 82–91. 2017. Generalised Dice Overlap as a Deep Learning Loss Function for Highly
Lorenzo-Navarro, J., Castrillón-Santana, M., Sánchez-Nielsen, E., Zarco, B., Herrera, A., Unbalanced Segmentations. Deep Learn Med Image Anal Multimodal Learn Clin
Martínez, I., Gómez, M., 2021. Deep learning approach for automatic microplastics Decis Support. https://doi.org/10.1007/978-3-319-67558-9_28.
counting and classification. Sci. Total Environ. 765, 142728. Wang, T., Zou, X., Li, B., Yao, Y., Zang, Z., Li, Y., Yu, W., Wang, W., 2019. Preliminary
Lv, L., Qu, J., Yu, Z., Chen, D., Zhou, C., Hong, P., Sun, S., Li, C., 2019. A simple method study of the source apportionment and diversity of microplastics: taking floating
for detecting and quantifying microplastics utilizing fluorescent dyes-Safranine T, microplastics in the South China Sea as an example. Environ. Pollut. 245, 965–974.
fluorescein isophosphate, Nile red based on thermal expansion and contraction Wegmayr, V., Sahin, A., Saemundsson, B., Buhmann, J., 2020. Instance segmentation for
property. Environ. Pollut. 255, 113283. the quantification of microplastic fiber images. In: Proceedings of the IEEE/CVF
Mason, S.A., Garneau, D., Sutton, R., Chu, Y., Ehmann, K., Barnes, J., Fink, P., Winter Conference on Applications of Computer Vision, pp. 2210–2217.
Papazissimos, D., Rogers, D.L., 2016. Microplastic pollution is widely detected in US Wu, Z., Shen, C., Van Den Hengel, A., 2019. Wider or deeper: revisiting the resnet model
municipal wastewater treatment plant effluent. Environ. Pollut. 218, 1045–1054. for visual recognition. Pattern Recogn. 90, 119–133.
Meyers, N., Catarino, A.I., Declercq, A.M., Brenan, A., Devriese, L., Vandegehuchte, M., Xu, B., Wang, W., Falzon, G., Kwan, P., Guo, L., Chen, G., Tait, A., Schneider, D., 2020.
De Witte, B., Janssen, C., Everaert, G., 2022. Microplastic detection and Automated cattle counting using Mask R-CNN in quadcopter vision system. Comput.
identification by Nile red staining: towards a semi-automated, cost- and time- Electron. Agric. 171, 105300.
effective technique. Sci. Total Environ. 823, 153441. Yu, Y., Zhang, K., Yang, L., Zhang, D., 2019. Fruit detection for strawberry harvesting
Mukhanov, V.S., Litvinyuk, D.A., Sakhon, E.G., Bagaev, A.V., Veerasingam, S., robot in non-structural environment based on Mask-RCNN. Comput. Electron. Agric.
Venkatachalapathy, R., 2019. A new method for analyzing microplastic particle size 163, 104846.
distribution in marine environmental samples. Ecol. Montenegrina. 23, 77–86. Zhang, W., Zhang, S., Wang, J., Wang, Y., Mu, J., Wang, P., Lin, X., Ma, D., 2017.
Neubeck, A., Van Gool, L., 2006. Efficient non-maximum suppression. In: 18th Microplastic pollution in the surface waters of the Bohai Sea, China. Environ. Pollut.
International Conference on Pattern Recognition (ICPR’06), vol. 3. IEEE, 231, 541–548.
pp. 850–855. Zhao, S., Zhang, D.M., Huang, H.W., 2020. Deep learning–based image instance
Nie, X., Duan, M., Ding, H., Hu, B., Wong, E.K., 2020. Attention Mask R-CNN for ship segmentation for moisture marks of shield tunnel lining. Tunn. Undergr. Space
detection and segmentation from remote sensing images. IEEE Access 8, 9325–9334. Technol. 95, 103156.
Nobre, C.R., Santana, M.F.M., Maluf, A., Cortez, F.S., Cesar, A., Pereira, C.D.S., Turra, A., Zhu, Y., Yeung, C.H., Lam, E.Y., 2021a. Digital holographic imaging and classification of
2015. Assessment of microplastic toxicity to embryonic development of the sea microplastics using deep transfer learning. Appl. Opt. 60 (4), A38–A47.
urchin Lytechinus variegatus (Echinodermata: Echinoidea). Mar. Pollut. Bull. 92 Zhu, Y., Yeung, C.H., Lam, E.Y., 2021b. Microplastic pollution monitoring with
(1–2), 99–104. holographic classification and deep learning. J. Phys. Photon. 3 (2), 024013.
15