Machine Learning For Microalgae Detection and Utilization
Machine Learning For Microalgae Detection and Utilization
*CORRESPONDENCE
Microalgae are essential parts of marine ecology, and they play a key role in
Teng Zhou species balance. Microalgae also have significant economic value. However,
[email protected] microalgae are too tiny, and there are many different kinds of microalgae in a
SPECIALTY SECTION single drop of seawater. It is challenging to identify microalgae species and
This article was submitted to
monitor microalgae changes. Machine learning techniques have achieved
Marine Biotechnology and
Bioproducts, massive success in object recognition and classification, and have attracted a
a section of the journal wide range of attention. Many researchers have introduced machine learning
Frontiers in Marine Science
algorithms into microalgae applications, and similarly significant effects are
RECEIVED 18 May 2022 gained. The paper summarizes recent advances based on various machine
ACCEPTED 29 June 2022
PUBLISHED 26 July 2022
learning algorithms in microalgae applications, such as microalgae
classification, bioenergy generation from microalgae, environment
CITATION
Ning H, Li R and Zhou T (2022) purification with microalgae, and microalgae growth monitor. Finally, we
Machine learning for microalgae prospect development of machine learning algorithms in microalgae
detection and utilization.
Front. Mar. Sci. 9:947394.
treatment in the future.
doi: 10.3389/fmars.2022.947394
COPYRIGHT KEYWORDS
© 2022 Ning, Li and Zhou. This is an microalgae, machine learning, environment protection, biodiesel, convolutional
open-access article distributed under
neural network
the terms of the Creative Commons
Attribution License (CC BY). The use,
distribution or reproduction in other
forums is permitted, provided the
original author(s) and the copyright
owner(s) are credited and that the
original publication in this journal is
cited, in accordance with accepted
academic practice. No use,
distribution or reproduction is
Introduction
permitted which does not comply with
these terms. Microalgae in the ocean are usually single-celled organisms that play a crucial part in
marine ecology (Chew et al., 2017). Microalgae are primary organic matter producers in
the sea. Microalgae absorb carbon dioxide and convert it into organic matter, while
releasing oxygen through photosynthesis (Chakdar et al., 2021). As a result, microalgae
are crucial food sources for organisms in ocean, and they could reduce the greenhouse
effect (Mochdia and Tamaki, 2021). In addition, microalgae have considerable social and
commercial value. Microalgae are capable of purifying sewage, because they can absorb
nitrogen and phosphorus. The high content of oil and fat in microalgae makes them an
ideal raw material for biodiesel product (Adamczak et al., 2009; Chowdhury and
Loganathan, 2019; Mofijur et al., 2019).
Microalgae species recognition and growth monitor are crucial steps in actual
applications (Gomez-Espinoza et al., 2018). Microalgae are commonly microscopic, and
there are usually many different kinds of microalgae species in a single sample (Ferro et al.,
2018) (Figure 1). These characteristics make the identification, years, data resource and computer computing power have
classification, and analysis of microalgae a very challenging task enhanced significantly. Machine learning has achieved great
(Andersen and Kawachi, 2005). Traditional manual methods are success and is applied widely in many fields (El Naqa and
not only time-consuming, they also require much skill and Murphy, 2015; Jordan and Mitchell, 2015; Liakos et al., 2018).
experience for the operators (Peniuk et al., 2016; Saputro et al., In particular, machine learning has greatly facilitated the
2019). As a result, the efficiency and scope of microalgae development of digital image processing and speech
applications are greatly limited. Faster and more efficient methods recognition (McCulloch and Pitts, 1990; He et al., 2015). Many
for the classification, identification, and analysis of microalgae are researchers have introduced machine learning techniques into
needed. (Sá et al., 2013; Wei et al., 2017). the field of microalgae process to identify the species of
Machine learning is a collection of data-driven algorithms in microalgae, and monitor the growth process of microalgae
essential (Rosenblatt, 1958; Rumelhart et al., 1986). In recent with outstanding results as well (Carleo et al., 2019).
FIGURE 1
Microscopic images of microalgae: (A) Glycophilic Chlorella or Chlorella saccharophilus; (B) Chlorellasorokiniana; (C) Chlorella vulgaris; (D) Coelastrella; (E)
Desmodesmus; (F) Desmodesmus; (G) Scenedesmus obliquus; (H) Scenedesmus. (reproduced with permission from Ferro et al., 2018).
This paper summarizes the state of machine learning The basic machine learning process is the analysis and learning of
algorithms used in microalgae treatment, with a focus on data with algorithms, and subsequent judgment and prediction about
summing up the advances made in recent years. Firstly, the the actual situations are made automatically (Wei et al., 2019)
article explains the basic principle of machine learning (Figure 2A). A framework with many parameters is first built, and
algorithms such as support vector machine, decision tree, then the prepared data is fed into the model. The parameters are
random forest, and neural network. The development of continuously adjusted until they match or close to the correct result
microalgae classification, the conversion from microalgae to (Bishop, 2013). Machine learning contains supervised learning,
bioenergy, microalgae for environmental protection, and the unsupervised learning, semi-supervised learning, and reinforcement
monitoring of microalgae growth stage with machine learning learning, depending on the training model. Many different models can
algorithms are then explained in detail. With all the summaries, be used for machine learning training, and a comprehensive
we list machine learning methods different from traditional description of centralized representative models are in the following
manual operation in microalgae treatment. This is a pretty (Mahesh, 2020).
reference for the following researchers and workers in the field.
A C
FIGURE 2
(A) Flowchart for machine learning. (reproduced from an open access article). (B) Schematic diagram of SVM. (reproduced with permission from
Deka, 2014). (C) A decision tree for identification based on iris. (reproduced from an open access article).
farthest distance from the two types of data and the plane can be linear kernel function, polynomial kernel function, radial basis
represented as: kernel function, and Sigmoid function (Wang et al., 2008).
w T ·x+b=0
Decision tree
w means the coefficient vector that judges the hyperplane
direction, and b represents the bias vector which describes the
The decision tree algorithm is also a classification and
distance between the hyperplane and the data sample set (Deka,
regression method that belongs to unsupervised learning
2014) (Figure 2B).
(Quinlan, 1986). The internal node in a decision tree means
The closest hyperplane between the positive and the negative
an attribute, a branch is a chosen path to obtain the final result,
samples can then be expressed as:
and each leaf node indicates a species (Li et al., 2019)
wT ·x+b=1 and w T ·x+b=−1 (Figure 2C). To construct a decision tree model, a training
dataset is essential.
The hyperplanes between different sample data can then be Decision tree learning essentially generalizes features in the
uniformly expressed as: training dataset and gains the rules to partition final sample data
into smaller ones. Based on the different partition ways, many
f ðxÞ=w T ·x+b
decision trees can be obtained through the same training dataset
The correctness of the sample classification is converted to (Myles et al., 2004). A decision tree with excellent performance
the interval distance between the sample data to the hyperplane, depends less on the training dataset, and it means the tree owns
g = jjwjj
2
. The coefficients w and b for optimal hyperplane can be perfect generalization ability. The most commonly used
found by searching the maximum value of the equation, max jjwjj2
probability fits the training dataset well, and predicts the
1 w,b
. The equation is equivalent to: min jjwjj. following unknown data perfectly. The process of decision tree
2
w,b
construction has three steps: feature selection, decision tree
This is a programming problem of convex quadratic, the
generation, and decision tree pruning.
solution of which should be obtained by introducing the
n n
Lagrange multiplier ai, w = oai* yi xi b = yj − oai* yi (xi : xj ) a*i Feature selection
i=1 i=1
is the solution to the pairwise optimization issue, and the Feature selection refers to the choice of features in the
training dataset suitable for the current dataset to be divided
subscript j satisfies a* >0.
j
into many parts, and one part means a leaf in the decision tree
(Pal and Mather, 2003). The dataset will be divided recursively
Nonlinear model until the sample points can be classified into their respective
SVM handles nonlinear data by introducing kernel function categories, and the complete tree is constructed. Many
to enhance the dimensionality of the feature space (Pradhan, prediction criteria can be used to choose features, and each
2012). Suppose that the kernel function f(x) is employed to choice leads to a different decision tree algorithm. The standard
represent the feature vector after the map of the sample set, the commonly employed construction tree algorithms are the ID3
hyperplane representation can be denoted as: algorithm, the C4.5 algorithm, and the CART algorithm. The
algorithms utilize information gain, information gain rate, and
f ðxÞ=w T :fðxÞ+b
Gini index respectively, to determine the features that divide the
After introducing the Lagrangian operator ai and using the dataset (Patel and Prajapati, 2018).
equation k(xi,xj)=〈f(xi)f(xj)〉 to represent the inner product f The essence of the ID3 algorithm is the attributes chosen with
(xi)Tf(xj) , the solution of the hyperplane equation can be the information gain benchmark, and the attribute with the
obtained: maximum information gain will be utilized to divide the dataset
recursively (Wellner et al., 2017). The less the expected information
n
f ðxÞ=oai y i k ðx,xi Þ+b is, the greater the information gain will be, and the dataset owns
i=1 higher purity. To describe the information gain clearly, the entropy
The type of the kernel function denotes the changed and conditional entropy need to be explained first.
distribution of the original sample in one-higher dimensional The entropy of the random variable X is expressed as:
space (Widodo and Yang, 2007). The function is the most n
significant variable for a nonlinear support vector machine HðXÞ=−op(xi )logpðxi Þ
i=1
framework. Much research reveals that the efficiency of the
framework relays greatly on the kernel function (Meyer et al., n represents the n different discrete values of X, p(xi)
2003; Shahid et al., 2015). The common kernel functions are represents the probability that X takes the value i.
To describe the non-determinacy of a random variable Y in recursion until the traversal stops at the leaf node (Swain and
the situation that the variable X is known, the conditional Hauska, 1977).
entropy is introduced:
Pruning of decision tree
HðYjXÞ= o pðxÞH(YjX=x)
x∈X Decision trees are prone to overfitting and often require
pruning to minimize the degree of the tree, alleviating overfitting
The ID3 algorithm evaluates the information gain of feature
by actively removing some branches and reducing the risk of
A in the sample set D and the prediction is computed as:
overfitting. Pruning is one of the methods used to break decision
GainðD,AÞ=HðDÞ−HðD,AÞ tree branching. There are two pruning ways: pre-pruning and
post-pruning. Pre-pruning sets a metric during tree growth and
After the information gains of all features are calculated, the stops growing when that metric is reached. During post-pruning,
feature with the largest information gain will divide the sample the tree grows fully until minimum impurity values for all leaf
set D. nodes. Post-pruning is often more computationally costly than
The disadvantage of information gain is that features could the pre-pruning manner, particularly in the enormous dataset.
have a bias toward characteristics that have many taken values. If But the post-pruning method is still superior to the pre-pruning
the count of different values taken by a feature is greater, the method in a small sample dataset (Friedl and Brodley, 1997).
more likely the feature will be used as a split point. The most
extreme case is that each result of the feature refers to a different
outcome of the feature, then the information entropy is found to Random forest
be 0, and the information gain is maximized. After improving
the defect of the ID3 algorithm, the C4.5 algorithm is derived. Random forest is an integrated learning approach, and the
The C4.5 algorithm utilizes the information gain rate to decision tree is the primary component unit of the random forest
measure the ability of features in ensemble classification algorithm (Breiman, 2001). Since a single decision tree has the
(Elomaa, 1994). The information gain rate is described in the problem of low accuracy and overfitting, it overcomes the
following: limitations by bringing numerous decision trees together.
GainRatioðAÞ=GainðAÞ Compared to the decision tree algorithm, the random forest
HðAÞ algorithm has better classification and regression performance.
Compared with other machine learning algorithms such as SVM
Gain (A) represents the information gain generated by and deep learning algorithms, as convolution neural network an
dividing the dataset using feature A, and H (A) is the example, the random forest algorithm has quicker prediction speed
information entropy of feature A. The C4.5 algorithm selects and superior accuracy with relatively lower computing power.
the property with the maximum information gain rate as the The random forest algorithm is a unite of the Bagging
division attribute to partition the dataset (Sharma et al., 2013; algorithm and decision tree algorithm, which commonly
Nugraha et al., 2020). utilizes a decision tree as a basis for classification training.
The CART algorithm uses the Gini index, which reflects the Finally, it makes accurate classification for samples with
mixture of the framework as the splitting criterion (Ayyagari, unknown outcomes by a voting method (Belgiu and Drăguţ,
2020). The smaller the Gini index is, the lower the mixture will 2016) (Figure 3).
be, and the selected feature is better. The Gini index is defined as:
n n Sample to generate train sets
GiniðDÞ=op(xi )½1−pðxi Þ=1−op2 ðxi Þ Each tree in a random forest is different, and different
i=1 i=1
datasets are needed to generate other trees. So different
datasets are extracted from the original dataset to form
Generation of decision tree different sub-datasets, which are used to train different
The decision tree generation process grows from the root decision trees (Paul et al., 2018). A standard method for
node and generates sub-nodes recursively top and down extracting subsets of data is the Bagging method, which
according to the chosen feature classification until the dataset ensures that each tree is unrelated to the other, and thus
is indistinguishable (Zhou and Chen, 2002). Based on various reduces the risk of overfitting. The Bagging algorithm is a
algorithms to generate a decision tree, we traverse the entire data typical parallelized integrated learning method with no strong
sample from the root node downward to search for the most dependencies between the individual learners.
influential node in the current feature vector as the child node of The Bagging method randomly draws a dataset from the
the layer. Then, we continue to traverse downward and take the original dataset and then puts the extracted data back into the
child node just obtained as the new parent node, and keep the original dataset before the next random draw. Thus, the dataset
FIGURE 3
(A) Training steps of random forest. (B) Classification application of random forest. (reproduced with permission from Belgiu and Dră guţ, 2016)
would be divided into many training subsets, which are used to 2016). The relative majority vote is that the result with the most
construct different decision trees (Shi and Horvath, 2006). votes is selected as the expected outcome, and if there is more
than one vote owns the most count. The final result is chosen
Construct of decision trees randomly (Speiser et al., 2019). The weighted vote method
Once the training set for each tree is determined, it is time to means that all results are given a weight, which is equivalent
construct the decision trees (Qi, 2012). During the construct to the weighted average process. The classification results of each
process of each decision tree for the random forest, some decision tree are multiplied by the weight, and the weighted
features from the feature set of the sub-dataset are randomly choices for each group are added. The category with the
selected to participate in the node split selection calculation as maximum number will be considered as the final result
nodes to build the decision tree, and no pruning is done for each (Rodriguez-Galiano et al., 2012).
generated decision tree. The detailed decision tree construction
process can be found in the section 2.2.
Neural network
Result confirm by vote
The process of sections 2.3.1 and 2.3.2 above is repeated The neural network technique is one of the machine learning
continuously, and will not stop until the number of trees reaches ways, and it is skilled in dealing with non-linear data. A neural
the required quantity. In this way, many different decision trees are network is a simulation of the nervous system in the human brain,
built, and these trees are combined (Bonissone et al., 2010; and the basic building blocks are neurons. The different
Boulesteix et al., 2012). The classification result of each tree is arrangements of neurons divide neural networks into many types,
voted on according to specific rules. The final random forest for example, convolutional neural network, which is ideal for
algorithm classification result is the decision tree result that gains processing image and waveform data (McCulloch and Pitts, 1943).
the most votes. In the final vote, there are generally three methods:
absolute majority vote, relative majority vote, and weighted vote. Neuron
The principle of absolute majority vote indicates that only Neurons are the essential components of various neural
more than half of the entire votes are cast for an option, and the networks and are the mathematical models of biological
option is chosen as the predicted outcome (Farnaaz and Jabbar, perceptual machines (Mohammed et al., 1995; Bakirtzis et al.,
1996). By feeding training data into the neuron, a corresponding The typical ANN has three layers, namely, the input layer,
output can be obtained by some mathematical calculations on the hidden layer, and the output layer according to their position
the neuron. A neuron is called a perceptron as well. in the left, middle, and right of the network. The input layer
The structure of a neuron contains many inputs, but only one receives input data, the hidden layer is invisible to the outside
output (Sagheer et al., 2019) (Figure 4A). x1, x2,…,xn2are the input world and calculates object features, and the output layer gives
data of a neuron, and w1,w2…,wn are weights for input data; b is the the final result. Each neuron in the N layer is contacted with all
bias, and f() expresses the activation function. The input parameters neurons in the N-1 layer, which is also called a fully connected
are multiplied by the weights and summed. Then biases are added neural network (Hsu et al., 1990).
and input into the activation function for processing (Tian and
Noore, 2004; Cheng et al., 2015). The result of the activation Deep neural network
function is the consequence of this perceptron. The whole process If there are more hidden layers in an ANN, there is a more
can be represented with the following equation: powerful analysis ability, that the neural network owns. If a
! neural network contains more than two hidden layers, it is called
n
y=f o
i=1
wi xi +b a deep neural network (Montavon et al., 2018) (DNN
Figure 4C). In practice, a neural network that includes just one
If >x=(x1,x2,…,xn) and w=(w1,w2,…,wn)T , the above hidden layer can satisfy any requirement, but the hidden layer
equation could be transferred into y=f(wx+b ) needs a large number of neurons. A deep network performs the
Common activation functions include f(x)= 1+e1 −x ,f(x)=max same role with fewer neurons.
(0,x ), f(x)= eex −e
x −x DNN determines the relevance of features better through the
+e−x etc.
mapping relations among the input and output data. Any data from
the input layer is sent to every neuron in the hidden layer. When the
Artificial neural network
size of the input data is too large, it is easy to over-fit the model with
Neurons have a simple structure and can only deal with
too many parameters. When the input data is unduly small, it is
linear problems, and neural networks are generally used to
difficult for the model to learn helpful information from the limited
handle non-linear problems. Neural networks can solve
data, resulting in underfitting (Cichy and Kaiser, 2019).
complex non-linear input-output applications. The network is
composed of numerous tiny neurons. In fact, a neural network is
a combination of massive neurons according to specific rules. Convolutional neural network
The neural network is called an artificial neural network Traditional neural network representations are constructed
(Abiodun et al., 2018) (ANN Figure 4B). with one-dimensional vectors, which miss the spatial information
A B
C D
FIGURE 4
(A) Structure of a perceptron. (reproduced from an open access article). (B) Schematic diagram of an artificial neural network. (reproduced from
an open access article). (C) Data forward propagation and error back propagation of a deep neural network. (reproduced from an open access
article). (D) Diagram of convolution operation. (reproduced with permission from Sarıgül et al., 2019).
of the objects. Researchers have devised a convolutional neural Microalgae detection and
network (CNN) by introducing convolutional and pooling
operations (LeCun and Bengio, 1995). The CNN can entirely get
classification with machine learning
the local spatial semantic characteristics of an image, and pooling
As unicellular organisms, microalgae are not only very
operations are able to extend the perceptual field to obtain more
microscopic, but also do not differ much from one species to
advanced image features for object recognition. Convolutional
another. Combined with the fact that thousands of species of
neural networks are composed of a cascade of a convolutional
microalgae may be present in a tiny sample, microalgae
layer with a local field of perception and pooling layers with a
classification is a very challenging job. Traditional manual
down-sampling effect. The CNN holds the ability to extract
classification under a microscope is not only laborious, but
hierarchical, multi-scale image features from images. The main
also requires a high level of skill and experience for the
application area of the convolutional neural network is image
operators. Therefore, the manual classification method of
recognition, but it can also be used in video analysis and natural
microalgae is usually inefficient and unsatisfied in terms of
language processing (Albawi et al., 2017).
accuracy (Barsanti et al., 2021).
The convolution layer implements feature detection, extracts
Machine learning algorithms based on data-driven models
crucial information from the input data, and adds non-linear
are very advantageous in dealing with different types of
factors to the feature information through the activation
unstructured data (Rani et al., 2021). Much progress has been
function. Convolution is a regional operation in which native
made in introducing machine learning algorithms in microalgae
information of an image is acquired with a specific size
detection and classification work. The scheme allows computers
convolution kernel applied to an image (Sarıgül et al., 2019)
to automatically learn the characteristics of different algae based
(Figure 4D). In the convolution layer of a CNN, the
on existing data and give classification results for new data. The
convolutional kernel extracts local features by sliding samples
data processed by machine learning algorithms are microalgae
over the image matrix. The process is called the convolution
images obtained through microscopy, so no-marker and
operation. The operation could be described with the following
invasion-free data acquisition can be achieved. The operations
formula:
avoid the tedious process of traditional staining and labeling
M−1N−1 steps and the damage to the microalgae growth environment
y ði,jÞ= o o Wðu,vÞXði−u,j−vÞ+b (Zheng et al., 2021).
u=0 v=0
A label-free analysis model was devised by Claire Lifan Chen
M is the width of the convolution kernel, N is the height of et al. to perform a rapid classification of microalgae (Chen et al.,
the convolution kernel, W is the weight of the convolution 2016). High-throughput imaging technology allowed the
kernel, X is the input data, and b is the bias. acquisition of 100,000 images of microalgae per second, while
The pooling layer is based on the convolutional layer to capturing rich information about microalgae and summarizing
extract meaningful information from the image further, it into 16 features. They employed multiple machine learning
reducing the parameters in the network and the amount of algorithms to classify the unlabeled data. Practical experiments
computation by reducing the space size. The pooling layer also showed that the method was 17% more accurate than the
reduces the overfitting of the model and improves the fault traditional method and was well suited for high-throughput
tolerance of the model (Kuo, 2016). There are two main types of label-free microalgae classification. Ç ağ atay Işıl et al. devised a
pooling, maximum pooling and mean pooling. The maximum novel portable cytometer to conduct label-free identification and
pooling is a typical pooling operation that reduces the amount of analysis of microalgae (Işıl et al., 2021b). The device could
data through a maximum value. Mean pooling, on the other analyze chemical perturbation in the external environment
hand, involves calculating the average value of an convolutional based on spectral features in microalgae images and classify
field as the pooling value for that space. microalgae based on deep learning technologies. In addition, the
In a CNN, one or more fully connected layers are connected device could count the number of microalgae and analyze the
to pooling layers after multiple convolutional layers. Among the interactions between them. The group also utilized a
layers, each neuron in the N layer connects all neurons in the N- convolutional neural network to analyze the label-free spatial
1 layer, but not in the same layer. The fully connected layers play and spectral characteristics of microalgae to analyze the
two roles in the overall convolutional neural network. Firstly, it composition and growth status of microalgae (Işıl et al.,
classifies the features based on different details extracted from 2021a). The ultimate goal was to confirm the interactions
the convolutional layers. Secondly, it reduces the impact of between microalgae and the response of microalgae to external
feature position shifts on the classification to a greater extent. contamination. The effect of the method was demonstrated by
The fully connected layer acts as a classifier (Acharya et al., 2017; the mixed culture of single microalgae and multiple microalgae
Gu et al., 2018). in copper-containing solutions. Thanks to the label-free
technique, the tested microalgae samples could be directly put accuracy of the method reached 99.8% in experiments, which
back into the original solution without contamination. was higher than the convolutional neural network algorithm in
Iago Corrêa et al. utilized a convolutional neural network some specific cases. An additional advantage of the model was
with five convolutional layers and three pooling layers, total that it provided morphological information on the microalgal
eight layers, to classify microalgae (Correa et al., 2017). The populations. Anaahat Dhindsa et al. designed a new scheme to
dataset consisted of microalgae images and labels. The classify microalgae (Dhindsa et al., 2021). The microalgae images
microalgae images were obtained by the team from South were segmented, and 25 features were extracted by a generalized
Atlantic seawater using FlowCAM equipment, and the tags segmentation algorithm, and then various machine learning
were manually classified by multiple experts. The input data of algorithms were applied for classification. The classification
the model could be low-resolution raw microalgae images accuracy increased from 96.1% to 98.2% after the modification
without feature extraction based on microalgae images. The of the support vector machine algorithm. The authors
fully automated microalgae classification accuracy given to the mentioned that the introduction of transfer learning into the
model without image preprocesses and human intervention had classification progress was expected to develop the accuracy in
reached 88.59%. The performance could be further improved if the future. Zhanpeng Xu et al. introduced a spectral imager that
the data enhancement technique was added. D.P. Yadav et al. allowed classification and growth cycle analysis of microalgae
improved the traditional ResNeXt convolutional neural network (Xu et al., 2020). The device acquired spectral images of
for the recognition and classification of microalgae (Yadav et al., microalgae and then analyzed them with a support vector
2020). The dataset was sourced from the Physiology Research machine algorithm. The last classification accuracy was 94.4%.
Center and the Internet, and the initial dataset contained only Based on the random forest algorithm, the growth of microalgae
100 images. After data augmentation, the dataset was expanded could be predicted from the above data, and the accuracy could
to 80,000 images, 80% employed for model training and the rest reach 98.1%. The accuracy and effectiveness of the model were
20% for model validation. The scheme was experimentally confirmed after the identification of a mixture of microalgae.
validated to achieve 99.97% classification accuracy. P. Otá lora Paulo Drews-Jr et al. applied semi-supervised learning in their
et al. presented two frameworks with neural networks to classify work on microalgae classification (Drews et al., 2013). The
microalgae (Otá lora et al., 2021). The first framework handled dataset was microalgae data obtained through the FlowCAM
microalgae data from the device FlowCAM, providing 30 device in the Atlantic Ocean. Experiments confirmed that the
features of microalgae. The artificial neural network included method could get better results than SVM, and the final
30 neurons in the input layer, and 25 neurons in the hidden recognition accuracy could reach 92% if the active learning
layer, and the final output was two types. The second framework algorithm was added. The performance of the method could
processed microalgae images with a convolutional neural be further enhanced by improving the dataset and optimizing
network, including 25 layers. The model could achieve 96% the image segmentation algorithm.
accuracy in training and 93.5% accuracy even in actual tests. Sansoen Promdaen et al. performed an in-depth research on
Mesut Ersin Sonmez et al. used multiple structured the classification of microalgae with unclear boundaries and
convolutional neural networks and a coupled support vector blurred textures (Promdaen et al., 2014)(Figure 5A). To deal
machine algorithm to classify microalgae, all of which yielded with the issue of vague boundaries, the authors utilized the
excellent results (Sonmez et al., 2022). All microalgae images method of microalgae segmentation based on the image
were obtained from an inverted microscope, with only 20 images background. To handle the situation of blurred textures, the
of each microalgae in the initial dataset. To ensure the final authors proposed a new texture description method. The dataset
result, the dataset was extended by applying the data with 720 images had multiple sources, including universities,
enhancement technique. The final recognition accuracy of the waterworks authorities, networks, etc. The accuracy of the
convolutional neural network based on the AlexNet structure method reached 97.22% in the experiment. Hui Huang et al.
with various modifications was as high as 99.66%. In addition, employed multiple machine learning algorithms to classify
the microalgae features identified by the convolutional neural microalgae and microplastics in seawater (Huang et al., 2021).
network were utilized as input parameters for the support vector The data processed by the various algorithms were the image
machine algorithm to improve its recognition accuracy to the data acquired by spectral microscopy. The image stitching
same level as the convolutional neural network. Jeffrey Harmon technique was introduced to expand the imaging range of
et al. conducted a classification study of spherical microalgae images. The effectiveness of each algorithm was verified by
based on a support vector machine approach (Harmon et al., testing in a real-world environment. Jhony−Heriberto Giraldo
2020). They first utilized fluorescence imaging technology to −Zuluaga et al. utilized a digital microscope to take images of the
obtain trichromatic images of microalgae to quantify the microalgae and obtained the microalgae species through the
morphological characteristics of microalgae. Then, the image process (Giraldo-Zuluaga et al., 2016). Images were
morphological features of microalgae were analyzed and finally characterized by statistical features, which were derived from
classified based on the support vector machine algorithm. The the calculation and analysis of texture features. The dataset used
B C
FIGURE 5
(A) Images of twelve species of microalgae. (reproduced with permission from Promdaen et al., 2014). (B) Procedure of microalgae classification and
recognition based on machine learning. (reproduced with permission from Reimann et al., 2020). (C) Procedure of microalgae classification and
recognition based on machine learning. (reproduced with permission from Wang et al., 2021).
for training was obtained by processing the original images features used for machine learning algorithm analysis were all
acquired by the digital microscope. The experiments showed acquired through digital holographic microscopy, which
that the effect of the support vector machine algorithm was eliminated the need for tedious staining and labeling processes
better than the artificial neural network algorithm, which could on the microalgae. The method also ensured that there was no
reach 98.63%. Zepeng Zhuo et al. constructed a dataset impact on the growth environment of the microalgae. The
containing 35 species of microalgae specifically for microalgal framework achieved an accuracy of 94.8% in the laboratory
classification (Zhuo et al., 2022). The content of the dataset was and reached the same accuracy as the conventional staining
polarized light scattering data of microalgae. They investigated method even when validated in practice. B.M. Franco et al.
the performance of many machine learning algorithms based on classified a variety of microalgae simultaneously based on an
the dataset, and the final result proved that the non-linear artificial neural network (Franco et al., 2019). The input data for
support vector machine algorithm could achieve the best the mode were spectral features of microalgae, and the model
performance of 80%. The research work had significant was trained using 550 sample data. In the experiment, the model
implications for the search for better light polarization. achieved 98% accuracy in the identification of single microalgae.
A model for automatic classification of live and dead cells in Even for mixtures of multiple microalgae, the model could
Chlorella was proposed by Ronny Reimann et al (Reimann et al., identify the species of microalgae and analyze the proportion
2020) (Figure 5B). Microalgae images were acquired by of the total.
fluorescence microscopy, and features were extracted. Multiple The YOLOv3 network was applied in the detection of
machine learning algorithms were used for the classification microalgae by Jungsu Park et al (Park et al., 2021). A dataset
prediction of live and dead cells of microalgae, and the random of 1114 microalgae images collected using microscopy was
forest algorithm gave the best result with a precision of 96.6%. composed. Depending on the quantity of extracted microalgae
The model could classify not only individual microalgae, but also attributes, the dataset was divided into four parts. After the
the whole microalgae population in terms of live and dead cells YOLOv3 network was trained on these four datasets, the
with an accuracy of 82%. The dead microalgal cells were measured recognition accuracy reached more than 80%. The
significantly larger in diameter and area than the live result fully proved the effectiveness of the approach in
microalgal system. Yanyan Wang et al. utilized machine recognizing microalgae. Further research by the group showed
learning algorithms to classify live and dead microalgae in the that the accuracy of recognition could be further improved by
ocean as well (Wang et al., 2021) (Figure 5C). The microalgae replacing the images in the dataset and recognizing objects with
color ones. An improved framework based on the YOLOv3 complexity of the microalgae culture and the uncertainty of
network was proposed by Mengying Cao et al. to identify the conversation (Enamala et al., 2018; Aghbashlo et al., 2021).
microalgae (Cao et al., 2021). Features were extracted by the Machine learning algorithms can play an essential role in
MobileNet network, and the elements could be fused in later microalgae culture and conversion to biofuels to solve the
operations of this model. The dataset was generated manually by above problems (Georgianna and Mayfield, 2012). By
the team through a camera, with a total of 10,000 images after analysing the existing data, the machine learning algorithms
data enhancement. The experiments showed that the correctness can estimate the optimal environmental and light conditions in
of the model for the microalgae identification was improved by the microalgae culture process, predict the biofuel output rate,
8.59% over the original model, reaching 98.90%. Daniele verify the quality of biofuels, etc. Machine learning algorithms
Gaetano Sirico et al. reported a novel scheme to detect the can make the conversion of energy more efficient and the quality
movement of microalgae in 3D space (Sirico et al., 2022). of biofuels more assured (Rock et al., 2021; Wang et al., 2022).
Mechanical scanning microscope was often challenging to Unlike previous methods of population lipid content
obtain the complete data of microalgae movement, so a digital analysis of microalgae, Baoshan Guo et al. conducted a study
holographic microscope was employed in the framework to of lipid content analysis of individual microalgae by combining
track the trajectory of microalgae movement. Computer optofluidic microscopy images with machine learning (Guo
software and digital image process algorithms synthesized 3D et al., 2017). The method allowed to obtain analytical results
images of microalgae movements and finalized the tracking of in a non-invasive way, without destroying the microalgal
their trajectories. Finally, the model visualized the activity of structure. The authors demonstrated the effectiveness of the
the microalgae. approach through practical experiments with slender-eyed
Many machine learning algorithms have been widely used in worms and E. coli. They predicted that better results could be
the detection and classification of microalgae, such as support achieved if deep learning or unsupervised learning techniques
vector machine, random forest, and neural network (Table 1). In could be introduced. Ahmet Coşguna et al. used a machine
particular, deep learning technology represented by convolutional learning approach to explore the optimal growth conditions and
neural network is most widely utilized. The datasets applied in lipid production factors for microalgae to generate biofuels
deep learning are essentially acquired by the device FlowCAM. (Coşgun et al., 2021)(Figure 6). The dataset and potential
The YOLOv3 network is a kind of deep learning model, which is influence factors were derived from a summary of 102
widely used in microalgae detection due to its perfect recognition scientific studies. Through the analysis of the decision tree
effect on small targets. algorithm, they found 11 combinations of influence conditions
for high microalgal production and 13 incorporations of
influence factors that could lead to increased lipid content.
Conversion from microalgae Rakesh Chandra Joshi et al. reported a new way to estimate
to energy the oil content of microalgae with a machine learning method
(Joshi et al., 2021). They first obtained images of the microalgae
Fossil fuels such as oil have insurmountable problems: the through microscopy. Then, the oil-containing particles in the
non-renewable issue and environmental pollution (Brennan and microalgae images were segmented and analysed for lipid
Owende, 2010). Renewable biofuels are an excellent solution to content. A comparison of the results between the traditional
these problems. Microalgae can be used to make biofuels, and method with the model confirmed that the model significantly
the technology has long been used in reality. But the transfer reduced the computation time, and the predictions were more
from microalgae to biofuels has faced issues such as the accurate. Ehecatl Antonio del Rio Chanona et al. devised a novel
TABLE 1 Machine learning algorithms and models used in microalgae classification and detection.
Support Vector Best Hyper Parameters: gamma: Remove extreme values in Data balance More computation Dhindsa
Machine 92.0, C: 4.3 each attribute et al., 2021
Random Forest Statistical model Voting for the last result Prevent overfitting Poor effect on data Xu et al.,
with few features 2020
Neutral Network Fully connected feed-forward Data were normalized No need for much time or Less accurate for Franco
neural network (3 layers) chemical analysis mixed microalgae et al., 2019
Deep Learning YOLOv3 network a lightweight network as the reduce the position error when Dataset is inadequate Cao et al.,
backbone network detecting small objects 2021
FIGURE 6
The flowchart of production from microalgae to biodiesel. (reproduced with permission from (Coş gun et al., 2021).
framework combined with deep learning technology to temperature and catalyst type were used as input parameters for
investigate optimal conditions for microalgae growth and the the analysis. The performance of the alternating model tree
conversion from microalgae to biofuels (del Rio-Chanona et al., algorithm was the best after the available metrics argument. A
2019). The behavior of the underlying organisms was studied by numerical model containing ANN utilized to evaluate the
coupling hydrodynamic and biodynamic techniques together, behaviour of biodiesel combustion, emission, etc., produced by
and the dataset for deep learning was constructed. The microalgae was designed by Satishchandra Salam et al (Salam and
framework reduced the calculation time from months to days Verma, 2019). ANN was trained through the data obtained from a
and predicted the more appropriate light conditions for software called Diesel-RK. The model accurately predicted the
microalgae growth and the configuration requirements for the combustion and emission factors of the internal combustion
conversion to biofuels. In order to obtain low-cost bio-oil, Bin engine under different response conditions. The redundancy
Long et al. applied machine learning algorithms in the of part system parameters indicated that the model had
cultivation process of microalgae, hoping to get cheaper the potential for further optimization. Hao Chen et al.
microalgae (Long et al., 2022). Factors such as algal density researched the viscosity of microalgae slurry used in the
and light condition were thoroughly analysed to provide optimal biofuel manufacturing process with ANN (Chen et al., 2021).
conditions for the growth of microalgae. A better culture The dataset was derived from 1691 experimental data, and the
environment and minimal light shading were also considered. considered parameters included temperature, microalgal mass
The results of the study were equally applicable to the calculation fraction, shear rate, etc. Experiments demonstrated that this
of conditions for the growth of microalgae in large-scale cultures method had better prediction and outperformed the already
of algae plants in industry and other types of installations. widely used curve-fitting method. Abhijeet Pathy et al. carried
Due to the high price of biodiesel produced from out the prediction of the yield and biochar composition from
microalgae, some people mixed cheap cooking oil such as microalgae to biochar based on machine learning algorithms
canola oil into the biodiesel. Mahdi Rashvand et al. (Pathy et al., 2020). After drilling the model with the training
introduced SVM and ANN algorithms in biodiesel quality data, the model was further refined by comparing experimental
identification (Rashvand et al., 2019). The SVM algorithm data on 13 different parameter combinations. The analysis
was used to analyse the biodiesel phase shift coefficient and results revealed that temperature played a dominant role in
voltage coefficient obtained through the capacitive sensor. In the final yield of biochar. Fangwei Cheng et al. assessed the
contrast, the ANN was used to analyse the image characteristics energy productivity and carbon capture capacity of microalgae
of the biodiesel. The experimental data showed that the through hydrothermal reaction based on machine learning
combination of the two methods together led to optimal algorithms (Cheng et al., 2020). The dataset contained
identification results. Hossein Moayedi et al. compared several 800 items, all extracted from the existing literature.
machine learning algorithms that could evaluate the purity of Numerous experiments had confirmed that the random
biodiesel obtained from microalgae conversion (Moayedi et al., forest algorithm was better in the task than the multiple
2020). The training sample data were obtained from existing linear regression algorithm and the regression tree model.
biodiesel research results, and eight factors including reaction Hydrothermal reaction methods typically had higher energy
production efficiency and carbon capture capacity than 2020). The collected experimental data were trained by a logistic
conventional methods. regression algorithm with regularization to build the binary
Jie Li et al. introduced the machine learning algorithm to the classification model. Factors such as blower speed and
hydrothermal liquefaction process of converting microalgae into temperature were used as input parameters to evaluate the
bio-oil to produce high quality, low nitrogen bio-oil (Li et al., residual water content and lipid yield of the microalgal slurry.
2021). Experiments confirmed that the random forest algorithm The predictions obtained by the method fundamentally
was the optimal choice for this multi-task prediction process. improved the drying speed of microalgae pulp. Nahid Sultana
Both predicted results and experimental data showed that the et al. proposed a new model with ANN and a support vector
lipid content in microalgae and temperature had the most regression algorithm to predict biodiesel produced from
significant effect on oil production. The nitrogen content of microalgae (Sultana et al., 2022). Parameters such as catalyst
microalgae and temperature played a decisive role in the dosage, reaction time, and reaction temperature were used as
nitrogen content of the final bio-oil. Weijin Zhang et al. input hyperparameters, and the hyperparameters were
researched optimal conditions used to produce bio-fuel for automatically adjusted by combining the Bayesian algorithm
different types of microalgae with machine learning methods with ANN. The model was validated using a lot of published
in the hydrothermal liquefaction process (Zhang et al., 2021). data, which proved its effectiveness. These numerical
The hyperparameters included the composition content of simulations used to estimate the yield of microalgae to
microalgae and the primary conditions of hydrothermal biodiesel were not only more accurate but also more time and
liquefaction. After several validations, it was finally shown that cost efficient.
the gradient propelled regression algorithm was better than the In addition, traditional machine learning algorithms are
random forest algorithm for both single-task and multi-task often applied in the transformation process from microalgae
estimation. So far, the whole process has a lot of potential for to bioenergy, while deep learning is rarely applied (Table 2). An
improvement. The adaptive neuro-fuzzy inference system is a important reason for this phenomenon is that there are fewer
new scheme inference that organically connects fuzzy logic and datasets available for deep learning.
neuron network. The system employs an integrated algorithm
containing the back propagation technique and the least squares
method to modulate the model parameters. The algorithm was Important role of microalgae in
introduced into the conversion of microalgae to biodiesel environment protection
production by Momir Milić et al. The training data were
obtained from experimental data in the published literature Microalgae play an important role in environment
(Milić et al., 2021). The method was a fundamental guide for protection and pollution prevention. Their absorption of
the conversion of microalgae to biodiesel. Sashi Sonkar et al. carbon dioxide through photosynthesis can mitigate the global
utilized machine learning algorithms to study the drying of greenhouse effect. (Sundui et al., 2021) In addition, they have an
microalgae pulp for biodiesel production (Sonkar and Mallick, irreplaceable role in wastewater treatment. The addition of
TABLE 2 Machine learning algorithms and models used in biofuel generation from microalgae.
Support Using the general search Based on statistical theory Multi algorithms such as Linear, Higher training cost Rashvand et al.,
Vector algorithm to create the final Quadratic, Cubic and Gaussian 2019
Machine model
Decision Tree “fitctree” function and CART Minimize the validation error Classify any new data correctly Extra training to get Coşgun et al.,
algorithm generalization ability 2021
Random Binary splitting Results using predictions derived Better fitting Complicated to interpret Cheng et al.,
Forest from multiple decision trees 2020
Gradient Gradient boost strategy Ensemble learning algorithm Good compatibility for unbalance Complicated operations Zhang et al.,
boosting datasets 2021
regression
Neutral 4 neurons in input layer, 18 in Hidden layer neuron numbers can Better represent ability Higher training cost Chen et al.,
Network hidden layer, 1 in output layer be varied during training 2021
Deep A CNN consists of two hidden Capabilities of tolerate noise and Prevent overfitting neurons increasing but del Rio-
Learning layers uncertainty accuracy not increase Chanona et al.,
2019
machine learning algorithms makes the role of microalgae even nutrient concentration in the wastewater, were fully considered
more apparent (Cruz et al., 2021) (Figure 7A). in this method. Different combinations of parameters suitable
Microalgae could be used to treat E. coli in wastewater, and for high yield, high phosphorus removal performance, and high
M Ž itnik et al. applied a machine learning algorithm to search nitrogen removal capability were calculated by this method. The
conditions that worked best for treatment (Ž itnik et al., 2019). method provided solid theoretical support for the large-scale
Parameters such as microalgae concentration, E. coli treatment of wastewater. S. M. Zakir Hossain et al. provided an
concentration, pH, and conductivity were analysed by the in-depth analysis of the ability of microalgae to treat municipal
decision tree algorithm. The results showed that conductivity sewage (Hossain et al., 2022a). They aimed to use microalgae to
had the most important effect on the treatment effect of E. coli. remove both nitrogen and phosphorus from sewage. The impact
Based on the results, targeted optimization of the wastewater of factors such as temperature, light, and dark cycles on the final
treatment system could be carried out. Vishal Singh et al. results was well demonstrated. The final consequence revealed
researched the ways to increase microalgal production and that the support vector regression algorithm predicted more
enhance their ability to treat wastewater with machine accurate and efficient results. They also combined the support
learning methods as well (Singh and Mishra, 2022). The vector regression algorithm with the crow search approach for
dataset was derived from publicly available results from recent single and multi-objective optimization to further improve the
years and was fully justified by the decision tree algorithm for effect of microalgae in removing nitrogen and phosphorus from
parameters such as temperature, CO2 content, and pH value. wastewater (Hossain et al., 2022b). Experimental data confirmed
The authors gave different combinations of parameters for the best treatment of wastewater by microalgae at the
improving microalgal production. The way treated wastewater temperature of 29.3 degrees Celsius, 24 hours of uninterrupted
with high nitrogen content and high phosphorus content after light, and nitrogen to phosphorus ratio of 6:1.
experimental validation admirably. They also applied the Muzhen Xu et al. investigated the treatment of heavy metals
decision tree algorithm in the prediction of microalgae growth in wastewater by microalgae based on artificial intelligence
conditions and wastewater treatment conditions (Singh and technology (Xu et al., 2021a). They utilized microscopy to take
Mishra, 2021) (Figure 7B). Parameters that were less involved images of individual microalgae to analyse their behaviour to
in other algorithms, such as initial inoculum, reactor type, and determine their removal effect on heavy metals. The effect
FIGURE 7
(A) Block diagram of Shellfish contamination prediction based on machine learning. (reproduced with permission from an open access article).
(B) Process diagram of wastewater treatment with microalgae. (reproduced with permission from Singh and Mishra, 2021).
of parameters such as eccentricity and compactness were Susanne Dunker et al. proposed a deep learning scheme for
specifically examined. Copper ion experiments proved that this identifying microalgae species and growth cycles (Dunker et al.,
method had a more effective heavy metal removal efficiency. 2018). 47,000 microscope high-throughput images at 60x
The team also used machine learning algorithms to study the magnification were trained on the model. The model achieved
morphology of microalgae in more depth to obtain the 97% accuracy in natural experiments, which was quite good. The
characteristics of microalgae that could efficiently treat heavy framework offered great help for the rapid assessment of water
metals in wastewater (Xu et al., 2021b). The process used quality. D. M. J. Purnomo et al. studied the growth behavior of
microscopy to acquire images of microalgae, enabling the microalgae in solutions with different pH values based on an
assessment of the efficiency of heavy metal removal by extreme learning machine (Purnomo et al., 2015). The team
microalgae in a non-invasive and label-free way. The observed the growth of microalgae for 20 consecutive days and
experimental results showed that the morphology of E. gracilis normalized the data to construct the dataset. A cross-validation
cells was more conducive to the efficient removal of heavy method was introduced to prevent overfitting problem during
metals. Microalgae can mitigate the greenhouse effect by model training. Experiments had shown that the method had a
absorbing carbon dioxide from the atmosphere through high accuracy rate, which could be further improved if used in
photosynthesis. Domenico D’Alelio et al. studied the impact of conjunction with a genetic algorithm. Bi Xiaolin et al. investigated
microalgae on the global warming issue based on machine the impact of pH on the growth of microalgae by analyzing
learning algorithms (D’Alelio et al., 2020). They trained the hyperspectral images with a machine learning algorithm (Bi et al.,
model based on known data downloaded from the Web, and 2019). The spectra of all microalgae were represented by 900
then analysed their collection data of 27 years in the North pixels, 300 pixels were then randomly selected as the training set,
Atlantic. The final analysis showed that as seawater temperature and another 300 pixels were randomly chosen as the validation
increased, the number of microalgae decreased, and distant set. The experimental data revealed that the support vector
marine areas faced nutrient deficiencies in seawater. machine algorithm was the most effective method in identifying
At present, there are few machine learning algorithms used to microalgae and could reflect their growth conditions. The study
assist microalgae in wastewater treatment and other environmental provided excellent technical support for monitoring the growth
protection work, mainly support vector machine algorithm and process of microalgae and analyzing their directional movements.
decision tree algorithm (Table 3). The application record of deep Wendie Levasseur et al. studied the effect of light on the growth of
learning in this field has not been found yet. microalgae, especially in an environment with alternating light
and dark light based on a machine learning algorithm (Levasseur
et al., 2022). Medium and high light, and dark light switch
Application of machine learning in frequencies were used as the focus of the analysis. The growth
the growth phase of microalgae data of low-density green microalgae under different light switch
frequencies were compiled and analyzed by inferential statistics.
The yield of microalgae and the ease of harvesting can Finally, the authors described different experimental setup to
directly affect their cost. There are many factors influencing observe the growth of microalgae. Shixuan He et al. analyzed
the growth and morphology of microalgae, and various the growth of microalgae based on the support vector machine
investigation has been conducted previously to obtain lower algorithm to assess the degree of eutrophication in water bodies
cost microalgae. However, it is either laborious or poorly (He et al., 2018). They first obtained the characteristic of
predicted, making it difficult to provide valuable suggestions microalgae by Raman spectroscopy to get their growth stages.
for actual production. Numerous factors have been analyzed Then they analyzed the relationship between algal growth and
with machine learning algorithms to predict the growth and final environmental changes. The authors presented a full paper on the
yield of microalgae in recent years. effectiveness of the method. A framework that brought together
Support Vector Using the general search algorithm Based on statistical theory k-fold cross-validation is Easy to Hossain et al.,
Machine to create the final model applied against overfitting overfitting 2022b
Decision Tree Governed by if-then rules The ‘cvpartition’ function and ‘HoldOut’ Variable combinations are High Singh and
validation procedure split the dataset easy to change training Mishra, 2021
cost
multiple machine learning algorithms was employed to study the ultimately used to predict nitrate and nitrite concentrations.
growth process of microalgae and the amount of CO2 fixation by The method not only significantly reduced the time required for
S. M. Zakir Hossain et al. Factors such as temperature, nitrogen to detection but also ensured a sufficiently high accuracy. A novel
phosphorus ratio, and frequency of light and dark cycles were framework coupling support vector machine algorithm and
used as input parameters for the whole framework (Hossain et al., random forest algorithm was reported by Patricio Ló pez
2022c) (Figure 8). All algorithms were utilized together with Expó sito et al. to measure microalgae concentration (Expó sito
Bayesian optimization for various predictions. The advantages et al., 2017). The laser irradiated the suspended particles of
and disadvantages of each algorithm in prediction were listed in microalgae to get reflectance spectra, which could be analyzed to
detail in the article. obtain the concentration of microalgae. The team constructed a
A model used to estimate the daily productivity and final dataset of 76 vectors through practical experiments to solve the
production of microalgae in open ponds was surveyed by hyperparameters in the model. The results demonstrated that
Supriyanto et al. Based on an existing dataset, a decision tree the model could quickly and accurately estimate the
method was employed to calculate the effect of temperature, concentration of microalgae. The team also analyzed the floc
solar radiation, and other condition on the growth and final yield length and geometric shape during the growth of microalgae in a
of microalgae (Supriyanto et al., 2018). The efficiency of the random forest regression model to reduce the cost of microalgae
model had been validated by practical evaluation. Its at harvesting (Lopez-Exposito et al., 2019). A set of length
performance could be further improved in the future if more collected by computer software generating virtual flocs after
parameters were added to the model. The group investigated the focused reflection operations was employed as the data set for
production of mixed microalgae in semi-continuous open ponds the training model. The trained and optimized model achieved
based on an artificial neural network as well (Supriyanto et al., very high accuracy in actual tests. An additional advantage of the
2019) (Figure 9). The neural network included a hidden layer model was that it could be quickly adapted to the floc structure
and an output layer, and the input layer contained eight according to the actual requirements.
parameters such as algae concentration, temperature, solar Machine learning algorithms can analyze the effect of
radiation, and pH value. The network was trained through a periphyton factors such as DNA, in addition to macroscopic
mature dataset. The final prediction was the concentration of factors that affect the growth of microalgae (Teng et al., 2020).
microalgae. The data showed that the three-layer neural network Appropriate edit of the microalgae genes could rapidly increase
model worked well for various input parameters. Victor the production and oil content of microalgae. However, the
Pozzobon et al. constructed a machine learning scheme to genomes of microalgae were not only long, but also particularly
check nitrate and nitrite levels in the microalgae growth complex. Therefore, they could not be rapidly localized and
environment (Pozzobon et al., 2021). First of all, different analyzed by conventional methods. Likai Wang et al. applied a
concentrations of nitrate and nitrite samples were extracted logistic regression algorithm to learn the 32 known characteristic
from the microalgal growth environment and analyzed expressions of stress genes to predict the function of the
spectroscopically with a spectrometer. These data were then remaining stress genes (Wang et al., 2018). The authors’ study
analyzed based on a least square regression algorithm and showed that the method had high accuracy. If more feature
FIGURE 8
Process diagram of microalgae cultivation environment. (reproduced with permission from Hossain et al., 2022c).
FIGURE 9
Schematic representation of a framework used to estimate microalgae growth with ANN. (reproduced with permission from Supriyanto et al., 2019).
expressions were learned in known data by deep learning overview refers to the classification and identification of
technique, the performance of the framework was promising microalgae, the conversion of microalgae into bioenergy, the
to be further improved. Supreeta Vijayakumar et al. analyzed the treatment of waste by microalgae, and the growth of
genomes of microalgae based on machine learning algorithms to microalgae. Microalgae are critical part of the marine
explore the feedback of microalgae to changes in light and ecological cycle and have a very significant economic value.
salinity in the environment (Vijayakumar et al., 2020). They However, the classification, identification, and purification of
firstly collected data on photosynthesis and genome-based microalgae have always been a problem for practitioners
energy metabolism of microalgae. These data were then because they are so small and diverse. Machine learning
analyzed by methods such as k-means clustering. The team’s techniques are good at operations such as classification and
further results showed that the combination of machine learning regression, and have been highly successful in digital image
algorithm and genomic model accomplished the work well. processing and speech recognition. The introduction of
Victor Pozzobon et al. analyzed the viability of microalgae by machine learning techniques to microalgae applications has
researching the flow cytometer readings through a machine been equally fruitful. This paper illustrates how data-driven
learning algorithm (Pozzobon et al., 2020). The microalgal machine learning techniques process input data and calculate
activity was obtained by studying the integrity of the cell wall the output results, with algorithms such as support vector
of microalgae after double staining. The validity of the model machine, decision tree, random forest, and neural network as
was verified by freezing the microalgae and observing their examples. How machine learning algorithms have been applied
activity. The results showed that the model predicted data and the results have been achieved in the areas such as
were consistent with those listed in the published literature. microalgae classification, conversion of microalgae into
Among all the microalgae treatment modules in this paper, the bioenergy, microalgae purification of the environment, and
microalgae growth status detection module uses the most machine microalgae growth are then summarized. The paper has
learning algorithms (Table 4). Although deep learning has not been tremendous implications for future extensions of machine
widely used in this work, it has achieved very significant results. learning in microalgae applications.
Conclusion Prospect
This paper provides a detailed summary of machine Many achievements have been made in the application of
learning techniques used in microalgae treatment. The machine learning in microalgae identification and treatment.
TABLE 4 Machine learning algorithms and models used in microalgae growth monitor.
Logic Regression 23 variables Detectability for certain Best performance in all Nonlinear expression Wang et al.,
differentially expressed genes models ability 2018
Support Vector RBF_kernel function Small sample Generalization error Need actual case He et al., 2018
Machine minimum verification
Decision Tree 25 decision output, 49 leaf’s with Suitable for the Open Raceway Easy to evaluate and use Cannot build without Supriyanto
the depth of tree is 6 Pond sufficient dataset et al., 2018
Random Forest Four hyperparameter Averaging the output of the Handle highly-dimensional Difficult to produce Lopez-Exposito
regression trees input in short time training data et al., 2019
k-means k=6 Clustering algorithm Avoids an increase in data High calculation cost Vijayakumar
dimensionality et al., 2020
Neutral Network 8 input neurons, 11 hidden Multilayer backpropagation High prediction accuracy High training cost Supriyanto
neurons, 1 output neuron neural network et al., 2019
Deep Learning A feed-forward CNN Transfer learning Powerful Need lots of images Dunker et al.,
2018
However, there are still many aspects that can be further Funding
optimized. First of all, there are very few datasets available for
machine learning algorithms in microalgae process. Even though This work is supported by the National Natural Science
some datasets have been widely used in some specific areas, they Foundation of China (Grant No. 52075138), the Anhui Provincial
still face the problem of over-fitting (Rani et al., 2021). Secondly, Natural Science Foundation (Grant No. 2008085QF329), the
when the performance of a single machine learning algorithm is Natural Science Foundation for the Higher Education Institutions
limited, multiple algorithms can be coupled to build a hybrid of Anhui Province (Grant No. KJ2020A0061), and the Natural
model. The dataset that accompanies the hybrid model also need Science General Project of Anhui Science and Technology
to be studied in depth (Sundui et al., 2021). Besides, the University (Grant No. 2021zryb26).
improvement of the performance of existing machine learning
models is also a key work in the future. For example, the current
cost of biodiesel converted from microalgae is significantly Conflict of interest
higher than diesel derived from fossil fuels. Both modification
of existing models and construction of new models to assist the The authors declare that the research was conducted in the
conversion of microalgae to biodiesel require a lot of work absence of any commercial or financial relationships that could
(Chowdhury and Loganathan, 2019). be construed as a potential conflict of interest.
Andersen, R. A., and Kawachi, M. (2005). Microalgae isolation techniques. Algal Coşgun, A., Günay, M. E., and Yıldırım, R. (2021). Exploring the critical factors
culturing techniques, 83–100. doi: 10.1016/b978-012088426-1/50007-x of algal biomass and lipid production for renewable fuel production by machine
Ayyagari, M. R. (2020). Classification of imbalanced datasets using one-class learning. Renewable Energy 163, 1299–1317. doi: 10.1016/j.renene.2020.09.034
SVM, k-nearest neighbors and CART algorithm. Int. J. Advanced Comput. Sci. Cruz, R. C., Costa, P. R., Vinga, S., Krippahl, L., and Lopes, M. B. (2021). A
Appl. 11, 1–5. doi: 10.14569/IJACSA.2020.0111101 review of recent machine learning advances for forecasting harmful algal blooms
Bakirtzis, A. G., Petridis, V., Kiartzis, S. J., Alexiadis, M. C., and Maissis, A. H. and shellfish contamination. J. Mar. Sci. Eng. 9, 283–99. doi: 10.3390/jmse9030283
(1996). A neural network short term load forecasting model for the Greek power D’Alelio, D., Rampone, S., Cusano, L. M., Morfino, V., Russo, L., Sanseverino,
system. IEEE Trans. Power Syst. 11, 858–863. doi: 10.1109/59.496166 N., et al. (2020). Machine learning identifies a strong association between warming
Barsanti, L., Birindelli, L., and Gualtieri, P. (2021). Water monitoring by means and reduced primary productivity in an oligotrophic ocean gyre. Sci. Rep. 10, 3287.
of digital microscopy identification and classification of microalgae. Environ. doi: 10.1038/s41598-020-59989-y
Science: Processes Impacts 23, 1443–1457. doi: 10.1039/D1EM00258A Deka, P. C. (2014). Support vector machine applications in the field of
Belgiu, M., and Drăguţ, L. (2016). Random forest in remote sensing: A review of hydrology: a review. Appl. soft computing 19, 372–386. doi: 10.1016/
applications and future directions. ISPRS J. photogrammetry Remote Sens. 114, 24– j.asoc.2014.02.002
31. doi: 10.1016/j.isprsjprs.2016.01.011 del Rio-Chanona, E. A., Wagner, J. L., Ali, H., Fiorelli, F., Zhang, D., and
Bi, X., Lin, S., Zhu, S., Yin, H., Li, Z., and Chen, Z. (2019). Species identification Hellgardt, K. (2019). Deep learning-based surrogate modeling and optimization for
and survival competition analysis of microalgae via hyperspectral microscopic microalgal biofuel production and photobioreactor design. AIChE J. 65, 915–923.
images. Optik 176, 191–197. doi: 10.1016/j.ijleo.2018.09.077 doi: 10.1002/aic.16473
Bishop, C. M. (2013). Model-based machine learning. Philos. Trans. R. Soc. A: Dhindsa, A., Bhatia, S., Agrawal, S., and Sohi, B. S. (2021). An improvised
Mathematical Phys. Eng. Sci. 371, 20120222. doi: 814 10.1098/rsta.2012.0222 machine learning model based on mutual information feature selection approach
for microbes classification. Entropy 23, 257–271. doi: 10.3390/e23020257
Bonissone, P., Cadenas, J. M., Garrido, M. C., and Dı́az-Valladares, R. A. (2010).
A fuzzy random forest. Int. J. Approximate Reasoning 51, 729–747. doi: 10.1016/ Dietterich, T. G. (1997). Machine-learning research. AI magazine 18, 97–97. doi:
j.ijar.2010.02.003 10.1609/aimag.v18i4.1324
Boser, B. E., Guyon, I. M., and Vapnik, V. N. (1992). A training algorithm for Drews, P., Colares, R. G., Machado, P., De Faria, M., Detoni, A., and Tavano, V.
optimal margin classifiers. In Proceedings of the 5th Annual Workshop on (2013). Microalgae classification using semi-supervised and active learning based
Computational Learning Theory ACM Press, 144–152. doi: 10.1145/130385.130401 on Gaussian mixture models. J. Braz. Comput. Soc. 19, 411–422. doi: 10.1007/
s13173-013-0121-y
Boulesteix, A. L., Janitza, S., Kruppa, J., and König, I. R. (2012). Overview of
random forest methodology and practical guidance with emphasis on Dunker, S., Boho, D., Wäldchen, J., and Mäder, P. (2018). Combining high-
computational biology and bioinformatics. Wiley Interdiscip. Reviews: Data Min. throughput imaging flow cytometry and deep learning for efficient species and life-
Knowledge Discovery 2, 493–507. doi: 10.1002/widm.1072 cycle stage identification of phytoplankton. BMC Ecol. 18, 51. doi: 10.1186/s12898-
018-0209-5
Breiman, L. (2001). Random forests. Mach. Learn. 45, 5–32. doi: 10.1023/
A:1010933404324 El Naqa, I., and Murphy, M. J. (2015). What is machine learning?. In: I. El Naqa
and M. J. Murphy. (eds) Machine Learning in Radiation Oncology: Theory and
Brennan, L., and Owende, P. (2010). Biofuels from microalgae–a review of Applications (Cham: Springer International Publishing(Switzerland)).
technologies for production, processing, and extractions of biofuels and co-products.
Renewable Sustain. Energy Rev. 14, 557–577. doi: 10.1016/j.rser.2009.10.009 Elomaa, T. (1994). “In defense of C4. 5: Notes on learning one-level decision
trees,” In: W Cohen and H Hirsh. (eds) Machine Learning Proceedings 1994. San
Cao, M., Wang, J., Chen, Y., and Wang, Y. (2021). Detection of microalgae Francisco (CA): Morgan Kaufmann.
objects based on the improved YOLOv3 model. Environ. Science: Processes Impacts
23, 1516–1530. doi: 10.1039/D1EM00159K Enamala, M. K., Enamala, S., Chavali, M., Donepudi, J., Yadavalli, R., Kolapalli,
B., et al. (2018). Production of biofuels from microalgae - a review on cultivation,
Carleo, G., Cirac, I., Cranmer, K., Daudet, L., Schuld, M., Tishby, N., et al. harvesting, lipid extraction, and numerous applications of microalgae. Renewable
(2019). Machine learning and the physical sciences. Rev. Modern Phys. 91, 045002. Sustain. Energy Rev. 94, 49–68. doi: 10.1016/j.rser.2018.05.012
doi: 10.1103/RevModPhys.91.045002
Expó sito, P. L., Suá rez, A. B., and Á lvarez, C. N. (2017). Laser reflectance
Chakdar, H., Hasan, M., Pabbi, S., Nevalainen, H., and Shukla, P. (2021). High- measurement for the online monitoring of chlorella sorokiniana biomass
throughput proteomics and metabolomic studies guide re-engineering of metabolic concentration. J. Biotechnol. 243, 10–15. doi: 10.1016/j.jbiotec.2016.12.020
pathways in eukaryotic microalgae: A review. Bioresource Technol. 321, 124495.
doi: 10.1016/j.biortech.2020.124495 Farnaaz, N., and Jabbar, M. A. (2016). Random forest modeling for network
intrusion detection system. Proc. Comput. Sci. 89, 213–217. doi: 10.1016/
Chen, H., Fu, Q., Liao, Q., Zhu, X., and Shah, A. (2021). Applying artificial j.procs.2016.06.047
neural network to predict the viscosity of microalgae slurry in hydrothermal
hydrolysis process. Energy AI 4, 100053. doi: 10.1016/j.egyai.2021.100053 Ferro, L., Gentili, F. G., and Funk, C. (2018). Isolation and characterization of
microalgal strains for biomass production and wastewater reclamation in northern
Cheng, C., Cheng, X., Dai, N., Jiang, X., Sun, Y., and Li, W. (2015). Prediction of Sweden. Algal Res. 32, 44–53. doi: 10.1016/j.algal.2018.03.006
facial deformation after complete denture prosthesis using BP neural network.
Comput. Biol. Med. 66, 103–112. doi: 10.1016/j.compbiomed.2015.08.018 Franco, B. M., Navas, L. M., Gó mez, C., Sepú lveda, C., and Acié n, F. G. (2019).
Monoalgal and mixed algal cultures discrimination by using an artificial neural
Cheng, F., Porter, M. D., and Colosi, L. M. (2020). Is hydrothermal treatment network. Algal Res. 38, 101419. doi: 10.1016/j.algal.2019.101419
coupled with carbon capture and storage an energy-producing negative emissions
technology? Energy Conversion Manage. 203, 112252. doi: 10.1016/ Friedl, M. A., and Brodley, C. E. (1997). Decision tree classification of land cover
j.enconman.2019.112252 from remotely sensed data. Remote Sens. Environ. 61, 399–409. doi: 10.1016/S0034-
4257(97)00049-7
Chen, P.-H., Lin, C.-J., and Schölkopf, B. (2005). A tutorial on n-support vector
machines. Appl. Stochastic Models Business Industry 21, 111–136. doi: 10.1002/ Georgianna, D. R., and Mayfield, S. P. (2012). Exploiting diversity and synthetic
asmb.537 biology for the production of algal biofuels. Nature 488, 329–335. doi: 10.1038/
nature11479
Chen, C. L., Mahjoubfar, A., Tai, L.-C., Blaby, I. K., Huang, A., Niazi, K. R., et al.
(2016). Deep learning in label-free cell classification. Sci. Rep. 6, 21471. doi: Giraldo-Zuluaga, J. H., Diez, G., Gomez, A., Martinez, T., Vasquez, M. P.,
10.1038/srep21471 Bonilla, J., et al. (2016). Automatic identification of scenedesmus polymorphic
microalgae from microscopic images. Pattern Analysis and Applications 21, 601–
Chew, K. W., Yap, J. Y., Show, P. L., Suan, N. H., Juan, J. C., Ling, T. C., et al. 612 doi: 10.1007/s10044-017-0662-3
(2017). Microalgae biorefinery: high value products perspectives. Bioresource
Technol. 229, 53–62. doi: 10.1016/j.biortech.2017.01.006 Gomez-Espinoza, O., Guerrero-Barrantes, M., Meneses-Montero, K., and
Nú ñez-Montero, K. (2018). Identification of a microalgae collection isolated
Chowdhury, H., and Loganathan, B. (2019). Third-generation biofuels from from Costa Rica by 18S rDNA sequencing. Acta Bioló gica Colombiana 23, 199.
microalgae: a review. Curr. Opin. Green Sustain. Chem. 20, 39–44. doi: 10.1016/ doi: 10.15446/abc.v23n2.68088
j.cogsc.2019.09.003
Guo, B., Lei, C., Kobayashi, H., Ito, T., Yalikun, Y., Jiang, Y., et al. (2017). High-
Cichy, R. M., and Kaiser, D. (2019). Deep neural networks as scientific models. throughput, label-free, single-cell, microalgal lipid screening by machine-learning-
Trends Cogn. Sci. 23, 305–317. doi: 10.1016/j.tics.2019.01.009 equipped optofluidic time-stretch quantitative phase microscopy. Cytometry Part A
Correa, I., Drews, P., Botelho, S., Souza, M. S. D., and Tavano, V. M. (2017). 91, 494–502. doi: 10.1002/cyto.a.23084
“Deep learning for microalgae classification,” in 2017 16th IEEE International Gu, J., Wang, Z., Kuen, J., Ma, L., Shahroudy, A., Shuai, B., et al. (2018). Recent
Conference on Machine Learning and Applications (ICMLA): Cancun, Mexico, 18- advances in convolutional neural networks. Pattern Recognition 77, 354–377. doi:
21 Dec IEEE(USA) Vol. 2017, 20–25. 10.1016/j.patcog.2017.10.013
Harmon, J., Mikami, H., Kanno, H., Ito, T., and Goda, K. (2020). Accurate Mahesh, B. (2020). Machine learning algorithms-a review. Int. J. Sci. Res. (IJSR)
classification of microalgae by intelligent frequency-division-multiplexed 9, 381–386. doi: 10.21275/ART20203995
fluorescence imaging flow cytometry. OSA Continuum 3, 430–440. doi: 10.1364/ Mcculloch, W. S., and Pitts, W. (1943). A logical calculus of the ideas immanent
OSAC.387523 in nervous activity. Bull. Math. biophysics 5, 115–133. doi: 10.1007/BF02478259
Hearst, M. A., Dumais, S. T., Osuna, E., Platt, J., and Scholkopf, B. (1998). McCulloch, W. S., and Pitts, W. (1990). A logical calculus of the ideas immanent
Support vector machines. IEEE Intelligent Syst. their Appl. 13, 18–28. doi: 10.1109/ in nervous activity. Bull. Math. Biol. 52, 99–115. doi: 10.1016/S0092-8240(05)
5254.708428 80006-0
He, S., Fang, S., Xie, W., Zhang, P., Li, Z., Zhou, D., et al. (2018). Assessment of Meyer, D., Leisch, F., and Hornik, K. (2003). The support vector machine under
physiological responses and growth phases of different microalgae under test. Neurocomputing 55, 169–186. doi: 10.1016/S0925-2312(03)00431-4
environmental changes by raman spectroscopy with chemometrics.
Spectrochimica Acta Part A: Mol. Biomolecular Spectrosc. 204, 287–294. doi: Milić , M., Petković, B., Selmi, A., Petković, D., Jermsittiparsert, K., Radivojević, A.,
10.1016/j.saa.2018.06.060 et al. (2021). Computational evaluation of microalgae biomass conversion to biodiesel.
Biomass Conversion Biorefinery 11, 1–8. doi: 10.1007/s13399-021-01314-2
He, K., Zhang, X., Ren, S., and Sun, J. (2015). Delving deep into rectifiers:
Surpassing human-level performance on imagenet classification. 2015 IEEE Moayedi, H., Aghel, B., Foong, L. K., and Bui, D. T. (2020). Feature validity
International Conference on Computer 929 Vision (ICCV), 7–13 Dec. IEEE(USA) during machine learning paradigms for predicting biodiesel purity. Fuel 262,
2015, 1026–1034. doi: 10.1109/ICCV.2015.123 116498. doi: 10.1016/j.fuel.2019.116498
Hossain, S. M. Z., Sultana, N., Jassim, M. S., Coskuner, G., Hazin, L. M., Razzak, Mochdia, K., and Tamaki, S. (2021). Transcription factor-based genetic
S. A., et al. (2022a). Soft-computing modeling and multiresponse optimization for engineering in microalgae. Plants 10, 1602–09. doi: 10.3390/plants10081602
nutrient removal process from municipal wastewater using microalgae. J. Water Mofijur, M., Rasul, M. G., Hassan, N. M. S., and Nabi, M. N. (2019). Recent
Process Eng. 45, 102490. doi: 10.1016/j.jwpe.2021.102490 development in the production of third generation biodiesel from microalgae.
Hossain, S. M. Z., Sultana, N., Mohammed, M. E., Razzak, S. A., and Hossain, M. Energy Proc. 156, 53–58. doi: 10.1016/j.egypro.2018.11.088
M. (2022b). Hybrid support vector regression and crow search algorithm for Mohammed, O., Park, D., Merchant, R., Dinh, T., Tong, C., Azeem, A., et al.
modeling and multiobjective optimization of microalgae-based wastewater (1995). Practical experiences with an adaptive neural network short-term load
treatment. J. Environ. Manage. 301, 113783. doi: 10.1016/j.jenvman.2021.113783 forecasting system. IEEE Trans. Power Syst. 10, 254–265. doi: 10.1109/59.373948
Hossain, S. M. Z., Sultana, N., Razzak, S. A., and Hossain, M. M. (2022c). Montavon, G., Samek, W., and MÜLLER, K.-R. (2018). Methods for interpreting
Modeling and multi-objective optimization of microalgae biomass production and and understanding deep neural networks. Digital Signal Process. 73, 1–15. doi:
CO2 biofixation using hybrid intelligence approaches. Renewable Sustain. Energy 10.1016/j.dsp.2017.10.011
Rev. 157, 112016. doi: 10.1016/j.rser.2021.112016
Myles, A. J., Feudale, R. N., Liu, Y., Woody, N. A., and Brown, S. D. (2004). An
Hsu, K. Y., Li, H. Y., and Psaltis, D. (1990). Holographic implementation of a introduction to decision tree modeling. J. Chemometrics: A J. Chemometrics Soc. 18,
fully connected neural network. Proc. IEEE 78, 1637–1645. doi: 10.1109/5.58357 275–285. doi: 10.1002/cem.873
Huang, H., Sun, Z., Zhang, Z., Chen, X., Di, Y., Zhu, F., et al. (2021). The Nugraha, W., Maulana, M. S., and Sasongko, A. (2020). Clustering based
identification of spherical engineered microplastics and microalgae by micro- undersampling for handling class imbalance in C4. 5 classification algorithm.
hyperspectral imaging. Bull. Environ. Contamination Toxicol. 107, 764–769. doi: Journal of Physics: Conference Series (UK: IOP Publishing) 1641, 012014.
10.1007/s00128-021-03131-9
Otá lora, P., Guzmá n, J. L., Acié n, F. G., Berenguel, M., and Reul, A. (2021).
Işıl, Ç ., dE Haan, K., Göröcs, Z., Koydemir, H. C., Peterman, S., Baum, D., et al. Microalgae classification based on machine learning techniques. Algal Res. 55,
(2021b). “Label-free imaging flow cytometry for phenotypic analysis of microalgae 102256. doi: 10.1016/j.algal.2021.102256
populations using deep learning,” In: CPTAR Mazzali and R Kaindl Frontiers in
Pal, M., and Mather, P. M. (2003). An assessment of the effectiveness of decision
Optics + Laser Science Washington, DC: Optica Publishing Group(America).
tree methods for land cover classification. Remote Sens. Environ. 86, 554–565. doi:
2021/11/01 2021b.
10.1016/S0034-4257(03)00132-9
Işıl, Ç ., dE Haan, K., Göröcs, Z., Koydemir, H. C., Peterman, S., Baum, D., Song,
Park, J., Baek, J., You, K., Nam, S. W., and Kim, J. (2021). Microalgae detection
F., et al. (2021a). Phenotypic analysis of microalgae populations using label-free
using a deep learning object detection algorithm, YOLOv3. J. Korean Soc. Water
imaging flow cytometry and deep learning. ACS Photonics 8, 1232–1242. doi:
Environ. 37, 275–285. doi: 10.15681/KSWE.2021.37.4.275
10.1021/acsphotonics.1c00220
Patel, H. H., and Prajapati, P. (2018). Study and analysis of decision tree based
Jordan, M. I., and Mitchell, T. M. (2015). Machine learning: Trends,
classification algorithms. Int. J. Comput. Sci. Eng. 6, 74–78. doi: 10.26438/ijcse/
perspectives, and prospects. Science 349, 255–260. doi: 10.1126/science.aaa8415
v6i10.7478
Joshi, R. C., Dhup, S., Kaushik, N., and Dutta, M. K. (2021). An efficient oil
Pathy, A., Meher, S., and Balasubramanian, P. (2020). Predicting algal biochar
content estimation technique using microscopic microalgae images. Ecol. Inf. 66,
yield using eXtreme gradient boosting (XGB) algorithm of machine learning
101468. doi: 10.1016/j.ecoinf.2021.101468
methods. Algal Res. 50, 102006. doi: 10.1016/j.algal.2020.102006
Kuo, C. C. J. (2016). Understanding convolutional neural networks with a
Paul, A., Mukherjee, D. P., Das, P., Gangopadhyay, A., Chintha, A. R., and
mathematical model. J. Visual Communication Image Representation 41, 406–413.
Kundu, S. (2018). Improved random forest for classification. IEEE Trans. Image
doi: 10.1016/j.jvcir.2016.11.003
Process. 27, 4012–4024. doi: 10.1109/TIP.2018.2834830
Lecun, Y., and Bengio, Y. (1995). “Convolutional networks for images, speech,
Peniuk, G. T., Schnurr, P. J., and Allen, D. G. (2016). Identification and
and time series,” in The handbook of brain theory and neural networks Cambridge,
quantification of suspended algae and bacteria populations using flow cytometry:
MA, USA: MIT Press, vol. 3361, 255–258.
applications for algae biofuel and biochemical growth systems. J. Appl. phycology
Levasseur, W., Pozzobon, V., and Perré , P. (2022). Green microalgae in 28, 95–104. doi: 10.1007/s10811-015-0569-6
intermittent light: a meta-analysis assisted by machine learning. J. Appl.
Pozzobon, V., Levasseur, W., Guerin, C., and Perré , P. (2021). Nitrate and nitrite
Phycology 34, 135–158. doi: 10.1007/s10811-021-02603-z
as mixed source of nitrogen for chlorella vulgaris: fast nitrogen quantification using
Liakos, K. G., Busato, P., Moshou, D., Pearson, S., and Bochtis, D. (2018). spectrophotometer and machine learning. J. Appl. Phycology 33, 1389–1397. doi:
Machine learning in agriculture: A review. Sensors 18, 2674. doi: 10.3390/ 10.1007/s10811-021-02422-2
s18082674
Pozzobon, V., Levasseur, W., Viau, E., Michiels, E., Clé ment, T., and Perré , P.
Li, M., Xu, H., and Deng, Y. (2019). Evidential decision tree based on belief (2020). Machine learning processing of microalgae flow cytometry readings:
entropy. Entropy 21, 897–910. doi: 10.3390/e21090897 illustrated with chlorella vulgaris viability assays. J. Appl. Phycology 32, 2967–
Li, J., Zhang, W., Liu, T., Yang, L., Li, H., Peng, H., et al. (2021). Machine 2976. doi: 10.1007/s10811-020-02180-7
learning aided bio-oil production with high energy recovery and low nitrogen Pradhan, A. (2012). Support vector machine-a survey. Int. J. Emerging Technol.
content from hydrothermal liquefaction of biomass with experiment verification. Advanced Eng. 2, 82–85. doi: 10.1007/978-3-662-47926-1_26
Chem. Eng. J. 425, 130649. doi: 10.1016/j.cej.2021.130649
Promdaen, S., Wattuya, P., and Sanevas, N. (2014). Automated microalgae
Long, B., Fischer, B., Zeng, Y., Amerigian, Z., Li, Q., Bryant, H., et al. (2022). image classification. Proc. Comput. Sci. 29, 1981–1992. doi: 10.1016/
Machine learning-informed and synthetic biology-enabled semi-continuous algal j.procs.2014.05.182
cultivation to unleash renewable fuel productivity. Nat. Commun. 13, 541. doi: Purnomo, D. M. J., Purbarani, S. C., Wibisono, A., Hendrayanti, D.,
10.1038/s41467-021-27665-y Bowolaksono, A., Mursanto, P., et al. (2015). Genetic algorithm optimization for
Lopez-Exposito, P., Negro, C., and Blanco, A. (2019). Direct estimation of extreme learning machine based microalgal growth forecasting of chlamydomonas
microalgal flocs fractal dimension through laser reflectance and machine learning. sp. 2015 International Conference on Advanced Computer Science and Information
Algal Res. 37, 240–247. doi: 10.1016/j.algal.2018.12.007 Systems (ICACSIS), 10-11 Oct. 2015, 243–248. doi: 10.1109/ICACSIS.2015.7415189
Qi, Y. (2012). “Random forest for bioinformatics,” In: C. Zhang and Y. Ma. Speiser, J. L., Miller, M. E., Tooze, J., and Ip, E. (2019). A comparison of random
(eds.) Ensemble Machine Learning: Methods and Applicationsg (Boston, MA: forest variable selection methods for classification prediction modeling. Expert Syst.
Springer US). Appl. 134, 93–101. doi: 10.1016/j.eswa.2019.05.028
Quinlan, J. R. (1986). Induction of decision trees. Mach. Learn. 1, 81–106. doi: Sultana, N., Hossain, S. M. Z., Abusaad, M., Alanbar, N., Senan, Y., and Razzak,
10.1007/BF00116251 S. A. (2022). Prediction of biodiesel production from microalgal oil using Bayesian
Rani, P., Kotwal, S., Manhas, J., Sharma, V., and Sharma, S. (2021). Machine optimization algorithm-based machine learning approaches. Fuel 309, 122184. doi:
learning and deep learning based computational approaches in automatic 10.1016/j.fuel.2021.122184
microorganisms image recognition: Methodologies, challenges, and Sundui, B., Calderon, O. A. R., Abdeldayem, O. M., Lá zaro-Gil, J., Rene, E. R.,
developments. Arch. Comput. Methods Eng 29, 641–677. doi: 10.1007/s11831- and Sambuu, U. (2021). Applications of machine learning algorithms for biological
021-09639-x wastewater treatment: Updates and perspectives. Clean Technol. Environ. Policy 23,
Rashvand, M., Zenouzi, A., and Abbaszadeh, R. (2019). Potential of image 127–143. doi: 10.1007/s10098-020-01993-x
processing, dielectric spectroscopy and intelligence methods in order to Supriyanto,, Noguchi, R., Ahamed, T., Mikihide, D., and Watanabe, M. M.
authentication of microalgae biodiesel. Measurement 148, 106962. doi: 10.1016/ (2018). “A decision tree approach to estimate the microalgae production in open
j.measurement.2019.106962 raceway pond,” in IOP Conference Series: Earth and Environmental Science, 3rd
Reimann, R., Zeng, B., Jakopec, M., Burdukiewicz, M., Petrick, I., Schierack, P., International conference on biomass: Accelerating the technical development and
et al. (2020). Classification of dead and living microalgae chlorella vulgaris by commercialization for sustainable bio-based products and energy, Bogor, Indonesia:
bioimage informatics and machine learning. Algal Res. 48, 101908. doi: 10.1016/ IOP Publishing (UK), 209. 012050.
j.algal.2020.101908 Supriyanto,, Noguchi, R., Ahamed, T., Rani, D. S., Sakurai, K., Nasution, M. A.,
Rock, A., Lucie, N., and Green, D. H. (2021). Synthetic biology is essential to et al. (2019). Artificial neural networks model for estimating growth of polyculture
unlock commercial biofuel production through hyper lipid-producing microalgae: microalgae in an open raceway pond. Biosyst. Eng. 177, 122–129. doi: 10.1016/
a review. J. Appl. Phycology 2, 41–59. doi: 10.1080/26388081.2021.1886872 j.biosystemseng.2018.10.002
Rodriguez-Galiano, V. F., Ghimire, B., Rogan, J., Chica-Olmo, M., and Rigol- Suykens, J. A. K., and Vandewalle, J. (1999). Least squares support vector machine
Sanchez, J. P. (2012). An assessment of the effectiveness of a random forest classifier classifiers. Neural Process. Lett. 9, 293–300. doi: 10.1023/A:1018628609742
for land-cover classification. ISPRS J. photogrammetry Remote Sens. 67, 93–104. Swain, P. H., and Hauska, H. (1977). The decision tree classifier: Design and
doi: 10.1016/j.isprsjprs.2011.11.002 potential. IEEE Trans. Geosci. Electron. 15, 142–147. doi: 10.1109/TGE.1977.6498972
Rosenblatt, F. (1958). The perceptron: a probabilistic model for information Teng, S. Y., Yew, G. Y., Sukačová , K., Show, P. L., Má sǎ , V., and Chang, J.-S.
storage and organization in the brain. psychol. Rev. 65, 386. doi: 10.1037/h0042519 (2020). Microalgae with artificial intelligence: A digitalized perspective on genetics,
Rumelhart, D. E., Hinton, G. E., and Williams, R. J. (1986). Learning systems and products. Biotechnol. Adv. 44, 107631. doi: 10.1016/
representations by back-propagating errors. nature 323, 533–536. doi: 10.1038/ j.biotechadv.2020.107631
323533a0 Tian, L., and Noore, A. (2004). Short-term load forecasting using optimized
Sagheer, A., Zidan, M., and Abdelsamea, M. M. (2019). A novel autonomous neural network with genetic algorithm. 2004 International Conference on
perceptron model for pattern classification applications. Entropy 21, 763–786. doi: Probabilistic Methods Applied to Power 1140 Systems, 12-16 Sept. 2004, 135–140.
10.3390/e21080763 doi: 10.1109/PMAPS.2004.243045
Sain, S. R. (1996). The nature of statistical learning theory. Technometrics 38, Vapnik, V. N. (1999). An overview of statistical learning theory. IEEE Trans.
409–409. doi: 10.1080/00401706.1996.10484565 Neural Networks 10, 988–999. doi: 10.1109/72.788640
Salam, S., and Verma, T. N. (2019). Appending empirical modelling to Vijayakumar, S., Rahman, P. K. S. M., and Angione, C. (2020). A hybrid flux
numerical solution for behaviour characterisation of microalgae biodiesel. Energy balance analysis and machine learning pipeline elucidates metabolic adaptation in
Conversion Manage. 180, 496–510. doi: 10.1016/j.enconman.2018.11.014 cyanobacteria. iScience 23, 101818. doi: 10.1016/j.isci.2020.101818
Sá , C., Leal, M. C., Silva, A., Nordez, S., André , E., Paula, J., et al. (2013). Wang, Y., Ju, P., Wang, S., Su, J., Zhai, W., and Wu, C. (2021). Identification of
Variation of phytoplankton assemblages along the Mozambique coast as revealed living and dead microalgae cells with digital holography and verified in the East
by HPLC and microscopy. J. Sea Res. 79, 1–11. doi: 10.1016/j.seares.2013.01.001 China Sea. Mar. pollut. Bull. 163, 111927. doi: 10.1016/j.marpolbul.2020.111927
Saputro, T. B., Purwani, K. I., Ermavitalini, D., and Saifullah, A. F. (2019). Wang, W., Men, C., and Lu, W. (2008). Online prediction model based on support
Isolation of high lipids content microalgae from wonorejo rivers, Surabaya, vector machine. Neurocomputing 71, 550–558. doi: 10.1016/j.neucom.2007.07.020
Indonesia and its identification using rbcL marker gene. Biodiversitas J. Biol. Wang, Z., Peng, X., Xia, A., Shah, A. A., Huang, Y., Zhu, X., et al. (2022). The role
Diversity 20, 1380–1388. doi: 10.13057/biodiv/d200530 of machine learning to boost the bioenergy and biofuels conversion. Bioresource
Sarıgül, M., Ozyildirim, B. M., and Avci, M. (2019). Differential convolutional Technol. 343, 126099. doi: 10.1016/j.biortech.2021.126099
neural network. Neural Networks 116, 279–287. doi: 10.1016/j.neunet.2019.04.025 Wang, L., Xi, Y., Sung, S., and Qiao, H. (2018). RNA-Seq assistant: machine
Shahid, N., Naqvi, I. H., and Qaisar, S. B. (2015). One-class support vector learning based methods to identify more transcriptional regulated genes. BMC
machines: analysis of outlier detection for wireless sensor networks in harsh Genomics 19, 546. doi: 10.1186/s12864-018-4932-2
environments. Artif. Intell. Rev. 43, 515–563. doi: 10.1007/s10462-013-9395-x Wei, J., Chu, X., Sun, X.-Y., Xu, K., Deng, H.-X., Chen, J., et al. (2019). Machine
Sharma, S., Agrawal, J., and Sharma, S. (2013). Classification through machine learning in materials science. InfoMat 1, 338–358. doi: 10.1002/inf2.12028
learning technique: C4. 5 algorithm based on various entropies. Int. J. Comput. Wei, L., Su, K., Zhu, S., Yin, H., Li, Z., Chen, Z., et al. (2017). Identification of
Appl. 82, 28–32. doi: 10.5120/14249-2444 microalgae by hyperspectral microscopic imaging system. Spectrosc. Lett. 50, 59–
Shi, T., and Horvath, S. (2006). Unsupervised learning with random 63. doi: 10.1080/00387010.2017.1287094
forest predictors. J. Comput. Graphical Stat 15, 118–138. doi: 10.1198/106186006 Wellner, B., Grand, J., Canzone, E., Coarr, M., Brady, P. W., Simmons, J., et al.
X94072 (2017). Predicting unplanned transfers to the intensive care unit: a machine
Singh, V., and Mishra, V. (2021). Exploring the effects of different combinations learning approach leveraging diverse clinical elements. JMIR Med. Inf. 5, e8680.
of predictor variables for the treatment of wastewater by microalgae and biomass doi: 10.2196/medinform.8680
production. Biochem. Eng. J. 174, 108129. doi: 10.1016/j.bej.2021.108129 Widodo, A., and Yang, B.-S. (2007). Support vector machine in machine
Singh, V., and Mishra, V. (2022). Evaluation of the effects of input variables on condition monitoring and fault diagnosis. Mechanical Syst. Signal Process. 21,
the growth of two microalgae classes during wastewater treatment. Water Res. 213, 2560–2574. doi: 10.1016/j.ymssp.2006.12.007
118165. doi: 10.1016/j.watres.2022.118165 Xu, M., Harmon, J., Hasunuma, T., Isozaki, A., and Goda, K. (2021a). “Ai on a chip
Sirico, D. G., Cavalletti, E., Miccio, L., Bianco, V., Memmolo, P., Sardo, A., et al. for identifying microalgal cells with high heavy metal removal efficiency,” in 2021 21st
(2022). Kinematic analysis and visualization of tetraselmis microalgae 3Dmotility International Conference on Solid-State Sensors, Actuators and Microsystems
by digital holography. Appl. Optics 61, B331–B338. doi: 10.1364/AO.444976 (Transducers), Orlando, FL, USA, 20-24 June 2021. IEEE(USA) 385–388.
Sonkar, S., and Mallick, N. (2020). Application of machine learning for Xu, M., Harmon, J., Yuan, D., Yan, S., Lei, C., Hiramatsu, K., et al. (2021b).
development of a drying protocol for microalga chlorella minutissima in a single Morphological indicator for directed evolution of euglena gracilis with a high heavy
rotary drum dryer for biodiesel production. Authorea 26, 2020. doi: 10.22541/ metal removal efficiency. Environ. Sci. Technol. 55, 7880–7889. doi: 10.1021/
au.160372833.38766717/v1 acs.est.0c05278
Sonmez, M. E., Eczacioglu, N., Gumuş, N. E., Aslan, M. F., Sabanci, K., and Xu, Z., Jiang, Y., Ji, J., Forsberg, E., Li, Y., and He, S. (2020). Classification,
Aşikkutlu, B. (2022). Convolutional neural network - support vector machine identification, and growth stage estimation of microalgae based on transmission
based approach for classification of cyanobacteria and chlorophyta microalgae hyperspectral microscopic imaging and machine learning. Optics Express 28,
groups. Algal Res. 61, 102568. doi: 10.1016/j.algal.2021.102568 30686–30700. doi: 10.1364/OE.406036
Yadav, D. P., Jalal, A. S., Garlapati, D., Hossain, K., Goyal, A., and Pant, G. Zhou, Z.-H., and Chen, Z.-Q. (2002). Hybrid decision tree. Knowledge-based
(2020). Deep learning-based ResNeXt model in phycological studies for future. Syst. 15, 515–528. doi: 10.1016/S0950-7051(02)00038-2
Algal Res. 50, 102018. doi: 10.1016/j.algal.2020.102018 Zhuo, Z., Wang, H., Liao, R., and Ma, H. (2022). Machine learning powered
Zhang, W., Li, J., Liu, T., Leng, S., Yang, L., Peng, H., et al. (2021). Machine learning microalgae classification by use of polarized light scattering data. Appl. Sci. 12,
prediction and optimization of bio-oil production from hydrothermal liquefaction of 3422. doi: 10.3390/app12073422
algae. Bioresource Technol. 342, 126011. doi: 10.1016/j.biortech.2021.126011 Ž itnik, M., Š unta, U., Torkar, K.Godič, Klemenčič, A.K., Atanasova, N., and
Zheng, X., Duan, X., Tu, X., Jiang, S., and Song, C. (2021). The fusion of Bulc, T.G. (2019). The study of interactions and removal efficiency of escherichia
microfluidics and optics for on-chip detection and characterization of microalgae. coli in raw blackwater treated by microalgae chlorella vulgaris. J. Cleaner
Micromachines 12, 1137–1156. doi: 10.3390/mi12101137 Production 238, 117865. doi: 10.1016/j.jclepro.2019.117865