Machine_Learning_Methods_for_Forest_Image_Analysis
Machine_Learning_Methods_for_Forest_Image_Analysis
May 3, 2022.
Digital Object Identifier 10.1109/ACCESS.2022.3170049
ABSTRACT The advent of modern remote sensors alongside the development of advanced parallel
computing has significantly transformed both the theoretical and real implementation aspects of remote
sensing. Several algorithms for detecting objects of interest in remote sensing images and subsequent
classification have been devised, and these include template matching based methods, machine learning and
knowledge-based methods. Knowledge-driven approaches have received much attention from the remote
sensing fraternity. They do, however, face challenges in terms of sensory gap, duality of expression,
vagueness and ambiguity, geographic concepts expressed in multiple modes, and semantic gap. This paper
aims to review and provide an up-to-date survey on machine learning and knowledge driven approaches
towards remote sensing forest image analysis. It is envisaged that this work will assist researchers in coming
up with efficient models that accurately detect and classify forest images. There is a mismatch between what
domain experts expect from remote sensing data and what remote sensing science produces. Such a mismatch
or disparity can be reduced or alleviated by adopting an ontology paradigm methodology. Ontologies should
be used to support the future of remote sensing in forest object classification. The paper is presented in
five parts: (1) a review of methods used for forest image detection and classification; (2) challenges faced
by object detection methods; (3) analysis of segmentation techniques employed; (4) feature extraction and
classification; and (5) performance of the state-of-the-art methods employed in forest image detection and
classification.
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
45290 VOLUME 10, 2022
C. Kwenda et al.: Machine Learning Methods for Forest Image Analysis and Classification
TABLE 1. Ontological framework for RSI concepts. upper ontology defined using the top down method [55]. The
features are divided into six categories, namely LayerProp-
erty, GeometryProperty, PositionProperty, TextureProperty,
ClassProperty, and ThematicProperty. The selection of fea-
tures of interest is performed by an expert to allow object
detection. Figure 13 shows a hierarchical breakdown of object
features from the six categories. GeometryProperty, Texture-
Property, and ThematicProperty are important features in
of the established relationship between symbolic informa- detecting forest objects [56].
tion (e.g. ‘‘HIGHNDVI’’) and numerical knowledge (e.g.,
NDVI>0.7), hence the semantic gap is reduced. B. ONTOLOGY MODEL OF THE LAND COVER CLASS
HIERARCHY
IV. ONTOLOGICAL FRAMEWORK FOR RSI (REMOTE The upper-level ontology is developed using concepts from
SENSING IMAGE) land cover classification systems (LCCS) [36]. Figure 13
[53] proposed a novel framework for RSI. The framework shows a hierarchically simplified way of representing
is made up of important terms or concepts. These include classes of interest emanating from the main land cover
satellite, sensor, image, spatial resolution, and spectral res- class [53]. [55] designed an upper level ontology for the
olution. The elements are shown in the table 1. Slot is mainly Chinese Geographic Condition Census Project [57].
concerned with the spatial and spectral resolutions, which Figure 14 depicts the design of an eight land cover ontology.
relate to the scope, although there are no related elements in The procedure was as follows:
the range component. Spectral resolution is one of the most 1) The first step was to establish a set of important terms,
important concepts for the framework. It follows a top down in this case; Fields, Woodland, Grassland, Orchards,
approach method, where the concept is parceled into two sub- Bare land, Roads, Building and Water.
components, i.e. the visible part and the infrared part. The 2) Classes and class hierarchies were then defined, A land
visible is made up of three color segments, i.e. the RGB (red, cover class was defined through a top down approach.
blue, and green). The infrared part is also made up of three
segments, i.e. thermal infrared, near infrared, and far infrared. 1) ONTOLOGY MODEL OF THE DECISION TREE CLASSIFIER
The parameters suited for the slot are explicitly defined Ontologies typically express two algorithms, namely decision
and include has_spatial_resolution, has_spectral_resolution, trees and semantic rules [55]. [58], [59] used decision trees in
etc. [47] developed a simple ontological approach for remote the field of ontologies to cluster and classify image objects.
sensing image classification. The prototype was built upon Findings proved that decision trees enhance ontologies to
the expert remote sensing knowledge expressed in [54]. granulate information, thereby increasing image classifica-
tion accuracy. [59] uses decision trees to solve the problem
A. ONTOLOGICAL FRAMEWORK FOR OBJECT FEATURE of inconsistency between overlapping ontologies. [47] use
EXTRACTION decision trees for ontology matching; the matching process
After an image goes through a segmentation process, each is purely based on derived decision tree rules for an ontology
region is characterised by a set of features. The feature extrac- that are compared with rules for external ontologies. [55]
tion process from eCognition software follows the general designed an ontology model for decision tree classifier that
M1. if T ∈ M, then T ∪ T’ is a conservative extension of links related concepts with associations such as ‘‘hasfeature’’.
every T’ such that Sig(T’) ∩ Loc(T) = θ The image processing package is composed of the Pseu-
M2. If T1 ,T2 ∈ M, then T = T1 ∪ T2 ∈ M with Loc(T) = doSpectrallndex and SpectralBand concepts. The concepts
Loc(T1 ) ∪ Loc(T2 ) help remote sensing experts describe contextual knowledge.
Falomir at al [66] proposed three levels of knowledge that Concepts such as spectral bands and texture are used by
are imperative for designing a modular ontological approach: remote sensing experts to interpret remote sensing images.
the reference conceptualisation (which provides a description
(b) The Contextual Knowledge
of images and image objects), the contextual knowledge (a set
of rules defined by a domain expert) and the image facts Contextual knowledge’s purpose is to represent remote sens-
(these are semantic descriptions of image content). Figure 19 ing expert knowledge using DL, hence the name ‘‘contextual
illustrates how the reasoner assigns image objects to their knowledge.’’ The basis of this knowledge comes from the
corresponding concepts based on facts drawn from reference Remote Sensing Science expert. As a result, it is a ‘‘subjec-
conceptualisation and contextual knowledge also drawn from tive’’ description of image rules rather than an ‘‘objective’’
reference conceptualisation. depiction of image structure. Figure 20 shows the concepts,
(a) The reference conceptualisation relations, and instances in conceptual knowledge.
It is a general model for describing image objects in remote
(c) The Image Facts
sensing. It consists of two packages, namely, (1) the image
structure package and (2) the image processing package [47]. These are facts extracted from image analysis, and they are
The image structure package is superimposed with the Ima- stored in the ABox [47]. The TBox contains the reference
geObjects concepts, which describe objects according to their conceptualisation and the contextual knowledge [47]. Facts
characteristics, and the ImageObjectFeature concept, which in ABox provide semantic descriptions of image objects, and
FIGURE 20. Conceptual knowledge showing concepts, relations and instances in an ontology [47].
V. VEGETATION DETECTION
Unsupervised and supervised classification algorithms are
very crucial in identifying vegetation areas.
FIGURE 22. Image (a) SI image (b) vegetation mask obtained with the
NDVI and SI indexes [69].
FIGURE 24. Flowchart of the model that harmonises RF and kNN [73].
σ2 is the standard deviation of class 2. If M < 1 it signifies non-parametric algorithms such as kNN and RF in remote
overlap of classes, if M > 1 it denotes that classes are well sensing applications.
separable. The segmentation process considered both seman-
tic properties and radiometric information. Large-scale mean VI. IMAGE SEGMENTATION
shift (LSMS) segmentation was used in the study because of An input image is partitioned (or subdivided) into meaning-
its ability to perform tile-wise segmentation of large VHR ful image objects (segments). Image segmentation can be
imagery [72]. The OTB LSMS segmentation process fol- classified into two categories: supervised (empirical discrep-
lowed the steps of LSMS smoothing, LSMS segmentation, ancy methods) and unsupervised (empirical goodness meth-
LSMS merging and LSMS vectorisation. Classification was ods) [76]. Unsupervised approaches evaluate a segmentation
performed for five different land cover classes, namely, grass, result based on how well the image object matches a human
cork oak, soil, shrubs and shadows. Two supervised learn- perception of the desired set of segmented images, and they
ing algorithms including RF and SVM were used to per- use quality criteria that are typically created in accordance
form the classification. SVM performs linear separation in with human perceptions of what constitutes a good segmen-
a hyperspace using a µ(.) mapping function. In the case tation. Supervised methods compare a result from segmenta-
where objects are not linearly separable, the kernel method tion with a ground truth [2]. If ground truth can be reliably
is used where it takes into account projections of feature established, supervised methods are preferred.
space [72]. RF uses decision trees for bagging to produce dif-
ferent subsets of variety of trees. Every decision tree in the RF A. TYPES OF IMAGE SEGMENTATION
participates in the classification process and the classification Pixel, edge, and region-based image segmentation methods
label returned is the class with the most votes. are the three primary types of traditional image segmenta-
Another study [73] analysed the performance of kNN and tion. [77].
RF classifiers for mapping forest fire areas. The authors [73] (a) Pixel Based Methods
implemented kNN and RF to classify forest areas and This method involves two important processes: (1) image
explained the effects of different satellite images on both thresholding and (2) segmentation in feature space. For
classifiers. Figure 24 shows the flow chart of the model. image thresholding, image pixels are divided according
The model being a supervised approach was implemented to their intensity level [78]. There are three types of
by using multi-spectral images obtained from Landsat8, thresholding [79], [80]:
Landsat-2, and Terra sensors. The classification accuracy was (1) Global thresholding - T being the appropriate
determined by the confusion matrices. The machine learning threshold value. The output of an image q(x,y)
classifier based on kNN and RF produced excellent results based on T is obtained from an original image p(x,y)
with k set to 5 for kNN and 400 trees for RF. The results from as
the hybrid model achieved a very high classification accuracy (
with an Overall accuracy (OA) > 89% and Dice coefficient 1, ifp (x, y) > T
q(x, y) =
(DC) > 0.8. Other studies [74], [75] have also implemented 0, ifp (x, y) ≤ T
1) AlexNet
AlexNet is made up of five convolutional layers and three
connected layers [89]. In between the convolutional layers is a
FIGURE 30. Inception Module [69].
pooling layer whose role is aimed at reducing dimensionality
and computational complexity. AlexNet’s pooling strategy
is max pooling, and the strategy is to obtain the biggest
to a greater extent is that it increases network depth, thereby
value covered by the filter, which is used to remove noisy
improving the accuracy of the network. The network’s perfor-
components [88]. Filters of sizes 11 × 11 and 5 × 5 are used
mance in tasks like semantic segmentation and target detec-
in the first and second convolutional layers, respectively. The
tion is improved by using features extracted from CNN that
last three layers use small-sized filters of 3 × 3. The whole
are structured in a hierarchy of scales [91]. Other classifiers,
process is described in Figure 27. The primary purpose of
such as SVMs, can use the features without fine-tuning [92].
such filters is to be solely used for feature extraction. Varying
filters accommodate objects of different scales.
3) GoogLeNet
1) It supports the application of non-saturating Rectified The architecture is different from the other three in that it
Linear Unit (ReLU) whose output is defined by involves three aspects, namely the inception module, at the
F(x) = max(x, 0). training stage, an auxiliary classifier is required, as well
as one fully connected layer [93]. Output results from
2) It employs the overlapping max pooling strategy these filters are concatenated with the maximum pooling
(which means that each filtering operation’s step result. Between the inception modules, maximum pooling
size (stride) is smaller than the filter’s overall size). is employed, and after the last inception module, average
3) To reduce over-fitting, it uses the dropout approach in pooling that employs dropout is used [94]. The flow chart
fully-connected layers. diagram is shown in Figure 29. The network is so deep
because it is made up of nine inception modules and up to
2) VGGNet three convolutional layers. Because of the profundity of the
The network is made up of three fully connected layers and network, the smooth flow of gradient from layer to layer
a varying number of convolutional layers. This is shown in becomes an issue. Figure 30 shows the Inception Module.
Figure 28. Unlike AlexNet, VGGNet has fixed small size The issue is addressed by adding an auxiliary classifier in the
filters of 3 × 3 in the convolutional layer [90]. The number middle of the convolutional layers, whose role is to process
of weights in the network is reduced by using small filters, the outputs from the inception modules. The loss from these
which minimizes the training complexity. Just like AlexNet, classifiers is added to the overall loss of the network during
VGGNet uses max pooling over a 2 × 2 window slide of training. Auxiliary classifiers are prohibited from making
2 pixels. The advantage of simplifying convolutional layers decisions during the prediction phase.
phase so that it would be able to classify the patches. The for tree species classification are cost-sensitive because they
MLP classification algorithm produced coarsely segmented require very large data sets and are restricted to specific tree
images. A watershed segmentation algorithm was applied species [108]. The study proposed a model based on CNN
to refine the segmentation process. The UNet architecture, to classify tree species at an individual level by analysing
originally used for medical image segmentation [96], is also high resolution RGB images obtained from the UAV. A CNN
very useful for remote sensing images. The UNet architecture was chosen in the study because of its ability to learn highly
was trained with data and pixel-wise annotation patches. The descriptive features from tree canopies. The study proposed
segmentation process follows a number of steps: (1) mosaic a CNN model with 50 convolutional layers, referred to as
images were split into patches for processing, (2) a UNet ResNet50. Figure 38 shows the architecture of ResNet50.
model was trained to predict patch segmentation, and (3) The procedure for performing tree crown delineation was
patch joining was used to obtain semantic segmentation for based on the iterative local maxima filtering technique that
the entire mosaic image. The model achieved an effective was used to identify probable tree tops. Tree tops were
learning transfer with a 12.48% improvement over random designed as markers, hence a marker controlled watershed
weights. Overall, the model reached a higher accuracy of segmentation was performed as a means of complementing
nearly 95%. the DSM for segmenting the tree crowns. Figure 37 shows a
Another study [104] proposed a Residual Neural Net- tree crown segmented polygon. The tree crown delineation
work (ResNet) architecture for classifying tree species process enables tree crown identification labelling. In the
acquired using a camera mounted on a UAV platform. In tem- training phase, images were shuffled in unison with their
perate forests, UAV images have been successfully used to corresponding labels to randomise the input data so that the
distinguish between living and dead forest species [107]. The neural network becomes generalised. The model achieved an
motivation of the study was that, most of the existing methods overall classification accuracy of 80%. The study concluded
two pooling layers, and one fully connected layer. The final
layer was used to detect bamboo coverage in Google Earth
images. Input images were randomly shuffled to alleviate
overlapped training and validation data. 72% percent of the
data was used for training and 25% of the data for testing. The
model achieved an average classification accuracy of 97.52%.
4) LOCALLY LINEAR EMBEDDING (LLE) two parts: (a) adaptive color region extraction via the defi-
The LLE is built on a foundation of manifold learning. nition circle (DC) model, and (b) corner feature extraction
A manifold is a D-dimensional object that is embedded in via the edge detection model, which includes a suppression
a higher-dimensional space. A manifold is considered as an mechanism.
integration of small linear patches, which is done through The purpose of the algorithm was to produce a clear and
piece-wise linear regression [118]. To do the integral opera- precise forest saliency map. The algorithm is broken down
tion, [119] proposed the construction of a kNN graph similar into three parts, and those are: (a) the color feature extraction
to an isomap. Then all the sample data is represented by a part; (b) the determination of the center of the DC model;
weighted summation of its k nearest neighbors. Considering and (c) an accurate description of color. The algorithm is
wi to be row i of the n x k weight matrix w, the solution to the expressed in figure 36.
goal is found by: (A) Colour feature extraction
1 Model appropriate for the extraction of color features is
Wi = G−1
i 1 (15) the DC model, which is comprised of the following steps:
1T (G−1
i T) (1) using the RGB picture G histogram to calculate the DC
G := (xi 1T − Vi )T (xi 1T − Vi ) (16) model’s center; (2) mapping the image to the HIS color space
or lab color; (3) using the k-means procedure to find the DC
where G is called a Gram matrix and V is a n x k matrix. model’s radius. The flow chart of the DC model is shown in
After the process of representing samples as a weighted Figure 41.
summation of their neighbors, LLE represents samples in the (1) Determine the center of the of the DC model
lower dimensional space by their neighbors with the same
While the DC model can describe color fluctuations under
obtained weight.The method has been successfully used in
specific gradients, the forest region’s dominating hue is gen-
feature extraction of Motor Imagery Electroencephalography
erally green, implying that the ’greenish’ pixels in the forest
(MI-EEG) and it outperformed methods such as Discrete
area must be filtered off. As a result, the G channel (green)
Wavelet Transform (DWT) in classification accuracy with
in the RGB three-channel system will be the focal point
fewer feature dimension [120].
for filtering out pixels that fall within a given range and
calculating the mean value within the range. That value will
5) T-DisTribuTed STOCHASTIC NEIGHBOR EMBEDDING
be regarded as the center of the circle.
(T-SNE)
(2) Color description
tSNE is an improvement of Stochastic Neighbor Embedding
(SNE) [121], which is used for data visualisation. The main It is critical to note that the purity of the green is determined
goal is to preserve the joint distribution of data samples in by the circle’s center, thus the radius must be adjusted to
the original and embedding spaces. Considering Pij and Qij account for a variety of color variations and fault tolerance.
to donate the probability that xi and xj and are neighbors and The RGB channel, on the other hand, does not function well
yi and yj are neighbors, it follows that: for color adjustments. The RGB color system is converted to
Hue, Saturation, and Intensity (HSI) or lab color space to fix
Pj |i + Pj |j the problem. The color can be defined more correctly using
pij = (17) only two channels, namely hue and saturation, rather than the
2n
exp(−||xi − xj ||22 /2σi2 RGB color space.
pij = P (18) (3) Adjustment of DC Model radius
K 6=1 exp(−||xi − xk ||2 /2σi
2 2
To improve the accuracy and adaptability of forest region
1 + ||yi − yj ||22 )− 1
qij = ( P (19) extraction, the center and entire remote sensing picture
2)− 1
k6 =1 (1 + ||yk − yl ||2 acquired in the first phase is mapped or converted to HIS
color space.Each pixel’s Euclidean distance to the RSI center
Embedded samples are then obtained by adopting the gra- is calculated. The k-means clustering algorithm subdivide the
dient descent method over minimizing Keullback-Leibler forest into clusters and determines the Euclidean distance
divergence [122] of p and q distributions. The main advantage between the cluster center and the DC model’s center, which
of t-SNE is the ability to deal with the problem of visualising is then used as the DC model’s radius.
‘‘crowded’’ high dimensional data in a low dimensional space
(e.g., 2D or 3D) [122], [123]. h = [h, s, i] (20)
R = (h − h0 )2 + (s − s0 )2 + (i − i0 )2 (21)
A. FEATURE EXTRACTION STATE OF THE ART k
X
In image retrieval, calibration, classification, and clustering, δ(i) = |Uki − Uki−1 | (22)
it is critical to extract useful features or characteristics from k=1
the image [124]. Color histogram is the most significant P denotes the center of the DC model and the value would
method to represent color features [125]. [126] provided have been obtained by the histogram model in the RGB to HIS
a state-of-the-art feature extraction model that consists of color scheme. R is the Euclidean distance and δ(i) represents
TABLE 3. Segmentation results based on PDI and ADI. TABLE 5. Performance of CNN.
[35] B. Bennett, ‘‘Foundations for an ontology of environment and habitat,’’ [59] B. Gajderowicz and A. Sadeghian, ‘‘Ontology granulation through induc-
in Proc. FOIS, 2010, pp. 31–44. tive decision trees,’’ in Proc. URSW, 2009, pp. 39–50.
[36] A. Mayamba, R. M. Byamungu, B. V. Broecke, H. Leirs, P. Hieronimo, [60] N. Kartha and A. Novstrup, ‘‘Ontology and rule based knowledge rep-
A. Nakiyemba, M. Isabirye, D. Kifumba, D. N. Kimaro, M. E. Mdangi, resentation for situation management and decision support,’’ Proc. SPIE,
and L. S. Mulungu, ‘‘Factors influencing the distribution and abundance vol. 7352, May 2009, Art. no. 73520P.
of small rodent pest species in agricultural landscapes in eastern Uganda,’’ [61] J. C. Giarratano and G. D. Riley, Expert Systems: Principles and Pro-
J. Vertebrate Biol., vol. 69, no. 2, Oct. 2020, Art. no. 020002. gramming. Pacific Grove, CA, USA: Brooks/Cole, 2005.
[37] H. G. Lund, ‘‘When is a forest not a forest?’’ J. Forestry, vol. 100, no. 8, [62] D. A. Waterman, D. B. Lenat, and F. Hayes-Roth, Building Expert Sys-
pp. 21–28, 2002. tems. Reading, MA, USA: Addison-Wesley, 1983.
[38] E. Romijn, J. H. Ainembabazi, A. Wijaya, M. Herold, A. Angelsen, [63] P. He, ‘‘Counter cyber attacks by semantic networks,’’ in Emerging
L. Verchot, and D. Murdiyarso, ‘‘Exploring different forest definitions Trends in ICT Security. Amsterdam, The Netherlands: Elsevier, 2014,
and their impact on developing REDD+ reference emission levels: A pp. 455–467.
case study for Indonesia,’’ Environ. Sci. Policy, vol. 33, pp. 246–259, [64] G. De Giacomo, D. Lembo, M. Lenzerini, A. Poggi, and R. Rosati,
Nov. 2013. ‘‘Using ontologies for semantic data integration,’’ in A Comprehen-
[39] C. E. Woodcock and A. H. Strahler, ‘‘The factor of scale in remote sive Guide Through Italian Database Res. Over Last 25 Years. Cham,
sensing,’’ Remote Sens. Environ., vol. 21, no. 3, pp. 311–332, Apr. 1987. Switzerland: Springer, 2018, pp. 187–202.
[40] A. W. M. Smeulders, M. Worring, S. Santini, A. Gupta, and R. Jain, [65] F. Baader, The Description Logic Handbook: Theory, Implementation and
‘‘Content-based image retrieval at the end of the early years,’’ IEEE Trans. Applications. Cambridge, U.K.: Cambridge Univ. Press, 2003.
Pattern Anal. Mach. Intell., vol. 22, no. 12, pp. 1349–1380, Dec. 2000. [66] B. C. Grau, I. Horrocks, B. Motik, B. Parsia, P. Patel-Schneider, and
[41] C. Unger and P. Cimiano, ‘‘Pythia: Compositional meaning construc-
U. Sattler, ‘‘OWL 2: The next step for OWL,’’ J. Web Semantics, vol. 6,
tion for ontology-based question answering on the semantic web,’’ in
no. 4, pp. 309–322, Nov. 2008.
Lecture Notes in Computer Science (Including Subseries Lecture Notes
[67] B. C. Grau, I. Horrocks, Y. Kazakov, and U. Sattler, ‘‘A logical framework
in Artificial Intelligence and Lecture Notes in Bioinformatics) (Lecture
for modularity of ontologies,’’ in Proc. IJCAI, 2007, pp. 298–303.
Notes in Computer Science), vol. 6716. Germany: Bielefeld Univ., 2011,
[68] S. Ghilardi, C. Lutz, and F. Wolter, ‘‘Did I damage my ontology,’’ in Proc.
pp. 153–160.
KR, 2006, pp. 187–197.
[42] R. H. Kilmann and K. W. Thomas, ‘‘Developing a forced-choice measure
[69] S. Roy and I. J. Cox, ‘‘A maximum-flow formulation of the N-camera
of conflict-handling behavior: The ‘MODE’ instrument,’’ Educ. Psychol.
stereo correspondence problem,’’ in Proc. 6th Int. Conf. Comput. Vis.,
Meas., vol. 37, no. 2, pp. 309–325, 1977.
[43] B. Bachimont, A. Isaac, and R. Troncy, ‘‘Semantic commitment for Jan. 1998, pp. 492–499.
designing ontologies: A proposal,’’ in Proc. Int. Conf. Knowl. Eng. Knowl. [70] R. Geerken, B. Zaitchik, and J. P. Evans, ‘‘Classifying rangeland vege-
Manage. Berlin, Germany: Springer, 2002, pp. 114–121. tation type and coverage from NDVI time series using Fourier filtered
[44] E. F. Fama and M. C. Jensen, ‘‘Separation of ownership and control,’’ cycle similarity,’’ Int. J. Remote Sens., vol. 26, no. 24, pp. 5535–5554,
J. Law Econ., vol. 26, no. 2, pp. 301–325, 1983. Dec. 2005.
[45] K. Satoh, Nonmonotonic Reasoning by Minimal Belief Revision. Tokyo, [71] Y. Guo, S. Han, Y. Li, C. Zhang, and Y. Bai, ‘‘K-nearest neighbor
Japan: ICOT Research Center (Institute for New Generation Computer combined with guided filter for hyperspectral image classification,’’ Proc.
Technology), 1988. Comput. Sci., vol. 129, pp. 159–165, Jan. 2018.
[46] T. Gruber, ‘‘What is an ontology,’’ Stanford Univ., Stanford, CA, USA, [72] G. De Luca, J. M. N. Silva, S. Cerasoli, J. Araújo, J. Campos,
Tech. Rep. KSL92-71, 1993. S. Di Fazio, and G. Modica, ‘‘Object-based land cover classification of
[47] S. Andrés, D. Arvor, I. Mougenot, T. Libourel, and L. Durieux, cork oak woodlands using UAV imagery and orfeo toolbox,’’ Remote
‘‘Ontology-based classification of remote sensing images using spectral Sens., vol. 11, no. 10, p. 1238, May 2019.
rules,’’ Comput. Geosci., vol. 102, pp. 158–166, May 2017. [73] A. D. P. Pacheco, J. A. D. S. Junior, A. M. Ruiz-Armenteros, and
[48] D. Mallenby, ‘‘Handling vagueness in ontologies of geographical infor- R. F. F. Henriques, ‘‘Assessment of K-nearest neighbor and random forest
mation,’’ Ph.D. dissertation, School Comput., Univ. Leeds, Leeds, U.K., classifiers for mapping forest fire areas in central Portugal using Landsat-
2008. [Online]. Available: http://etheses.whiterose.ac.U.K./1373/ 8, Sentinel-2, and Terra imagery,’’ Remote Sens., vol. 13, no. 7, p. 1345,
[49] N. Eric Maillot and M. Thonnat, ‘‘Ontology based complex object recog- Apr. 2021.
nition,’’ Image Vis. Comput., vol. 26, no. 1, pp. 102–113, Jan. 2008. [74] P. T. Noi and M. Kappas, ‘‘Comparison of random forest, K-nearest
[50] C. Eschenbach and M. Grüninger, ‘‘Formal ontology in information neighbor, and support vector machine classifiers for land cover clas-
systems,’’ in Proc. 5th Int. Conf. (FOIS), vol. 110, 2008, pp. 68–71. sification using Sentinel-2 imagery,’’ Sensors, vol. 18, no. 1, p. 18,
[51] M. Davis, S. King, N. Good, and R. Sarvas, ‘‘From context to content:
2018.
Leveraging context to infer media metadata,’’ in Proc. 12th Annu. ACM [75] E. Tomppo, M. Haakana, M. Katila, and J. Peräsaari, Multi-Source
Int. Conf. Multimedia, 2004, pp. 188–195. National Forest Inventory: Methods and Applications, vol. 18. Springer,
[52] F. Nack, C. Dorai, and S. Venkatesh, ‘‘Computational media aesthet-
2008.
ics: Finding meaning beautiful,’’ IEEE MultimediaMag., vol. 8, no. 4,
[76] L. Tlig, M. Bouchouicha, M. Tlig, M. Sayadi, and E. Moreau, ‘‘A fast
pp. 10–12, Oct. 2001.
[53] H. Gu, H. Li, L. Yan, Z. Liu, T. Blaschke, and U. Soergel, ‘‘An object- segmentation method for fire forest images based on multiscale transform
based semantic classification method for high resolution remote sensing and PCA,’’ Sensors, vol. 20, no. 22, p. 6429, Nov. 2020.
imagery using ontology,’’ Remote Sens., vol. 9, no. 4, p. 329, 2017. [77] S. M. De Jong and F. D. Van der Meer, Remote Sensing Image Analysis:
[54] A. Baraldi, V. Puzzolo, P. Blonda, L. Bruzzone, and C. Tarantino, ‘‘Auto- Including the Spatial Domain, vol. 5. Springer, 2007.
matic spectral rule-based preliminary mapping of calibrated Landsat TM [78] D. Kaur and Y. Kaur, ‘‘Various image segmentation techniques: A
and ETM+ images,’’ IEEE Trans. Geosci. Remote Sens., vol. 44, no. 9, review,’’ Int. J. Comput. Sci. Mobile Comput., vol. 3, no. 5, pp. 809–814,
pp. 2563–2586, Sep. 2006. 2014.
[55] A. M. Arifjanov, S. B. Akmalov, T. U. Apakhodjaeva, and [79] Y.-J. Zhang, ‘‘An overview of image and video segmentation in the last
D. S. Tojikhodjaeva, ‘‘Comparison of pixel to pixel and object based 40 years,’’ in Advances in Image and Video Segmentation. Dordrecht, The
image analysis using worldview-2 satellite images of vangiobod village Netherlands: 2006, pp. 1–16.
of syndria province,’’ Remote Methods Earth Res. vol. 26, no. 2, [80] T. Lindeberg and M.-X. Li, ‘‘Segmentation and classification of edges
pp. 313–321, 2020. using minimum description length approximation and complemen-
[56] N. Durand, S. Derivaux, G. Forestier, C. Wemmert, P. Gançarski, tary junction cues,’’ Comput. Vis. Image Understand., vol. 67, no. 1,
O. Boussaid, and A. Puissant, ‘‘Ontology-based object recognition for pp. 88–98, Jul. 1997.
remote sensing image interpretation,’’ in Proc. 19th IEEE Int. Conf. Tools [81] S. Yuheng and Y. Hao, ‘‘Image segmentation algorithms overview,’’ 2017,
Artif. Intell. (ICTAI), vol. 1, Oct. 2007, pp. 472–479. arXiv:1707.02051.
[57] S. R. Phinn, C. M. Roelfsema, and P. J. Mumby, ‘‘Multi-scale, object- [82] N. Senthilkumaran and R. Rajesh, ‘‘Image segmentation—A survey of
based image analysis for mapping geomorphic and ecological zones soft computing approaches,’’ in Proc. Int. Conf. Adv. Recent Technol.
on coral reefs,’’ Int. J. Remote Sens., vol. 33, no. 12, pp. 3768–3797, Commun. Comput. Stockholm, Sweden: KTH (Roy. Inst. Technol.),
Jun. 2012. Oct. 2009, pp. 844–846.
[58] B. Gajderowicz, ‘‘Using decision trees for inductively driven semantic [83] M. K. Kundu and S. K. Pal, ‘‘Thresholding for edge detection using
integration and ontology matching,’’ M.S. thesis, Dept. Comput. Sci., human psychovisual phenomena,’’ Pattern Recognit. Lett., vol. 4, no. 6,
Ryerson Univ., Toronto, ON, Canada, 2011. pp. 433–441, 1986.
[84] M. R. Khokher, A. Ghafoor, and A. M. Siddiqui, ‘‘Image segmentation [108] M. J. Zimmer-Gembeck and M. Helfand, ‘‘Ten years of longitudi-
using multilevel graph cuts and graph development using fuzzy rule-based nal research on U.S. adolescent sexual behavior: Developmental cor-
system,’’ IET Image Process., vol. 7, no. 3, pp. 201–211, 2013. relates of sexual intercourse, and the importance of age, gender and
[85] T. Blaschke, C. Burnett, and A. Pekkarinen, ‘‘Image segmentation ethnic background,’’ Developmental Rev., vol. 28, no. 2, pp. 153–224,
methods for object-based analysis and classification,’’ in Remote 2008.
Sensing Image Analysis: Including The Spatial Domain. Dordrecht, [109] S. Watanabe, K. Sumi, and T. Ise, ‘‘Identifying the vegetation type in
The Netherlands: Springer, 2004, pp. 211–236. Google Earth images using a convolutional neural network: A case study
[86] T. Lei, X. Jia, Y. Zhang, S. Liu, H. Meng, and A. K. Nandi, for Japanese bamboo forests,’’ BMC Ecol., vol. 20, no. 1, pp. 1–14,
‘‘Superpixel-based fast fuzzy C-means clustering for color image seg- Dec. 2020.
mentation,’’ IEEE Trans. Fuzzy Syst., vol. 27, no. 9, pp. 1753–1766, [110] E. Guirado, S. Tabik, D. Alcaraz-Segura, J. Cabello, and F. Herrera,
Sep. 2019. ‘‘Deep-learning versus OBIA for scattered shrub detection with Google
[87] P. Neubert and P. Protzel, ‘‘Compact watershed and preemptive SLIC: Earth imagery: Ziziphus lotus as case study,’’ Remote Sens., vol. 9, no. 12,
On improving trade-offs of superpixel segmentation algorithms,’’ in Proc. p. 1220, Nov. 2017.
22nd Int. Conf. Pattern Recognit., Aug. 2014, pp. 996–1001. [111] P. P. Ippolito. (2019). Feature Extraction Techniques. Accessed:
[88] X. Yuan, J. Shi, and L. Gu, ‘‘A review of deep learning methods for Apr. 29, 2020. [Online]. Available: https://towardsdatascience.
semantic segmentation of remote sensing imagery,’’ Expert Syst. Appl., com/feature-extraction-techniques-d619b56e31be
vol. 169, May 2021, Art. no. 114417. [112] A. Ghodsi, ‘‘Dimensionality reduction a short tutorial,’’ Ph.D. disserta-
[89] A. Krizhevsky, I. Sutskever, and G. E. Hinton, ‘‘ImageNet classification tion, Dept. Statist. Actuarial Sci., Univ. Waterloo, Waterloo, ON, Canada,
with deep convolutional neural networks,’’ in Proc. Adv. Neural Inf. 2006, vol. 37, no. 38.
Process. Syst., vol. 25, 2012, pp. 1097–1105. [113] B. Ghojogh, M. N. Samad, S. Asif Mashhadi, T. Kapoor, W. Ali, F. Karray,
[90] B. Liu, X. Yu, A. Yu, and G. Wan, ‘‘Deep convolutional recurrent neural and M. Crowley, ‘‘Feature selection and feature extraction in pattern
network with transfer learning for hyperspectral image classification,’’ analysis: A literature review,’’ 2019, arXiv:1905.02845.
J. Appl. Remote Sens., vol. 12, no. 2, 2018, Art. no. 026028. [114] C. Citro, ‘‘rules. In Proc. 20th Int. Conf. very large data bases, VLDB,
[91] L. A. Gatys, A. S. Ecker, and M. Bethge, ‘‘Image style transfer using volume 1215, pages 487–499, 1994.[5] Alfred V. Aho, Ravi Sethi and
convolutional neural networks,’’ in Proc. IEEE Conf. Comput. Vis. Pattern Jeffrey D. Ullman. Compilers: Principles, techniques, and tools. Boston,
Recognit. (CVPR), Jun. 2016, pp. 2414–2423. MA: Addison-Wesley, 1986.[6] Adrian Akmajian, Ann K. Farmer, Lee
[92] O. A. B. Penatti, K. Nogueira, and J. A. dos Santos, ‘‘Do deep features Bickmore, Richard A. Demers and,’’ Learning, vol. 5, no. 1, pp. 71–99,
generalize from everyday objects to remote sensing and aerial scenes 1990.
domains?’’ in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. Work- [115] C. A. Brooks and K. Iagnemma, ‘‘Vibration-based Terrain classification
shops (CVPRW), Jun. 2015, pp. 44–51. for planetary exploration rovers,’’ IEEE Trans. Robot., vol. 21, no. 6,
[93] M. Volpi and V. Ferrari, ‘‘Semantic segmentation of urban scenes by pp. 1185–1191, Dec. 2005.
learning local class interactions,’’ in Proc. IEEE Conf. Comput. Vis. [116] F. Subhan, S. Saleem, H. Bari, W. Z. Khan, S. Hakak, S. Ahmad, and
Pattern Recognit. Workshops (CVPRW), Jun. 2015, pp. 1–9. A. M. El-Sherbeeny, ‘‘Linear discriminant analysis-based dynamic indoor
[94] M. Lin and Q. Yan, ‘‘Network in network,’’ in Proc. Int. Conf. Learn. localization using Bluetooth low energy (BLE),’’ Sustainability, vol. 12,
Represent. (ICLR), 2014, pp. 1–4. no. 24, p. 10627, Dec. 2020.
[95] E. Shelhamer, J. Long, and T. Darrell, ‘‘Fully convolutional networks for [117] Y. Mo, Z. Zhang, Y. Lu, W. Meng, and G. Agha, ‘‘Random forest
semantic segmentation,’’ IEEE Trans. Pattern Anal. Mach. Intell., vol. 39, based coarse locating and KPCA feature extraction for indoor positioning
no. 4, pp. 640–651, 2016. system,’’ Math. Problems Eng., vol. 2014, Oct. 2014, Art. no. 850926.
[96] O. Ronneberger, P. Fischer, and T. Brox, ‘‘U-Net: Convolutional networks [118] L. C. Marsh and D. R. Cormier, Spline Regression Models, no. 137.
for biomedical image segmentation,’’ in Proc. Int. Conf. Med. Image Newbury Park, CA, USA: Sage, 2001.
Comput. Comput.-Assist. Intervent. Springer, 2015, pp. 234–241. [119] L. K. Saul and S. T. Roweis, ‘‘Think globally, fit locally: Unsupervised
[97] J. E. Ball, D. T. Anderson, and C. S. Chan, ‘‘Comprehensive survey of learning of low dimensional manifolds,’’ J. Machine Learn. Res., vol. 4,
deep learning in remote sensing: Theories, tools, and challenges for the pp. 119–155, Jun. 2003.
community,’’ J. Appl. Remote Sens., vol. 11, no. 4, 2017, Art. no. 042609. [120] M. Li, X. Luo, J. Yang, and Y. Sun, ‘‘Applying a locally linear embedding
[98] S. Chilamkurthy. (2017). A 2017 Guide to Semantic Segmentation with algorithm for feature extraction and visualization of MI-EEG,’’ J. Sen-
Deep Learning. [Online]. Available: https://blog.qure.ai/notes/semantic- sors, vol. 2016, Aug. 2016, Art. no. 7481946.
segmentation-deep-learning-review [121] G. Hinton and S. T. Roweis, ‘‘Stochastic neighbor embedding,’’ in Proc.
[99] J. Le. (2017). How to do Semantic Segmentation Using Deep Learn- NIPS, vol. 15, 2002, pp. 833–840.
ing. [Online]. Available: https://nanonets.com/blog/how-to-do-semantic- [122] S. Kullback, Information Theory and Statistics. Chelmsford, MA, USA:
segmentation-using-deep-learning/ Courier Corporation, 1997.
[100] A. Mittal. (2019). Introduction to U-Net and Res-Net for Image Segmenta- [123] L. Van der Maaten and G. Hinton, ‘‘Visualizing data using t-sne,’’
tion. [Online]. Available: https://aditi-mittal.medium.com/introduction- J. Mach. Learn. Res., vol. 9, no. 11, 2008.
to-u-net-and-res-net-for-image-segmentation-9afcb432ee2f [124] D. Gu, Z. Han, and Q. Wu, ‘‘Feature extraction to polar image,’’ J. Com-
[101] S. Kentsch, M. L. Lopez Caceres, D. Serrano, F. Roure, and Y. Diez, put. Commun., vol. 5, no. 11, pp. 16–26, 2017.
‘‘Computer vision and deep learning techniques for the analysis of drone- [125] F. Alamdar and M. Keyvanpour, ‘‘A new color feature extraction method
acquired forest images, a transfer learning study,’’ Remote Sens., vol. 12, based on QuadHistogram,’’ Proc. Environ. Sci., vol. 10, pp. 777–783,
no. 8, p. 1287, Apr. 2020. Jan. 2011.
[102] M. Šulc, D. Mishkin, and J. Matas, ‘‘Very deep residual networks with [126] H. Du and Y. Zhuang, ‘‘Optical remote sensing images feature extraction
maxout for plant identification in the wild,’’ in Proc. Work. Notes CLEF, of forest regions,’’ in Proc. IEEE Int. Conf. Signal, Inf. Data Process.
2016, pp. 1–8. (ICSIDP), Dec. 2019, pp. 1–5.
[103] M. Onishi and T. Ise, ‘‘Automatic classification of trees using a UAV [127] H. Luo, L. Li, H. Zhu, X. Kuai, Z. Zhang, and Y. Liu, ‘‘Land cover
onboard camera and deep learning,’’ 2018, arXiv:1804.10390. extraction from high resolution ZY-3 satellite imagery using ontology-
[104] S. Natesan, C. Armenakis, and U. Vepakomma, ‘‘ResNet-based tree based method,’’ ISPRS Int. J. Geo-Inf., vol. 5, no. 3, p. 31, 2016.
species classification using UAV images,’’ Int. Arch. Photogramm., [128] O. Oke Alice, O. Omidiora Elijah, A. Fakolujo Olaosebikan, S. Falohun
Remote Sens. Spatial Inf. Sci., vol. 42, pp. 475–481, Jun. 2019. Adeleye, and S. Olabiyisi, ‘‘Effect of modified Wiener algorithm on noise
[105] M. Dyrmann, H. Karstoft, and H. S. Midtiby, ‘‘Plant species classifica- models,’’ Int. J. Eng. Technol., vol. 2, no. 8, pp. 1024–1033, 2012.
tion using deep convolutional neural network,’’ Biosyst. Eng., vol. 151, [129] G. Hay and G. Castilla, ‘‘Object-based image analysis: Strengths, weak-
pp. 72–80, Nov. 2016. nesses, opportunities and threats (SWOT),’’ in Proc. 1st Int. Conf. (OBIA),
[106] M. Onishi and T. Ise, ‘‘Automatic classification of trees using a UAV 2006, pp. 4–5.
onboard camera and deep learning,’’ 2018, arXiv:1804.10390. [130] D. C. Duro, S. E. Franklin, and M. G. Dubé, ‘‘A comparison of pixel-
[107] O. Brovkina, E. Cienciala, P. Surový, and P. Janata, ‘‘Unmanned aerial based and object-based image analysis with selected machine learn-
vehicles (UAV) for assessment of qualitative classification of Norway ing algorithms for the classification of agricultural landscapes using
spruce in temperate forest stands,’’ Geo-spatial Inf. Sci., vol. 21, no. 1, SPOT-5 HRG imagery,’’ Remote Sens. Environ., vol. 118, pp. 259–272,
pp. 12–20, Jan. 2018. Mar. 2012.
CLOPAS KWENDA received the B.Sc. degree JEAN VINCENT FONOU DOMBEU received the
(Hons.) in computer science from the Bindura Uni- B.Sc. degree (Hons.) in computer science from the
versity of Science Education (BUSE), Zimbabwe, University of Yaoundé I, Cameroonis, the M.Sc.
and the M.Sc. degree in computer science from degree in computer science from the University
the University of Zimbabwe (UZ), Zimbabwe. of KwaZulu-Natal, South Africa, and the Ph.D.
He is currently pursuing the Ph.D. degree with degree in computer science from the North-West
the University of KwaZulu-Natal (UKZN), South University, South Africa. He is a Senior Lecturer
Africa. His research interests include image pro- with the Department of Computer Science, Uni-
cessing, artificial intelligence, machine learning, versity of KwaZulu-Natal (UKZN). His research
deep learning, and ontology building. interests include ontology engineering, semantic
web, and machine learning—specifically, in ontology building, learning,
modularization, ranking, summarization and visualization, artificial intel-
ligence, machine learning and data mining methods for the semantic web,
knowledge representation and reasoning on the web, and knowledge graphs
MANDLENKOSI GWETU received the Ph.D. and deep semantics.
degree in computer science (CS), specializing
in medical image processing, from University
of KwaZulu-Natal (UKZN), South Africa. He is
a Senior Lecturer with UKZN. He is currently
the Academic Leader of CS with UKZN. He is
the Principal Investigator of the UKZN node in the
Erasmus+ funded the Living Laboratories for Cli-
mate Change Multi-National Project and is an
Alumni of the Heidelberg Laureate Forum. His
research interests include deep learning, pattern recognition, and computer
vision.