0% found this document useful (0 votes)
3 views

Machine_Learning_Methods_for_Forest_Image_Analysis

This paper reviews machine learning and knowledge-driven approaches for forest image analysis and classification in remote sensing. It discusses various detection methods, challenges, segmentation techniques, feature extraction, and the performance of state-of-the-art methods, emphasizing the need for ontological frameworks to bridge gaps between expert expectations and scientific outputs. The work aims to assist researchers in developing efficient models for accurate forest image detection and classification.

Uploaded by

matheus
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

Machine_Learning_Methods_for_Forest_Image_Analysis

This paper reviews machine learning and knowledge-driven approaches for forest image analysis and classification in remote sensing. It discusses various detection methods, challenges, segmentation techniques, feature extraction, and the performance of state-of-the-art methods, emphasizing the need for ontological frameworks to bridge gaps between expert expectations and scientific outputs. The work aims to assist researchers in developing efficient models for accurate forest image detection and classification.

Uploaded by

matheus
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 27

Received March 9, 2022, accepted April 15, 2022, date of publication April 25, 2022, date of current version

May 3, 2022.
Digital Object Identifier 10.1109/ACCESS.2022.3170049

Machine Learning Methods for Forest Image


Analysis and Classification: A Survey of
the State of the Art
CLOPAS KWENDA , MANDLENKOSI GWETU, AND JEAN VINCENT FONOU DOMBEU
School of Computer Science, Statistics and Mathematics, University of KwaZulu-Natal, Pietermaritzburg 3209, South Africa
Corresponding author: Clopas Kwenda ([email protected])

ABSTRACT The advent of modern remote sensors alongside the development of advanced parallel
computing has significantly transformed both the theoretical and real implementation aspects of remote
sensing. Several algorithms for detecting objects of interest in remote sensing images and subsequent
classification have been devised, and these include template matching based methods, machine learning and
knowledge-based methods. Knowledge-driven approaches have received much attention from the remote
sensing fraternity. They do, however, face challenges in terms of sensory gap, duality of expression,
vagueness and ambiguity, geographic concepts expressed in multiple modes, and semantic gap. This paper
aims to review and provide an up-to-date survey on machine learning and knowledge driven approaches
towards remote sensing forest image analysis. It is envisaged that this work will assist researchers in coming
up with efficient models that accurately detect and classify forest images. There is a mismatch between what
domain experts expect from remote sensing data and what remote sensing science produces. Such a mismatch
or disparity can be reduced or alleviated by adopting an ontology paradigm methodology. Ontologies should
be used to support the future of remote sensing in forest object classification. The paper is presented in
five parts: (1) a review of methods used for forest image detection and classification; (2) challenges faced
by object detection methods; (3) analysis of segmentation techniques employed; (4) feature extraction and
classification; and (5) performance of the state-of-the-art methods employed in forest image detection and
classification.

INDEX TERMS Feature extraction, ontology, segmentation, remote sensing.

I. INTRODUCTION deep learning techniques for forest image classification are


Remote sensing science is rapidly growing. The evolution reviewed.
of high spatial resolution remote sensors in conjunction with There are several algorithms that are geared towards detect-
advanced computing has significantly transformed the spec- ing objects of interest in remote sensing images, for the
ification and practice of remote sensing [1]. Remote sensing further regional analysis and classification. These algorithms
images are characterized by high spatial resolution and pro- are categorized into three groups, namely template matching
vide more explicit information on the earth’s surface as com- based, knowledge based, and machine learning based meth-
pared to middle and coarser resolution images [2]. Machine ods [3]. The taxonomy of image classification methods is
learning methods for analyzing and classifying forest images depicted in Figure 1.
are continuously evolving to provide more advanced auto- (a) Template matching based detection methods
matic land cover pattern recognition on aerial images.This The template matching method determines whether a pic-
paper surveyed existing methods for forest ecosystem image ture or an image contains a previously defined object or
classification. In particular, machine learning classifiers and whether a predefined sub image (template) has an exact
match in an image. Although this method provided one of
the first approaches for object analysis [3], its dependence
The associate editor coordinating the review of this manuscript and on handcrafted matching criteria limited its applicability to
approving it for publication was Stefania Bonafoni . complex object recognition. Once a suitable template is deter-

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
45290 VOLUME 10, 2022
C. Kwenda et al.: Machine Learning Methods for Forest Image Analysis and Classification

FIGURE 1. Methods for object detection in optical sensing images [3].

mined, a measure of matching between the template and every


possible location in the image is calculated, and a classifi-
cation decision is made based on the measure of certainty.
The most popular metric based measures are the Euclidean
distance, squared difference, and cross correlation, defined in
Equations (1) to (3): FIGURE 2. Template matching based criteria [3].
Euclidean distance
sX X
E (m, n) = [g (i, j) − t (i − m, j − n)]2 24 (1)
with this approach is that the method does not cater for
i j
the scale and orientation of the template [5]. It fails due to
Squared Difference Measure occlusions and distortions on the boundary [6]. The method
is very sensitive to shape and viewpoint change. The solution
E 2 (m, n) suggested was to have a unique representation of a template
XX
= [g(i, j)2 −2g(i, j)t(i−m, j−n))+t(i−m, j−n)2 ] orientation and scale that varies, but the solution becomes
i j computationally expensive.
(2) (b) Machine Learning based approaches
Cross Correlation Measure An input image is subjected to the initial first phase
XX where regions or objects are extracted. Then, for each
R (m, n) = g (i, j) + t (i, j) t (i − m, j − n) (3)
object, features of interest are computed using Convolutional
i j
Neural Networks (CNN). Optimal features are obtained after
There are two types of templates, namely, global and a subsequent series of feature fusion and dimension reduc-
local templates. When a template is used to reference the tion processes. Finally, classifiers such as Support Vector
whole (global) object in an image, it is referred to as a Machines (SVM), k-Nearest Neighbor (kNN), Sparse Repre-
global template. However, when object features (local fea- sentation based Classification (SRC), AdaBoost, Conditional
tures of an object) in an image are referenced with multiple Random Fields (CRF), and others are used to classify each
or several templates, these templates are referred to as local region/object. Figure 3 shows the main important phases
templates [4]. Figure 2 shows stages to be followed to deter- of machine learning object detection, i.e., feature extrac-
mine the best templates for object detection. The challenge tion, feature fusion, and dimensionality reduction, as well as

VOLUME 10, 2022 45291


C. Kwenda et al.: Machine Learning Methods for Forest Image Analysis and Classification

FIGURE 3. Machine learning methods [3].

the classification phase. Machine learning-based approaches


coupled with innovative algorithms and higher performance
computing seem to have gained popularity in remote sensing
science because they produce better results considering the
accuracy of the created maps [1]. As a result, they are used
in big land cover applications that rely on pixel based statis- FIGURE 4. Knowledge based object detection systems [3].
tical analysis of massive image data sets [7]–[9]. Pixel based
approaches pose challenges in the analysis of high spatial
resolution images [10] because they take into consideration types of knowledge that have been used on target objects, and
the aspects of spectral information as a backdrop for analyz- these are geometric knowledge and context knowledge.
ing and classifying high spatial resolution images, neglecting (a) Geometric Knowledge
spatial and temporal information, which are of paramount
This type of knowledge is the most important in a
importance. These methods are less efficient in dealing with
knowledge-based approach and is widely used for object
symbolic knowledge, that is, when concepts are characterized
detection. It encompasses generic shape models or parametric
by symbols, for instance, vegetation is made of grass [11].
specifics. For instance, it is proposed in [18] that buildings are
They do not offer the function of creating a super class
square or composed of rectangular segments and are utilized
whilst classes have been defined. Suppose one has defined the
as conventional models of shapes to distinguish buildings.
following classes of interest; ‘‘trees’’, ‘‘grass’’, ‘‘road’’, and
‘‘building’’. It will then be impossible for the user to define (b) Context knowledge
a vegetation class unless it has been beforehand defined as Context knowledge is very important for key objects, and
a super class. The methods do not offer the facility to add it is expressed by rules derived from relationships between
spatial rules [12], for instance, ‘‘grass’’ cannot be found objects of interest and their respective backgrounds [14], [19],
inside a building, but can be found in a field. Because of this [20]. For instance, shadow evidence has been used for build-
reasoning limitation, data-driven approaches are unsuitable ing detection [21], the correlation between artificial structures
for use in applications areas such as ecology, that deal with such as buildings and their respective shadows has been used
earth observation. to project locations and shapes of buildings [22].
Recently, knowledge-driven approaches seem to be the
(c) Knowledge based detection methods
direction taken by the remote sensing science community [3]
These methods have been applied to land slide, crops, since they incorporate domain expert knowledge. Geographic
urban land change and forests [13]–[16]. Figure 4 shows the Object Based Image Analysis (GEOBIA), which classifies
processes whereby an input image goes through a hypothe- image objects based on apriori domain expert knowledge,
sis generation phase, the hypothesis is validated and tested is proving to be a key trend in remote sensing image anal-
using the established knowledge and rules. Post processing ysis. GEOBIA is a classification technique that divides a
of validation results will be subjected to machine learning for remote sensing image into objects of interest and evaluates
final object detection. Knowledge and rules from geometric the objects based on their spectral, temporal and spatial char-
information and context information will be used to test the acteristics. The generation of objects of interest is done using
validity of the hypothesis generated from an input image, different segmentation approaches such as random walker,
if the hypothesis is valid it will be subjected to machine canny, histogram-based segmentation, etc. An algorithm is
learning for object detection [17]. Generally, there are two deemed effective in segmentation if and only if a segmented

45292 VOLUME 10, 2022


C. Kwenda et al.: Machine Learning Methods for Forest Image Analysis and Classification

image object completely matches the corresponding Actual


Image Region (AIR) of a scene object. [2], proposed a blend
of area coincidence methods and boundary coincidence meth-
ods for assessing segmentation quality. The area coincidence
methods select an image that has the dominant or largest
area of intersection with the AIR. The boundary coincidence
methods calculate the distance between a point of interest in
a segmented image and that of its corresponding point on
the AIR. The segmentation quality is high when the mea-
sured distance is much closer to zero. Segmentation eval-
uation methods can either be Unsupervised or Supervised.
Supervised techniques evaluate a segmented image based
on a ground truth image also referred to as the reference
image. The evaluation of unsupervised methods is solely FIGURE 5. GEOBIA WORKFLOW [28].
dependant on the segmented image as it has to assess the
extent to which the image matches the desirable features
of a good segmented image. [23] proposed four metrics
(Equations 4-7) for assessing supervised segmentation qual- The final step of GEOBIA is image classification. The com-
ity namely F-measure, SUM which should be less than 2, mon image classifiers for GEOBIA are Random Forest (RF),
ED that indicates distance to point (0,0) in the space and ED’ Simple Vector Machines (SVM), k-Nearest Neighbor (kNN)
that indicates distance to point (1,1) in the space. and Naive Bayes (NB) [28].
Figure 5 shows GEOBIA workflow [28] that imple-
1
f = (4) mented three different algorithms, namely, Large Scale Mean
α precision
+ 1 − α recall
1 1
Shift (LSMS) in OTB, the Shephered segmentation algorithm
sum = precision + recall (5) in RSGISLib and the Multi-resolution segmentation (MRS)
algorithm in eCognition. However, GEOBIA solutions do not
q
ed = precision2 + recall 2 (6)
q give answers to every segmentation problem. Even though
ed 0 = (1 − precision)2 (1 − recall)2 (7) GEOBIA is more efficient than pixel-based approaches, seg-
menting a multi-spectral image made up of thousands of
Two other metrics that take into account the over and under mega pixels remains a challenging task [29]. Another draw-
segmentation errors, GOSE and GUSE, respectively, are pro- back of GEOBIA is that it approximates, to some extent,
posed [24]. Rand Error(RE) is another widely used metric computer-aided photo interpretation, which has been criti-
for evaluating supervised approaches. RE is a measure of is cized as being highly subjective [5]. However, in the last
defined in Equation 8 [25]. Let R1 and R2 be segmentation decade, knowledge-driven techniques, like GEOBIA, have
regions of image S with t pixels and the following holds: gained traction as a means of bridging the gap between
• n correspond to the number of pixels in image S that implicit data representation and end-user needs. Knowledge-
appear in both R1 and R2 driven approaches consist of translating symbolic knowl-
• m correspond to the number of pixels in image S that are edge into a format understandable by humans into numerical
neither in R1 and R2 knowledge.
Vegetation indices obtained from satellite images pro-
n+m vide valuable information which is essential for the map-
RE = (8)
n ping of vegetation. The Normalised Difference Vegetation
( ) Index (NDVI) has proven to be a valuable tool, particularly
2
in tropical dry forests, where it serves as a foundation for
A criterion for unsupervised technique that balances homo- estimating overall green biomass, tree density, and species
geneity and inter-segment heterogeneity is proposed by diversity [30]–[32]. NDVI is an indicator that determines
Wang et al. [26] as in Equation 9. the greenish component from the analyzed satellite images.
NDVI provides a balance between the energy received and
Z = T + λD (9)
the energy emitted by objects on the earth’s surface [32].
where, T and D represents intra-segment homogeneity and In the context of plant communities, it is an indicator that
inter-segment heterogeneity, respectively. Another metric for determines how greenish an area is, and that is influenced
unsupervised techniques proposed by Gao et al. [27] is the by the quantity of vegetation in that particular area and
Global Score (GS). GS incorporates weighted variance (WV) its state of health. The NDVI values range from −1 to
and Moron’s I and is defined in Equation 10. +1. The values that are less than 0.1 correspond to water
bodies and bare grounds, while higher values indicate the
GS = Vnorm + Inorm (10) presence of agricultural activities, temperate forests, and rain

VOLUME 10, 2022 45293


C. Kwenda et al.: Machine Learning Methods for Forest Image Analysis and Classification

2 meters tall. The Food and Agriculture Organization (FAO)


standardised the definition of forest to refer to a land area
spanning over 0.5 hectares enveloped by trees at 5m and
above, with a canopy cover of 10%. This definition excludes
land under agricultural or urban land use [38].

B. DUALITY OF GEOGRAPHIC CONCEPTS


Two important major terms arise from the concept of dual-
ity, that is, scene and image. A scene is real and exists on
the ground, whereas an image is an assortment of spatially
orchestrated estimations drawn from the scene [1]. Compo-
nents obtained from images are regarded as abstractions of
FIGURE 6. Symbolic to numerical knowledge conversion. real objects in the ground scene [39]. Forest concepts can be
viewed either from a real world perspective (a forest concept
is characterized by high NPP values) or from image properties
(a forest concept is defined by high NDVI values). In the case
forests [32]. The NDVI values can be used to group the
of forests, the assertion that NDVI is correlated to NPP is
vegetation ecosystems into 4 major categories as follows [33]:
not always valid because NPP lacks information on attributes
forests made up of semi-deciduous and evergreen have
such as vegetation height, vegetation cover, and so forth.
NDVI ≥ 0.7, woodlands are defined by the range
This anomaly is also referred to as the sensory gap. With
(0.6 ≤ NDVI < 0.71), a mixed class that is composed of
this notion, sensory gaps cause improbability in describing
a) shrub land, b) woodland/shrub land/exposed lands, and
geographic objects [40].
c) cactus forest have the range (0.49 ≤ NDVI < 0.61) and
the dwarf woodland and shrub land assume the range of
C. VAGUENESS AND AMBIGUITY OF GEOGRAPHIC
(NDVI < 0.49) [33]. For instance, a forest concept made
CONCEPTS
up of semi-deciduous and evergreen is described by high
NDVI values and when translated into numerical knowledge, The process of connecting attribute (e.g. NDVI) range of
it is implemented by the classification rule set: Forest = values (for instance, high) to geographic concepts (e.g. for-
(NDVI ≥ 0.71). Figure 6 shows the symbolic to numerical est concept) is not easy [1]. The reason behind this is that
knowledge conversion. the associated value ‘‘high’’ is qualitative in nature, so the
obtained classification rule becomes vague and ambiguous.
Some pixels (image objects) inside the image in Figure 7 are
II. OVERVIEW OF THE CHALLENGES FACED WITH
not classified as forest, though in nature they are consti-
KNOWLEDGE BASED APPROACHES
tuted as forest areas. When qualitative terms like ‘‘high’’
A. DIFFERENT MODES OF DEFINING GEOGRAPHIC
are employed to identify objects with sharp, crisp bound-
CONCEPTS
aries, threshold ambiguity occurs. Qualitative description of
A geographical concept can be defined from different geographic objects raises partiality issues. For instance, the
perspectives; the definition might be based on physical, his- symbolic classification rule ‘‘high NDVI’’ partially describes
torical, functional, or conventional mode [34]. Various meth- the forest because:
ods of defining the same geographic concept bring about
(a) It is very difficult to fish out only forests in areas
elective perspectives on the definition of the same concepts;
that have other crops with the same NDVI values as
for example, an idea can be characterized by elective defi-
in Figure 9.
nitions that are not basically the same, despite the fact that
they are normally correlated [35]. From a functional view- (b) It is also impossible to classify all the varying types
point, the role of the forest primarily acts as a repository of forests because, in some cases, there are some
for storing carbon. This is correlated by the Net Primary forests that have ‘‘low NDVI’’ values, such as the
Productivity (NPP) values. Forests can also be defined based degraded forests in Figure 8.
on physical attributes such as vegetation cover, phenology, Ambiguity arises in all cases where a natural language expres-
vegetation, age, etc. A tremendous effort is still in place to sion can have various meanings [41]. The usual one is lexical
standardize land cover classes in land cover classification sys- ambiguity, which emerges because of the homonym of reg-
tems (LCCS) [36]. The term ‘‘forest’’ is defined differently by ular language articulation, that is, an expression with more
different organizations and countries; for instance, in Brazil than one meaning, such that each meaning points only to one
an area that is regarded as a forest, has an area that exceeds ontological concept unambiguously [41]. More than 800 dif-
one hectare, is characterised by a 30% canopy, and is com- ferent definitions of forest concepts are provided in [42].
posed of trees with a minimum height of 5m [37]. A forest Deep ambiguity, also referred to as open texture, exists where
in China is defined as an area larger than 0.67 hectares in there is no clear boundary between concepts or terms or
size, with at least 20% crown cover and trees standing at least cases where the meaning of a concept changes over time, for

45294 VOLUME 10, 2022


C. Kwenda et al.: Machine Learning Methods for Forest Image Analysis and Classification

FIGURE 7. Concepts of vagueness.


FIGURE 9. Other crops having high NDVI as forests.

FIGURE 8. Forests having low NDVI values.


FIGURE 10. Definition of knowledge [45].

instance, when a new technology appears or the physical or


social context of the term evolves. ‘‘exists’’ for Artificial Intelligence (AI) systems is that which
can be represented. The following properties, with corre-
D. SEMANTIC GAP sponding definitions, should be observed: (1) conceptualiza-
It arises from the vagueness and ambiguity of geographic tion, means that an ontology is an abstract model of a real
concepts. It is defined as a mismatch between data extracted world phenomenon; (2) explicit, implies that all ontology
on the basis of visual information and the interpretation concepts must be clearly defined; (3) formal, implies that an
drawn from the same data in a given situation [40]. This is ontology is machine understandable; and (4) shared, means
so because converting visual data (from human perception) that there should be consensus amongst a community of
to computational representation is a very difficult task. The people about the knowledge represented by the ontology.
translation first requires expressing perception of visual data
into a symbolic knowledge representation format (e.g. forests A. FORMAL ONTOLOGIES
have high NDVI values). Such a conversion is a very dif- Remote sensing science experts are conversant with work-
ficult task since some concepts have vague meanings when ing on numerical knowledge that has been derived from
expressed in natural language [43]. For instance, color may an image viewpoint [47]. Numerical knowledge represen-
be considered a significantly important biophysical prop- tation by nature suffers from the problem of partiality and
erty [44], but its perception varies amongst humans and it is implicit knowledge representation, hence it becomes diffi-
difficult to express. cult to share the knowledge with other scientists, such as
ecologist, agronomist, etc., who are used to working with
III. INTRODUCTION ON ONTOLOGIES symbolic knowledge in describing a geographic concept, for
Sharing knowledge among people is feasible only if people instance, a forest concept is defined by ‘‘High NDVI’’ values.
speak a common language [42]. The traditional definition of Formal ontologies provide a road map that caters for the
‘‘knowledge is a subset of true beliefs’’ [45]. It is the intersec- representation of both symbolic and numerical knowledge.
tion between truth and beliefs, as represented in Figure 10. Formal ontologies can be utilized to unequivocally portray
Ontologies enable formal (machine-understandable) repre- a perception or observation from different perspectives, for
sentation of knowledge. In computer science, ontology is instance, the extensible observation ontology (OBOE) is
defined as an explicit, formal specification of a shared con- utilized to portray the semantics of scientific observations.
ceptualization [46]. An ontology is a systematic description An observation of an entity encompasses the characteristics
of existence, and this term is drawn from philosophy. What (e.g. biomass) of the entity based on a measurement standard

VOLUME 10, 2022 45295


C. Kwenda et al.: Machine Learning Methods for Forest Image Analysis and Classification

FIGURE 12. Solving sensory gap challenge.

two thresholds i.e. ambiguity reject threshold and the distance


FIGURE 11. Dual representation of concepts. reject threshold. Ambiguity reject threshold is defined by the
rule αamp ∈ | 0.5, 1 | and define the degree of confidence
required to recognise an object. Distance reject threshold is
(grams). Ontologies for remote sensing science applications defined by the rule αdist ∈ | 0.1 |, this means an object x
based on description logic offer the following advantages; is unlikely to belong to both classes Ck and ¬ Ck and
• Symbolic language - it binds/associates concepts with might belong to a concept not yet learnt. Vagueness can also
relevant sensing data and also promotes binding of be addressed by adopting probability ontologies. They use
related concepts. probability sets to define concepts of interest. Attributes in
• Knowledge sharing - it advocates for common concep- the set properties have probabilities attached to them, and the
tualization and adoption of standard ontology language statistical measure of the probability value of the geographic
such as web ontology language. concept [50] is used to determine whether a geographic
• Reasoning - description logic in ontology allows the concept is a member of a class. Ambiguity in ontologies
inferring of new knowledge from explicit descriptions. can be reduced by limiting the information that describes a
concept [41].

B. ONTOLOGY KNOWLEDGE BASED AS A SOLUTION 3) SENSORY GAP


This section outlines how the adoption of ontologies in The discrepancy between real objects and their depiction in
knowledge base approaches helps in alleviating the problems images is known as the sensory gap. As referenced by [40]
addressed in Section 2.0. sensory gap can be reduced by explicitly defining the domain
and world knowledge in the system. Knowledge about phys-
1) DUALITY OF GEOGRAPHIC CONCEPTS ical laws, laws governing the behavior of objects, and how
Ontologies incorporate the concept of perspectivalism. That people perceive them will all be incorporated into the system
is, they allow the separate description of a field point of in the hope of enhancing recognisers and thereby assist-
view of a forest concept. For instance, a forest concept can ing machines in bridging the sensory gap [51]. However,
be specified in terms of attributes such as ‘‘high’’ NPP, leaf in ontologies, real world description of forest entities is corre-
type, and so on. The other angle of description is from the lated with matching image point description of forest objects,
point of view of an image of geographic objects. That is, i.e., NDVI is correlated with NPP [2]. Figure 12 shows how
a forest can be defined in terms of attributes such as ‘‘high’’ a real world description of a forest concept can be mapped to
NDVI, texture, and wavelength. In general, it allows for the an object description in an image. An object description of an
separate description of geographic entities and geographic image is easily formalised on a computer.
objects alongside their characteristics. Figure 11 shows the
dual representation of a geographic feature, that is, it can be 4) SEMANTIC GAP
described either from the perspective of a geographic entity The semantic gap is the discrepancy between the high level
or from the perspective of a geographic object. descriptions of images by humans and the low-level detection
used by machines to detect images [52]. On the other hand,
2) VAGUENESS AND AMBIGUITY OF GEOGRAPHIC adding captions and annotations to images solves the prob-
CONCEPTS lem [50]. The method is time-consuming and costly because it
Fuzzy logic is the most popular way of handling the vague- requires a lot of effort, machine algorithm tweaking, and close
ness of geographic concepts [48]. Processing of data is done attention to vocabulary and content to ensure that photos are
using partial set membership rather than strict set member- appropriately labeled [50]. In ontologies, however, an image
ship. For example, a forest concept is not considered to be feature (e.g. NDVI) and its associated value (‘‘HIGHNDVI’’)
strictly ‘‘green’’, but rather is considered to belong partially are used to define a pixel (image object) of a forest con-
to some degree to the set of things that are green. [49] defined cept. The ‘‘HIGHNDVI’’ concept is formalized as a result

45296 VOLUME 10, 2022


C. Kwenda et al.: Machine Learning Methods for Forest Image Analysis and Classification

FIGURE 13. Object features hierarchy in ontology [55].

TABLE 1. Ontological framework for RSI concepts. upper ontology defined using the top down method [55]. The
features are divided into six categories, namely LayerProp-
erty, GeometryProperty, PositionProperty, TextureProperty,
ClassProperty, and ThematicProperty. The selection of fea-
tures of interest is performed by an expert to allow object
detection. Figure 13 shows a hierarchical breakdown of object
features from the six categories. GeometryProperty, Texture-
Property, and ThematicProperty are important features in
of the established relationship between symbolic informa- detecting forest objects [56].
tion (e.g. ‘‘HIGHNDVI’’) and numerical knowledge (e.g.,
NDVI>0.7), hence the semantic gap is reduced. B. ONTOLOGY MODEL OF THE LAND COVER CLASS
HIERARCHY
IV. ONTOLOGICAL FRAMEWORK FOR RSI (REMOTE The upper-level ontology is developed using concepts from
SENSING IMAGE) land cover classification systems (LCCS) [36]. Figure 13
[53] proposed a novel framework for RSI. The framework shows a hierarchically simplified way of representing
is made up of important terms or concepts. These include classes of interest emanating from the main land cover
satellite, sensor, image, spatial resolution, and spectral res- class [53]. [55] designed an upper level ontology for the
olution. The elements are shown in the table 1. Slot is mainly Chinese Geographic Condition Census Project [57].
concerned with the spatial and spectral resolutions, which Figure 14 depicts the design of an eight land cover ontology.
relate to the scope, although there are no related elements in The procedure was as follows:
the range component. Spectral resolution is one of the most 1) The first step was to establish a set of important terms,
important concepts for the framework. It follows a top down in this case; Fields, Woodland, Grassland, Orchards,
approach method, where the concept is parceled into two sub- Bare land, Roads, Building and Water.
components, i.e. the visible part and the infrared part. The 2) Classes and class hierarchies were then defined, A land
visible is made up of three color segments, i.e. the RGB (red, cover class was defined through a top down approach.
blue, and green). The infrared part is also made up of three
segments, i.e. thermal infrared, near infrared, and far infrared. 1) ONTOLOGY MODEL OF THE DECISION TREE CLASSIFIER
The parameters suited for the slot are explicitly defined Ontologies typically express two algorithms, namely decision
and include has_spatial_resolution, has_spectral_resolution, trees and semantic rules [55]. [58], [59] used decision trees in
etc. [47] developed a simple ontological approach for remote the field of ontologies to cluster and classify image objects.
sensing image classification. The prototype was built upon Findings proved that decision trees enhance ontologies to
the expert remote sensing knowledge expressed in [54]. granulate information, thereby increasing image classifica-
tion accuracy. [59] uses decision trees to solve the problem
A. ONTOLOGICAL FRAMEWORK FOR OBJECT FEATURE of inconsistency between overlapping ontologies. [47] use
EXTRACTION decision trees for ontology matching; the matching process
After an image goes through a segmentation process, each is purely based on derived decision tree rules for an ontology
region is characterised by a set of features. The feature extrac- that are compared with rules for external ontologies. [55]
tion process from eCognition software follows the general designed an ontology model for decision tree classifier that

VOLUME 10, 2022 45297


C. Kwenda et al.: Machine Learning Methods for Forest Image Analysis and Classification

FIGURE 14. Land cover class hierarchy in ontology [55].

Ontologies explicitly represent concepts in the same way


humans describe concepts in their domain of interest. How-
ever, ontologies that are developed disregarding decision
rules have proved to be computationally expensive [60].
This is due to their inability to capture the kinds of
decision-making knowledge that arises in practice, such as
those involving multiple ontologies. Decision rules on ontolo-
gies help in three ways, namely: [61], [62]; (a) they take into
cognisance primitives from multiple ontologies as well as
primitives that are not part of the rule framework; (b) they
are time dependant (c) they incorporate default assumptions.
FIGURE 15. Ontology model of the decision tree classifier [55]. Eight types of land cover obtained from the Chinese Geo-
graphic Census Project [57] were defined in terms of a rule
as outlined in Figure 17.
consists of three parts; (1) a set of decision trees is composed
of all essential terms and concepts, for instance, a node and a C. SEMANTIC NETWORK MODEL
leaf; (2) a slot is defined by the following inequality symbols Semantic networks graphically represent knowledge in the
> ≥,<,≤(3). The final step is to create the nodes. Figure 15 form of nodes and links, whereby links provide hierarchical
shows the elements of the decision tree classifier. relationships between objects [63]. The semantic network
model explicitly express knowledge through concepts and
2) ONTOLOGY MODEL OF THE SEMANTIC RULES their corresponding semantic relations [55]. This is shown in
[55] followed a two phased approach to designing an ontol- Figure 18. The network bridges the gap between low-level
ogy model for semantic rules; the first is the establishment of characteristics and high-level semantics, reducing the seman-
mark rules, followed by decision rules. Mark’s rules convert tic gap.
low level features to semantic concepts. On the other hand,
decision rules are inferred from mark rules and apriori knowl- D. ONTOLOGIES FOR KNOWLEDGE MANAGEMENT
edge. Framework ontologies and domain ontologies are the two
• Ontology model for mark rules most important types of ontologies. Frameworks, or foun-
The morphology of semantic notions is classified into strip dation ontologies, consist of concepts explicitly expressed
and planar; the shape is regular and irregular; the texture is in high-level knowledge (for human understandability), and
smooth and rough; the brightness is light and dark; the height they are also not designed for a specific domain. A domain
is high, medium, and low; and the position relationship is ontology has knowledge tailor-made for a specific domain,
adjacent, disjunct, and contained. The ontology model of the e.g., remote sensing. Domain ontology eave drops from
mark rules is shown in Figure 16 framework ontology. Domain ontologies have a hierarchical
• Ontology model of the decision rules structure of two levels; the first level is called the ABox,

45298 VOLUME 10, 2022


C. Kwenda et al.: Machine Learning Methods for Forest Image Analysis and Classification

FIGURE 16. The mark rules in ontology [55].

FIGURE 17. Decision rules based from ontology [55].

and the second level is called the TBox. ABox contains


assertions (or rules) that comprise the theory that the ontology
describes in its domain of application [64]. TBox is where
experts conceptualise their knowledge in a specific scientific
domain [47]. There are vast paradigms for modelling ontolo-
gies, but chief amongst them are Description Logics(DL) [65]
and rule formalism. The DL formalism serves as a foundation
for building ontologies using the web ontology language
(OWL) [66]. Ontologies can be inferred from new knowledge
using DL, which makes ontologies machine understandable.

FIGURE 18. The semantic network model [55].


E. MODULAR ONTOLOGICAL APPROACH
The modular approach is the best way of building complex
ontologies from simpler (modular) ontologies in a constant signature of TBoxes of T’. Then T ∪ T 0 is a conservative
and well-defined way [67]. Such an approach allows collabo- extension of T’ if for every axiom α with Sig(α) ⊆ Sig(T’)
rative development by many different domain experts to build we have T ∪ T 0 ⇒ α iff T’ ⇒ α [67]. In addition, if two
a single ontology through the integration of independently independent parts T1 and T2 of an ontology T, are constructed
developed ontologies. The ontological approach is carried out in a modular way, then T remains modular as well. These are
in such a way that TBoxTs are not changed when elements formalised as follows [67]:
of T’ are reused in another TBoxT. Formalisation of such a Definition 2 (Modularity): Let Loc(T) be a local signature
property follows the conservative theorem [68]. T and Ext(T) be external signature. A set M of TBoxes T with
Definition 1 (Conservative Extension): Let T and T’ be Sig(T) = Loc(T) ] Ext(T) is a modular class if the following
TBoxes, Sig(α) be a signature of axiom α and Sig(T’) be condition holds:

VOLUME 10, 2022 45299


C. Kwenda et al.: Machine Learning Methods for Forest Image Analysis and Classification

FIGURE 19. Structure of knowledge base [47].

M1. if T ∈ M, then T ∪ T’ is a conservative extension of links related concepts with associations such as ‘‘hasfeature’’.
every T’ such that Sig(T’) ∩ Loc(T) = θ The image processing package is composed of the Pseu-
M2. If T1 ,T2 ∈ M, then T = T1 ∪ T2 ∈ M with Loc(T) = doSpectrallndex and SpectralBand concepts. The concepts
Loc(T1 ) ∪ Loc(T2 ) help remote sensing experts describe contextual knowledge.
Falomir at al [66] proposed three levels of knowledge that Concepts such as spectral bands and texture are used by
are imperative for designing a modular ontological approach: remote sensing experts to interpret remote sensing images.
the reference conceptualisation (which provides a description
(b) The Contextual Knowledge
of images and image objects), the contextual knowledge (a set
of rules defined by a domain expert) and the image facts Contextual knowledge’s purpose is to represent remote sens-
(these are semantic descriptions of image content). Figure 19 ing expert knowledge using DL, hence the name ‘‘contextual
illustrates how the reasoner assigns image objects to their knowledge.’’ The basis of this knowledge comes from the
corresponding concepts based on facts drawn from reference Remote Sensing Science expert. As a result, it is a ‘‘subjec-
conceptualisation and contextual knowledge also drawn from tive’’ description of image rules rather than an ‘‘objective’’
reference conceptualisation. depiction of image structure. Figure 20 shows the concepts,
(a) The reference conceptualisation relations, and instances in conceptual knowledge.
It is a general model for describing image objects in remote
(c) The Image Facts
sensing. It consists of two packages, namely, (1) the image
structure package and (2) the image processing package [47]. These are facts extracted from image analysis, and they are
The image structure package is superimposed with the Ima- stored in the ABox [47]. The TBox contains the reference
geObjects concepts, which describe objects according to their conceptualisation and the contextual knowledge [47]. Facts
characteristics, and the ImageObjectFeature concept, which in ABox provide semantic descriptions of image objects, and

45300 VOLUME 10, 2022


C. Kwenda et al.: Machine Learning Methods for Forest Image Analysis and Classification

FIGURE 20. Conceptual knowledge showing concepts, relations and instances in an ontology [47].

the description is done with the help of reference conceptual-


isation and conceptual knowledge.

V. VEGETATION DETECTION
Unsupervised and supervised classification algorithms are
very crucial in identifying vegetation areas.

A. UNSUPERVISED CLASSIFICATION INDICES


Spectral indices are used in these methods to detect vegetation
areas. The Normalized Difference Vegetation Index (NDVI),
which is calculated for each pixel in an image, is one of
FIGURE 21. (a)RGB input image (b) NDVI image [69].
the indices utilized. The NDVI image is represented in a
gray scale image. As shown in Figure 21: image (a) is a
representation of an image using the RGB channel; image
index. The formula was then refined to take into account the
(b) is the representation of the same image in an NDVI format
spectral index [70].
using the gray scale.
ψIR − ψR ψR − ψB
NDVI = (11) SI = (12)
ψIR + ψR ψR + ψB
Equation 4 illustrates the calculation of NDVI, where ψIR Equation 5 illustrates the calculation of SI, where ψ B is the
and ψR are pixel values in the infrared and the red band pixel value in the blue band and ψ R is the pixel value in
respectively. The formula defines vegetation as areas that the red band. An NDVI value and a SI value are binarized to
have a higher reflective index in the infrared than the red band create a vegetation mask. This is shown in Figure 22.

VOLUME 10, 2022 45301


C. Kwenda et al.: Machine Learning Methods for Forest Image Analysis and Classification

FIGURE 22. Image (a) SI image (b) vegetation mask obtained with the
NDVI and SI indexes [69].

B. SUPERVISED CLASSIFICATION INDICES


Detection of vegetation by spectral indices is highly depen-
dent on spectral characteristics. In other cases, supervised
classification methods are primarily based on Support Vector
Machines (SVMs). The feature vector that defines all pixels
in the training data set contains four characteristics, namely:
the reflectance value of each pixel in the infrared, red, green,
and blue. Supervised methods do well in distinguishing
between non-vegetation and vegetation areas through spec-
tral indices. It necessitates the use of a SVM capable of
determining the best linear separator. Random Forest (RF),
k-Nearest Neighbour (kNN), SVM and sparse representations
are among pixel wise classifiers that have been used for the
last decade [71]. These traditional methods only consider
spectral information as the basis of the classification process,
disregarding spatial contextual information which contributes
significantly to the classification performance [71]. Several
researchers have proposed a hybrid of spectral-spatial classi-
fication that takes into account both the spatial context and
spectral information, based on the assumption that pixels
from a local region have similar spectral information. [71] FIGURE 23. Workflow that presents the stages of preprocessing,
proposed a hybrid model of kNN combined with guided filter segmentation, classification and accuracy assessment [72].

for hyper-spectral image (HSI) classification of forest trees.


Joint hybrid model of kNN and guided filter (PGF-kNN) was to a linear band covering a range of 8 bits, that is, from
used to optimise hyper-spectral images produced by kNN. a minimum of 0 to a maximum of 255. The process was
Optimised hyper-spectral images were taken in as input by the done to normalise each band, to suppress the effect of pos-
Joint kNN, and processed to produce the classification maps. sible outliers on the segmentation. A layer stretching process
Each class map was converted into a probability value and the was performed on images containing R-G-NIR (Red, Green,
class map with the highest probability value was chosen as Near-Infrared) bands, obtained during spring and summer
the classification result. [72] conducted a study to determine seasons through integrating NDVI and DSM data, to obtain
the reliability of RF and SVM algorithms in the classification the final 2 five band orthomosaics. Such a process was of sig-
of very high resolution images (VHR), obtained from oak nificant importance because OTB segmentation requires only
woodlands of a Mediterranean ecosystem. The first stage was one rasta image as the input data. Spectral separability is of
data acquisition, where images were subjected to a Structure- significant importance when it comes to image classification.
Form-Motion (SFM) technique to identify common features The M-static defined in Equation 13 was employed [72] to
in overlapping images. Each image was then orthorectified measure the separability of NDVI and DSM layers of varying
through the interpolated digital surface mode (DSM). Finally, types of vegetation.
all the images were combined into an orthomosaic. The
| µ1 − µ2 |
workflow of the study followed 4 main steps, namely, pre- M= (13)
processing, segmentation, classification and accuracy assess- σ1 + σ2
ment. Figure 23 shows the workflow of the proposed model. where, µ1 is the mean value of class 1 and µ2 is the mean
In the preprocessing stage each input layer was subjected value of class 2. σ1 is the standard deviation of class 1 and

45302 VOLUME 10, 2022


C. Kwenda et al.: Machine Learning Methods for Forest Image Analysis and Classification

FIGURE 24. Flowchart of the model that harmonises RF and kNN [73].

σ2 is the standard deviation of class 2. If M < 1 it signifies non-parametric algorithms such as kNN and RF in remote
overlap of classes, if M > 1 it denotes that classes are well sensing applications.
separable. The segmentation process considered both seman-
tic properties and radiometric information. Large-scale mean VI. IMAGE SEGMENTATION
shift (LSMS) segmentation was used in the study because of An input image is partitioned (or subdivided) into meaning-
its ability to perform tile-wise segmentation of large VHR ful image objects (segments). Image segmentation can be
imagery [72]. The OTB LSMS segmentation process fol- classified into two categories: supervised (empirical discrep-
lowed the steps of LSMS smoothing, LSMS segmentation, ancy methods) and unsupervised (empirical goodness meth-
LSMS merging and LSMS vectorisation. Classification was ods) [76]. Unsupervised approaches evaluate a segmentation
performed for five different land cover classes, namely, grass, result based on how well the image object matches a human
cork oak, soil, shrubs and shadows. Two supervised learn- perception of the desired set of segmented images, and they
ing algorithms including RF and SVM were used to per- use quality criteria that are typically created in accordance
form the classification. SVM performs linear separation in with human perceptions of what constitutes a good segmen-
a hyperspace using a µ(.) mapping function. In the case tation. Supervised methods compare a result from segmenta-
where objects are not linearly separable, the kernel method tion with a ground truth [2]. If ground truth can be reliably
is used where it takes into account projections of feature established, supervised methods are preferred.
space [72]. RF uses decision trees for bagging to produce dif-
ferent subsets of variety of trees. Every decision tree in the RF A. TYPES OF IMAGE SEGMENTATION
participates in the classification process and the classification Pixel, edge, and region-based image segmentation methods
label returned is the class with the most votes. are the three primary types of traditional image segmenta-
Another study [73] analysed the performance of kNN and tion. [77].
RF classifiers for mapping forest fire areas. The authors [73] (a) Pixel Based Methods
implemented kNN and RF to classify forest areas and This method involves two important processes: (1) image
explained the effects of different satellite images on both thresholding and (2) segmentation in feature space. For
classifiers. Figure 24 shows the flow chart of the model. image thresholding, image pixels are divided according
The model being a supervised approach was implemented to their intensity level [78]. There are three types of
by using multi-spectral images obtained from Landsat8, thresholding [79], [80]:
Landsat-2, and Terra sensors. The classification accuracy was (1) Global thresholding - T being the appropriate
determined by the confusion matrices. The machine learning threshold value. The output of an image q(x,y)
classifier based on kNN and RF produced excellent results based on T is obtained from an original image p(x,y)
with k set to 5 for kNN and 400 trees for RF. The results from as
the hybrid model achieved a very high classification accuracy (
with an Overall accuracy (OA) > 89% and Dice coefficient 1, ifp (x, y) > T
q(x, y) =
(DC) > 0.8. Other studies [74], [75] have also implemented 0, ifp (x, y) ≤ T

VOLUME 10, 2022 45303


C. Kwenda et al.: Machine Learning Methods for Forest Image Analysis and Classification

(2) Variable thresholding - This when the value of T


varies varies over an image and it comes in two
flavours:
• Local Threshold - T depends on the neighborhood of x
and y.
• Adaptive Threshold - T’s value is a function of x and y.
(3) Multiple thresholding - It has multiple values of T.
The output image is computed as follows:
FIGURE 25. Region splitting.

m, ifp (x, y) > T 1

q(x, y) = n, ifp (x, y) ≤ T 1
0, ifp (x, y) ≤ T 0

However, these methods suffer from incomplete segmenta-


tion, so the output results need to be clumped. Also, these
methods are appropriate for images with lighter objects than
the background.
(b) Edge Segmentation methods FIGURE 26. Image segmentation state of the art [76].
Edge-detecting operators are employed to detect all possi-
ble edges that are found in an image. Adjacent edges are
clearly separated by a gray sharp edge, but there could be (2) Region splitting and region merging
a case where the gray value is not continuous [81]. The The original image is split or subdivided into sub images.
edges will be represented by discontinuity in gray level, color, Each sub-image is recursively divided into its own
texture, etc. This discontinuity is detected by using derivative sub-images based on the condition or predicate given. If the
operations such as differential operators [82]. The Prewitt, condition is not satisfied, further splitting ceases [82].
Roberts, and Sobel operators are the most frequently utilized Figure 25 shows the splitting process.
first order differential operators [83]. There are a number
of edge detection operators such as the template matching
B. IMAGE SEGMENTATION STATE OF THE ART
edge detectors. One challenge with edge-based segmentation
Reference [76] proposed a segmentation process that
is that sometimes it presents edges in locations where there
improves segmentation accuracy by modifying the super-pixel
is no border. Filtering, enhancement, and detection are the
extraction methodology so as to increase robustness to added
three processes in edge segmentation algorithms [77]. The
noise. The segmentation method is based on Gabor filter-
purpose of the filtering process is to reduce the amount of
ing and Principal Component Analysis (PCA). Figure 26
noise present in the imagery. The enhancement uses high
presents the state-of-the-art segmentation process. The
pass filtering to detect and reveal local changes in intensity.
method depends on two principal tasks: (1) pre-segmentation
Finally, the edges detected (using threshold techniques) are
(super-pixel extraction), and (2) clustering of previously
combined or linked together to form the boundaries of the
extracted pixels.
image object. One challenge with edge-based segmentation
is that, sometimes it presents edges in locations where there (a) Pre-segmentation
is no border. An input image is subdivided into a number of regions
(c) Region Based Segmentation of interest. Each region is made up of pixels with similar
Images are segmented into regions with similar properties features. The Watershed Transform (WT) clustering based
using region-based approaches [81]. There are three types super-pixel algorithm has previously been considered for
of region based segmentation, namely: (1) region-growing super-pixel extraction [86], [87].
segmentation, (2) region-splitting and merging segmenta- (b) Gabor filter
tion [82], [84]. Gabor filters are used to extract spatially localized spectral
(1) Region Growing Segmentation features [76]. They have been advocated for because they are
It starts with the matrix’s origin (seed point), which is then based on principles found in similar human visual systems
subjected to a rule that joins surrounding pixels to these start- and have key features that can be utilized to segment images.
ing regions, and the procedure is repeated until a particular Before the introduction of deep learning, machine learn-
threshold is met [81]. The method is repeated until there ing techniques such as SVM, K-means clustering, Random
are no more pixels to ascribe. This process is repeated until Forest, etc., were the chief algorithms for image segmenta-
the entire image is segmented. The algorithms, on the other tion. Semantic segmentation using deep learning has proven
hand, suffer from a lack of control over the region’s growth to work better than the aforementioned techniques because
break-off criterion [85]. they classify each pixel of an image rather than the entire

45304 VOLUME 10, 2022


C. Kwenda et al.: Machine Learning Methods for Forest Image Analysis and Classification

FIGURE 28. VGGNet [89].

FIGURE 27. AlexNet [89].

image object. The next chapter gives an overview of semantic


segmentation techniques.

VII. SEMANTIC SEGMENTATION USING DEEP LEARNING


This section introduces fundamental ideas of CNNs and sub-
sequent variants for semantic segmentation, as well as their
network structures [88].

A. AlexNet, VGGNet AND GoogleNet FIGURE 29. GoogLeNet [89].


These are the three chief deep neural networks for image
classification, which formed the major foundations of later
developments. The networks support network architectures
for semantic segmentation.

1) AlexNet
AlexNet is made up of five convolutional layers and three
connected layers [89]. In between the convolutional layers is a
FIGURE 30. Inception Module [69].
pooling layer whose role is aimed at reducing dimensionality
and computational complexity. AlexNet’s pooling strategy
is max pooling, and the strategy is to obtain the biggest
to a greater extent is that it increases network depth, thereby
value covered by the filter, which is used to remove noisy
improving the accuracy of the network. The network’s perfor-
components [88]. Filters of sizes 11 × 11 and 5 × 5 are used
mance in tasks like semantic segmentation and target detec-
in the first and second convolutional layers, respectively. The
tion is improved by using features extracted from CNN that
last three layers use small-sized filters of 3 × 3. The whole
are structured in a hierarchy of scales [91]. Other classifiers,
process is described in Figure 27. The primary purpose of
such as SVMs, can use the features without fine-tuning [92].
such filters is to be solely used for feature extraction. Varying
filters accommodate objects of different scales.
3) GoogLeNet
1) It supports the application of non-saturating Rectified The architecture is different from the other three in that it
Linear Unit (ReLU) whose output is defined by involves three aspects, namely the inception module, at the
F(x) = max(x, 0). training stage, an auxiliary classifier is required, as well
as one fully connected layer [93]. Output results from
2) It employs the overlapping max pooling strategy these filters are concatenated with the maximum pooling
(which means that each filtering operation’s step result. Between the inception modules, maximum pooling
size (stride) is smaller than the filter’s overall size). is employed, and after the last inception module, average
3) To reduce over-fitting, it uses the dropout approach in pooling that employs dropout is used [94]. The flow chart
fully-connected layers. diagram is shown in Figure 29. The network is so deep
because it is made up of nine inception modules and up to
2) VGGNet three convolutional layers. Because of the profundity of the
The network is made up of three fully connected layers and network, the smooth flow of gradient from layer to layer
a varying number of convolutional layers. This is shown in becomes an issue. Figure 30 shows the Inception Module.
Figure 28. Unlike AlexNet, VGGNet has fixed small size The issue is addressed by adding an auxiliary classifier in the
filters of 3 × 3 in the convolutional layer [90]. The number middle of the convolutional layers, whose role is to process
of weights in the network is reduced by using small filters, the outputs from the inception modules. The loss from these
which minimizes the training complexity. Just like AlexNet, classifiers is added to the overall loss of the network during
VGGNet uses max pooling over a 2 × 2 window slide of training. Auxiliary classifiers are prohibited from making
2 pixels. The advantage of simplifying convolutional layers decisions during the prediction phase.

VOLUME 10, 2022 45305


C. Kwenda et al.: Machine Learning Methods for Forest Image Analysis and Classification

FIGURE 31. FCN Network [95].

B. FULLY CONVOLUTIONAL NETWORK


Fully convolutional networks for semantic image seg-
mentation are an extension of AlexNet, VGGNet, and
GoogLeNet [95]. Multi-convolutional, deconvolutional, and
fusion are the three steps that define the network. The flow
chart is shown in Figure 31. Convolutional layers have been
substituted for fully linked layers, with the specification that
each image’s score be computed using a 1 × 1 convolution.
Because of pooling, the output image from convolutional
FIGURE 32. Unet [89].
layers is smaller than the input image. The deconvolutional
process is used to restore the image. It uses the same methods
as the convolutional process, but cushions the framework (by
padding the matrix) and joins the elements inside a deconvo- and labeling pixels. The interconnection of layers in Unet is
lution filter to increase the input size. The process of recov- shown in Figure 32.
ering the original image through the deconvolution process
has some side effects; for example, some details are lost as a D. SegNet
result of the dilution of class scores. To circumvent the side The network is composed of two subnetworks, namely; the
effects, the skip architecture combines semantic information encoder and decoder networks. The encoder network’s man-
obtained from layers with location details obtained from date is the downsizing of feature maps. It consists of a varying
previous layers. By element wise addition, the up-sampled number of convolutions and subsequent maxpooling opera-
deep layer is fused with the yield or output of a shallow tions for feature extraction [97]. However, features produced
layer. have vague or ambiguous spatial information. The issue is
solved by saving an element index that will be used later in
C. UNET the decoder network’s up-sampling procedure. Convolutions
The building blocks of Unet are the convolutional and decon- map low-resolution features to high-resolution features in the
volutional layers. The network works well with small images, decoder network. A 2×2 low-resolution feature, for example,
hence the paramount step is downsizing of input images [96]. is up-sampled to a 4 × 4 matrix. This process may result in
Convolutional layers use filters of size 3 × 3 which produce the loss of spatial information; therefore, reusing the pooling
output images that are subsequently subjected to Relu for index from the encoder network completely recovers the lost
processing, followed by maxpooling (which uses a stride information. The SegNet network is depicted in Figure 33.
of two). Maxpooling generates downsized outputs. Feature
channels in the convolutional layers double at each and every E. DeepNet
step. The deconvolutional layer does upsampling, but a 2 × It is a variant of FCN that employs dilated convolution to
2 convolution is used to limit the number of features to the broaden the scope of filters to include image context in
required standard. The network generates the segmentation a larger neighborhood while also allowing for flexibility
result by applying a 1 × 1 convolution on the feature map over feature response resolution [17]. Deeplab uses Atrous

45306 VOLUME 10, 2022


C. Kwenda et al.: Machine Learning Methods for Forest Image Analysis and Classification

FIGURE 33. SegNe [95].

FIGURE 35. Residual block [95].

FIGURE 34. Residual Net [89].


because the automatic processing of images by these tech-
niques chiefly depends on human expert knowledge, which
Spatial Pyramid Pooling (ASPP) for up-sampling. Several has impacted the way land surveys are done [101]. The
atrous convolutions operated on the same kernel but with main advantage of deep learning approaches is the automatic
various sampling rates are used in the scheme. An additional computational extraction of features, unlike other machine
operator combines the output from all convolutions. Down- learning algorithms where feature extraction is typically man-
sampling processes and subsequent maxpooling operators ual [102]. The strength of deep learning algorithms lies in
make segmentation results lose some fine details. To solve the learning from examples. The learning process consists of
problem, conditional random filters (CRFs) are employed to a number of steps: first, an architecture of a network of
improve the spatial localization of segmentation. CRF models nodes is clearly defined. The nodes that form an Artificial
contribute to the smooth segmentation process based on the Neural Network (ANN) are arranged into layers. An ANN
underlying image intensities [98]. They boost the accuracy with many layers is referred to as a Deep Neural Network
score by 1% to 2%. (DNN). The behaviour of the DNN is determined by the
type and number of nodes as well as the connection between
F. ResNet the nodes [101]. If an existing DNN is to be customized
The residual network is well recognized for its 152 layer for an new application context, its weights are recursively
depth and residual block introduction [99]. The residual block updated to achieve the new desired response. This process is
is presented in Figure 35. As based on traditional neural referred to as ‘‘transfer learning’’. Deep learning was origi-
networks, the greater the number of layers, the better the per- nally used for locating and classifying different tree species in
formance of the network. However, because of the vanishing a mosaic built from UAV-acquired images [103], [104]. [105]
gradient problem, first layer weights will not be updated cor- devised a deep learning technique to detect and identify tree
rectly through the backpropagation algorithm [100]. As the species. The objective of the study was to classify patches
error gradient is propagated to earlier layers it goes through corresponding to tree species. The authors developed a Deep
a repeated multiplication process such that the gradient Learning (DL) architecture, which is a hybrid of ResNet and
becomes very small hence the network performance gets UNet, to come up with a semantic segmentation algorithm for
saturated and will start to decrease. This problem is solved by tree spices that is precise and efficient. Seven orthomosaic
using the identity function, whereby the gradient is multiplied images were collected using UAV in the winter, and one
by one so as to preserve the input and avoid any loss in orthomosaic image was collected using UAV in the summer.
the information. The network is made up of the following The algorithm pipeline is presented in Figure 36. The first
components; 3 × 3 filters, CNN downsampling layers with step of the technique identified the classes corresponding to
a stride of 2, global average pooling, and a 1000-way fully each mosaic patch. The focus was on classifying the pix-
connected layer with softmax at the end. ResNet employs els in each mosaic patch. The incorporation of the ResNet
a skip relation, which means that an original input is also architecture into the DL network enhanced the accuracy and
connected to the convolution block’s output. This aids in the efficiency in classifying forest images [104], [106]. Images
solution of the vanishing gradient problem by allowing the were divided into patches in response to the prescribed anno-
gradient to flow in a different direction. The network diagram tations, and each patch was assigned to a list corresponding
of the residual network is shown in Figure 34. to the classes that matched it. Patches could belong to more
than one class, resulting in patches having to be labelled
G. APPLICATION OF DEEP LEARNING TECHNIQUES repeatedly. Because of the repeated labelling of patches, the
New emerging technologies such as deep learning have algorithm is referred to as a Multi-label Patch (MLP) based
gained ground in the remote sensing science fraternity classifier. The ResNet architecture went through the training

VOLUME 10, 2022 45307


C. Kwenda et al.: Machine Learning Methods for Forest Image Analysis and Classification

FIGURE 36. Algorithm pipeline [106].

phase so that it would be able to classify the patches. The for tree species classification are cost-sensitive because they
MLP classification algorithm produced coarsely segmented require very large data sets and are restricted to specific tree
images. A watershed segmentation algorithm was applied species [108]. The study proposed a model based on CNN
to refine the segmentation process. The UNet architecture, to classify tree species at an individual level by analysing
originally used for medical image segmentation [96], is also high resolution RGB images obtained from the UAV. A CNN
very useful for remote sensing images. The UNet architecture was chosen in the study because of its ability to learn highly
was trained with data and pixel-wise annotation patches. The descriptive features from tree canopies. The study proposed
segmentation process follows a number of steps: (1) mosaic a CNN model with 50 convolutional layers, referred to as
images were split into patches for processing, (2) a UNet ResNet50. Figure 38 shows the architecture of ResNet50.
model was trained to predict patch segmentation, and (3) The procedure for performing tree crown delineation was
patch joining was used to obtain semantic segmentation for based on the iterative local maxima filtering technique that
the entire mosaic image. The model achieved an effective was used to identify probable tree tops. Tree tops were
learning transfer with a 12.48% improvement over random designed as markers, hence a marker controlled watershed
weights. Overall, the model reached a higher accuracy of segmentation was performed as a means of complementing
nearly 95%. the DSM for segmenting the tree crowns. Figure 37 shows a
Another study [104] proposed a Residual Neural Net- tree crown segmented polygon. The tree crown delineation
work (ResNet) architecture for classifying tree species process enables tree crown identification labelling. In the
acquired using a camera mounted on a UAV platform. In tem- training phase, images were shuffled in unison with their
perate forests, UAV images have been successfully used to corresponding labels to randomise the input data so that the
distinguish between living and dead forest species [107]. The neural network becomes generalised. The model achieved an
motivation of the study was that, most of the existing methods overall classification accuracy of 80%. The study concluded

45308 VOLUME 10, 2022


C. Kwenda et al.: Machine Learning Methods for Forest Image Analysis and Classification

two pooling layers, and one fully connected layer. The final
layer was used to detect bamboo coverage in Google Earth
images. Input images were randomly shuffled to alleviate
overlapped training and validation data. 72% percent of the
data was used for training and 25% of the data for testing. The
model achieved an average classification accuracy of 97.52%.

VIII. FEATURE EXTRACTION TECHNIQUES


This section delves into the main techniques for feature
extraction, and these include (1) Principal Component
Analysis (PCA); (2) Independent Component Analysis
(ICA); (3) Linear Discriminant Analysis (LDA); and
(4) t-Distributed Stochastic Neighbor Embedding (t-SNE).

1) PRINCIPAL COMPONENT ANALYSIS (PCA)


PCA is popularly used as a dimensionality reduction tech-
FIGURE 37. Tree crown delineation [104]. nique [111]. It was first proposed by [54]. From the original
data input, the PCA method tries combinations of input fea-
tures in order to determine the best features that summarise
the original data. This is accomplished by looking at pair-wise
distances to maximize variances and minimize reconstruction
error [112]. Since PCA is an unsupervised learning algorithm
it leads to misclassification of data in some cases [111]. Dis-
tortion errors arise when data is reconstructed back because
samples would have been projected onto a subspace [113].

2) INDEPENDENT COMPONENT ANALYSIS (ICA)


FIGURE 38. CNN model architecture [104]. ICA, like PCA, is a linear dimensionality reduction method
that combines discrete components to produce input data with
the goal of correctly identifying each of them [111]. It is based
that classification accuracy increases with an increase in the on the principle that two features are deemed independent if
number of training images. their linear and nonlinear dependence are both zero [114].
The task of classifying and mapping vegetation images has Independent Component Analyses are extensively used in
been difficult because the conventional methods employed medical applications such as Electroencephalography (EEG)
are highly labour intensive. Deep learning and CNN came and Functional Magnetic Resonance Imaging (FMRI) analy-
as solutions to the problems posed by traditional meth- sis to differentiate useful from unhelpful signals [111].
ods, but they are still not efficient in detecting ambiguous
objects [109]. There is a little research that employs CNN 3) LINEAR DISCRIMINANT ANALYSIS (LDA)
to detect and classify vegetation in remote sensing science LDA is a supervised learning dimensionality reduction tech-
images [109]. A study by Guirado [110] successfully used nique and a machine learning classifier [111]. The method is
CNN to detect wild shrubs from Google Earth images. The similar to PCA in the sense that it calculates the projection
author demonstrated that a CNN is much better than tradi- of data along a direction, but instead of maximising variation
tional object detection methods. Another study [109] used of data, the LDA uses label information to determine a pro-
a deep learning model and the chopped picture method to jection by maximising the ratio of between class variance to
detect vegetation from Google Earth images. The study was within class variance [113]. The goal of LDA is formulated
carried out against the backdrop that existing work still faces as the Fisher criterion [115].
huge challenges in classifying vegetation that has ambiguous
and amorphous shapes, such as clonal plants. The training uT SB U
J (u) := (14)
data was prepared using the chopped picture method, and uT SW U
images were put into two sets; one set with images com- Recently, this technique has been used for indoor positioning
pletely covered with bamboo trees and the other set without or localisation systems for the purpose of obtaining superior
bamboo trees. Images were then chopped into small squares and higher accuracy [116]. The performance of LDA in the
and subsequently used as training images. A classical deep construction of data using independent variables is directly
learning model in the form of a LeNet network was employed proportional to the number of data patterns [116]. However,
by the study because it is efficient in processing small-sized its performance is yet to be confirmed in the context of
images. The network is composed of two convolution layers, non-linearity [117].

VOLUME 10, 2022 45309


C. Kwenda et al.: Machine Learning Methods for Forest Image Analysis and Classification

4) LOCALLY LINEAR EMBEDDING (LLE) two parts: (a) adaptive color region extraction via the defi-
The LLE is built on a foundation of manifold learning. nition circle (DC) model, and (b) corner feature extraction
A manifold is a D-dimensional object that is embedded in via the edge detection model, which includes a suppression
a higher-dimensional space. A manifold is considered as an mechanism.
integration of small linear patches, which is done through The purpose of the algorithm was to produce a clear and
piece-wise linear regression [118]. To do the integral opera- precise forest saliency map. The algorithm is broken down
tion, [119] proposed the construction of a kNN graph similar into three parts, and those are: (a) the color feature extraction
to an isomap. Then all the sample data is represented by a part; (b) the determination of the center of the DC model;
weighted summation of its k nearest neighbors. Considering and (c) an accurate description of color. The algorithm is
wi to be row i of the n x k weight matrix w, the solution to the expressed in figure 36.
goal is found by: (A) Colour feature extraction
1 Model appropriate for the extraction of color features is
Wi = G−1
i 1 (15) the DC model, which is comprised of the following steps:
1T (G−1
i T) (1) using the RGB picture G histogram to calculate the DC
G := (xi 1T − Vi )T (xi 1T − Vi ) (16) model’s center; (2) mapping the image to the HIS color space
or lab color; (3) using the k-means procedure to find the DC
where G is called a Gram matrix and V is a n x k matrix. model’s radius. The flow chart of the DC model is shown in
After the process of representing samples as a weighted Figure 41.
summation of their neighbors, LLE represents samples in the (1) Determine the center of the of the DC model
lower dimensional space by their neighbors with the same
While the DC model can describe color fluctuations under
obtained weight.The method has been successfully used in
specific gradients, the forest region’s dominating hue is gen-
feature extraction of Motor Imagery Electroencephalography
erally green, implying that the ’greenish’ pixels in the forest
(MI-EEG) and it outperformed methods such as Discrete
area must be filtered off. As a result, the G channel (green)
Wavelet Transform (DWT) in classification accuracy with
in the RGB three-channel system will be the focal point
fewer feature dimension [120].
for filtering out pixels that fall within a given range and
calculating the mean value within the range. That value will
5) T-DisTribuTed STOCHASTIC NEIGHBOR EMBEDDING
be regarded as the center of the circle.
(T-SNE)
(2) Color description
tSNE is an improvement of Stochastic Neighbor Embedding
(SNE) [121], which is used for data visualisation. The main It is critical to note that the purity of the green is determined
goal is to preserve the joint distribution of data samples in by the circle’s center, thus the radius must be adjusted to
the original and embedding spaces. Considering Pij and Qij account for a variety of color variations and fault tolerance.
to donate the probability that xi and xj and are neighbors and The RGB channel, on the other hand, does not function well
yi and yj are neighbors, it follows that: for color adjustments. The RGB color system is converted to
Hue, Saturation, and Intensity (HSI) or lab color space to fix
Pj |i + Pj |j the problem. The color can be defined more correctly using
pij = (17) only two channels, namely hue and saturation, rather than the
2n
exp(−||xi − xj ||22 /2σi2 RGB color space.
pij = P (18) (3) Adjustment of DC Model radius
K 6=1 exp(−||xi − xk ||2 /2σi
2 2
To improve the accuracy and adaptability of forest region
1 + ||yi − yj ||22 )− 1
qij = ( P (19) extraction, the center and entire remote sensing picture
2)− 1
k6 =1 (1 + ||yk − yl ||2 acquired in the first phase is mapped or converted to HIS
color space.Each pixel’s Euclidean distance to the RSI center
Embedded samples are then obtained by adopting the gra- is calculated. The k-means clustering algorithm subdivide the
dient descent method over minimizing Keullback-Leibler forest into clusters and determines the Euclidean distance
divergence [122] of p and q distributions. The main advantage between the cluster center and the DC model’s center, which
of t-SNE is the ability to deal with the problem of visualising is then used as the DC model’s radius.
‘‘crowded’’ high dimensional data in a low dimensional space
(e.g., 2D or 3D) [122], [123]. h = [h, s, i] (20)
R = (h − h0 )2 + (s − s0 )2 + (i − i0 )2 (21)
A. FEATURE EXTRACTION STATE OF THE ART k
X
In image retrieval, calibration, classification, and clustering, δ(i) = |Uki − Uki−1 | (22)
it is critical to extract useful features or characteristics from k=1
the image [124]. Color histogram is the most significant P denotes the center of the DC model and the value would
method to represent color features [125]. [126] provided have been obtained by the histogram model in the RGB to HIS
a state-of-the-art feature extraction model that consists of color scheme. R is the Euclidean distance and δ(i) represents

45310 VOLUME 10, 2022


C. Kwenda et al.: Machine Learning Methods for Forest Image Analysis and Classification

FIGURE 39. DC model in color extraction feature [126].

FIGURE 41. Flow diagram of the algorithm [126].

Average Precision (AP).


Precision ∗ Recall
F1 = 2 (23)
Precision + Recall
FIGURE 40. Edge feature extractor. Numberofmisclassifiedforestimages
FRP = ∗ 100% (24)
Numberofimages
an is the iterations of the class algorithm. Figure 39 shows the Numberofcorrectlyclassifiedimages
Acc = ∗ 100% (25)
color extraction feature of the DC model Numberofimages
(B) Edge Feature extraction AP = areaundertheprecision-recallcurvePRC (26)
The goal of this procedure is to successfully eliminate non- Measurements for image segmentation area evaluation are
forest areas. [126] proposed the canny operator as the edge presented in Table 2. The Area Fitness Index (AFI) was
detection operator because of its better performance than proposed by [63] and the remaining measurements by [2]
other operators in terms of edge feature detection. In partic- The average distance between the reference object and its
ular, denoising is key for image processing, and in this par- matching image object is described by the Position Discrep-
ticular instance, a Gaussian filter was employed to smoothen ancy Index (PDI). The Overall PDI is the average of the PDI.
the image, thereby preserving the edges. The amplitude and
direction of the gradient are then calculated using the finite let a = (x(k) − Xr )2 + (Y (k) − Yr )2 (27)
difference of the step-wise derivative. The canny edge detec- XM p
tor operator returns only the maximum value and uses the let b = (X (I ) − Xr )2 + (Y (I ) − Yr )2 ) (28)
non-maximum suppression operation to suppress the field’s i=1
conspicuous points, resulting in a corner point with high N
1 Xp
precision and clear vision. Finally, by using a dual threshold PDI = ( (a + b) (29)
N +M
setting, discrete edges are linked together to form a contin- k=1
n
uous edge. Figure 35 shows the stages of an edge feature 1X
extractor. PDIOverall = PDI (i) (30)
n
i=n

IX. PERFORMANCE EVALUATION MATRIX


The major matrices to measure the performance of the model X. PERFORMANCE ANALYSIS OF THE STATE OF THE ART
in forest image classification are: False Positive Rate (FPR), Results based on the CNN with hyperparameter settings
Accuracy (Acc), F1 score, Precision-Recall Curve, and of patch size L = 15, regulation strength α = 0.001, and

VOLUME 10, 2022 45311


C. Kwenda et al.: Machine Learning Methods for Forest Image Analysis and Classification

TABLE 2. Evaluation Matrix.

TABLE 3. Segmentation results based on PDI and ADI. TABLE 5. Performance of CNN.

TABLE 4. Edge feature extractor. XI. RECOMMENDATION


Pixel-based techniques have been commonly used for image
analysis and classification for a very long time. However,
due to the massive growth of high spatial resolution images
and the fact that pixel based methods only work with spec-
tral information, the technique could not be fully utilized
because it does not incorporate spatial, texture, and shape
information, [127]. Previous studies have also shown that
such approaches cause noise in the output message, other-
wise known as the ‘‘salt and pepper effect.’’ [128]. Due to
the limitations of traditional pixel-based methods to cope
with high-resolution imagery, OBIA methods have become
increasingly popular because they have a high degree of infor-
mation utilization, strong anti-interference, a high degree
of data integration, and high classification accuracy [129],
[130]. However, GEOBIA techniques are made up of knowl-
edge and rules purely from domain expert knowledge, such
C = 32 filter kernels in the first convolutional layer up to a that they enhance the subjectivity of image interpretation
maximum of C’ = 128 kernels. Using Tensorflow and Keras processes. Given the evolution of remote sensing science
mechanisms, the final CNN classifiers used the hyperspectral as a result of artificial intelligence, this study suggests that
imagery to outperform the RGB subset image as indicated by we pay more attention to Good Old-Fashioned Artificial
precision, recall, or F-score. Results are presented in Table 3. Intelligence (GOFAI), which is based on sound mathematics
Table 4 displays state-of-the-art segmentation results and logic to construct symbolic representations of abstract
obtained using a supervised segmentation method and notions [1]. This research highly recommends a shift towards
the following matrix measurements: AFI (index), OE, remote sensing image analysis with ontologies because such
OEO VERALL, CE, CEO VERALL, ADI, PDI. technology allows management, aggregation, and sharing
Object fate analysis and the method proposed by [63] of the knowledge of remote sensing and domain experts.
do not objectively express segmentation quality results. Formal ontologies explicitly define expert knowledge that
Table 5 indicates that AFI ranges from 0.561 to −0.280 when is used to interpret remote sensing images. This improves
shape and compactness are both at 0.1 and the scale parameter the sharing and reuse of formalized remote sensing expert
is changed from 60 to 120. knowledge.

45312 VOLUME 10, 2022


C. Kwenda et al.: Machine Learning Methods for Forest Image Analysis and Classification

XII. CONCLUSION [14] D. Chaudhuri, N. K. Kushwaha, and A. Samal, ‘‘Semi-automated road


This paper is a critical and analytical survey of the methods detection from high resolution satellite images by directional morphologi-
cal enhancement and segmentation techniques,’’ IEEE J. Sel. Topics Appl.
for forest image detection and classification. It is a compre- Earth Observ. Remote Sens., vol. 5, no. 5, pp. 1538–1544, Oct. 2012.
hensive review of the techniques used to detect objects of [15] A. H. S. Solberg, ‘‘Contextual data fusion applied to forest map revi-
interest in an image that will be analysed for classification sion,’’ IEEE Trans. Geosci. Remote Sens., vol. 37, no. 3, pp. 1234–1243,
May 1999.
of forests. These techniques cover semantic segmentation [16] A. Hanif, A. B. Mansoor, and A. S. Imran, Performance Analysis
techniques, feature extraction methods and finally classifica- of Vehicle Detection Techniques: A Concise Survey, vol. 746. Cham,
tion techniques. Exploration of knowledge based approaches Switzerland: Springer, 2018, doi: 10.1007/978-3-319-77712-2_46.
[17] G. Chen, X. Zhang, Q. Wang, F. Dai, Y. Gong, and K. Zhu, ‘‘Symmet-
in form of GEOBIA were analysed and how their short- rical dense-shortcut deep fully convolutional networks for semantic seg-
coming in terms of dual mode of defining geographic con- mentation of very-high-resolution remote sensing images,’’ IEEE J. Sel.
cept, vagueness and ambiguity of geographic concepts, and Topics Appl. Earth Observ. Remote Sens., vol. 11, no. 5, pp. 1633–1644,
May 2018.
semantic gaps were addressed by ontology knowledge based [18] R. Mohan and R. Nevatia, ‘‘Using perceptual organization to extract 3D
approaches. Performance of the state of the art Tensorflow structures,’’ IEEE Trans. Pattern Anal. Mach. Intell., vol. 11, no. 11,
and Keras for image classification were analysed. Formal pp. 1121–1139, Nov. 1989.
[19] H. G. Akcay and S. Aksoy, ‘‘Building detection using directional spatial
ontologies knowledge representation was recommended for constraints,’’ in Proc. IEEE Int. Geosci. Remote Sens. Symp., Jul. 2010,
state of the art approach for detecting objects of interest. CNN pp. 1932–1935.
methods for semantic segmentation were critically analysed [20] J. Peng and Y. Liu, ‘‘Model and context-driven building extraction
in dense urban aerial images,’’ Int. J. Remote Sens., vol. 26, no. 7,
and these were; AlexNet, VGGNet, GoogLeNet, FCN, UNet, pp. 1289–1307, 2005.
SegNet, DeepNet and ResNet. [21] A. O. Ok, C. Senaras, and B. Yuksel, ‘‘Automated detection of arbitrarily
shaped buildings in complex environments from monocular VHR optical
satellite imagery,’’ IEEE Trans. Geosci. Remote Sens., vol. 51, no. 3,
REFERENCES pp. 1701–1717, Mar. 2013.
[1] D. Arvor, M. Belgiu, Z. Falomir, I. Mougenot, and L. Durieux, ‘‘Ontolo- [22] Y.-T. Liow and T. Pavlidis, ‘‘Use of shadows for extracting buildings
gies to interpret remote sensing images: Why do we need them?’’ in aerial images,’’ Comput. Vis., Graph., Image Process., vol. 49, no. 2,
GIScience Remote Sens., vol. 56, no. 6, pp. 911–939, Aug. 2019. pp. 242–277, Feb. 1990.
[2] J. Cheng, Y. Bo, Y. Zhu, and X. Ji, ‘‘A novel method for assessing the [23] X. Zhang, X. Feng, P. Xiao, G. He, and L. Zhu, ‘‘Segmentation qual-
segmentation quality of high-spatial resolution remote-sensing images,’’ ity evaluation using region-based precision and recall measures for
Int. J. Remote Sens., vol. 35, no. 10, pp. 3816–3839, May 2014. remote sensing images,’’ ISPRS J. Photogramm. Remote Sens., vol. 102,
[3] G. Cheng and J. Han, ‘‘A survey on object detection in optical pp. 73–84, Apr. 2015.
remote sensing images,’’ ISPRS J. Photogramm. Remote Sens., vol. 177, [24] T. Su and S. Zhang, ‘‘Object-based crop classification in hetao plain
pp. 11–28, Jul. 2016. using random forest,’’ Earth Sci. Informat., vol. 14, no. 1, pp. 119–131,
[4] R. M. Dufour, E. L. Miller, and N. P. Galatsanos, ‘‘Template matching Mar. 2021.
based object recognition with unknown geometric parameters,’’ IEEE [25] R. Unnikrishnan, C. Pantofaru, and M. Hebert, ‘‘Toward objective evalua-
Trans. Image Process., vol. 11, no. 12, pp. 1385–1396, Dec. 2002. tion of image segmentation algorithms,’’ IEEE Trans. Pattern Anal. Mach.
[5] K. Bahareh, M. Shattri, S. Helmi, and H. Alfian, ‘‘Integration of tem- Intell., vol. 29, no. 6, pp. 929–944, Jun. 2007.
plate matching and object-based image analysis for semi-automatic oil [26] Y. Wang, Q. Qi, and Y. Liu, ‘‘Unsupervised segmentation evaluation using
palm tree counting in UAV images,’’ in Proc. 37th Asian Conf. Remote area-weighted variance and jeffries-matusita distance for remote sensing
Sens. (ACRS), vol. 3, 2016, pp. 2333–2340. images,’’ Remote Sens., vol. 10, no. 8, p. 1193, Jul. 2018.
[27] H. Gao, Y. Tang, L. Jing, H. Li, and H. Ding, ‘‘A novel unsupervised
[6] I. A. Aljarrah and A. S. Ghorab, ‘‘Object recognition system using
segmentation quality evaluation method for remote sensing images,’’
template matching based on signature and principal component analy-
Sensors, vol. 17, no. 10, p. 2427, Oct. 2017.
sis,’’ Int. J. Digit. Inf. Wireless Commun., vol. 2, no. 2, pp. 156–163,
[28] G. Modica, G. De Luca, G. Messina, and S. Praticò, ‘‘Comparison
2012.
and assessment of different object-based classifications using machine
[7] I. Jordi, V. Aurthur, M. Arias, B. Tardy, D. Morin, and I. Rodes, ‘‘Oper-
learning algorithms and UAVs multispectral imagery: A case study in a
ational high resolution land cover map production at the country scale
citrus orchard and an onion crop,’’ Eur. J. Remote Sens., vol. 54, no. 1,
using satellite image time series,’’ Remote Sens., vol. 9, no. 1, p. 95, 2017.
pp. 431–460, Jan. 2021.
[8] M. Cristina, A. Picoli, G. Camara, I. Sanches, R. Simões, [29] G. J. Hay and G. Castilla, ‘‘Geographic object-based image analysis
A. Carvalho, A. Maciel, A. Coutinho, J. Esquerdo, J. Antunes, (GEOBIA): A new name for a new discipline,’’ in Object-Based Image
R. Anzolin, D. Arvor, and C. Almeida, ‘‘Big earth observation time series Analysis. Berlin, Germany: Springer, 2008, pp. 75–89.
analysis for monitoring Brazilian agriculture,’’ ISPRS J. Photogramm. [30] J. Krishnaswamy, M. C. Kiran, and K. N. Ganeshaiah, ‘‘Tree model
Remote Sens., vol. 145, pp. 328–339, Nov. 2018. based eco-climatic vegetation classification and fuzzy mapping in diverse
[9] M. Papadomanolaki, M. Vakalopoulou, S. Zagoruyko, and tropical deciduous ecosystems using multi-season NDVI,’’ Int. J. Remote
K. Karantzalos, ‘‘Benchmarking deep learning frameworks for the Sens., vol. 25, no. 6, pp. 1185–1205, Mar. 2004.
classification of very high resolution satellite multispectral data,’’ ISPRS [31] K. J. Feeley, T. W. Gillespie, and J. W. Terborgh, ‘‘The utility of spectral
Ann. Photogramm., Remote Sens. Spatial Inf. Sci., vol. 7, pp. 83–88, indices from Landsat ETM+ for measuring the structure and composition
Jun. 2016. of tropical dry forests,’’ Biotropica: J. Biol. Conservation, vol. 37, no. 4,
[10] J. Campbell, Introduction to Remote Sensing, 3rd ed. New York, NY, pp. 508–519, 2005.
USA: Guilford Press, 2002. [32] G. A. Sanchez-Azofeifa, K. L. Castro, B. Rivard, M. R. Kalascka, and
[11] G. Marcus, ‘‘Deep learning: A critical appraisal,’’ 2018, 1801.00631. R. C. Harriss, ‘‘Remote sensing research priorities in tropical dry forest
[12] A. L. Ali, Z. Falomir, F. Schmid, and C. Freksa, ‘‘Rule-guided environments,’’ Biotropica, vol. 35, no. 2, pp. 134–142, Jun. 2003.
human classification of volunteered geographic information,’’ ISPRS [33] S. Martinuzzi, W. A. Gould, O. M. Ramos González,
J. Photogramm. Remote Sens., vol. 127, pp. 3–15, May 2017, doi: A. Martínez Robles, P. Calle Maldonado, N. Pérez Buitrago, and
10.1016/j.isprsjprs.2016.06.003. J. J. Fumero Cabán, ‘‘Mapping tropical dry forest habitats integrating
[13] T. R. Martha, N. Kerle, C. J. V. Westen, V. Jetten, and K. V. Kumar, landsat NDVI, ikonos imagery, and topographic information in the
‘‘Segment optimization and data-driven thresholding for knowledge- Caribbean island of Mona,’’ Revista de Biología Tropical, pp. 625–639,
based landslide detection by object-based image analysis,’’ IEEE Nov. 2006.
Trans. Geosci. Remote Sens., vol. 49, no. 12, pp. 4928–4943, [34] B. Bennett, ‘‘What is a forest? On the vagueness of certain geographic
Dec. 2011. concepts,’’ in Proc. TOPOI, 2002, pp. 2–17.

VOLUME 10, 2022 45313


C. Kwenda et al.: Machine Learning Methods for Forest Image Analysis and Classification

[35] B. Bennett, ‘‘Foundations for an ontology of environment and habitat,’’ [59] B. Gajderowicz and A. Sadeghian, ‘‘Ontology granulation through induc-
in Proc. FOIS, 2010, pp. 31–44. tive decision trees,’’ in Proc. URSW, 2009, pp. 39–50.
[36] A. Mayamba, R. M. Byamungu, B. V. Broecke, H. Leirs, P. Hieronimo, [60] N. Kartha and A. Novstrup, ‘‘Ontology and rule based knowledge rep-
A. Nakiyemba, M. Isabirye, D. Kifumba, D. N. Kimaro, M. E. Mdangi, resentation for situation management and decision support,’’ Proc. SPIE,
and L. S. Mulungu, ‘‘Factors influencing the distribution and abundance vol. 7352, May 2009, Art. no. 73520P.
of small rodent pest species in agricultural landscapes in eastern Uganda,’’ [61] J. C. Giarratano and G. D. Riley, Expert Systems: Principles and Pro-
J. Vertebrate Biol., vol. 69, no. 2, Oct. 2020, Art. no. 020002. gramming. Pacific Grove, CA, USA: Brooks/Cole, 2005.
[37] H. G. Lund, ‘‘When is a forest not a forest?’’ J. Forestry, vol. 100, no. 8, [62] D. A. Waterman, D. B. Lenat, and F. Hayes-Roth, Building Expert Sys-
pp. 21–28, 2002. tems. Reading, MA, USA: Addison-Wesley, 1983.
[38] E. Romijn, J. H. Ainembabazi, A. Wijaya, M. Herold, A. Angelsen, [63] P. He, ‘‘Counter cyber attacks by semantic networks,’’ in Emerging
L. Verchot, and D. Murdiyarso, ‘‘Exploring different forest definitions Trends in ICT Security. Amsterdam, The Netherlands: Elsevier, 2014,
and their impact on developing REDD+ reference emission levels: A pp. 455–467.
case study for Indonesia,’’ Environ. Sci. Policy, vol. 33, pp. 246–259, [64] G. De Giacomo, D. Lembo, M. Lenzerini, A. Poggi, and R. Rosati,
Nov. 2013. ‘‘Using ontologies for semantic data integration,’’ in A Comprehen-
[39] C. E. Woodcock and A. H. Strahler, ‘‘The factor of scale in remote sive Guide Through Italian Database Res. Over Last 25 Years. Cham,
sensing,’’ Remote Sens. Environ., vol. 21, no. 3, pp. 311–332, Apr. 1987. Switzerland: Springer, 2018, pp. 187–202.
[40] A. W. M. Smeulders, M. Worring, S. Santini, A. Gupta, and R. Jain, [65] F. Baader, The Description Logic Handbook: Theory, Implementation and
‘‘Content-based image retrieval at the end of the early years,’’ IEEE Trans. Applications. Cambridge, U.K.: Cambridge Univ. Press, 2003.
Pattern Anal. Mach. Intell., vol. 22, no. 12, pp. 1349–1380, Dec. 2000. [66] B. C. Grau, I. Horrocks, B. Motik, B. Parsia, P. Patel-Schneider, and
[41] C. Unger and P. Cimiano, ‘‘Pythia: Compositional meaning construc-
U. Sattler, ‘‘OWL 2: The next step for OWL,’’ J. Web Semantics, vol. 6,
tion for ontology-based question answering on the semantic web,’’ in
no. 4, pp. 309–322, Nov. 2008.
Lecture Notes in Computer Science (Including Subseries Lecture Notes
[67] B. C. Grau, I. Horrocks, Y. Kazakov, and U. Sattler, ‘‘A logical framework
in Artificial Intelligence and Lecture Notes in Bioinformatics) (Lecture
for modularity of ontologies,’’ in Proc. IJCAI, 2007, pp. 298–303.
Notes in Computer Science), vol. 6716. Germany: Bielefeld Univ., 2011,
[68] S. Ghilardi, C. Lutz, and F. Wolter, ‘‘Did I damage my ontology,’’ in Proc.
pp. 153–160.
KR, 2006, pp. 187–197.
[42] R. H. Kilmann and K. W. Thomas, ‘‘Developing a forced-choice measure
[69] S. Roy and I. J. Cox, ‘‘A maximum-flow formulation of the N-camera
of conflict-handling behavior: The ‘MODE’ instrument,’’ Educ. Psychol.
stereo correspondence problem,’’ in Proc. 6th Int. Conf. Comput. Vis.,
Meas., vol. 37, no. 2, pp. 309–325, 1977.
[43] B. Bachimont, A. Isaac, and R. Troncy, ‘‘Semantic commitment for Jan. 1998, pp. 492–499.
designing ontologies: A proposal,’’ in Proc. Int. Conf. Knowl. Eng. Knowl. [70] R. Geerken, B. Zaitchik, and J. P. Evans, ‘‘Classifying rangeland vege-
Manage. Berlin, Germany: Springer, 2002, pp. 114–121. tation type and coverage from NDVI time series using Fourier filtered
[44] E. F. Fama and M. C. Jensen, ‘‘Separation of ownership and control,’’ cycle similarity,’’ Int. J. Remote Sens., vol. 26, no. 24, pp. 5535–5554,
J. Law Econ., vol. 26, no. 2, pp. 301–325, 1983. Dec. 2005.
[45] K. Satoh, Nonmonotonic Reasoning by Minimal Belief Revision. Tokyo, [71] Y. Guo, S. Han, Y. Li, C. Zhang, and Y. Bai, ‘‘K-nearest neighbor
Japan: ICOT Research Center (Institute for New Generation Computer combined with guided filter for hyperspectral image classification,’’ Proc.
Technology), 1988. Comput. Sci., vol. 129, pp. 159–165, Jan. 2018.
[46] T. Gruber, ‘‘What is an ontology,’’ Stanford Univ., Stanford, CA, USA, [72] G. De Luca, J. M. N. Silva, S. Cerasoli, J. Araújo, J. Campos,
Tech. Rep. KSL92-71, 1993. S. Di Fazio, and G. Modica, ‘‘Object-based land cover classification of
[47] S. Andrés, D. Arvor, I. Mougenot, T. Libourel, and L. Durieux, cork oak woodlands using UAV imagery and orfeo toolbox,’’ Remote
‘‘Ontology-based classification of remote sensing images using spectral Sens., vol. 11, no. 10, p. 1238, May 2019.
rules,’’ Comput. Geosci., vol. 102, pp. 158–166, May 2017. [73] A. D. P. Pacheco, J. A. D. S. Junior, A. M. Ruiz-Armenteros, and
[48] D. Mallenby, ‘‘Handling vagueness in ontologies of geographical infor- R. F. F. Henriques, ‘‘Assessment of K-nearest neighbor and random forest
mation,’’ Ph.D. dissertation, School Comput., Univ. Leeds, Leeds, U.K., classifiers for mapping forest fire areas in central Portugal using Landsat-
2008. [Online]. Available: http://etheses.whiterose.ac.U.K./1373/ 8, Sentinel-2, and Terra imagery,’’ Remote Sens., vol. 13, no. 7, p. 1345,
[49] N. Eric Maillot and M. Thonnat, ‘‘Ontology based complex object recog- Apr. 2021.
nition,’’ Image Vis. Comput., vol. 26, no. 1, pp. 102–113, Jan. 2008. [74] P. T. Noi and M. Kappas, ‘‘Comparison of random forest, K-nearest
[50] C. Eschenbach and M. Grüninger, ‘‘Formal ontology in information neighbor, and support vector machine classifiers for land cover clas-
systems,’’ in Proc. 5th Int. Conf. (FOIS), vol. 110, 2008, pp. 68–71. sification using Sentinel-2 imagery,’’ Sensors, vol. 18, no. 1, p. 18,
[51] M. Davis, S. King, N. Good, and R. Sarvas, ‘‘From context to content:
2018.
Leveraging context to infer media metadata,’’ in Proc. 12th Annu. ACM [75] E. Tomppo, M. Haakana, M. Katila, and J. Peräsaari, Multi-Source
Int. Conf. Multimedia, 2004, pp. 188–195. National Forest Inventory: Methods and Applications, vol. 18. Springer,
[52] F. Nack, C. Dorai, and S. Venkatesh, ‘‘Computational media aesthet-
2008.
ics: Finding meaning beautiful,’’ IEEE MultimediaMag., vol. 8, no. 4,
[76] L. Tlig, M. Bouchouicha, M. Tlig, M. Sayadi, and E. Moreau, ‘‘A fast
pp. 10–12, Oct. 2001.
[53] H. Gu, H. Li, L. Yan, Z. Liu, T. Blaschke, and U. Soergel, ‘‘An object- segmentation method for fire forest images based on multiscale transform
based semantic classification method for high resolution remote sensing and PCA,’’ Sensors, vol. 20, no. 22, p. 6429, Nov. 2020.
imagery using ontology,’’ Remote Sens., vol. 9, no. 4, p. 329, 2017. [77] S. M. De Jong and F. D. Van der Meer, Remote Sensing Image Analysis:
[54] A. Baraldi, V. Puzzolo, P. Blonda, L. Bruzzone, and C. Tarantino, ‘‘Auto- Including the Spatial Domain, vol. 5. Springer, 2007.
matic spectral rule-based preliminary mapping of calibrated Landsat TM [78] D. Kaur and Y. Kaur, ‘‘Various image segmentation techniques: A
and ETM+ images,’’ IEEE Trans. Geosci. Remote Sens., vol. 44, no. 9, review,’’ Int. J. Comput. Sci. Mobile Comput., vol. 3, no. 5, pp. 809–814,
pp. 2563–2586, Sep. 2006. 2014.
[55] A. M. Arifjanov, S. B. Akmalov, T. U. Apakhodjaeva, and [79] Y.-J. Zhang, ‘‘An overview of image and video segmentation in the last
D. S. Tojikhodjaeva, ‘‘Comparison of pixel to pixel and object based 40 years,’’ in Advances in Image and Video Segmentation. Dordrecht, The
image analysis using worldview-2 satellite images of vangiobod village Netherlands: 2006, pp. 1–16.
of syndria province,’’ Remote Methods Earth Res. vol. 26, no. 2, [80] T. Lindeberg and M.-X. Li, ‘‘Segmentation and classification of edges
pp. 313–321, 2020. using minimum description length approximation and complemen-
[56] N. Durand, S. Derivaux, G. Forestier, C. Wemmert, P. Gançarski, tary junction cues,’’ Comput. Vis. Image Understand., vol. 67, no. 1,
O. Boussaid, and A. Puissant, ‘‘Ontology-based object recognition for pp. 88–98, Jul. 1997.
remote sensing image interpretation,’’ in Proc. 19th IEEE Int. Conf. Tools [81] S. Yuheng and Y. Hao, ‘‘Image segmentation algorithms overview,’’ 2017,
Artif. Intell. (ICTAI), vol. 1, Oct. 2007, pp. 472–479. arXiv:1707.02051.
[57] S. R. Phinn, C. M. Roelfsema, and P. J. Mumby, ‘‘Multi-scale, object- [82] N. Senthilkumaran and R. Rajesh, ‘‘Image segmentation—A survey of
based image analysis for mapping geomorphic and ecological zones soft computing approaches,’’ in Proc. Int. Conf. Adv. Recent Technol.
on coral reefs,’’ Int. J. Remote Sens., vol. 33, no. 12, pp. 3768–3797, Commun. Comput. Stockholm, Sweden: KTH (Roy. Inst. Technol.),
Jun. 2012. Oct. 2009, pp. 844–846.
[58] B. Gajderowicz, ‘‘Using decision trees for inductively driven semantic [83] M. K. Kundu and S. K. Pal, ‘‘Thresholding for edge detection using
integration and ontology matching,’’ M.S. thesis, Dept. Comput. Sci., human psychovisual phenomena,’’ Pattern Recognit. Lett., vol. 4, no. 6,
Ryerson Univ., Toronto, ON, Canada, 2011. pp. 433–441, 1986.

45314 VOLUME 10, 2022


C. Kwenda et al.: Machine Learning Methods for Forest Image Analysis and Classification

[84] M. R. Khokher, A. Ghafoor, and A. M. Siddiqui, ‘‘Image segmentation [108] M. J. Zimmer-Gembeck and M. Helfand, ‘‘Ten years of longitudi-
using multilevel graph cuts and graph development using fuzzy rule-based nal research on U.S. adolescent sexual behavior: Developmental cor-
system,’’ IET Image Process., vol. 7, no. 3, pp. 201–211, 2013. relates of sexual intercourse, and the importance of age, gender and
[85] T. Blaschke, C. Burnett, and A. Pekkarinen, ‘‘Image segmentation ethnic background,’’ Developmental Rev., vol. 28, no. 2, pp. 153–224,
methods for object-based analysis and classification,’’ in Remote 2008.
Sensing Image Analysis: Including The Spatial Domain. Dordrecht, [109] S. Watanabe, K. Sumi, and T. Ise, ‘‘Identifying the vegetation type in
The Netherlands: Springer, 2004, pp. 211–236. Google Earth images using a convolutional neural network: A case study
[86] T. Lei, X. Jia, Y. Zhang, S. Liu, H. Meng, and A. K. Nandi, for Japanese bamboo forests,’’ BMC Ecol., vol. 20, no. 1, pp. 1–14,
‘‘Superpixel-based fast fuzzy C-means clustering for color image seg- Dec. 2020.
mentation,’’ IEEE Trans. Fuzzy Syst., vol. 27, no. 9, pp. 1753–1766, [110] E. Guirado, S. Tabik, D. Alcaraz-Segura, J. Cabello, and F. Herrera,
Sep. 2019. ‘‘Deep-learning versus OBIA for scattered shrub detection with Google
[87] P. Neubert and P. Protzel, ‘‘Compact watershed and preemptive SLIC: Earth imagery: Ziziphus lotus as case study,’’ Remote Sens., vol. 9, no. 12,
On improving trade-offs of superpixel segmentation algorithms,’’ in Proc. p. 1220, Nov. 2017.
22nd Int. Conf. Pattern Recognit., Aug. 2014, pp. 996–1001. [111] P. P. Ippolito. (2019). Feature Extraction Techniques. Accessed:
[88] X. Yuan, J. Shi, and L. Gu, ‘‘A review of deep learning methods for Apr. 29, 2020. [Online]. Available: https://towardsdatascience.
semantic segmentation of remote sensing imagery,’’ Expert Syst. Appl., com/feature-extraction-techniques-d619b56e31be
vol. 169, May 2021, Art. no. 114417. [112] A. Ghodsi, ‘‘Dimensionality reduction a short tutorial,’’ Ph.D. disserta-
[89] A. Krizhevsky, I. Sutskever, and G. E. Hinton, ‘‘ImageNet classification tion, Dept. Statist. Actuarial Sci., Univ. Waterloo, Waterloo, ON, Canada,
with deep convolutional neural networks,’’ in Proc. Adv. Neural Inf. 2006, vol. 37, no. 38.
Process. Syst., vol. 25, 2012, pp. 1097–1105. [113] B. Ghojogh, M. N. Samad, S. Asif Mashhadi, T. Kapoor, W. Ali, F. Karray,
[90] B. Liu, X. Yu, A. Yu, and G. Wan, ‘‘Deep convolutional recurrent neural and M. Crowley, ‘‘Feature selection and feature extraction in pattern
network with transfer learning for hyperspectral image classification,’’ analysis: A literature review,’’ 2019, arXiv:1905.02845.
J. Appl. Remote Sens., vol. 12, no. 2, 2018, Art. no. 026028. [114] C. Citro, ‘‘rules. In Proc. 20th Int. Conf. very large data bases, VLDB,
[91] L. A. Gatys, A. S. Ecker, and M. Bethge, ‘‘Image style transfer using volume 1215, pages 487–499, 1994.[5] Alfred V. Aho, Ravi Sethi and
convolutional neural networks,’’ in Proc. IEEE Conf. Comput. Vis. Pattern Jeffrey D. Ullman. Compilers: Principles, techniques, and tools. Boston,
Recognit. (CVPR), Jun. 2016, pp. 2414–2423. MA: Addison-Wesley, 1986.[6] Adrian Akmajian, Ann K. Farmer, Lee
[92] O. A. B. Penatti, K. Nogueira, and J. A. dos Santos, ‘‘Do deep features Bickmore, Richard A. Demers and,’’ Learning, vol. 5, no. 1, pp. 71–99,
generalize from everyday objects to remote sensing and aerial scenes 1990.
domains?’’ in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. Work- [115] C. A. Brooks and K. Iagnemma, ‘‘Vibration-based Terrain classification
shops (CVPRW), Jun. 2015, pp. 44–51. for planetary exploration rovers,’’ IEEE Trans. Robot., vol. 21, no. 6,
[93] M. Volpi and V. Ferrari, ‘‘Semantic segmentation of urban scenes by pp. 1185–1191, Dec. 2005.
learning local class interactions,’’ in Proc. IEEE Conf. Comput. Vis. [116] F. Subhan, S. Saleem, H. Bari, W. Z. Khan, S. Hakak, S. Ahmad, and
Pattern Recognit. Workshops (CVPRW), Jun. 2015, pp. 1–9. A. M. El-Sherbeeny, ‘‘Linear discriminant analysis-based dynamic indoor
[94] M. Lin and Q. Yan, ‘‘Network in network,’’ in Proc. Int. Conf. Learn. localization using Bluetooth low energy (BLE),’’ Sustainability, vol. 12,
Represent. (ICLR), 2014, pp. 1–4. no. 24, p. 10627, Dec. 2020.
[95] E. Shelhamer, J. Long, and T. Darrell, ‘‘Fully convolutional networks for [117] Y. Mo, Z. Zhang, Y. Lu, W. Meng, and G. Agha, ‘‘Random forest
semantic segmentation,’’ IEEE Trans. Pattern Anal. Mach. Intell., vol. 39, based coarse locating and KPCA feature extraction for indoor positioning
no. 4, pp. 640–651, 2016. system,’’ Math. Problems Eng., vol. 2014, Oct. 2014, Art. no. 850926.
[96] O. Ronneberger, P. Fischer, and T. Brox, ‘‘U-Net: Convolutional networks [118] L. C. Marsh and D. R. Cormier, Spline Regression Models, no. 137.
for biomedical image segmentation,’’ in Proc. Int. Conf. Med. Image Newbury Park, CA, USA: Sage, 2001.
Comput. Comput.-Assist. Intervent. Springer, 2015, pp. 234–241. [119] L. K. Saul and S. T. Roweis, ‘‘Think globally, fit locally: Unsupervised
[97] J. E. Ball, D. T. Anderson, and C. S. Chan, ‘‘Comprehensive survey of learning of low dimensional manifolds,’’ J. Machine Learn. Res., vol. 4,
deep learning in remote sensing: Theories, tools, and challenges for the pp. 119–155, Jun. 2003.
community,’’ J. Appl. Remote Sens., vol. 11, no. 4, 2017, Art. no. 042609. [120] M. Li, X. Luo, J. Yang, and Y. Sun, ‘‘Applying a locally linear embedding
[98] S. Chilamkurthy. (2017). A 2017 Guide to Semantic Segmentation with algorithm for feature extraction and visualization of MI-EEG,’’ J. Sen-
Deep Learning. [Online]. Available: https://blog.qure.ai/notes/semantic- sors, vol. 2016, Aug. 2016, Art. no. 7481946.
segmentation-deep-learning-review [121] G. Hinton and S. T. Roweis, ‘‘Stochastic neighbor embedding,’’ in Proc.
[99] J. Le. (2017). How to do Semantic Segmentation Using Deep Learn- NIPS, vol. 15, 2002, pp. 833–840.
ing. [Online]. Available: https://nanonets.com/blog/how-to-do-semantic- [122] S. Kullback, Information Theory and Statistics. Chelmsford, MA, USA:
segmentation-using-deep-learning/ Courier Corporation, 1997.
[100] A. Mittal. (2019). Introduction to U-Net and Res-Net for Image Segmenta- [123] L. Van der Maaten and G. Hinton, ‘‘Visualizing data using t-sne,’’
tion. [Online]. Available: https://aditi-mittal.medium.com/introduction- J. Mach. Learn. Res., vol. 9, no. 11, 2008.
to-u-net-and-res-net-for-image-segmentation-9afcb432ee2f [124] D. Gu, Z. Han, and Q. Wu, ‘‘Feature extraction to polar image,’’ J. Com-
[101] S. Kentsch, M. L. Lopez Caceres, D. Serrano, F. Roure, and Y. Diez, put. Commun., vol. 5, no. 11, pp. 16–26, 2017.
‘‘Computer vision and deep learning techniques for the analysis of drone- [125] F. Alamdar and M. Keyvanpour, ‘‘A new color feature extraction method
acquired forest images, a transfer learning study,’’ Remote Sens., vol. 12, based on QuadHistogram,’’ Proc. Environ. Sci., vol. 10, pp. 777–783,
no. 8, p. 1287, Apr. 2020. Jan. 2011.
[102] M. Šulc, D. Mishkin, and J. Matas, ‘‘Very deep residual networks with [126] H. Du and Y. Zhuang, ‘‘Optical remote sensing images feature extraction
maxout for plant identification in the wild,’’ in Proc. Work. Notes CLEF, of forest regions,’’ in Proc. IEEE Int. Conf. Signal, Inf. Data Process.
2016, pp. 1–8. (ICSIDP), Dec. 2019, pp. 1–5.
[103] M. Onishi and T. Ise, ‘‘Automatic classification of trees using a UAV [127] H. Luo, L. Li, H. Zhu, X. Kuai, Z. Zhang, and Y. Liu, ‘‘Land cover
onboard camera and deep learning,’’ 2018, arXiv:1804.10390. extraction from high resolution ZY-3 satellite imagery using ontology-
[104] S. Natesan, C. Armenakis, and U. Vepakomma, ‘‘ResNet-based tree based method,’’ ISPRS Int. J. Geo-Inf., vol. 5, no. 3, p. 31, 2016.
species classification using UAV images,’’ Int. Arch. Photogramm., [128] O. Oke Alice, O. Omidiora Elijah, A. Fakolujo Olaosebikan, S. Falohun
Remote Sens. Spatial Inf. Sci., vol. 42, pp. 475–481, Jun. 2019. Adeleye, and S. Olabiyisi, ‘‘Effect of modified Wiener algorithm on noise
[105] M. Dyrmann, H. Karstoft, and H. S. Midtiby, ‘‘Plant species classifica- models,’’ Int. J. Eng. Technol., vol. 2, no. 8, pp. 1024–1033, 2012.
tion using deep convolutional neural network,’’ Biosyst. Eng., vol. 151, [129] G. Hay and G. Castilla, ‘‘Object-based image analysis: Strengths, weak-
pp. 72–80, Nov. 2016. nesses, opportunities and threats (SWOT),’’ in Proc. 1st Int. Conf. (OBIA),
[106] M. Onishi and T. Ise, ‘‘Automatic classification of trees using a UAV 2006, pp. 4–5.
onboard camera and deep learning,’’ 2018, arXiv:1804.10390. [130] D. C. Duro, S. E. Franklin, and M. G. Dubé, ‘‘A comparison of pixel-
[107] O. Brovkina, E. Cienciala, P. Surový, and P. Janata, ‘‘Unmanned aerial based and object-based image analysis with selected machine learn-
vehicles (UAV) for assessment of qualitative classification of Norway ing algorithms for the classification of agricultural landscapes using
spruce in temperate forest stands,’’ Geo-spatial Inf. Sci., vol. 21, no. 1, SPOT-5 HRG imagery,’’ Remote Sens. Environ., vol. 118, pp. 259–272,
pp. 12–20, Jan. 2018. Mar. 2012.

VOLUME 10, 2022 45315


C. Kwenda et al.: Machine Learning Methods for Forest Image Analysis and Classification

CLOPAS KWENDA received the B.Sc. degree JEAN VINCENT FONOU DOMBEU received the
(Hons.) in computer science from the Bindura Uni- B.Sc. degree (Hons.) in computer science from the
versity of Science Education (BUSE), Zimbabwe, University of Yaoundé I, Cameroonis, the M.Sc.
and the M.Sc. degree in computer science from degree in computer science from the University
the University of Zimbabwe (UZ), Zimbabwe. of KwaZulu-Natal, South Africa, and the Ph.D.
He is currently pursuing the Ph.D. degree with degree in computer science from the North-West
the University of KwaZulu-Natal (UKZN), South University, South Africa. He is a Senior Lecturer
Africa. His research interests include image pro- with the Department of Computer Science, Uni-
cessing, artificial intelligence, machine learning, versity of KwaZulu-Natal (UKZN). His research
deep learning, and ontology building. interests include ontology engineering, semantic
web, and machine learning—specifically, in ontology building, learning,
modularization, ranking, summarization and visualization, artificial intel-
ligence, machine learning and data mining methods for the semantic web,
knowledge representation and reasoning on the web, and knowledge graphs
MANDLENKOSI GWETU received the Ph.D. and deep semantics.
degree in computer science (CS), specializing
in medical image processing, from University
of KwaZulu-Natal (UKZN), South Africa. He is
a Senior Lecturer with UKZN. He is currently
the Academic Leader of CS with UKZN. He is
the Principal Investigator of the UKZN node in the
Erasmus+ funded the Living Laboratories for Cli-
mate Change Multi-National Project and is an
Alumni of the Heidelberg Laureate Forum. His
research interests include deep learning, pattern recognition, and computer
vision.

45316 VOLUME 10, 2022

You might also like