Volume 8, Issue 8, August 2023 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165
Environmental Exploration and Monitoring
of Vegetation Cover using Deep Convolutional
Neural Network in Gombe State
Okere Chidiebere Emmanuel, Abdulrauf Abdulrasheed
Department of Computer Science
Federal Polytechnic Kaltungo
Gombe state, Nigeria
Mustapha Abdulrahman Lawal Ismail Zahraddeen Yakubu
Department Of Mathematical Sciences Department of Computing Technologies
Abubakar Tafawa Balewa University SRM Institute of Science and Technology
Bauchi, Nigeria Kattankulathur, Chennai, India, 603203
Abstract:- In the previous years, human assessments of information about the distribution of various land cover
satellite data were possible because to the relatively categories inside an image.
modest volume of images accessible; however, this is no
longer the case. Additionally, traditional software like Due to the rapid advancement of remote sensing (RS)
ARGIS, EDARS, ILWIS, and other time-consuming and technology, high-resolution remote sensing satellites (like
ineffective tools are heavily used by environmental IKONOS, SPOT-5, World View, and Quick Bird) produce
organizations in Nigeria, such as the National Centre for imagery that is more abundant in information than low-
Remote Sensing. With today's large number of data, resolution remote sensing imagery, making it easier to
relevant information extraction from photos thus extract features and identify ground objects. Today, it is
becomes a challenge. Differentiating between classes possible to detect a wide variety of artificial things that were
with comparable visual qualities is another problem, as previously challenging to identify [1]. Many applications
seen when attempting to categorize a green pixel as use semantic segmentation of remote sensing imagery,
grass, shrubs, or a tree. However, as seen in other which has been a major area of study for decades. In
computer vision fields, machine learning approaches computer vision (CV) and remote sensing, semantic
have shown to be a strong answer in this case. We segmentation has received a lot of attention, relying on
proposed the introduction of a novel three-dimensional shallow features manually generated by experts [2].
based architecture that is specialized to multispectral Therefore, a framework that works for one task may not
pictures and addresses the majority of the challenging work for another depending on the circumstances, resulting
features of Deep Learning for Remote Sensing as a in a costly and lengthy rewrite of the entire feature
solution to the issues raised. This work's major goal is to extraction process. These drawbacks prompted academics in
create a deeper CNN architecture (U-Net) model that is the field to hunt for a more reliable and successful strategy
efficient for semantically segmenting remote sensing [3]. The deep learning research community has reached a
imagery with additional multi-spectral feature classes. state of the art in automating visual labeling tasks.
In comparison to traditional remote sensing software,
Researchers have worked hard to apply deep learning
our innovative 3D CNN architecture can analyze the
spatial and spectral components simultaneously with techniques' improved performance to the study of remote
true 3D convolutions. our results in better, faster, and sensing image processing in light of their success in the
more efficient segmentation. field of computer vision [1]. In the last ten years, deep
learning techniques have proven to perform significantly
Keywords:- Computer Vision, Deep Learning, Semantic better in many common computer vision tasks, such as
Segmentation of Satellite Imagery. semantic segmentation and object categorization. Deep
learning approaches are more suited to handle complex
I. INTRODUCTION situations since they automatically produce features that are
suitable for specific classification tasks. The enormous
The process of classifying each pixel in an image into success of deep learning techniques in other fields
one of several predetermined classes or categories is known encouraged their adoption and expansion for use in remote
as semantic segmentation of remote sensing imagery. This sensing problems [1]. Despite decades of work, the
method is frequently used to interpret and understand literature review [1] reveals the following significant issues
satellite or aerial pictures of natural habitats, urban that call for research and the creation of fresh approaches:
environments, agricultural fields, and landscapes. Semantic (3) a shortage of training examples, (4) the need for pixel-
segmentation is an essential technique for applications like level accuracy, and (5) the processing of unusual data. This
land use classification, environmental monitoring, disaster creates a lot of space for more research to address the
assessment, and urban planning since it gives precise problems highlighted above.
IJISRT23AUG2006 www.ijisrt.com 2423
Volume 8, Issue 8, August 2023 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165
Because there were very few photos accessible in the In order to detect water in satellite images for flood
last ten years, human assessments of satellite data were assessment, Jony, Woodley [7] uses an ensemble classifier.
practical. This is no longer the case. Additionally, traditional After evaluating this method against Mediaeval 2017, it was
software like ARGIS, EDARS, ILWIS, and other time- discovered that this method can produce good classification
consuming and ineffective tools are heavily used by accuracy for both a seen location when bands are used and
environmental organizations in Nigeria, such as the National an unseen location when NDWI is used. The study's biggest
Centre for Remote Sensing. With today's large number of flaw, though, was that it produced poorer results on a hidden
data, relevant information extraction from photos thus site.
becomes a challenge. Differentiating between classes with
comparable visual qualities is another problem, as seen Furthermore, [8] put forth a brand-new convolutional
when attempting to categorize a green pixel as grass, shrubs, neural network (CNN) to categorize snow and clouds at the
or a tree. However, as seen in other computer vision fields, object level. In particular, a novel CNN structure that can
machine learning approaches have shown to be a strong learn multiscale semantic aspects of clouds and snow from
answer in this case We proposed the introduction of a novel high-resolution multispectral data is described. The author
three-dimensional-based architecture that is specialized to extends a straightforward linear iterative clustering
multispectral pictures and addresses the majority of the approach for segmenting high-resolution multispectral
challenging features of Deep Learning for Remote Sensing pictures and producing super pixels to address the problem
as a solution to the issues raised. This work's major goal is of "salt-and-pepper" in pixel-level predictions. Results
to create a deeper CNN architecture (U-Net) model that is showed that the new proposed method performs better than
efficient for semantically segmenting remote sensing the previous methods in terms of accuracy and robustness in
imagery with additional multi-spectral feature classes. In separating snow and clouds in high-resolution images. The
comparison to traditional remote sensing software, our study, however, fails to apply the suggested convolutional
innovative 3D CNN architecture can analyze the spatial and neural network-based techniques to a different objective in
spectral components simultaneously with actual 3D the realm of remote sensing, such as urban water extraction
convolutions. our results in better, faster, and more efficient and ship detection.
segmentation.
By proposing an end-to-end segmentation model that
II. RELATED WORK combines convolution and pooling operations and is capable
of learning global relationships between object classes more
In this study, semantic segmentation and pixel-wise effectively than conventional classification methods, the
classification are used interchangeably. The term "semantic study [9] illustrated the utility of FCN architectures for the
segmentation" is becoming frequently used in computer semantic segmentation of remote sensing MSI. The outcome
vision, and it is also being used more frequently in remote shown that superior classification performance was
sensing. Modern convolution and segmentation sub- achieved for fourteen of the eighteen classes in RIT-9 using
networks are used in end-to-end trained semantic an end-to-end semantic segmentation approach. However,
segmentation frameworks for RGB imagery. Due to its the outcome can be enhanced by investigating more intricate
superior performance compared to that of conventional ResNet and U-Net models. Additionally, adding more
learning algorithms, deep learning (DL) has recently different classes to the synthetic data should help the
emerged as the big data analysis trend that is expanding the creation of frameworks that are more discriminative and
fastest. It has been widely and successfully applied to a produce better results.
variety of computer application fields, including speech
recognition, natural language processing, sequential data, Remote sensing has previously shown to be a very
and image classification [4]. Machine learning techniques useful and effective technology for mapping slums. The
are gaining importance as science shifts its focus to data- work in [10] examines how well FCNs can learn from slum
intensive research. In particular, deep learning has made mapping in various satellite pictures. Sentinel-2 and
significant advancements in the sector. TerraSAR-X data are modeled using QuickBird's extremely
high-resolution optical satellite images. Although free
For the mapping of trees, shade, buildings, and roads, Sentinel-2 data is freely available, slum mapping is a
Praveena and Singh [5] presented a hybrid clustering difficult task due to its considerably poor resolution. On the
technique and feed-forward neural network classifier. The other hand, TerraSAR-X data is thought to be an effective
suggested method outperformed every other existing data source for intra-urban structure study because it has a
algorithm used as a benchmark. However, the outcomes greater resolution However, transferring the model did not
show that Moving KFCM outperforms the currently used increase the performance of semantic segmentation due to
algorithms for classifying shade regions. The performance the different image features of SAR compared to optical
of a powerful deep neural network was also proposed by [6] data, yet we detected extremely high accuracies for mapped
and when compared to SIFT, SURF, SAR-SIFT, and PSO- slums in the optical data.
SIFT, the results of the experiments reveal that the use of
transfer learning increases accuracy and lowers training
costs. But the main drawback of this research is that distinct
source photos do not share a common feature representation.
IJISRT23AUG2006 www.ijisrt.com 2424
Volume 8, Issue 8, August 2023 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165
To overcome the performance issues with VHR input, they are unable to use the context of subsequent slices
picture semantic segmentation. A Superpixel-enhanced by default. For the prediction of segmentation maps, voxel
Deep Neural Forest (SDNF) was suggested by [11]. In order information from neighboring slices may be helpful. This
to balance the classification capacity and representation problem is addressed by 3D CNNs, which use 3D
learning capability of DCNNs, a fully differentiable forest is convolutional kernels to predict segmentation for a
implemented to take control of deep convolutional layer volumetric area of an image. Performance can be enhanced
representation learning. It is also suggested to use a Super by using interslice context effectively [14]. In this study
pixel-enhanced Region Module (SRM) to reduce work, semantic segmentation of a multispectral picture with
classification noise and improve the boundaries of ground seven channels and more extra feature classes was
objects. The ISPRS 2D labeling standard is used to gauge performed using modern end-to-end DCNN segmentation
SDNF's effectiveness. Results from the experiments show frameworks via a 3D U-Net convolutional neural network. It
that our technique achieves new state-of-the-art is believed that these methods will contribute to the creation
performance. of frameworks that are more discriminating and produce
greater performance.
Similar to this, [3] suggested an FCN-based model to
implement pixel-wise classifications for remote sensing III. MATERIALS AND METHOD
images in an end-to-end manner. Additionally, [3] proposed
an adaptive threshold approach to alter the threshold of the In contrast to the methodologies previously discussed,
Jaccard index in each class. The suggested method generates the innovative 3D CNN architecture proposed in this study
accurate classifications in a comprehensive manner, simultaneously processes the spatial and spectral
according to tests on the DSTL dataset. Results indicate that components with genuine 3D convolutions, resulting in
the adaptive threshold approach can improve segmentation better use of the limited sample sets with fewer trainable
accuracy by raising the average Jaccard index score from parameters. According to this idea, the issue can be broken
0.614 to 0.636. The model's training under weak down into processing a number of volumetric
supervision, which would increase its applicability, is the representations of the image. As a result, each pixel is
research's main inadequacy. connected to n spatial neighborhoods and f spectral bands.
Each pixel is therefore handled as an n×n×f volume. This
A weed/crop segmentation network that was recently architecture's key idea is to combine the conventional CNN
proposed in 2020 [12] delivers higher performance for network with a modification that applies 3D convolution
accurately recognizing the weed with arbitrary shape in operations rather than 1D convolution operators that only
challenging environmental conditions and offers excellent examine the spectral content of the data To achieve deep
support for autonomous robots to successfully minimize the accurate representations of the image, various CNN layer
density of weeds. By incorporating four extra components blocks are stacked on top of one another. To start, a series of
and testing the network's performance on the two difficult layers based on 3D convolution is introduced to deal with
Stuttgart and Bonn datasets, the deep neural network the three-dimensional input voxels. Each of these layers
(DNN)-based segmentation model achieves sustained consists of a number of volumetric kernels that perform
improvements. The two dataset’s cutting-edge performance convolutions on the input's width, height, and depth axes all
demonstrates that each additional component has a at once. A series of Fully Connected layers and a set of 1 1
significant potential to improve segmentation accuracy. The convolution (1D) layers that ignore the spatial neighborhood
research does not, however, model the network to learn follow such a stack of 3D convolutions. The suggested
more correlated spatial information or take advantage of the architecture essentially takes 3D voxels as input data and
domain expertise in weed detection. creates 3D feature maps first, which are then gradually
reduced into 1D feature vectors across the layers. By
Due to their excellent spatial resolution and selecting particular strides and padding configurations for
adaptability in picture capture, drones have recently the convolution filter, this process is ensured.
revolutionized the mapping of wetlands. As a result, [13]
suggested utilizing image segmentation to map the A. Proposed Framework
vegetation in wetlands. To do this, ML and DL algorithms There are altogether 19069955 parameters in the
were put to the test on a collection of drone photographs architecture. By expanding the number of channels even
taken of Ireland's Clara Bog, an elevated bog. Overall, the before max pooling, we can prevent bottlenecks. This plan
DL approaches' accuracy was around 4% better than the ML is used by us in the synthesis path as well. A N x N x F
techniques. Furthermore, the DL technique does not need voxel tile of the image with 7 channels serves as the
any color adjustments or additional textural qualities. network's input. We have N x N x F voxels in the final
However, DL needs a lot of initial labeled training data— layer, arranged in the x, y, and z directions, respectively.
roughly 48 x 106 pixels—to get started. The approximate receptive field for each voxel in the
expected segmentation is 155x155x180xm3 with a voxel
This analysis has revealed that almost all of the CNN size of 1:76x1:76x2:04xm3. As a result, each output voxel
designs currently in use for the semantic segmentation of has access to enough information for effective learning.
multispectral pictures are based on 1D or 2D architectures.
The context throughout the height and width of the slice can
be used by the 2D convolutional kernels to create
predictions. However, as 2D CNNs only accept one slice as
IJISRT23AUG2006 www.ijisrt.com 2425
Volume 8, Issue 8, August 2023 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165
Prior to each ReLU, we also introduce batch C. Data Source and Choice of Metrics
normalization (BN). During training, each batch is In this study, the network is trained using a high-
normalized using its mean and standard deviation, and using resolution multispectral data set. A drone was used to take
these values, global statistics are updated. A layer that the image collection over the research area in Gombe
explicitly teaches scale and bias is added after that. These Metropolis. There are nine object class labels on the
computed global statistics, along with the scale and bias that data's training, validation, and test sets. The data file is 3.0
have been learned, are used to normalize data at test time. GB in size. In this study, accuracy is used to assess how
However, we only have one batch and a small number of well the proposed model performs. This performance
samples. Utilizing the most recent statistics at test time is metric, which deals with the model's correct predictions, is
ideal in these applications. The weighted SoftMax loss expressed as:
function is a crucial component of the architecture that
enables us to train on sparse annotations. It is possible to Accuracy = (1)
learn from only the named pixels and, therefore, generalize
by setting the weights of the unlabeled pixels to zero.
Precision = (2)
B. Model Training
In this study, a different 3D U-Net network will be used
Recall = (3)
to produce the 3D U-Net Layer. The initial set of
convolutional layers in U-Net is broken up by max-pooling IV. IMPELEMTATION OF THE SYSTEM
layers, gradually lowering the input image's resolution.
These layers are preceded by a string of convolutional layers MATLAB was used for our experiment. This study
with up-sampling operators interleaved, gradually raising trains the network using a high-resolution multispectral data
the input image's resolution. The network can be drawn with collection and then implements the suggested model. Over
a symmetric shape similar to the letter U, hence the name U- the study area in the Gombe metropolis, a drone was used to
Net. In order to make the input and output of the take the image collection. As depicted in Fig. 1, the data
convolutions equal in size, this research alters the U-Net to includes labeled training, validation, and test sets with 9-
use zero-padding in the convolutions. So, using a few pre- item class labels. The data file is 3.0 GB in size.
selected hyper-parameters, we build a U-Net.
The experiment was carried out using MATLAB
2021a, a 64-bit version of Microsoft Windows 10, 8GB of
RAM, and an Intel(R) Core(TM) i7-4000M @ 2.4 GHz
processor. This study trains the network using high-
resolution multispectral data in order to execute the
suggested model.
Fig. 1: Multispectral image with 9 object class labels
The multispectral picture data is organized as arrays used. The third, second, and first picture channels in Figure
with dimensions of numChannels-by-width-by-height. to 2 are the RGB color channels. given the color element of the
modify the data so that the third-dimensional channels are training, validation, and test images as a montage.
IJISRT23AUG2006 www.ijisrt.com 2426
Volume 8, Issue 8, August 2023 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165
Fig. 2: 3 RGB component of training image (left), validation image (center) and test image (right)
Fig. 3: IR channel 1 (left), 2 (center) and test image (right)
Fig. 4: Mask of training image (left), validation image (center) and test image (right)
The segmentation's ground truth data is contained in the labeled images, where each pixel has been allocated to one of the
classes. The 9 classes and their IDs are listed in Table 1.
Table 1: Image Classes and IDs for THE DATASETS
IDs Class Name
0. Other Class/Image Border
1. Road
2. Vehicle (Car, Truck, or Bus)
3. Rocks
4. Settlement
5. Dense Vegetation
6. Sparse vegetation
7. Water Body
8. Farm land
9. Bare surface
IJISRT23AUG2006 www.ijisrt.com 2427
Volume 8, Issue 8, August 2023 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165
Fig. 5: Training labels from the datasets
By dividing the total number of vegetation pixels by image. Table 2 shows the settings for the hyper parameters
the total number of valid pixels, this work's final objective is and training options.
to determine the extent of plant cover in the multispectral
Table 2: Image Classes and IDs for THE DATASETS
Parameters Settings
Initial Learning Rate 0.05
Max Epochs 150
Mini batch Size 16
l2reg 0.0001
Momentum 0.9
Learn Rate Schedule Piecewise
Shuffle every-epoch
Gradient Threshold Method l2norm
Gradient Threshold 0.05
Verbose Frequency 20
As a result, we can now semantically segment the A. Results
multispectral image using the constructed 3D-U-Net. The Segmentation of image patches is carried out utilizing the
section above presents the prediction outcomes in terms of semantics function during the forward pass on the trained
accuracy, precision, and recall. network. Figure 6 displays a representative sample of the
segmented image.
Fig. 6: Segmented Image.
The baseline reality labels and the segmented image superimposing the segmented image over the RGB
are both stored as PNG files. As illustrated in Fig. 7, they validation image with an equalized histogram.
were utilized to calculate accuracy measures by
IJISRT23AUG2006 www.ijisrt.com 2428
Volume 8, Issue 8, August 2023 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165
Fig. 7: label Validation Image
A training accuracy assessment was conducted after As shown in the segmented image, the overall plant
100 iterations of the model, as shown in Table 3. A 90.698% coverage for the present study is 24.42%.
accuracy was achieved after training. Generally, the better
the model, the higher the accuracy of categorization. B. Validation
In this section, we evaluate the performance of the
By dividing the total number of vegetation pixels by proposed model against other classification framework using
the total number of valid pixels, this study aims to determine the collected datasets. Table 3 depicts the mean-class
how much vegetation is present in the multispectral image. accuracy (AA) on the test set compare with other existing
approaches.
Table 3: Performance comparison against classical approaches
Model Mean Accuracy (%) Precision Recall
Proposed 91 95 97
MLP 31 41 50
KNN 28 32 29
SVM 30 39 31
Sharp Mask 58 68 61
Refine-Net 60 59 66
Table 3 makes it very clear that in terms of accuracy, of how the difference in performance trend continues, this
precision, and recall, the proposed deep learning algorithms conclusion is further illustrated in Fig. 8 in a graphical
surpass the traditional technique. For a clear understanding depiction.
Fig. 8: Performance Comparison with classical approaches
IJISRT23AUG2006 www.ijisrt.com 2429
Volume 8, Issue 8, August 2023 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165
V. CONCLUSION [7.] Jony, R.I., et al. Ensemble Classification Technique
for Water Detection in Satellite Images. in 2018
As seen in the in-classification task, a precision and Digital Image Computing: Techniques and
recall value within the range of 91% – 100% can be Applications (DICTA). 2018. IEEE.
concluded as an excellent performance that can support the [8.] Wang, L., et al., Object-based convolutional neural
accuracy score by a given classification model. Thus, From networks for cloud and snow detection in high-
table 5. It can be notices that the proposed model achieved resolution multispectral imagers. Water, 2018.
precision value of 95% and recall value of 97%. generally, 10(11): p. 1666.
we can conclude that the proposed model has significantly [9.] Kemker, R., C. Salvaggio, and C. Kanan, Algorithms
improve the classification performance other existing for semantic segmentation of multispectral remote
approaches used in this context. This study has sensing imagery using deep learning. ISPRS journal
demonstrated that semantic segmentation of a multispectral of photogrammetry and remote sensing, 2018. 145: p.
image with seven channels can be performed using newer 60-77.
end-to-end DCNN segmentation frameworks via a 3D U-Net [10.] Wurm, M., et al., Semantic segmentation of slums in
convolutional neural network, which can address the satellite images using transfer learning on fully
majority of the difficulty in DL for RS aspects. It is thought convolutional neural networks. ISPRS journal of
that these methods can assist in the creation of frameworks photogrammetry and remote sensing, 2019. 150: p.
with greater discrimination and better performance. In 59-69.
contrast to other research, this study also determines the [11.] Mi, L. and Z. Chen, Superpixel-enhanced deep neural
segmented image's vegetation cover. The findings of this forest for remote sensing image semantic
study demonstrate that this upgraded model has greater segmentation. ISPRS Journal of Photogrammetry and
potential for image processing in the context of remote Remote Sensing, 2020. 159: p. 140-152.
sensing and the enhancement of satellite images. [12.] You, J., W. Liu, and J. Lee, A DNN-based semantic
segmentation for detecting weed and crop. Computers
ACKNOWLEDGMENT and Electronics in Agriculture, 2020. 178: p. 105750.
This study was supported by the Tertiary Education [13.] Bhatnagar, S., L. Gill, and B. Ghosh, Drone image
Trust Fund (TET Fund) Institutional Based Research (IBR) segmentation using machine and deep learning for
Fund for Federal Polytechnic Kaltungo, mapping raised bog vegetation communities. Remote
Gombe state. Nigeria. Sensing, 2020. 12(16): p. 2602.
[14.] Hamida, A.B., et al. Deep learning approach for
REFERENCES remote sensing image analysis. in Big Data from
Space (BiDS'16). 2016. Publications Office of the
[1.] Yuan, X., J. Shi, and L. Gu, A Review of Deep European Union.
Learning Methods for Semantic Segmentation of
Remote Sensing Imagery. Expert Systems with
Applications, 2020: p. 114417.
[2.] Girshick, R., et al. Rich feature hierarchies for
accurate object detection and semantic segmentation.
in Proceedings of the IEEE conference on computer
vision and pattern recognition. 2014.
[3.] Wu, Z., et al., Semantic segmentation of high-
resolution remote sensing images using fully
convolutional network with adaptive threshold.
Connection Science, 2019. 31(2): p. 169-184.
[4.] Abdel-Hamid, O., et al. Applying convolutional
neural networks concepts to hybrid NN-HMM model
for speech recognition. in 2012 IEEE international
conference on Acoustics, speech and signal
processing (ICASSP). 2012. IEEE.
[5.] Praveena, S. and S. Singh. Hybrid clusteing algorithm
and Neural Network classifier for satellite image
classification. in 2015 International Conference on
Industrial Instrumentation and Control (ICIC). 2015.
IEEE.
[6.] Shuang, W., et al., A deep learning framework for
remote sensing image registration. ISPRS Journal of
Photogrammetry and Remote Sensing, 2018. 145(1):
p. 148-164.
IJISRT23AUG2006 www.ijisrt.com 2430