Automatic Detection of Oil Palm Tree From UAV Images Based On The Deep Learning Method
Automatic Detection of Oil Palm Tree From UAV Images Based On The Deep Learning Method
An International Journal
Xinni Liu, Kamarul Hawari Ghazali, Fengrong Han & Izzeldin Ibrahim
Mohamed
To cite this article: Xinni Liu, Kamarul Hawari Ghazali, Fengrong Han & Izzeldin
Ibrahim Mohamed (2021) Automatic Detection of Oil Palm Tree from UAV Images
Based on the Deep Learning Method, Applied Artificial Intelligence, 35:1, 13-24, DOI:
10.1080/08839514.2020.1831226
ABSTRACT
Palm oil is a major contributor to Malaysia’s GDP in the agricul
ture sector. The sheer vastness of oil palm plantations requires
a huge effort to administer. An oil palm plantation in regards to
the irrigation process, fertilization, and planning for planting
new trees require an audit process to correctly count the oil
palm trees. Currently, the audit is done manually using aerial
view images. Therefore, an effective and efficient method is
imperative. This paper proposes a new automatic end-to-end
method based on deep learning (DL) for detection and counting
oil palm trees from images obtained from unmanned aerial
vehicle (UAV) drone. The acquired images were first cropped
and sampled into small size of sub-images, which were divided
into a training set, a validation set, and a testing set. A DL
algorithm based on Faster-RCNN was employed to build the
model, extracts features from the images and identifies the oil
palm trees, and gives information on the respective locations.
The model was then trained and used to detect individual oil
palm tree based on data from the testing set. The overall
accuracy of oil palm tree detection was measured from three
different sites with 97.06%, 96.58%, and 97.79% correct oil palm
detection. The results show that the proposed method is more
effective, accurate detection, and correctly counts the number
of oil palm trees from the UAV images.
Introduction
The most important economic crop in Malaysia and other tropical countries
such as Indonesia, Nigeria, and Thailand are the oil palm trees. It is one of the
main contributors to Malaysia’s Gross Domestic Product (GDP) and foreign
exchange incomes. Palm oil has also become the main source of vegetable oil
due to its high production yield compared to rapeseed, sunflower, and soybean
(Hansen et al. 2015). Malaysia has a total of 4.49 million hectares of oil palm
plantations, which produces approximately 17.73 million tons of palm oil
each year (Nambiappan et al. 2018). Hence, one of the key tasks in plantation
management is to count the number of oil palm trees, also known as the oil
palm tree audit. The process allows the plantation manager to estimate the
CONTACT Xinni Liu [email protected] Faculty of Electrical and Electronics Engineering, Universiti
Malaysia Pahang, Pekan, Malaysia
© 2020 Taylor & Francis
14 X. LIU ET AL.
yield of plantation and plan the weeding and fertilization strategy (Huth et al.
2014). The audit also facilitates the irrigation improvement process, planting
of new trees, and managing old and dead trees (Kattenborn et al. 2014).
Recent studies show that there are three main methods of classification and
detection of oil palm trees which are image processing, machine learning, and
deep learning (DL). Implementation of the image processing method is more
direct in which the algorithm is used to extract the features of an object such as
shape, edge, and color, then perform direct manipulation or transformation of
the pixel values of the images. The challenging part of image processing in
object detection is when the object is crowded in one place, overlapped with
other objects or the background is complex (Sheng et al. 2012). In reference
(Nam et al. 2016), Scale Invariant Feature Transform (SIFT) based on image
processing methods was used to detect oil palm trees. SIFT key points were
extracted together with local binary patterns to analyze the texture of the
overall region from aerial images. The method achieved 91.11% accuracy of
oil palm tree detection. More progressive image processing method has been
used to mask areas of palm oil plantations (Korom et al. 2014). The features of
the oil palm trees were derived using the vegetation indices and kappa
statistics. Optimal thresholding was used to distinguish between a tree and
non-tree region. The method yielded a 77% accuracy of oil palm tree detection.
In reference (Manandhar, Hoegner, and Stilla 2016), the oil palm tree shape
feature was represented by the result of the circular autocorrelation of the
polar shape matrix. The shape feature was extracted using the sliding window
method. An algorithm to detect the local maximum was used to detect the
palm trees in different tough scenarios. The average detection accuracy
obtained was 84%.
For the machine learning methods, hand-crafted features are still needed by
the simple trainable classifier to segment the oil palm tree image as individual
tree crowns and extract the feature before applying a predefined classier for
detection and classification. In reference (Rizeei et al. 2018), object-based
image analysis (OBIA) and support vector machine (SVM) method were
used for oil palm tree counting in Worldview-3 satellite image and range
(LiDAR) airborne imagery. The oil palm tree detection result was 98% on
the testing site, but only satellite and LiDAR airborne imageries were used in
this method. Machine learning methods are highly dependent on hand-crafted
features. When dealing with new data, new hand-crafted features based on the
new data are required, which can be costly. The extraction method is then
repeated with the new features; however, the process proves to be tedious
(Fassnacht et al. 2016).
Presently, DL methods have been widely used in many applications, parti
cularly in image detection and classification (Alliez et al. 2017; Kemker,
Salvaggio, and Kanan 2018; Rezaee et al. 2018). DL is essentially a subset of
machine learning which belongs to the broader family of artificial intelligence.
APPLIED ARTIFICIAL INTELLIGENCE 15
Faster RCNN
The Faster RCNN model consisted of the convolution network, region pro
posal network (RPN), region of interest (ROI) pooling, softmax classification,
and bounding box regression. Figure 3 shows the training process of the
method using the datasets described in Section 2.1. Once the network was
trained, the performance of the network model was tested using the validation
and testing dataset.
For the network training, the labeled training dataset was introduced to the
convolution layer, which will extract the deep features of the image to produce
the feature map of the input image. The feature map was then fed into the RPN
layer. The RPN was trained to yield the region with probable oil palm trees.
The probable region is marked by an orange box which contains the prob
ability score of oil palm trees. The obtained feature map and the proposed
region were fed into the ROI layer to generate proposed regions based on the
previously obtained feature map. The information is then fed to the fully
connected layer and by using the softmax classification and bounding box
regression, the final tree detection was obtained. The final network was
validated and tested.
The loss function during the training contains the classification loss, Lcls and
the region loss, Lreg is shown by Equation (1), where i indicates the bounding
box index, pi is the foreground softmax probability, and p�i is the probability
for ground truth box, ti indicates the predicted bounding box, ti� is the corre
sponding ground truth box for the foreground bounding box. The training
aims to minimize the loss. Ncls indicates the number of class and Nreg indicates
the number of the proposal regions, parameter μ is used to balance the
difference between Ncls and Nreg . The classification loss Lcls was calculated as
the loss after softmax classification, the regression loss Lreg for bounding box
regression was calculated by Smooth L1 loss, �shown as� Equation (2).
Parameter ti� is the ground truth box while ti ¼ tx ; ty ; tw; th indicates the
parameter of the predicted bounding box. The non-maximum suppression
was used to further obtain the proposed region containing the oil palm trees.
1 X � 1 X � �
Lðfpi g; fti gÞ ¼ Lcls pi ; p�i þ μ pi Lreg ti; ti� (1)
Ncls i Nreg i
� X �
Lreg ti ; ti� ¼ SmoothL1 ti ti� (2)
i2fx;y;w;hg
�
0:5X 2 if jXj < 1
SmoothL1ðXÞ ¼ (3)
jXj 0:5otherwise
Experimental Analysis
Dataset
In this research, the aerial view image data were acquired by a drone services
company. The images were of a 22 hectares oil palm tree plantation. The drone
has a band order of red, blue, and green (RGB) and its ground sample distance
(GSD) is 5 cm per pixel. The plantation contains different age groups of trees
between 2 and 16 years old. The oil palm trees are crowded in some areas,
overlapped with some other vegetation, or sparsely distributed in other areas,
as shown in Figure 4. The original image is 7837 × 23608, which was cropped
into 700 sub-images with 600 × 500 pixels; 500 of these images were set as the
training data, 100 images were chosen as the validation dataset and another
100 images as the testing dataset. For the DL network to achieve optimum
accuracy, the training dataset should consist of all possible conditions of the oil
The popular classical VGG16 network trained from ImageNet was used as the
backbone of the convolution network (Karen Simonyan 2014). The weights of
first 1–5 convolutional layer remain unchanged while the weights of the 6th-
7th convolutional layer were converted to the fully connected layers. The
weights were changed accordingly, where 6th full connected layer has 512
kernels of 7 × 7 and 7th full connected layer has 4096 kernels of 1 × 1, the ROI
pooling size is 7, all the dropout layers and the 8th full connected layer were
removed. The parameter configuration during Faster RCNN network training
is shown in Table 1. The RPN network yielded 300 proposal boxes using the
initial parameter values shown in Table 1. The softmax function was used to
classify the proposal feature maps as an object or backgrounds. The bounding
box regression method was used to obtain a more accurate proposal box.
There are two classes in this study, namely the object and the background. The
object is the oil palm tree while the background is the vegetation, empty land,
and boundary.
20 X. LIU ET AL.
(c) Sparely distributed palm tree (d) Boundary with few palm trees
Figure 6. Detection results on four samples. (a) Crowded palm tree, (b) overlapped with vegeta
tions, (c) sparely distributed palm tree, (d) boundary with few palm trees.
APPLIED ARTIFICIAL INTELLIGENCE 21
were not detected, and the confident score four small detected oil palm trees
are 0.70, 0.73, 0.86, and 0.63, respectively, while for the other detected oil palm
trees it is more than 0.9. Figure 6d shows young oil palm trees on the boundary
which were all successfully detected, and the confidence score is more than
0.92. It can be seen that Figure 6b, d have few palm trees, but the background
of Figure 6d is clearer and simpler than 6(b) and 6(c), so all the oil palm trees
are successfully detected in 6(c). Based on the characteristic of the detection
image and the detection result, it is observed that oil palm trees that appear
small in the images and oil palm trees that overlapped with other vegetation
are more difficult to detect.
Accuracy Assessment
Accuracy assessment was done by comparing the detected and the non-
detected tree to the number of actual trees in the image to measure the
precision, recall, and overall accuracy. The process was conducted manually.
The precision, recall, and overall accuracy (OA) are described by Equations
(4)-(6), where TP is the true positive detection result, indicating the total
number of correctly detected oil palm trees, FP is the false positive detection
result, which gives the number of objects incorrectly detected as oil palm trees,
and FN is the false negative detection result, which expresses the number of oil
palm trees that were not detected.
TP
Precision ¼ (4)
TP þ FP
TP
Recall ¼ (5)
TP þ FN
Precision þ Recall
OA ¼ (6)
2
The result of 100 testing dataset is given in Table 2. A total of 2326 oil palm
trees were correctly detected, 56 other vegetation were detected as oil palm
trees, and 60 oil palm trees were not detected. The precision, recall and OA are
97.65%, 97.49%, and 97.57%, respectively.
The performance of the proposed method was tested against the traditional
machine learning methods, i.e. ANN-based (Fadilah et al. 2012) and SVM-
based methods (Dalponte et al. 2015). Three other sites were selected for the
validation purposes; site A where most of the palm trees overlap with other
vegetations, site B where the palm tree in the plantation has small size, and site
C with most of the mature oil palm trees crowded. For the ANN- and SVM-
based methods, the images were cropped into individual oil palm trees with
background and then labeled as the training set which was used to train the
ANN and SVM network to obtain parameters to develop the ANN and SVM
models. The models were then implemented to detect the oil palm trees in the
test images using a sliding window method (Pibre et al. 2017). For the
proposed method, the input images of each site were cropped to 600 × 500
pixels or less than 600 × 500 pixels sub-images. These sub-images were then
fed into the aforementioned pretrained Faster RCNN model. The detection
results for site A, site B, and site C are shown in Tables 3, 4 and 5, respectively.
The detection results of all three sites show a similar pattern; the
proposed method has the highest TP, and the lowest FP and FN com
pared to the results of the ANN- and SVM-based methods. The stark
difference between the FP results of the proposed method compared to
the two other methods is the most interesting result. The FP produced by
the traditional methods is tenfold more compared to the proposed
method. The huge number of FP could have easily contributed to the
low percentage of precision and OA of the ANN- and SVM-based
method. The detection results from all three sites show that the proposed
method has the highest precision number and OA percentage with
a difference of approximately 10% to 17%.
Conclusions
This study proposed a novel approach for detecting oil palm trees from UAV
images. It employed state-of-the-art DL based on the Faster RCNN and image
pre-processing method to detect oil palm trees from drone images of the
plantation. The performance of the proposed method was compared to tradi
tional machine learning-based ANN and SVM methods on three different
sites. The overall accuracies obtained for the proposed method were more than
96%. The proposed method outperforms the other two machine learning
methods in terms of precision and overall accuracy. The results also showed
that the proposed method was effective and robust in the detection of oil palm
trees in various plantation conditions. The object detection was unsuccessful
when the size of the oil palm trees in the images was too small, there are
overlapping background and low resolution of the object. Nonetheless, based
on the model pretrained in this paper, it takes 1.5 hour to detect and count the
22 hectares oil palm tree plantation in this study, the results were promising
which suggests that it has the potential to be used in practical applications.
Funding
This work was supported by the University Malaysia Pahang, under Grant [number
PGRS1903142].
References
Dalponte, M., L. T. Ene, M. Marconcini, T. Gobakken, and E. Næsset. 2015. Semi-supervised
SVM for individual tree crown species classification. ISPRS Journal of Photogrammetry and
Remote Sensing 110:77–87. doi:10.1016/j.isprsjprs.2015.10.010.
Fadilah, N., J. M. Saleh, H. Ibrahim, and Z. A. Halim. 2012. Oil palm fresh fruit bunch ripeness
classification using artificial neural network. Paper presented at the 2012 4th International
Conference on Intelligent and Advanced Systems (ICIAS2012), Kuala Lumpur, Malaysia,
June 12–14.
Fassnacht, F. E., H. Latifi, K. Stereńczak, A. Modzelewska, M. Lefsky, L. T. Waser, and
A. Ghosh. 2016. Review of studies on tree species classification from remotely sensed data.
Remote Sensing of Environment 186:64–87. doi:10.1016/j.rse.2016.08.013.
Guo, Y., Y. Liu, A. Oerlemans, S. Lao, S. Wu, and M. S. Lew. 2016. Deep learning for visual
understanding: A review. Neurocomputing 187:27–48. doi:10.1016/j.neucom.2015.09.116.
Hansen, S. B., R. Padfield, K. Syayuti, S. Evers, Z. Zakariah, and S. Mastura. 2015. Trends in
global palm oil sustainability research. Journal of Cleaner Production 100:140–49.
doi:10.1016/j.jclepro.2015.03.051.
Huth, N. I., M. Banabas, P. N. Nelson, and M. Webb. 2014. Development of an oil palm
cropping systems model: Lessons learned and future directions. Environmental Modelling
and Software 62:411–19. doi:10.1016/j.envsoft.2014.06.021.
Kattenborn, T., M. Sperlich, K. Bataua, and B. Koch. 2014. Automatic single tree detection in
plantations using UAV-based photogrammetric point clouds. The International Archives of
24 X. LIU ET AL.