0% found this document useful (0 votes)

36 views7 pages

Implementation of Deep Neural Networks Learning On Unmanned Aerial Vehicle Based Remote-Sensing

Due to efficient and adaptable data collecting, unmanned aerial vehicle (UAV) has been a popular topic in computer vision (CV) and remote sensing (RS) in recent years. Inspiring by the recent success of deep learning (DL), several enhanced object identification and tracking methods have been broadly applied to a variety of UAV-related applications, including environmental monitoring, precision agriculture, and traffic management. In this research, we present efficient neural network (ENet), a unique deep neural network architecture designed exclusively for jobs demanding low latency operation. ENet is up to quicker, takes fewer floating-point operations per second (FLOPs), has fewer parameters, and offers accuracy comparable to or superior to that of previous models. We have tested it on the street and cityscapes reports on comparisons with current state-of-the-art approaches and the tradeoffs between a network's processing speed and accuracy. We give measurements of the proposed architecture's performance on embedded devices and offer software enhancements that might make ENet even quicker.

Uploaded by

IAES IJAI

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

36 views7 pages

Implementation of Deep Neural Networks Learning On Unmanned Aerial Vehicle Based Remote-Sensing

Uploaded by

IAES IJAI

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 7

IAES International Journal of Artificial Intelligence (IJ-AI)

Vol. 13, No. 1, March 2024, pp. 941~947

ISSN: 2252-8938, DOI: 10.11591/ijai.v13.i2.pp941-947  941

Implementation of deep neural networks learning on unmanned

aerial vehicle based remote-sensing

Shouket Abdulrahman Ahmed1, Hazry Desa1, Abadal-Salam T. Hussain1,2, Taha A. Taha3

1
Centre of Excellence for Unmanned Aerial Systems (COEUAS), Universiti Malaysia Perlis, Perlis, Malaysia
2
Department of Medical Instrumentation Engineering Techniques, Alkitab University, Kirkuk, Iraq
3
Unit of Renewable Energy, Northern Technical University, Kirkuk, Iraq

Article Info ABSTRACT

Article history: Due to efficient and adaptable data collecting, unmanned aerial vehicle (UAV)
has been a popular topic in computer vision (CV) and remote sensing (RS) in
Received Dec 20, 2022 recent years. Inspiring by the recent success of deep learning (DL), several
Revised Jan 13, 2023 enhanced object identification and tracking methods have been broadly
Accepted Mar 10, 2023 applied to a variety of UAV-related applications, including environmental
monitoring, precision agriculture, and traffic management. In this research,
we present efficient neural network (ENet), a unique deep neural network
Keywords: architecture designed exclusively for jobs demanding low latency operation.
ENet is up to quicker, takes fewer floating-point operations per second
Deep learning (FLOPs), has fewer parameters, and offers accuracy comparable to or superior
ENet to that of previous models. We have tested it on the street and cityscapes
Neural network reports on comparisons with current state-of-the-art approaches and the
Remote sensing tradeoffs between a network's processing speed and accuracy. We give
Unmanned aerial vehicle measurements of the proposed architecture's performance on embedded
devices and offer software enhancements that might make ENet even quicker.
This is an open access article under the CC BY-SA license.

Corresponding Author:
Shouket Abdulrahman Ahmed
Centre of Excellence for Unmanned Aerial Systems (COEUAS), Universiti Malaysia Perlis
Kangar-Alor Setar Street, 01000 Kangar, Perlis, Malaysia
Email: [email protected]

1. INTRODUCTION
Technological advancements have driven recent interest in home automation devices, augmented reality
wearables, and self-driving cars and these have advanced the need for real-time algorithms that can understand
visual scenes on low-power mobile devices. These algorithms are called semantic-segmentation algorithms. This
need has created a significant gap in the market. These algorithms give every pixel in the picture a class label so
that it can be put into the right group. Deep convolutional neural networks (CNNs) [1]–[4] have been able to
outperform many standard computer vision methods [5]–[7] in recent times owing to the more access to big
datasets and advancements in computational capabilities. When used to the pixel-by-pixel labelling of pictures,
CNNs provide results with a poor spatial resolution, even though they are becoming more effective at
classification and categorization tasks. Because of this, they are often combined with other algorithms in a process
called "cascading" to make the results more accurate. Some examples of these other algorithms are conditional
random fields and colour-based segmentation [8], [9].
To spatially identify and finely segment pictures, numerous different designs for neural networks have
been developed, such as fully convolutional networks and SegNet [10]–[12]. All these researches are based on an
architecture called visual geometry group-16 (VGG16) [13], which is a big model made for categorizing things
into more than one class. The networks that these sources suggest have many parameters and take a long time to

Journal homepage: http://ijai.iaescore.com

942  ISSN: 2252-8938

figure out. Because of this, they can't be used for many mobile or battery-powered applications that need to process
pictures at a processing rate of >10 frames/sec.
In this research, a novel design for neural networks that is aimed at achieving high accuracy while
maintaining a high rate of inference speed is proposed. In our research, we decided against implementing any
post-processing stages, which, although they are compatible with our methodology and may be used together,
would reduce the effectiveness of an end-to-end CNN method. Section 3 describes an encoder-decoder
architecture called efficient neural network (ENet); it is both quick and space-efficient. It was made with the help
of new guidelines and ideas from the research literature discussed in section 4. The performance of the suggested
network was evaluated on street and Cityscapes for real-time driving test-cases while the environment on an
interior environment was tested using the scene understanding (SUN) dataset [14]–[16]. The findings are
presented in section 5 of the report.

2. RELATED WORK
Understanding the content of pictures and locating the items of interest both need semantic
segmentation as a necessary step. This method is of the highest significance in practical contexts like augmented
reality and driving assistance systems. In addition, they are required to operate in real time, which highlights
the need to give thorough consideration to the design of CNNs. Deep neural networks (NNs) are among the
common ways of doing different tasks, such as semantic segmentation. These networks are now one of the
most widely used techniques in modern computer vision applications. Because this study provides a novel
neural network design, our goal is to compare it to previously published works that carry out the vast majority
of inferences in the same manner.
Encoders and decoders, both distinct forms of neural network architecture, are integrated in CNNs to
produce the final product. The probabilistic auto-encoders [17], [18] are the basis for the for the design of the
encoder-decoder network first described in SegNet-basic [10] and extended in SegNet [11]. Being a standard
CNN (just like VGG16 [13]), the encoder is trained to perform input classification while the decoder is a
separate network that is used for upsampling the encoder's output [12], [19]–[22]. However, because of their
complex topologies and the vast number of parameters, these networks are sluggish during the inference
process. The fully connected VGG16 layers, in contrast to fully convolutional networks (FCN) [12], were not
incorporated in the latest SegNet iteration; the aim is to reduce the amount of floating-point operations and the
networks memory utilization, thereby increasing its compactness. However, they are only capable of
functioning in real time.
Other current designs make use of more straightforward classifiers and then, as a post-processing step,
cascade these classifiers with a conditional random field (CRF) [9], [23]. These methods need laborious post-
processing processes, as shown in [11], and they mostly fail in the identification of classes that occupy a smaller
number of pixels in a frame. There is more evidence of these shortcomings in [11]. CNNs can also be added to
recurrent neural networks [20] to make them more accurate, but this will slow down the network’s speeds.
Also, it's important to remember that recurrent neural network (RNN), which is incorporated a post-processing
step, is compatible with any other approach, including the described technique in this research.

3. NETWORK ARCHITECTURE
The ENet architecture suggests potential boundary boxes in images using local suggestion techniques.
The boundary boxes are then further categorised, processed, and identical ones are eliminated. Finally, the
boundary boxes are reassessed in light of the final set of nearby items. This procedure needs to be used in
various places and contexts. Areas with a higher score are categorised as obstacles. This method is done a
number of times prior to the specified detection threshold is attained. This approach is accurate and extensively
used, but it requires a lot of computing power and is difficult to tune and duplicate to the point where it can be
used for fully unmanned aerial vehicle (UAV) functions [24]. On the other hand, the semantic segmentation
method utilising OpenCV uses a separate neural network to divide an image into sections, evaluate each sector,
and determine its boundary boxes. These boundary boxes are categorised based on the results of their
evaluations. Using this approach has the advantage that the full image is processed and, although being divided
into portions, is still analysed [25].
With the OpenCV, semantic segmentation aims to recognise things as tensor regression expressions.
The procedure begins with the algorithm receiving an image as input, where the images' parameters must follow
the configuration (nxmx3, where 3 stands for the number of colour channels). The 512×512×3 was utilised in
all the tests listed in Table 1 because the initial readings from the trials revealed that the size of the ideal image
is 512×512. A consistently sized grid, abbreviated as (SS), is layered over the image after it has been reduced
in size. The steps of detection, categorization, and decision-making are shown in Figure 1 [26].

Int J Artif Intell, Vol. 13, No. 1, March 2024: 941-947

Int J Artif Intell ISSN: 2252-8938  943

Table 1. The output sizes are in the input (512 of 512) for E-Net architecture [26]
Name Type Output size
Initial 16×256×256
Bottleneckl.O Downsampling 64×128×128
4x bottleneckl.x 64×128×128
Bottleneck2.0 Downsampling 128×64×64
Bottleneck2.1 128×64×64
Bottleneck2.2 Dilated 2 128×64×64
Bottleneck2.3 Asymmetric 5 128×64×64
Bottleneck2.4 Dilated 4 128×64×64
Bottleneck2.5 128×64×64
Bottleneck2.6 Dilated 8 128×64×64
Bottleneck2.7 Asymmetric 5 128×64×64
Bottleneck2.8 Dilated 16 128×64×64
Repeat section 2, without bottleneck.2.0
Bottleneck4.0 Upsampling 64×128×128
Bottleneck4.1 64×128×128
Bottleneck4.2 64×128×128
Bottleneck5.0 Upsampling 16×256×256
Bottleneck5.1 16×256×256
Fullconv C×512×512

The system is built to identify, categorise, and make decisions about certain fields and objects. The
application applies a variety of approaches depending on the situation, such as biomedical treatment
distribution for plants like rice fields and/or avoiding homes, people, and animals. The drone system's semantic
segmentation algorithm uses OpenCV to look through a stream of footage only once and make decisions in
real time. A third party, such as a person, might monitor the entire system and take action in the most urgent
situations [26]. A single block, seen in Figure 1(a), makes up the first stage. We adopt a residual networks
(ResNets) model [26] in which extensions branch out from the main branch using convolutional filters and re-
join it using element-wise addition. This model is depicted in Figure 1(b). Figure 2 shows the different roads
and streets provided as inputs to ENet. The figure also includes the output images segmentation.

Figure 1. Module structure diagram, (a) ENet initial block and (b) ENet bottleneck module

Filters specifically created for this purpose are used for detection, and ENet architecture is used for
clustering and classification to produce real-time decision-making. The hybrid system's usage of semantic
segmentation utilising OpenCV and ENet architecture increases the system's robustness, reliability, and
intelligence in decision-making while decreasing the number of crucial decisions [26] as show in Figure 2.
(ResNet) are popular NN used to perform numerous tasks in computer vision; this is why it was recognized as
the ImageNet challenge winner in 2015 [27]–[30]. ResNet was introduced as a game-changer that allowed the
successful training of extraordinarily deep neural networks (DNNs) with 150+ layers [31], [32]. However, the
problem of vanishing gradients provides that the training of very DNNs is a difficult task before ResNet. The
skip connection concept was first presented by ResNet as depicted in the diagram in Figure 3. The graphic on
the left shows the stacking of the convolution layers on each other. On the right, the convolution layers were
also stacked as previously, but now with the inclusion of the original input in the input of the convolution
block. This is termed a skip connection; here, skip connections work for the major reasons show in Figure 4.
They prevent the vanishing gradient problem by allowing the flow of the gradient through another shortcut
path.
Implementation of deep neural networks learning on unmanned aerial … (Shouket Abdulrahman Ahmed)
944  ISSN: 2252-8938

Figure 2. The input images and the output images segmentation in the ENet

They aid in the model's ability to learn an identity function, ensuring that the performance of the
higher layer is at least at par with the lower layer. The difficulty of training very deep networks has been
reduced by the development of ResNet, which is made up of residual blocks [26]–[30]. When there are
differences in the input and output dimensions, this approach cannot perform optimally, as can be seen with
pooling and convolutional layers. In this case, the following two approaches can be implemented when the
dimensions of f(x) differed from x; adding more zeros to the skip connection improves the dimensions.
Matching the dimension using the projection approach is achieved by padding the input with 1×1 convolutional
layer. In this case, the output becomes as [33]. An additional parameter w1 is introduced here but none is added
in the first approach.

𝐻(𝑥) = 𝑓(𝑥) + 𝑤1. 𝑥 (1)

Figure 3. Skip connection

Shallow mesh and deeper mesh structures are shown in Figure 5. A shallow network, also known as
a shallow neural network or a shallow model, typically consists of a small number of layers. It usually has a
simple structure with only a few hidden layers between the input and output layers. Shallow networks are often
used for tasks that have relatively simpler patterns or when the available training data is limited. They are
computationally less complex and can be trained relatively faster compared to deeper networks. However,
shallow networks may have limited capacity to learn complex patterns and may struggle with tasks that require
hierarchical representations or feature abstraction.
Assume we have a shallow and deep network where the function H is used for the conversion of an
input 'x' into an output 'y' (x); here, it is expected that the deep network should work like the shallow network
(at least) without impacting performance negatively as seen with the ordinary NNs that contained no residual
blocks [33]. The only way of getting this done is to learn the identity function using another deep network layer
in a manner that their output and the inputs are equal, thereby maintaining optimal performance even with”

Int J Artif Intell, Vol. 13, No. 1, March 2024: 941-947

Int J Artif Intell ISSN: 2252-8938  945

extra layers as show in Figure 5. Residual blocks are seen to make easy the learning of the identity functions
by the layers as seen in (1). The output for the plain networks is (2) [34],

𝐻(𝑥) = 𝑓(𝑥) (2)

so, learning an identity function demands that f(x) must be equivalent to x; this is grader to achieve but in the
case of ResNet, its output is [35],

𝐻(𝑥) = 𝑓(𝑥) + 𝑥
𝑓(𝑥) = 0
𝐻(𝑥) = 𝑥 (3)

the only requirement is to make f(x)=0 and this is easier, and x will be realized as the output which is equally
our input. For the best-case scenario, additional DNN layers can offer better mapping of ‘x’ to output ‘y’ than
the shallow NN; it will also reduce the error significantly. Hence, the ResNet is expected to perform similarly
or better than the ordinary DNNs.

Figure 4. Residual learning: A building block Figure 5. G and M act as identity functions both the
network give the same output

4. CONCLUSION
Deep learning is still considered a “black-box” approach for many issues; however, current studies
are working to diminish this perception at a rapid pace. Nevertheless, in the field of remote sensing, most
applications have benefited through contributions by deep learning-based methods. The aim of this literature
survey was to detail the implementation of these approaches in UAV-related image data computation. For this
reason, a thorough examination of the field was provided while enlisting outlines of modern strategies and
considerations pertaining to their utilization. This survey is intended to serve as a broad study to epitomize the
implementation of deep neural networks on UAV-related operations. Within the scope of UAV-based remote-
sensing, the majority of authored literature focused on object identification approaches utilizing RGB
technology, despite the fact that many applications such as forest related tasks and controlled-agriculture can
be more accurately monitored through multispectral or hyperspectral formats of data. It is essential to expand
the collection of openly accessible data clusters attained through UAV-imagery for utilization in configuration
and evaluation of related systems. As such, this study shared a cluster of UAV-based data obtained in both
environmental and agricultural settings. Convolutional neural networks (CNNs) are currently the more highly
embraced network architectures in the field; however, various models developed on the footing provided by
CNNs i.e., the long short-term memory (LSTM) network and generative adversarial networks (GANs), are
rising in prevalence in the image segmentation field and may be promising for upcoming UAV-related image
processing applications. Graphical processing units (GPUs) may boost the performance of deep learning
algorithms for real time processing operations; hence, it is essential to explore the possibility of onboard
processing systems on UAV devices. There are numerous forthcoming technologies, for instance multitask
learning, few-shot learning, open-set, attention-based models, that can be used with each order to produce
revolutionary UAV-related image processing models. These technologies can also assist in greatly improving
the rationalization capability of deep neural networks.
Implementation of deep neural networks learning on unmanned aerial … (Shouket Abdulrahman Ahmed)
946  ISSN: 2252-8938

ACKNOWLEDGEMENTS
The author would like to acknowledge the financial support in the form of a Pre-commercialization
grant 9001-00639 from Universiti Malaysia Perlis (UniMAP).

REFERENCES
[1] O. Dahmane, M. Khelifi, M. Beladgham, and I. Kadri, “Pneumonia detection based on transfer learning and a combination of
VGG19 and a CNN Built from scratch,” Indones. J. Electr. Eng. Comput. Sci., vol. 24, no. 3, pp. 1469–1480, Dec. 2021, doi:
10.11591/ijeecs.v24.i3.pp1469-1480.
[2] Y. Xin, J. Yi, K. Zhang, C. Chen, and J. Xiong, “Offline Selective Harmonic Elimination With (2N+1) Output Voltage Levels in
Modular Multilevel Converter Using a Differential Harmony Search Algorithm,” IEEE Access, vol. 8, pp. 121596–121610, 2020,
doi: 10.1109/ACCESS.2020.3007022.
[3] M. S. AL-Huseiny and A. S. Sajit, “Transfer learning with GoogLeNet for detection of lung cancer,” Indones. J. Electr. Eng.
Comput. Sci., vol. 22, no. 2, pp. 1078–1086, May 2021, doi: 10.11591/ijeecs.v22.i2.pp1078-1086.
[4] H. K. Sagiraju and S. Mogalla, “Application of multilayer perceptron to deep reinforcement learning for stock market trading and
analysis,” Indones. J. Electr. Eng. Comput. Sci., vol. 24, no. 3, pp. 1759–1771, Dec. 2021, doi: 10.11591/ijeecs.v24.i3.pp1759-1771.
[5] H. El Hamdaoui et al., “High precision brain tumor classification model based on deep transfer learning and stacking concepts,”
Indones. J. Electr. Eng. Comput. Sci., vol. 24, no. 1, pp. 167–177, Oct. 2021, doi: 10.11591/ijeecs.v24.i1.pp167-177.
[6] R. F. Olanrewaju, S. N. Ibrahim, A. L. Asnawi, and H. Altaf, “Classification of ECG signals for detection of arrhythmia and
congestive heart failure based on continuous wavelet transform and deep neural networks,” Indones. J. Electr. Eng. Comput. Sci.,
vol. 22, no. 3, pp. 1520–1528, Jun. 2021, doi: 10.11591/ijeecs.v22.i3.pp1520-1528.
[7] A. M. Mahmoud and H. Karamti, “Classifying a type of brain disorder in children: an effective fMRI based deep attempt,” Indones.
J. Electr. Eng. Comput. Sci., vol. 22, no. 1, pp. 260–269, Apr. 2021, doi: 10.11591/ijeecs.v22.i1.pp260-269.
[8] C. Farabet, C. Couprie, L. Najman, and Y. LeCun, “Learning Hierarchical Features for Scene Labeling,” IEEE Trans. Pattern Anal.
Mach. Intell., vol. 35, no. 8, pp. 1915–1929, Aug. 2013, doi: 10.1109/TPAMI.2012.231.
[9] L.-C. Chen, G. Papandreou, I. Kokkinos, K. Murphy, and A. L. Yuille, “DeepLab: Semantic Image Segmentation with Deep
Convolutional Nets, Atrous Convolution, and Fully Connected CRFs,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 40, no. 4, pp.
834–848, Apr. 2018, doi: 10.1109/TPAMI.2017.2699184.
[10] V. Badrinarayanan, A. Handa, and R. Cipolla, “SegNet: A Deep Convolutional Encoder-Decoder Architecture for Robust Semantic
Pixel-Wise Labelling,” Comput. Sci. Comput. Vis. Pattern Recognit., vol. 1, 2015, doi: https://doi.org/10.48550/arXiv.1505.07293.
[11] D. T. Bui, P. Tsangaratos, V.-T. Nguyen, N. Van Liem, and P. T. Trinh, “Comparing the prediction performance of a Deep Learning
Neural Network model with conventional machine learning models in landslide susceptibility assessment,” CATENA, vol. 188, p.
104426, May 2020, doi: 10.1016/j.catena.2019.104426.
[12] J. Long, E. Shelhamer, and T. Darrell, “Fully convolutional networks for semantic segmentation,” in 2015 IEEE Conference on
Computer Vision and Pattern Recognition (CVPR), IEEE, Jun. 2015, pp. 3431–3440. doi: 10.1109/CVPR.2015.7298965.
[13] K. Simonyan and A. Zisserman, “Very Deep Convolutional Networks for Large-Scale Image Recognition,” Comput. Sci. Comput.
Vis. Pattern Recognit., vol. 1, 2015, doi: https://doi.org/10.48550/arXiv.1409.1556.
[14] M. Cordts et al., “The Cityscapes Dataset for Semantic Urban Scene Understanding,” Comput. Sci. Comput. Vis. Pattern Recognit.,
vol. 1, 2016, doi: https://doi.org/10.48550/arXiv.1604.01685.
[15] G. J. Brostow, J. Shotton, J. Fauqueur, and R. Cipolla, “Segmentation and Recognition Using Structure from Motion Point Clouds,”
in Computer Vision – ECCV 2008, 2008, pp. 44–57. doi: 10.1007/978-3-540-88682-2_5.
[16] S. Li, “Simulation Study of Selected Harmonic Eliminated PWM Methods for Cascaded Multilevel Inverters,” in Proceedings of
the 2015 International Power, Electronics and Materials Engineering Conference, 2015. doi: 10.2991/ipemec-15.2015.56.
[17] M. Ranzato, F. J. Huang, Y.-L. Boureau, and Y. LeCun, “Unsupervised Learning of Invariant Feature Hierarchies with Applications
to Object Recognition,” in 2007 IEEE Conference on Computer Vision and Pattern Recognition, IEEE, Jun. 2007, pp. 1–8. doi:
10.1109/CVPR.2007.383157.
[18] J. Ngiam, A. Khosla, M. Kim, J. Nam, H. Lee, and A. Y. Ng, “Multimodal Deep Learning,” Proc. 28 th Int. Conf. Mach. Learn.,
2011, [Online]. Available: https://people.csail.mit.edu/khosla/papers/icml2011_ngiam.pdf
[19] H. Noh, S. Hong, and B. Han, “Learning Deconvolution Network for Semantic Segmentation,” in 2015 IEEE International
Conference on Computer Vision (ICCV), IEEE, Dec. 2015, pp. 1520–1528. doi: 10.1109/ICCV.2015.178.
[20] S. Zheng et al., “Conditional Random Fields as Recurrent Neural Networks,” in 2015 IEEE International Conference on Computer
Vision (ICCV), IEEE, Dec. 2015, pp. 1529–1537. doi: 10.1109/ICCV.2015.179.
[21] D. Eigen and R. Fergus, “Predicting Depth, Surface Normals and Semantic Labels with a Common Multi-scale Convolutional
Architecture,” in 2015 IEEE International Conference on Computer Vision (ICCV), IEEE, Dec. 2015, pp. 2650–2658. doi:
10.1109/ICCV.2015.304.
[22] S. Hong, H. Noh, and B. Han, “Decoupled Deep Neural Network for Semi-supervised Semantic Segmentation,” Comput. Sci.
Comput. Vis. Pattern Recognit., vol. 1, 2015, doi: https://doi.org/10.48550/arXiv.1506.04924.
[23] P. Sturgess, K. Alahari, L. Ladický, and P. H. S. Torr, “Combining Appearance and Structure from Motion Features for Road Scene
Understanding,” Comb. Appear. SFM Featur., 2009, [Online]. Available: https://www.robots.ox.ac.uk/~lubor/bmvc09.pdf
[24] N. Ammour, H. Alhichri, Y. Bazi, B. Benjdira, N. Alajlan, and M. Zuair, “Deep Learning Approach for Car Detection in UAV
Imagery,” Remote Sens., vol. 9, no. 4, p. 312, Mar. 2017, doi: 10.3390/rs9040312.
[25] J. G. A. Barbedo, L. V. Koenigkan, T. T. Santos, and P. M. Santos, “A Study on the Detection of Cattle in UAV Images Using Deep
Learning,” Sensors, vol. 19, no. 24, p. 5436, Dec. 2019, doi: 10.3390/s19245436.
[26] A. Paszke, A. Chaurasia, S. Kim, and E. Culurciello, “ENet: A Deep Neural Network Architecture for Real-Time Semantic
Segmentation,” Comput. Sci. Comput. Vis. Pattern Recognit., vol. 1, 2016, doi: https://doi.org/10.48550/arXiv.1606.02147.
[27] J. Hou et al., “Identification of animal individuals using deep learning: A case study of giant panda,” Biol. Conserv., vol. 242, p.
108414, Feb. 2020, doi: 10.1016/j.biocon.2020.108414.
[28] W. Liu and L. Zhang, “Aerial Traffic Statistics Based on YOLOv5+DeepSORT,” Acad. J. Sci. Technol., vol. 3, no. 3, pp. 198–201,
Nov. 2022, doi: 10.54097/ajst.v3i3.2981.
[29] N. Horning, E. Fleishman, P. J. Ersts, F. A. Fogarty, and M. Wohlfeil Zillig, “Mapping of land cover with open‐source software
and ultra‐high‐resolution imagery acquired with unmanned aerial vehicles,” Remote Sens. Ecol. Conserv., vol. 6, no. 4, pp. 487–
497, Dec. 2020, doi: 10.1002/rse2.144.
[30] Z. M. Hamdi, M. Brandmeier, and C. Straub, “Forest Damage Assessment Using Deep Learning on High Resolution Remote

Int J Artif Intell, Vol. 13, No. 1, March 2024: 941-947

Int J Artif Intell ISSN: 2252-8938  947

Sensing Data,” Remote Sens., vol. 11, no. 17, p. 1976, Aug. 2019, doi: 10.3390/rs11171976.
[31] K. He, X. Zhang, S. Ren, and J. Sun, “Deep Residual Learning for Image Recognition,” in 2016 IEEE Conference on Computer
Vision and Pattern Recognition (CVPR), IEEE, Jun. 2016, pp. 770–778. doi: 10.1109/CVPR.2016.90.
[32] N. Kussul, M. Lavreniuk, S. Skakun, and A. Shelestov, “Deep Learning Classification of Land Cover and Crop Types Using Remote
Sensing Data,” IEEE Geosci. Remote Sens. Lett., vol. 14, no. 5, pp. 778–782, May 2017, doi: 10.1109/LGRS.2017.2681128.
[33] J. S. Soderholm, M. R. Kumjian, N. McCarthy, P. Maldonado, and M. Wang, “Quantifying hail size distributions from the sky –
application of drone aerial photogrammetry,” Atmos. Meas. Tech., vol. 13, no. 2, pp. 747–754, Feb. 2020, doi: 10.5194/amt-13-747-
2020.
[34] L. Ichim and D. Popescu, “Segmentation of Vegetation and Flood from Aerial Images Based on Decision Fusion of Neural
Networks,” Remote Sens., vol. 12, no. 15, p. 2490, Aug. 2020, doi: 10.3390/rs12152490.
[35] S. Nezami, E. Khoramshahi, O. Nevalainen, I. Pölönen, and E. Honkavaara, “Tree Species Classification of Drone Hyperspectral
and RGB Imagery with Deep Learning Convolutional Neural Networks,” Remote Sens., vol. 12, no. 7, p. 1070, Mar. 2020, doi:
10.3390/rs12071070.

BIOGRAPHIES OF AUTHORS

Shouket Abdulrahman Ahmed is awarded M.Sc. in Electrical System

Engineering, University Malaysia Perlis (UniMAP), Perlis, Malaysia in 2016, thesis title is
“Electrical Intelligence System Based on Microcontroller for Power Station”. Currently, His
expertise is in the areas of power electronics. He has published more than three journal
articles and conferences. After that, he obtained a Bachelor's degree (full time), in electrical
and electronics engineering, Grad 3 from Thames Polytechnic, UK in 1989. He can be
contacted at email: [email protected].

Hazry Desa received his bachelor of mechanical engineering from Tokushima

University, Japan. He was a Senior Design Mechanical Engineer in R&D Sony Technology
Malaysia in 1997. In 2001 he was employed by Tamura Electronics Malaysia. He pursued
his Ph.D. at the Artificial Life and Robotics Laboratory, Oita University, Japan in 2003. He
has supervised a number of postgraduate students in either master or doctor of philosophy
levels. His research interests include mobile robot, unmanned systems and aerial robotics. A
lot of achievements have been made through the years such as journal publications, patents
and prototypes developments in UAS area. He is also registered with Engineering Council
United Kingdom as Chartered Engineer (C.Eng.), registered as a Professional Engineer (Ir.)
with Board of Engineers Malaysia (BEM), registered as a Professional Technologist (Ts.)
with Malaysia Board of Technologist (MBOT), senior member of IEEE and a member of
IET. He can be contacted at email: [email protected].

Abadal-Salam T. Hussain is currently a Ph.D. faculty staff, Assistant Professor

– Department head of Medical Instrumentations Technique Engineering, Alkitab University.
Previously Faculty staff of School of Electrical System Engineering, University Malaysia
Perlis (UniMAP), Malaysia (2013-2016). Before that, he was affiliated with the University
of Sharjah, UAE (2004-2007) and "Sharjah Institute of Technology (SIT), Sharjah, UAE
(2007-2011). Abadal also has a GCE from the University of London & a Bachelor of
Engineering from the Royal Naval Engineering College, Manadon, Plymouth, UK in (1989).
He also has a Master of Science awarded by the University of Science & Technology Oran
Algeria in 2000 and a doctorate (Ph.D.) in mechatronic engineering awarded by the
University Malaysia Perlis (UniMAP) in 2013. He can be contacted at email:
[email protected] or [email protected].

Taha A. Taha is a recipient of a mixed-mode master's degree, M.Sc., in

Electrical Power Engineering from the University Malaysia Perlis (2019) in Malaysia. He
also holds a bachelor's degree from the School of Electrical System at the University Malaysia
Perlis (UniMAP) (2017) and a diploma in IT skills from the University of Cambridge, UK in
2012. He can be contacted at email: [email protected].

Implementation of deep neural networks learning on unmanned aerial … (Shouket Abdulrahman Ahmed)