Deep Learning Applications
Deep Learning Applications
Applications
In Computer Vision, Signals and Networks
This page intentionally left blank
Deep Learning
Applications
In Computer Vision, Signals and Networks
Edited by
Qi Xuan
Yun Xiang
Dongwei Xu
Zhejiang University of Technology, China
World Scientific
NEW JERSEY • LONDON • SINGAPORE • BEIJING •
Published by
World Scientific Publishing Co. Pte. Ltd.
5 Toh Tuck Link, Singapore 596224
USA office: 27 Warren Street, Suite 401-402, Hackensack, NJ 07601
UK office: 57 Shelton Street, Covent Garden, London WC2H 9HE
For photocopying of material in this volume, please pay a copying fee through the Copyright Clearance
Center, Inc., 222 Rosewood Drive, Danvers, MA 01923, USA. In this case permission to photocopy
is not required from the publisher.
Printed in Singapore
c 2023 World Scientific Publishing Company
https://doi.org/10.1142/9789811266911 fmatter
Preface
v
vi Preface
weight of each edge in the molecule via an edge attention layer for
molecular networks. Hu et al.11 presented a novel deep learning based
scam detection framework using attention mechanism. For internet
network data, TopoScope12 used ensemble learning and Bayesian
Network to reduce the observation bias, and reconstruct the internet
topology by discovering hidden links. Existing techniques for conflict
detection in social media mainly focus on macro-topic controversy
detection.13,14
In this book, we mainly focus on the application of deep learning
in vision, signal, and network. In the past few years, various deep
learning models have been proposed to solve academic and indus-
trial problems in the related areas. For real-world applications, the
complexity of real-world scenarios and the restrictions on equipment
cost limit the algorithm performance. Thus, this book will introduce
the latest applications of deep learning. In particular, we focus on
the following areas: computer vision, signal processing, and graph
network.
Computer Vision aims to derive meaningful information from
visual data, e.g., images and videos, to automate tasks or aid human
decisions. For humans, vision accounts for the majority of our access
to information and dominates our decisions. In some of the visual
tasks, such as image classification, object detection, and behavior
recognition, the neural network has outperformed humans.15 Most
computer vision algorithms are based on convolutional neural net-
works (CNNs). One of the most popular CNNs is ResNet,3 which
has solved the problem of training very deep neural networks and
achieved state-of-the-art performance in the 2015 ImageNet Large
Scale Visual Recognition Challenge.16 Though many models are first
proposed for image classification, they can also be used as powerful
feature extractors. For example, in object detection, region-CNN17
uses a CNN to extract features of latent regions where objects exist
and then determines the category of the objects. Segmentation and
classification are completely different tasks, but their models can
share similar components.18 In this book, these algorithms are eval-
uated on standard benchmark data.
In Chapter 1, Zhang, Chen, and Xiang introduce the method to
estimate particle matter 2.5 (PM2.5). They deploy the sampling
system and collect PM2.5 data and images for verification. The
experimental results demonstrate the effectiveness and reliability
viii Preface
In Chapter 10, Li and Zhang take both dynamic and static fea-
tures for obtaining precise social network node embedding results.
They develop a new feature calculation technique to integrate sev-
eral node features, which significantly enhances the performance of
controversy detection.
In Chapter 11, Jin, Zhou, Chen, Sheng, and Xuan construct a mul-
tidimensional graph neural network detection model for Ethereum
Ponzi schemes detection. It employs the three main features, i.e.,
contract code, transactions, and network structure features.
In Chapter 12, Zhou, Tan, Zhang, and Zhao build a reliable pre-
diction model for molecular biological activity prediction. It can be
applied for molecular multi-feature fusion and adaptively fuses mul-
tiple features through the self-attention mechanism. The authors also
use focal loss and gradient harmonizing mechanism (GHM) to ana-
lyze the imbalance between positive and negative samples in molec-
ular biological activity data.
References
xiii
xiv About the Editors
Contents
Preface v
About the Editors xiii
Introduction xix
xv
xvi Contents
Index 275
This page intentionally left blank
c 2023 World Scientific Publishing Company
https://doi.org/10.1142/9789811266911 fmatter
Introduction
xix
xx Introduction
are six feature maps extracted from the original image, includ-
ing refined dark channel, max local contrast, max local saturation,
min local color attenuation, hue disparity, and chroma. The model
extracts the haze information from the input feature maps and out-
puts the final PM2.5 concentrations.
Chapter 2 will introduce a ship plate identification technique
based on R2CNN and ASTER, which utilizes a two-stage end-to-end
network, including text detection, rectification, and recognition. The
network is demonstrated on a dataset built from Xiangshan port. It
achieves high performance and can identify the ship in real time.
Chapter 3 will introduce two methods to identify the surface
defects with different detection granularity. The first one utilizes a
deep learning network for dichotomy and uses a generative adversar-
ial network (GAN) for data enhancement. The experimental results
show that the deep learning-based method has a high detection effi-
ciency and GAN improves the detection performance of the network.
The second method is a fine-grained defect detection network, which
mainly focuses on the detection of subtle scratches. This method is
divided into two stages. The first stage consists of a Faster R-CNN
network and the second stage uses DeepLabV3+ network to detect
scratches.
Chapter 4 will explore the research of deep learning in agriculture
crop stress analysis. To better illustrate the issue, this chapter begins
with the stress types and challenges of identifying stress. Then it
introduces the deep neural networks used in agriculture. Finally, it
concludes with a summary of the current situation, limitations and
future work.
Vision Applications
This page intentionally left blank
c 2023 World Scientific Publishing Company
https://doi.org/10.1142/9789811266911 0001
Chapter 1
Vision-Based Particulate
Matter Estimation
1. Introduction
3
4 K. Zhang et al.
estimates that 2.4 million people die annually from causes associated
with air pollution.6 The most common air pollutants are particulate
matter (PM), sulfur dioxide, and nitrogen dioxide. This work focuses
on PM2.5, which can increase the rate of cardiovascular, respiratory,
and cerebrovascular diseases.7,8 Air monitoring stations are now used
to estimate PM2.5, which are correlated with pollutant concentra-
tions.9 However, the limited number of sensors and, therefore, low
spatial density makes them inaccurate.
Low measurement spatial density makes it especially difficult to
estimate human exposures. PM has heterogeneous sources,10 e.g.,
automobiles exhaust, dust, cooking, manufacturing, and building
construction, etc. The PM concentrations are correlated with source
distributions. For example, numerous factors, including wind, humid-
ity, and geography,11,12 are related to PM distributions. Therefore,
air pollution concentration varies within a relatively short distance:
relying on existing sparse, stationary monitoring stations can lead to
inaccurate estimation of the high-resolution pollution field.
Increasing sensor density or adding image sensors supporting high
spatial resolution captures can increase estimation accuracy and res-
olution. PM2.5 can be estimated by analyzing the visual haze effect
caused by particles and gasses.13 The image data may be derived
from several sources such as social media14 and digital cameras.15
The ground truth data are typically derived from the nearest air
quality station and have low spatial resolution. Existing approaches
are generally inaccurate except near the sparsely deployed sensing
stations. Moreover, they generally assume homogeneous distributions
of particles and gases within images, implying consistent light atten-
uation. However, in reality, pollution concentration varies rapidly
in space. Thus, accurate evaluation requires vision-based estimation
algorithms.
In this chapter, we present a vision-based PM2.5 estimation algo-
rithm and collect an air pollution dataset containing images to eval-
uate our algorithm. The algorithm consists of two steps, haze feature
extraction and PM concentration estimation, where haze feature is
represented by six haze-relevant image features and PM concentra-
tion is estimated by a deep neural network. Our dataset contains
images captured by a drone and ground PM concentration measured
by particle sensors. The main contents of this chapter are summa-
rized as follows:
Vision-Based Particulate Matter Estimation 5
2. Related Work
3. Methodology
(a1) (a2)
Fig. 1: An example of the extracted feature maps for two raw haze images (the
PM value of image 1 is 6 µgm−3 , the PM value of image 2 is 118 µgm−3 ):
(a) origin input image; (b) refined dark channel; (c) max local contrast; (d) max
local saturation; (e) min local color attenuation; (f) hue disparity; (g) chroma.
8 K. Zhang et al.
some pixels with intensity very low in at least one color channel.23
Therefore, the dark channel can roughly reflect the thickness of
the haze.
For better estimation of haze value, refined dark channel17
is proposed with the application of filter G on the estimated
medium transmission t 24 to identify the sharp discontinuous
edge and draw the haze profile. Note that by applying the min-
imum operation on Eq. (1), the dark channel of J tends to be
zero, i.e.,
t = 1 − D(x; I).
The refined dark channel is
R IC (y)
D (x; I) = 1 − G 1 − min min , (3)
y∈Ω(x) c{r,g,b} Lc∞
(4) Min local color attenuation: The scene depth is positively corre-
lated with the difference between the image brightness and satu-
ration.27 The scene depth is represented by the color attenuation
prior.
where I h is the hue channel of the image. Figure 1(f) shows the
hue disparity feature for image with haze.
(6) Chroma: In the CIELab color space, one of the most represen-
tative features to describe the color degradation in the atmo-
sphere is the chroma. Assume [L(x; I), a(x; I), b(x; I)]T are the
haze image I in the CIELab space, the chroma feature is defined
as
fully connected to the output layer. We choose the leaky ReLU func-
tion as the activation function, which helps to avoid the dying ReLU
phenomenon.
4. Experiment
4.1. Devices
(1) PM device: Nova PM sensor modules and Jetson nano are com-
bined for PM2.5 collection. The equipment collects the PM2.5
value at an interval of several seconds.
(2) UAV: DJI Air 2S is utilized for videos capture: [https://www.
dji.com/air-2s/specs].
Nova PM sensor
Sensor range [PM2.5] 0.0–999.9 µgm−3
[PM10] 0–1999.9 µgm−3
Operating temperature −10–50◦ C
Operating humidity Maximum 70%
The response time 1s
Serial port data output frequency 1 Hz
Minimum resolution particle size 0.3 µm
The relative error Max. ±15% and ±10 µgm−3
(Note: 25◦ C, 50%RH)
Standard certification CE/FCC/RoHS
UAV camera
Sensor 1 CMOS
Effective pixels: 20 MP; 2.4 µm pixel size
Lens FOV: 88◦
35 mm format equivalent: 2 mm
Aperture: f /2.8
Shooting range: 0.6 m to ∞
Video resolution MP4 (H.264/MPEG-4 AVC, H.265/HEVC)
The lens angle of depression 17◦ ∼19◦
12 K. Zhang et al.
(a) (b)
(a) (b)
(a)
(b)
Fig. 5: The horizontal axis represents the number of iterations, and the vertical
axis represents RMSE (a) training error and test error under different iterations
under one training; (b) test error of different iteration times under 12-fold cross-
validation.
14 K. Zhang et al.
10.0
9.0
8.0
7.0
6.0
5.0
4.0
3.0
2.0
1.0
0.0
3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
(a)
180
160
140
120
100
80
60
40
20
0
4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
(b)
Fig. 6: The horizontal axis represents the number of iterations, and the vertical
axis represents RMSE (a) training error and test error under different iterations
under one training; (b) test error of different iteration times under 12-fold cross-
validation.
Vision-Based Particulate Matter Estimation 15
5. Conclusion
References
Chapter 2
1. Introduction
19
20 H. Xu et al.
2. Related Work
3. Dataset
(a)
(b)
(c)
(d)
Fig. 1: Some samples from ZJUTSHIP-3656. (a) Ship images shooting by dif-
ferent cameras. (b) Ship images shooting by same cameras from different angles.
(c) Ship images at different times. (d) Case of missing characters in ship license.
Automatic Ship Plate Recognition Using Deep Learning Techniques 25
4. Method
Feature Extractor
classification score
axis-aligned box
inclined box
fully connected layer
ResNet101 ROI Pooling
RPN
box classification
box regression
where Lcls is the classification loss, Lloch and Llocr are the regression
loss of the horizontal box and the rotational box, respectively.
We optimize parameters of modules for our task. In region pro-
posal stage in RPN module, we set 7 different anchor scales (1, 2, 4,
8, 16, 32, 64) and 13 different anchor ratios (1, 1/2, 2, 1/3, 3, 1/4, 4,
5, 1/5, 6, 1/6, 7, and 1/7), respectively. In the Fast R-CNN network,
the size of ROI pooling we use is 7 × 7.
5. Experiment
of the passing ship from a radar and control the video camera. It
delivers images and ship information to the recognition server. Our
algorithm is implemented in the recognition server. It operates the
ship license plate detection, text rectification, and text recognition
functions.
Fig. 7: The devices we use in ship location stage. The left is camera and the
right is radar.
lines, the server read frequency to radar is set to 1 Hz. If the ship
travels to the area surrounded by blue lines, the server read frequency
increases to 4 Hz. When the ship travels to the area surrounded by
red lines, the server controls the camera to take picture of the ship.
According to the radar feedback distance, cameras adjust the focal
length automatically and capture the image of the ship. Then, images
are transmitted to the ship plate recognition server for subsequent
processing.
When the image of the ship is taken, the detection server sends
the image, image shape, and number of images to the ship recognition
system.
is cut out. The picture of the ship plate is transferred to the ship
plate recognition network, and the ship plate recognition network
recognizes the ship plate. Finally, the ship license is matched with
the database and the final result is returned.
The deployment of database checking is mainly to prevent the
wrong result after the identification of the outgoing ship’s license.
We record the ship in the database when the ship enters the port.
When the ship is detected to leave the port, we match it in the
database. If the ship information exists, the ship will be recorded as
having left the port.
TP
P = , (2)
TP + FP
TP
R= , (3)
TP + FN
2×P ×R
F1 = . (4)
P +R
To calculate AP, assuming that there are M positive examples in
N samples, we get M recall values. For each recall value r, we can
calculate the maximal precision corresponding to (r > r), and then
average the M precision values to get the final AP value. Generally,
a classifier with higher AP performs better.
Since bounding boxes detected by the algorithm cannot com-
pletely match the manually labeled data, we use IoU (Intersect over
Union) to evaluate the positioning accuracy of the bounding box. It
is defined to measure the overlap of two boxes. The calculation is as
follows:
A∩B
IoU (A, B) = . (5)
A∪B
32 H. Xu et al.
Unlike the target detection that some detected object features are
sufficient for subsequent recognition, the ship plate detection detects
all characters in an image. Therefore, we set that when IoU > 0.8, the
classification is correct, and the ship plate is regarded as a successful
sample.
We use accuracy and average edit distance (AED)28 as our eval-
uation metrics for ship license plate recognition. For an input ship
license plate image, Sg is defined as the sequence string of the ground
truth label of the ship license image, i.e., the true name of the ship
license plate. We assume Sp as the output of the ship plate recog-
nition algorithm, i.e., the predicted ship plate name for any two Sg
and Sp . The minimal number of editing operations required to con-
vert one string to another is called edit distance. The larger the edit
distance, the lower the similarity between the two strings. The edit-
ing operations include the insertion, deletion, and replacement of
characters.
Accuracy is calculated by comparing Sg and Sp directly. If the
character length of Sp is equal to Sg , and the character of each posi-
tion of Sp is the same as Sg , then Sp is considered to be a correct
prediction. Otherwise, it is a wrong prediction.
The corresponding formulas are as follows:
N
ED (Sgi , Spi )
AED = i=1 , (6)
N
Ncorrect
ACC = × 100%, (7)
N
where ED is the edit distance, N is the number of samples, and
Ncorrect is the number of correct recognition.
and testing set are divided by a ratio of 9:1. We train our detec-
tion model using stochastic gradient descent with momentum. The
training iterations is 100,000. The learning rate is set to 0.0003.
It decreases to 0.00003 and 0.000003, respectively, when trained to
30,000 iterations and 60,000 iterations.
As mentioned in the previous section, at detection stage of our
task, high IoU are necessary for text recognition. We set the clas-
sification correct if IoU > 0.8. We evaluate the performance of our
detection network in different λ1 and λ2 in Eq. (1). When setting
λ1 = 1 and λ2 = 0 at training, it means we only regress axis-aligned
bounding boxes. We set λ1 = 0 and λ2 = 1 to test the regression
accuracy without axis-aligned bounding boxes. As shown in Table 2,
the network with two regression branches performs better than the
others.
We also compare our method with Faster R-CNN method and
CTPN16 in the same experiment settings. The performance is shown
in Fig. 8. Our network outperforms both Faster R-CNN and CTPN
in the ship license plate detection task.
6. Conclusion
References
Chapter 3
41
42 J. Shao et al.
1. Introduction
2. Related Work
The first one only uses one network, whose output is the target cate-
gory probability and position coordinate. The mainstream one-stage
detection algorithms include SSD21 and YOLO,22 etc. Although they
are fast, their accuracy is usually lower than the two-stage ones.
The two-stage detection algorithms first extract multiple candidate
regions containing the target, and then perform region classification
and location refinement.
Their representative algorithms include region proposal with con-
volutional neural network (R-CNN),23 Fast R-CNN,24 and Faster
R-CNN,25 etc. To ensure defect detection accuracy, we use two-stage
detection method.
R-CNN23 first extracts a set of independent region proposals
(object candidate boxes) by selective search.26 Then each region pro-
posal is scaled to a fixed-size image and fed into a CNN model (e.g.,
AlexNet) to extract features. In the end, a linear SVM classifier is
used to predict and classify the object in each region. For public
datasets, R-CNN achieves a mean average precision (mAP) of 53.7%
on PASCAL VOC 2010.23 In 2015, Ross Girshick modified R-CNN
and proposed Fast R-CNN. Fast R-CNN trains both detector and
bounding box regressor simultaneously, which makes it capable of
using very deep detection networks. It is up to nine times faster than
R-CNN. However, both R-CNN and Fast R-CNN are based on the
region proposal algorithms, which is very time-consuming. Ren et al.
propose Faster R-CNN. It replaces the previous regional proposal
method by using the regional proposal network (RPN). RPN is a
fully convolutional network that takes images of any size as inputs
and outputs a set of rectangular candidates. Anchor is the core con-
cept of RPN network. An anchor point is the central point of the
current slide window on the feature map generated by the backbone
network, which will be further processed as a candidate box.
3. Dataset
Failure Description
Fig. 1: Main types of defects on the side, end face, and chamfering. (a) to (d)
show that the defects on the side are mainly scratches, wear, burns, and discol-
oration; (e) to (h) show that the defects on the end face are mainly scratches,
inner diameter sags, wear, and outer diameter corner crack; (i) to (l) show that
the defects on the chamfering are corrosion, scratches, broken corners, and wear.
images into training set and testing set with a ratio of 8:2. The train-
ing set and testing set evenly contain qualified samples and defective
samples. Finally, we quadruple each category in the training set by
data augmentation. The original dataset is shown in Table 2 and the
final augmented training set is shown in Table 3.
For our first method, i.e., the binary classification algorithm,
we can use the augmented dataset directly. The second method
requires data annotation, we use Labelme to annotate the scratched
part.
4. Method
Data
Generative enhancement
Adversarial AlexNet
Network The
VGGNet Compare Accuracy optimal
model Backbone Defect Detection
and
Network Result
GoogLeNet Speed
Fig. 2: The framework of the first defect detection method based on GAN.
The classification network includes four basic neural network options, which are
AlexNet, VGGNet, GoogLeNet, and ResNet.
(LabelImg)
Fine-grained Defect
Finely labeled Segmentation Network
training set
1 Qualified
a 1 (Labelme)
Module A
Bearing roller 0.5 Finely labeled
(a+b)/2 Module C training set
images
b
Module B
0
0 Defective
Training set
Fig. 3: The framework of the second defect detection method, which consists of
two stages. In the first stage, Module A and Module B are independently trained,
where Module A represents a Faster R-CNN network and Module B represents
the binary classifier (the one evaluated in our first method. In the second stage,
small scratches are detected through the DeepLabV3+ network.
CONV4
CONV3
CONV2
CONV1
reshape
4×4×1024
8×8×512
16×16×256
Random noise 32×32×128
z
64×64×3
G(z)
Classification
RPN network
5. Experimental Results
6. Conclusion
References
Chapter 4
For farmers, the variety of crop diseases and symptoms causes great
difficulties in the diagnosis of plant diseases. To address this problem,
researchers have recently begun using deep learning algorithms to ana-
lyze crop stress and achieved good results. This chapter aims to explore
the research of deep learning in crop stress. To better illustrate the issue,
this chapter begins with the types and challenges of crop stress, followed
by an introduction to the deep neural networks used, and concludes with
a summary of the current progresses, limitations, and future work of crop
stress research.
1. Introduction
61
62 Q. Chen et al.
2.2. Autoencoder
The autoencoder was proposed by Hinton et al. in 2006,18 which is an
unsupervised learning method that uses the input data as supervision
to learn a mapping relationship and obtain a reconstructed output.
Its variants include sparse autoencoder, denoising autoencoder, and
contract autoencoder, etc. They can be used for feature dimension-
ality reduction and feature extraction. A simple autoencoder can be
structured as a three-layer neural network.
Number of Papers
30
25
20
15
10
0
2018 2019 2020 2021
Fig. 1: The distribution of the year in which the selected papers were published.
(a)
(b)
(c)
(d)
Fig. 2: Sample images from PlantVillage dataset: (a) common rust, (b) gray leaf
spot, (c) northern leaf blight, (d) healthy.
Note: Cited from Ahila et al. Maize leaf disease classification using deep convo-
lutional neural networks.
Motor Box
Camera
Power
Supply Box
Fig. 4: Prediction of the proposed model on the test dataset and (a)–(d) are
images that correctly detected as bacterial ones and (e)–(h) are correctly detected
as healthy leaves.
Note: Cited from Yadav et al. Identification of disease using deep learning and
evaluation of bacteriosis in peach leaf.
70 Q. Chen et al.
Fig. 5: Schematic diagram of the flow of work for identification of stressed crop.
Note: Cited from Chandel et al. Identifying crop water stress using deep learning
models.
(a)
(c)
(b)
"
References
22. S. Ren, K. He, R. Girshick, and J. Sun, Faster R-CNN: Towards real-
time object detection with region proposal networks, Advances in Neu-
ral Information Processing Systems. 28, 91–99 (2015).
23. W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C.-Y. Fu, and
A. C. Berg, Ssd: Single shot multibox detector, in European Conference
on Computer Vision, 2016, pp. 21–37.
24. J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, You only look once:
Unified, real-time object detection, in Proceedings of the IEEE Confer-
ence on Computer Vision and Pattern Recognition, 2016, pp. 779–788.
25. J. Redmon and A. Farhadi, YOLO9000: Better, faster, stronger, in
Proceedings of the IEEE Conference on Computer Vision and Pattern
Recognition, 2017, pp. 7263–7271.
26. J. Redmon and A. Farhadi, YOLOv3: An incremental improvement,
arXiv preprint arXiv:1804.02767 (2018).
27. A. Bochkovskiy, C.-Y. Wang, and H.-Y. M. Liao, YOLOv4: Optimal
speed and accuracy of object detection, arXiv preprint arXiv:2004.
10934 (2020).
28. A. K. Singh, B. Ganapathysubramanian, S. Sarkar, and A. Singh, Deep
learning for plant stress phenotyping: Trends and future perspectives,
Trends in Plant Science. 23(10), 883–898 (2018).
29. J. Shin, Y. K. Chang, B. Heung, T. Nguyen-Quang, G. W. Price, and
A. Al-Mallahi, A deep learning approach for RGB image-based pow-
dery mildew disease detection on strawberry leaves, Computers and
Electronics in Agriculture. 183, 106042 (2021).
30. R. Ahila Priyadharshini, S. Arivazhagan, M. Arun, and A. Mirnalini,
Maize leaf disease classification using deep convolutional neural net-
works, Neural Computing and Applications. 31(12), 8887–8895 (2019).
31. R. G. De Luna, E. P. Dadios, and A. A. Bandala, Automated image
capturing system for deep learning-based tomato plant leaf disease
detection and recognition, in TENCON 2018—2018 IEEE Region 10
Conference, 2018, pp. 1414–1419.
32. S. Nickolas et al. Deep learning based betelvine leaf disease detection
(piper betlet.), in 2020 IEEE 5th International Conference on Com-
puting Communication and Automation (ICCCA), 2020, pp. 215–219.
33. J. M. Duarte-Carvajalino, D. F. Alzate, A. A. Ramirez, J. D. Santa-
Sepulveda, A. E. Fajardo-Rojas, and M. Soto-Suárez, Evaluating late
blight severity in potato crops using unmanned aerial vehicles and
machine learning algorithms, Remote Sensing. 10(10), 1513 (2018).
34. S. Yadav, N. Sengar, A. Singh, A. Singh, and M. K. Dutta, Identifi-
cation of disease using deep learning and evaluation of bacteriosis in
peach leaf, Ecological Informatics. 61, 101247 (2021).
Application of Deep Learning in Crop Stress 77
Signal Applications
This page intentionally left blank
c 2023 World Scientific Publishing Company
https://doi.org/10.1142/9789811266911 0005
Chapter 5
81
82 S. Gao et al.
1. Introduction
obtain a smaller model with less performance loss and faster inference
speed.
According to deleting the entire neuron or filter, pruning can be
divided into unstructured pruning and structured pruning. Unstruc-
tured pruning considers each element in the weight of each filter,
and deletes the weight parameter with value 0, thereby obtaining
a sparse weight matrix.18–20 Unstructured pruning has the highest
flexibility and generalization performance. Generally, it can achieve
a higher compression ratio. Through some regularization methods
(L1 norm, Group-Lasso, hierarchical group-level sparse, etc.), the
neural network model can adaptively adjust multiple granular struc-
tures during the training process (weight filter level, channel level,
layer level, etc.) to achieve the effect of structured sparseness. Weight
pruning24 can be accelerated on dedicated software or hardware, but
only on general hardware or BLAS libraries. It is difficult for the
pruned model to achieve substantial performance improvement. The
structured pruning method21–23 is roughly divided into filter prun-
ing, channel pruning, and layer pruning according to the granularity
of pruning of the model structure. Since the entire filter and chan-
nel of some layers in the model are removed, the model structure
is very regular. Not limited by hardware, it can significantly reduce
the number of model parameters and obtain a significant accelera-
tion effect. However, due to the coarse pruning granularity, it has a
greater impact on the performance of the fine-tuned model, causing
an irreparable loss of accuracy in the classification task. The core of
structured pruning lies in the selection criteria of the model structure
at different granularities, and the goal should be the lowest accuracy
loss in exchange for the highest compression ratio.
To better achieve the goal, in this chapter we propose a mixed
pruning method based on structured pruning, which combines filter-
level and layer-level pruning methods to construct a simple and
effective neural network pruning scheme. This method provides a
new idea for solving scenarios where large-scale neural network mod-
els need to be deployed under limited resources in real life. Then we
apply the mixed pruning method to the signal modulation recogni-
tion classification task. According to a given pruning ratio, unimpor-
tant filters or neural network layers are determined to be pruned.
After pruning, fine-tuning is used to compensate for the loss of
84 S. Gao et al.
accuracy in our method. Compared with the original model, the final
network is greatly reduced in terms of model size, running memory,
and computing operation. At the same time, compared with a single
method of pruning, our mixed pruning in terms of parameters and
FLOPS can achieve a higher compression ratio and a more reasonable
loss of model accuracy.
The rest of this chapter is organized as follows. In Section 2,
we introduce the related work of neural network pruning in detail.
In Section 3, we introduce our proposed mixed pruning method in
more detail. In Section 4, we conduct experiments on the filter prun-
ing methods, the layer pruning methods, and the mixed pruning
methods, respectively, and further analyze the experimental results
under different pruning methods. In Section 5, we briefly summarize
our work.
2. Related Work
summarize, there are three main ideas for signal modulation recog-
nition based on deep learning at present:
(1) Based on one-dimensional convolutional neural network. O’Shea
et al.2 used one-dimensional convolutional neural network to
directly perform feature extraction and recognition of signal
sequences.
(2) Based on recurrent neural network, Rajendran et al.25 used
RNN (Recurrent Neural Network) and LSTM (Long Short-term
Memory Network) to directly extract features and recognize
time-domain signals.
(3) Based on two-dimensional convolution neural network,1 the orig-
inal modulated input signal was mapped into a two-dimensional
input signal similar to an image. Then two-dimensional convo-
lution was used to realize feature extraction and classification of
the image, and make full use of the advantages of deep learning
in the field of image research, i.e., quantities of neural network
models and powerful performance.
mixed pruning algorithm can still perform well on this type of neural
network model.
3. Methods
can we prune the model in two directions at the same time, so that
the pruned model has a healthier state, and its pruning effect is
improved? To this end, we propose a mixed pruning method based
on convolutional neural network for signal modulation recognition.
Combining the two pruning methods, the model is pruned in both
width and depth to ensure the model has good adaptability in width
and depth. At the same time, it can further reduce the number of
parameters and FLOPs of the neural network on the basis of less
accuracy loss.
Pruned
1
0.9
...... ...... ......
The importance scorre
0.8
0.7
of the filters
0.6
0.5
0.4
0.3
Evaluate the importance of the filters. 0.2
0.1
0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
The index of the filters
6
The importance scorre
4
3
2
1
0
1 2 3 4 5 6 7 8 9 10
The index of the convoluon layers
3.2. Notations
In this section, we define some symbols and definitions that we will
use. Suppose we build a convolutional neural network model, and
its convolutional layer is defined as C = {C 1 , C 2 , . . . , C L }, where
C i represents the ith convolutional layer and L is the total number
of convolutional layers. And the set of filter numbers of a layer in
the model is defined as N = (n1 , n2 , . . . , nL ), where ni is the num-
ber of filters in C i . Specifically, the filters of C i are expressed as
1 2 1 2
FC i = {F1i , F2i , . . . , Fni i } ∈ Rni ×ni−1 ×ki ×ki , where Fji ∈ Rni−1 ×ki ×ki
represents the jth filter of the ith layer, and ki1 × ki2 represents the
size of the kernel of the ith layer. The output of the filter is defined
as ot = {ot1 , . . . , otnl } ∈ Rnl ×g×hl ×wl . And oij ∈ Rg×hi ×ωl represents
the feature map generated by Fjt , where g is the size of the input
data, and hi and wi represent the height and width of the feature
map, respectively.
In this regard, if we define the model of the pre-trained neural
network as M , the weight of the initial model after fine-tuning on the
dataset as the model after pruning as M , and the weight of pruned
model as W . For the pruned model M , we denote its convolutional
A Mixed Pruning Method for Signal Modulation Recognition Based on CNN 93
FLOPS (M (C , F ))
P RFLOPs = 1 − , (2)
FLOPS (M (C, F ))
Param (M (C , F ))
P RParam = 1 − , (3)
Param(M (C, F ))
4. Experiment
4.1. Datasets
The dataset used in the experiments is the public electromagnetic sig-
nal dataset RML2016.10a. It contains 11 modulation categories, i.e.,
8PSK, AM-DSB, AM-SSB, BPSK, CPFSK, GFSK, PAM4, QAM16,
QAM64, QPSK, WBFM. Each modulation category includes 20
kinds of signal-to-noise ratio (SNR) signals ranging from –20 dB to
18 dB in 2 dB steps. There are 1,100 signals per SNR for a total of
220,000 signals in the dataset.
In the experiments, a total of 11,000 signals with the highest
signal-to-noise ratio of 18 dB in the original dataset are used. And
the training set and test set are split as follows: 80% is used as the
training set, with a total of 8,800 signals, and the remaining 2,200
signals are used as the test set.
4.2. Baselines
In this chapter, the proposed model is compared with the following
models: (1) the original ResNet56 model, (2) the model after filter
96 S. Gao et al.
pruning based on the original model with the pruning rate of 0.2,
0.5, 0.7, (3) the model after layer pruning on the original model with
the pruning rate of 0.2, 0.5, 0.7, (4) the model with the maximum
pruning rate of each pruning method, in which the accuracy does not
drop by more than 3%.
PRFLOPs + PRParam
PRxl = ∗ 100%. (7)
2
to describe the pros and cons of the pruned model on the efficiency–
quality curve.
A Mixed Pruning Method for Signal Modulation Recognition Based on CNN 97
pruned model are 100.54 and 100.99, respectively, which are higher
than those of filter pruned and layer pruned models.
4.4.3. Analysis
Our mixed pruning has a relatively better performance on the 18 dB
signal dataset. In terms of accuracy, our mixed pruning can be
adjusted in two directions, making the model structure after mixed
pruning more diverse. In terms of compression rate, our hybrid prun-
ing comprehensively considers the filter level and the layer level,
which can achieve higher compression rate than single-level pruning.
5. Conclusion
References
Chapter 6
103
104 T. Feng et al.
1. Introduction
prove that the algorithm proposed in this chapter can extract feature
information more effectively, as well as effectively improve the accu-
racy and efficiency of time series data recognition.
2. Preliminary
This chapter combines GAF and BLS to classify time series. There-
fore, this part introduces the prior knowledge of these two parts,
mainly including feedforward neural network, sparse autoencoder,
and Gram Matrix.
(2) (2)
The input uk and output Ok of the kth node of the output layer
are shown in Eqs. (1) and (2), respectively, where k = 1, . . . , K:
M
n
(2)
(2)
(1) (1) (2)
uk (x; w) = wkj h (wji xi + bj ) + bk (1)
j=1 i=1
(2)
Ok (x; w) = σ(uk (x; w)), (2)
(1)
where wji represents the weight of connecting the ith neuron of the
(1)
input layer to the jth neuron in the hidden layer, bj represents
the bias of the jth neuron, and the superscript (1) indicates that
the parameter belongs to the first layer of the neural network. h(·),
σ(·) are the nonlinear activation functions of the hidden layer and
the output layer, respectively. h(·) usually adopts a sigmoid function,
such as a hyperbolic tangent function. σ(·) usually uses the Logistic
Sigmoid activation function in binary classification problems. The
Sigmoid function maps the input variables to (0, 1) interval, which
can be used to represent conditional probability p(Ck |x).13 In the
case of binary classification:
1
p(Ck |x) = sigmoid(uk ) = . (3)
1 + exp(−uk )
Equation (3) can be extended to the multi-class case (K > 2) through
the normalization function:
exp(uk )
p(Ck |x) = K . (4)
j exp(u j )
Equation (4) is also called the softmax activation function.
108 T. Feng et al.
E(w) = − ln L(x, y)
= − ln P (O = y|x; w)
(5)
N
=− yn ln On + (1 − yn ) ln(1 − On ).
n=1
wτ +1 = wτ − ηE(wτ ). (8)
From Eqs. (12) and (13), another equation for solving the optimal
weight β can be obtained:
3. Method
are connected to the output layer. The difference between BLS and
RVFL neural network is that the input data of the RVFL neural net-
work is not mapped into MF but directly connected to the output
layer. Therefore, BLS is more flexible in structure than the RVFL
neural network, has a stronger ability to handle high-dimensional
data than the RVFL neural network, and has a better generalization
capabilities. In BLS, the input data first undergo nonlinear mapping
of formula (22), and the MF node Zj can be obtained as follows:
Zi = φ(XWei + βei ), i = 1, . . . , n, (22)
where Z n represents all MF nodes, Z n = [Z1 , . . . , Zn ]. φ(·) represents
a nonlinear function, and Wei , βei is the randomly generated weight
and bias, respectively. Then Z n is nonlinearly mapped to Hj through
formula (23).
Hj = σ(Z n Whj + βhj ), j = 1, . . . , m, (23)
where Hj stands for EN, σ(·) stands for nonlinear function, H m =
[H1 , ..., Hm ] stands for all EN, and Whj , βhj is the randomly gen-
erated weight and bias, respectively. Chen et al.10 use linear sparse
autoencoders to fine-tune Wei to enhance the feature expression abil-
ity of MF. The output layer of BLS is connected to MF and EN at
the same time, and the calculation formula is
O = [Z n |H m ]W o , (24)
where O is output response of BLS, and W o is the weight of the
output layer. Furthermore, the objective function of BLS is
min
o
||O − Y ||22 + λ||W o ||22 . (25)
W
There are two items in formula (25), the first item is the empirical
risk item, Y is the given supervision information, the first item is to
BLS Based on GAF for Time Series Classification 115
reduce the difference between the output of BLS and the supervision
information Y , and the second item is the structural risk item which
is used to improve the generalization ability of BLS and reduce the
risk of over-fitting. λ is the coefficient.
The algorithm of a broad learning system includes two kinds,
linear sparse autoencoder and ridge regression theory.
Linear Sparse Autoencoder (LSAE)10 : To improve the sparse
expression ability of the input data, BLS uses LSAE to fine-tune the
weight Wei to improve the feature expression ability of the input
data. For the input data X, LSAE solves the following optimization
problem:
arg min
∗
||ZW ∗ − X||22 + λ||W ∗ ||1 , (26)
W
3.3. GAF–BLS
As shown in Fig. 4, based on the above-mentioned BLS and GAF
algorithms, this chapter proposes a time series classification method
based on GAF and BLS as follows: (1) Convert each sample into a
feature matrix according to the GAF encoding method; (2) Input
the feature matrix into the built broad learning system and tune the
parameters so that it can adaptively extract the relevant informa-
tion in the feature matrix, learn the information of each category,
and update the network weights; (3) Use the trained broad learning
network to classify time series data.
118 T. Feng et al.
deal with sparse data has been fully confirmed. Therefore, mapping
univariate time series to polar coordinates and generating GAF, and
then using BLS to learn it, can effectively improve the accuracy of
classification.
5. Conclusion
References
Chapter 7
125
126 H. Yao et al.
1. Introduction
Signals are widely used in the field of communication, but radio com-
munication channels are usually complex and will cause certain inter-
ference to communication. Therefore, signal denoising has always
been a hot research topic. The existing denoising methods are mainly
divided into two categories: traditional methods and learning-based
methods. The traditional methods in the past mainly included non-
learning methods such as filtering, wavelet transform, and empirical
mode decomposition. Linear denoising methods that rely on filter-
ing, such as Wiener filters,1 work well in the presence of stationary
noise, but they show limitations when the signal and noise share the
same frequency spectrum.2 In wavelet transform, the signal is divided
into different proportions according to the frequency range, and the
noise is filtered through a reasonable threshold,3–6 which lacks the
ability of an adaptive decomposition layer.7 Finally, empirical mode
decomposition8 is a data-driven method, suitable for stationary and
non-stationary signals, but it is difficult to decompose the signal into
unique frequency components, resulting in mode mixing.9
Learning-based denoising methods are particularly popular in
the image field, mainly including encoder and decoder networks,10
deep denoising networks based on residual learning,11 and multi-level
wavelet neural networks12 methods. Similarly, in audio and speech
processing, deep neural networks13,17 have also made some progress.
In addition, there are also some for seismic noise signals,18,19 electro-
cardiograms and motion signals,20,22 gravitational waves. Signal23 is
based on the study of learning methods, but the research on radio
modulation signals is quite limited.24
Based on the study of deep learning methods for image denoising
and speech denoising, we propose a deep denoising network structure
based on a generative adversarial network for adaptive denoising. The
main contributions of our work are as follows:
2. Preliminary
2.1. GAN
GAN contains two network models, one is a generative model and
the other is a discriminative model. The two are trained alternately
to fight against learning. The generative model is learning how to
map the distribution Z from a certain sample z to the distribution
X from another sample x, where sample x is the training sample and
sample z is the generated pseudo sample. The neural network that
performs the mapping in the generative adversarial network struc-
ture is called the generator (G), and its main task is to learn an
effective mapping that can imitate the real data distribution to gen-
erate new samples related to the samples in the training set. The
important thing is that the function of the generator is not realized
by memorizing the input and output pairs, but by learning the data
distribution characteristics and then mapping to the sample z.
The generator learns to learn the mapping through adversarial
training, and there is another neural network called the discrimina-
tor (D ). The discriminator is usually a classifier, and its input has two
cases, one is a real sample, from the dataset that the generator is imi-
tating, and the other is a pseudo sample generated by the generator.
128 H. Yao et al.
2.2. CGAN
CGAN is an extension of GAN. The main difference between the
two is that both the generator and the discriminator add additional
information y as a condition, and y can be any information, such as
category information and data information. The reason for adding
conditional information is that GAN learns directly on the data
distribution, and the optimization direction is difficult to control.
CGAN inputs the prior noise z and condition information y into the
generator and discriminator for training. In the generator, a priori
noise z and conditional information y are connected to form joint
implicit characterization information, which guides the signal gener-
ated by the generator.
Generating data and generating labels need to pass the judg-
ment of D at the same time, so the loss function of CGAN can
be expressed as
2.3. DCGAN
DCGAN mainly optimizes the GAN network structure and uses con-
volution in GAN. Before the specific introduction, first, explain that
the function of convolution is to extract some specific features in the
data, different numbers of convolution kernels, the extracted features
of which are different; the extracted features are the same, and the
different convolution kernels have the same effect. Not the same.
The main improvement of DCGAN is that the spatial pooling layer
is replaced with a convolutional layer with specified step size, the
generator and discriminator both use batch normalization (BN), the
fully connected layer is removed, and the generator uses ReLU as
the activation function (Except the output layer), all layers of the
discriminator use Leaky ReLU as the activation function.
The reasons for the improvement are explained as follows: First
of all, BN has data with a unified specification, which makes it easier
for the model to learn the laws between the data, thereby helping the
network to converge. Furthermore, the use of transposed convolution
allows the neural network to learn to upsample in the best way, and
the use of convolution makes downsampling no longer a fixed dis-
card of pixel values in certain locations but lets the network learn to
downsample by itself this way. At the same time, removing the fully
connected layer is global mean pooling, although it helps the stabil-
ity of the model it reduces the convergence speed. Finally, the Leaky
ReLu activation function has a fast calculation speed and solves the
problem of GAN’s easy gradient disappearance.
3. Methods
3.1. SDGAN
The specific experimental flow of the radio modulation signal deep
denoising method based on supervised learning is shown in Fig. 1.
The radio modulation signal deep denoising model based on
supervised learning is a generative countermeasure network used for
signal denoising, which combines the characteristics of the above-
mentioned different GANs for fusion. The deep denoising model we
130 H. Yao et al.
decode
X
z
encode
n 0
Noisy signal
D(z)
D
Random noise z G(z,xn)
1
G
ConvTran ConvTran
Conv1d -spose1d -spose1d
PReLU Conv1d ConvTran PReLU PReLU
PReLU Conv1d
N ConvTran-spose1d
PReLU Conv1d ConvTran -spose1d PReLU
Conv1d -spose1d PReLU
PReLU PReLU
PReLU ConvTran
Conv1d -spose1d
PReLU L/2 L/2 Tanh
L/4 L/8 L/16 L/32 L/64 L/64 L/32 L/16 L/8 L/4
* * * * * * * * * * * *
K1 K2 K3 K4 K5 K6 K6 K5 K4 K3 K2 K1
Input Output
Conv1d Conv1d
VBN Dropout
Leaky- VBN Conv1d
ReLU Leaky- VBN Conv1d Conv1d
ReLU Leaky- VBN
ReLU Leaky- VBN Conv1d
Leaky-
Conv1d ReLU Leaky-
ReLU
VBN ReLU Linear
Sigmoid
LeakyReLU
L/2 L/4 L/8 L/16 L/32 L/64 L/64 L/128
Input * * * * * * * * Output
K1 K2 K3 K4 K5 K6 K7 K7
Fig. 3: The structure diagram of the discriminator in the depth denoising model.
1
min VLSGAN (D) = Ex,xn∼Pdata (x,xn (D (x, xn ) − 1)2
D 2
1
+ Ez∼Pz (z),xn ∼pdata (xn ) D (G (zx xn ) xs xn )2 ,
2
(4)
1
min VLSGAN (G) = Ez∼Pz (z),xn ∼Pdata (xn ) (D (G (z, xn ), xn ) − 1)2 .
G 2
(5)
1
2
min VLSGAN (G) = Ez∼Pz (z),xn ∼pdata (xn ) (D (G (z, xn ), xn ) − 1)
G 2
+ λ1 G (z, xn ) − x1 + λ2 ∇ωpp . (6)
134 H. Yao et al.
4. Experiment
(1) Select the BPSK modulation type data as the interference source
from the pure signal, and the QPSK modulation type data as the
transmission signal, each with 1,000 samples.
(2) Multiply the BPSK modulation type data by the coefficients 0.2
and 0.3, and then superimpose them on the QPSK modulation
type data to obtain 2,000 samples.
(3) In the actual transmission process, the two types of modulation
data may not be transmitted at the same time, so we consider
superimposing according to different positions. Since the data
length in the public dataset is 128, we finally superimpose the
BPSK data from 0, 16, 32, and 64 bits on the QPSK modulation
type data to obtain 8,000 samples.
136 H. Yao et al.
Fig. 5: Constellation diagram of the comparison of the received signal, the inter-
ference signal, and the transmitted signal.
g_cond_loss:0.4748
g_cond_loss:0.5318
g_cond_loss:0.4849
(a) (b)
g_cond_loss:0.7206
g_cond_loss:0.8037
g_cond_loss:0.9638
(1) The filtered signal basically cannot reflect the denoising effect,
but the denoising signal reflects the denoising effect; and
(2) The denoising signal of the overall constellation diagram is more
similar to the target signal than the filtered signal.
It can be seen from Fig. 10 that the low-frequency domain denois-
ing signal exhibits better denoising performance than the low-pass
filter, and the peak value is reduced by about half. The denoising
signal is closer to the target signal in the frequency domain, while
the low-pass filter basically cannot reflect the denoising effect in the
frequency domain.
The pure index line chart of different signals under different inter-
ferences is shown in Fig. 11. The purity index of the denoising signal
Denoising of Radio Modulation Signal Based on Deep Learning 141
(a) (b)
5. Conclusion
References
Chapter 8
145
146 J. Zhou et al.
1. Introduction
Time series are popular in the world, and I/Q radio signals as typical
and essential time series in communication have attracted widespread
attention recently. Besides this, the successful application of deep
learning on Euclidean data has led to its rapid development on
non-Euclidean data, i.e., graphs. The application of mature com-
plex network theory and the powerful feature extraction capabilities
of graph neural networks (GNNs) led to the idea of the combination
of I/Q radio signals and GNNs.
In the field of communication, the purpose of modulation recog-
nition is to solve the problem of determining what the signal is,
after which other signal processing can proceed smoothly. The tradi-
tional machine learning modulation recognition is usually completed
in two steps: feature extraction and classification. Soliman et al.1
and Subasias et al.2 use Fourier transform3 and wavelet transform4
to preprocess the signal, and then extract the ordered cyclic spec-
trum, high-order cumulant, cyclostationary characteristics, power
spectrum,5 and other characteristics of the signal. Based on these
features, traditional classification methods in machine learning, such
as decision trees,6 random forests,7 and support vector machines
(SVMs),8 can be used to classify time series.
As a branch of machine learning, deep learning is a combina-
tion of powerful automatic feature extractors and efficient classi-
fiers. It replaces the process of selecting handcrafted features and
effectively uses the existing features to complete classification tasks.
Modern machine learning methods such as deep learning have begun
to increasingly develop rapidly in the field of radio communications.
Generally, there are two main ways of achieving I/Q radio signal
classification in deep learning. The first one is taking signals directly
as the input to train an improved recurrent neural network (RNN)
classification model. For instance, Hochreiter and Schmidhuber9 pro-
posed a long short-term memory network (LSTM) framework based
on RNN, which is designed to retain the time-related features and
deepen the RNN model to capture high-level features. Although
LSTM can obtain satisfactory accuracy on time series classification
tasks and the model is very small, it takes a really long time to
train. With the rapid development of convolution neural networks
(CNNs), mapping signals into images and then utilizing 1DCNN10
A GNN Modulation Recognition Framework Based on LLPVG 147
was designed, GNN has shown its powerful capability in graph tasks.
GNN can embed nodes according to their neighbor information, that
is, aggregate and optimize node features. In the past few years, in
order to deal with complex graph data, the generalization and def-
inition of important operators have developed at a quick pace. In
addition to graph convolutional networks (GCNs), many new GNN
models have emerged, such as GraphSAGE,15 Diffpool,16 GAT,28
etc. And all these GNN models have many applications across fields,
such as computer vision,29–31 traffic,32–34 chemistry,35–38 etc. How-
ever, there are almost no applications that combine modulation sig-
nals with GNN to realize the automatic classification algorithm.
In this chapter, to realize modulation recognition utilizing the
powerful feature extraction capability of GNNs, and simultaneously
match the performance to CNNs, first we introduce a visibility
slider to the LPVG models to be suitable for I/Q radio signals,
expand underlying features based on communication knowledge, and
design an I/Q radio signal classification framework with GNN. To
validate the effectiveness of the proposed framework, first we test
the improved VG model with the original VG model in a similar
traditional machine learning framework and explore the impact of
different lengths of the visibility slider on classification accuracy. Fur-
thermore, the classification performance and model size of our pro-
posed GNN deep learning framework are tested compared to LSTM
and some CNN models.
The main contributions of this chapter are as follows:
2. Related Work
(b)
LPVG
Fig. 1: Schematic diagrams of LPVG (M = 1), and the dashed line is the
penetrated line.
3. Methods
A= I 2 + Q2 , (7)
Q
W = arctan , (8)
I
A GNN Modulation Recognition Framework Based on LLPVG 153
GW
Channel W
Channel Q
Time-domain Channel A Graph
…… GA
Channel I
Channel Q ……
GQ
Feature Expansion Construction
GI
Channel I
layers of GNNs
( )
G
……
( )
G
FC Classifier Concatenate Graph-level
……
( )
G
Pooling
Modulation type G
( )
(b)
LPVG
where xji represents the jth sampling value of the i-channel signal,
then the graph can be described as G(V, E, X).
6 Append node n to V
return G(V, E)
hG = GN N s(G). (10)
156 J. Zhou et al.
a(k)
v = Aggregate
(k)
h(k−1)
u : u ∈ N (v) , (11)
(k−1)
where N (v) is the neighborhoods of the node v, hu is the node u’s
feature vector of the previous update layer. As for Aggregate(·), it
can be summation, averaging, weighted summation, etc. For example,
the Aggregate(·) in GraphSAGE can be formulated as
a(k)
v = MAX ReLU W · h(k−1)
u , ∀u ∈ N (v) , (12)
h(k)
v = Update
(k)
h(k−1)
v , a(k)
v , (13)
(k)
where hv represents the kth layer feature vector of node v. As
for Update(·), it can be summation, averaging, or linear mapping
after concatenation. For example, one kind of Update in GraphSAGE
could be formulated as
(k) (k−1) (k)
hv = W · hv , av . (14)
hG = Readout h(K)
v |v∈G . (15)
4. Experiment
4.1. Datasets
The dataset used in the experiments is an open synthetic radio mod-
ulation dataset RML2016.10a.11 It is a high-quality radio signal sim-
ulation dataset generated by GNU Radio, which was first released
at the 6th GNU Annual Radio Conference. This signal dataset con-
tains 11 modulation types (8 digital and 3 analog). Digital mod-
ulations include BPSK, QPSK, 8PSK, 16QAM, 64QAM, BFSK,
CPFSK, and PAM4. Analog modulations include WB-FM, AM-SSB,
and AM-DSB. Each modulation type contains 20 different signal-to-
noise ratios (SNRs), and each SNR contains 1,000 samples. Each
sample consists of an in-phase signal I and a quadrature signal Q.
And each signal contains 128 sampling points. So the size of the com-
plete dataset is 220,000 × 2 × 128. In the experiment, we divide the
training set and the test set in a ratio of 4:1. Considering the balance
of the SNR signal samples, we randomly select 80% samples of each
SNR in each modulation type as the training set, and the rest as the
test set.
4.2. Baselines
In this subsection, the proposed model is compared with the follow-
ing models: (1) gated recurrent unit network (GRU),43 (2) LSTM
recurrent neural network,9 (3) DCNN in Ref. 10, (4) 2DCNN in
Ref. 44, (5) limited penetrable visibility graph (LPVG-G2V) (M = 1)
with Graph2vec, and (6) limited penetrable visibility graph (LPVG-
GraphSAGE) (M = 1) with GraphSAGE.
the baselines and our proposed model have the same general hyper-
parameters, such as batch size, training epoch, and learning rate.
Batch size, epoch, and learning rate are set to 128, 100, and 0.001,
respectively. For RNN models, layers of both GRU and LSTM are
set to 2 and hidden units are set to 128. For CNN models, 1DCNN is
composed of six residual stack units and two fully connected layers,
and 2DCNN is composed of two convolution layers and two fully con-
nected layers. For GNN model, we use a three layer GraphSAGE and
two fully connected layers for comparison, and the hidden feature is
set to 64. For VG models, limited penetrable distance M is set to 1
in both LPVG and LLPVG.
LLPVG-G2V
Baseline
SNR (dB) w = 4 (%) w = 8 (%) w = 16 (%) LPVG-G2V (%)
the better the terminal deployment. How to balance model size and
classification performance is also worthy of further exploration.
5. Conclusion
References
Network Applications
This page intentionally left blank
c 2023 World Scientific Publishing Company
https://doi.org/10.1142/9789811266911 0009
Chapter 9
167
168 S. Peng et al.
1. Introduction
2. Related Work
In this part, we briefly review the background and the related works
on AS Classification and AS relationship.
2.1. AS classification
Researchers have developed techniques decomposing the AS topol-
ogy into different levels or tiers based on connectivity properties of
BGP-derived AS graphs. Govindan et al.21 proposed to classify ASes
into four levels based on their AS degree. Ge et al.22 classified ASes
into seven tiers based on inferred customer-to-provider relationships.
Their classification exploited the idea that provider ASes should be
in higher tiers than their customers. Subramanian et al.23 classified
ASes into five tiers based on inferred customer-to-provider as well as
peer-to-peer relationships.
Dimitropoulos et al.24 suggested that an AS n“node” can rep-
resent a wide variety of organizations, e.g., large ISPs, small ISPs,
customer ASes, Universities, internet exchange points (IXPs), and
network information centers (NICs). They introduced a radically new
approach AdaBoost based on machine learning techniques to map all
the ASes in the internet into a natural AS taxonomy, and successfully
classified 95.3% of ASes with expected accuracy of 78.1%.
Dhamdhere et al.25 attempted to measure and understand the
evolution of the internet ecosystem during the last 12 years. They
proposed to use the machine learning method decision tree to clas-
sify ASes into a number of types depending on their function and
business type using observable topological properties of those ASes.
The AS types they considered were large transit providers, small
transit providers, content/access/hosting providers, and enterprise
networks. They were able to classify ASes into these AS types with
an accuracy of 75–80%.
CAIDA used a ground-truth dataset from PeeringDB and trained
a decision tree classifier using a number of features. They introduced
Study of AS Business Types Based on GNNs 171
2.2. AS relationship
The Internet topology at the AS level is typically modeled using a
simple graph where each node is an AS and each link represents a
business relationship between two ASes. These relationships reflect
who pays whom when traffic is exchanged between the ASes. They
are the keys to the normal operation of the internet ecosystem.
Traditionally, these relationships are categorized into (1) customer–
provider (C2P), (2) peer–peer (P2P), and (3) sibling relationships
(S2S).26 However, other forms of relationships exist as well. In a
C2P relationship, the customer is billed for using the provider to
reach the rest of the internet. The other two types of relationships
are in general settlement-free. In other words, no money is exchanged
between the two parties involved in a P2P or S2S relationship.
Understanding of AS relationship is vital to the technical research
and economic exchanges of the inter-domain structure of the internet.
The relationship between ASes is regarded as private information by
various organizations, institutions, and operators and not published
on the open platform. By considering the internet as a complex net-
work, various AS relationship inference algorithms have been pro-
posed to predict the AS-level structural relationship of the internet,
which is of particular significance for internet security.
Gao12 first proposed to enhance the representation of the
AS graph by defining multiple business relationships and put
forward an assumption that valid BGP paths are valley-free,27
(i.e., [C2P/S2S]n [P 2P ](0,1) [P 2C/S2S]m , n ≥ 0, m ≥ 0, which means
a path consists of zero or more C2P or S2S links, followed by zero or
one P2P links, followed by zero or more P2C or S2S links, the shape
is composed of an uphill path and a downhill path or one of the two).
172 S. Peng et al.
3. Datasets
Source Description
Class Description
where fAS1 and fAS2 are the feature values of AS1 and AS2, respec-
tively, and Δ is the difference. This method is used by default in the
feature analysis that follows.
Study of AS Business Types Based on GNNs 177
4.2.3. Assign VP
Vantage points (VPs, which can be intuitively understood as the
first nodes of AS paths) are typically distributed in many different
geographic locations, especially at the upper tiers of the internet hier-
archy. Meanwhile, the number of VPs is also very limited, compared
with the scale of complete internet structure. We analyze the quan-
tity of VPs, which can detect the same AS link. We visualize the
discrimination among different types of AS relationships in Fig. 3(b)
about this feature. From the figure, we can observe that more than
97% P2P links can be detected by less than 100 VPs (referring to
the previous work18 ). More than half P2C are seen by more than 110
VPs. Hence, for the single feature assign VP, the two types of AS
relationships (i.e., P2C and P2P) tend to be similar. This result once
again confirms the excellence of the features selected by the previous
algorithm.
4.2.5. Distance to VP
Different from distance to clique, we also pay attention to the dis-
tance from each node (AS) to the first node (VP) in each BGP path.
Study of AS Business Types Based on GNNs 179
0.4
P2P 0.2
0
0.0 0.2 0.4 0.6 0.8 1.0 P2C P2P
Fig. 1: Analysis of the Distance to Clique, Assign VP, Common Neighbor Ratio.
(a) CDF of absolute distance between ASes to clique for different relationships.
(b) The distribution of the number of VPs with a threshold 110 that can be
detected on each relationship’s links. (c) The distribution of different common
neighbor rates for different relationships.
This feature indicates that we expect to count the distance set from
the target AS to VP in all BGP paths to reflect the position of a
link in many paths. Because the same node can appear in several
paths, the distance to VP value of the node is expressed as a set of
integers. In the face of these integer sets, the mean value of the set
represents the universality of the node position, and the maximum
and minimum values represent the specificity of the node position. As
shown in Fig. 2, we can observe that using the mean value of the set
is more discriminative among the two types. In the following feature
importance analysis (see Fig. 2), it also proves that the importance
of the mean value is higher than the maximum and minimum values.
4.2.7. AS type
An organization has related business types due to its functions. AS
type has been considered to be a very important feature, since it has
a direct impact on the AS relationship. We get the AS classification
dataset from CAIDA.38 The ground-truth data are extracted from
the self-reported business-type of each AS list in PeeringDB.36 After
that, AS type can be summarized into three main categories: (1)
Transit/Access. This type of ASes are inferred to be either a tran-
sit and/or access provider. (2) Content. This type of ASes provide
content hosting and distribution systems. (3) Enterprise. This type
of ASes include various organizations, universities, and companies.
Furthermore, we also add the fourth type: (4) Unknown. This group
contains those ASes that don’t have a clear type, and the neutral
Fig. 3: (a) Three components of the Internet based on K-Shell. (b) The distri-
bution of the two AS types between P2P and P2C.
Study of AS Business Types Based on GNNs 181
ASes that do not belong to the first three categories. We take the
type of the source node of each edge as the feature of the edge, and
the results of the two types of edges as shown in Fig. 3(b).
5. Methodology
−1/2 −1/2
 = D̃ ÃD̃ , (2)
where à = A + I, D̃ii = j Ãij , with A being the adjacency matrix
and I being the identity matrix. Â can be regarded as a graph dis-
placement operator.
GNN carries out convolution operation in the spectral domain,
and each operation can aggregate an additional layer of features.
Spectral convolution function is formulated as
(l)
H(l+1) = σ ÂH̃ W(l) , (3)
6. Evaluation
TP + TN
Accuracy = , (6)
TP + TN + FP + FN
TP
Precision = , (7)
TP + FP
TP
Recall = , (8)
TP + FN
Precision × Recall
F1 = 2 × , (9)
Precision + Recall
where TP, TN, FP, and FN refer to true positive, true negative, false
positive, and false negative, respectively.
Our model is implemented using PyTorch.42 The parameters are
updated by Adam algorithm.43 Each experiment runs 1,000 epochs
in total. In the AS-type classification, all results are obtained by
training the ASGNN using Adam with weight decay 5 × 10−4 and an
initial learning rate of 0.1. We use two blocks, where each block has
two standard GNN layers (i.e., ASGNN setting can be represented
as 2 × 2), to learn the graph structure.
In summary, we select the best parameter configuration based on
performance on the validation set and evaluate the configuration on
the testing set.
11 Features 19 Features
Models Acc. Pre. Rec. F1 Acc. Pre. Rec. F1
Note: The bold entries represent the best value of the current metric among
the many methods.
188 S. Peng et al.
References
Chapter 10
With the rapid growth of the internet in the last decade, online
social media has become one of the major channels for information
spreading. Individuals can exchange their opinions on various news on
platforms such as Chinese Toutiao. However, such free-flowing infor-
mation could also provide grounds for violent behaviors. Most existing
studies ignore the interaction among comments and the corresponding
replies. In this chapter, we propose an end-to-end model PathMerge.
This model is applied for controversy detection. PathMerge method takes
both dynamic and static features into consideration, and then integrates
the information from the graph structure of the social network dynamic
information. Experiments on the real-world dataset demonstrate that
our model outperforms existing methods. Analysis of the results prove
our model has significant generalization ability.
1. Introduction
a
https://www.toutiao.com.
193
194 Z. Li and J. Zhang
even war of words. This pollutes the cyber environment. The cause
of these controversies can be political debates1,2 or other topics.3
The contents of such comments represent certain public sentiment.
It provides opportunities to solve the major problems in network
governance, such as news topic selection, influence assessment, and
polarized views alleviation.4
Therefore, controversy detection in social media has drawn atten-
tion.2,5 Existing methods for controversy detection in social media
focus mainly on macro-topic controversy detection. For example,
some of the specific topics in Twitter can raise large-scale contro-
versial debates among different users.6,7 Moreover, for the data col-
lected from news portals, researchers pay much attention to whether
certain news items are likely to raise conflicts among users.8 The
existing methods mainly detect and analyze conflicts from a macro
perspective. We concentrate on detecting the conflicts between the
comments under the news with a certain topic.
In this chapter, we detect the micro controversy among comments
from social media. According to recent research,9 controversial com-
ments always have debatable content and express an idea or an
opinion which generates argument in the response. This represents
an opposing opinion in disagreement with the current comment.
Figure 1 gives an example of a controversy over a certain piece of
news. The news N belongs to topic T and it is followed by multiple
Fig. 1: A piece of news under the topic of Huawei, and the comments under the
news. Each comment is labeled as positive, negative, or neutral, depending on its
attitude to the news.
Social Media Opinions Analysis 195
Fig. 2: The architecture of graph construction. T refers to the topic of the news.
N represents one news under the topic. C and U refer to the comment and the
user. The red line from comment node to comment node means controversy while
the blue line means non-controversy.
the feature vectors of nodes among the specific path in the com-
ment tree. The specific path indicates the route from the root node
to the leaf node. We obtain the final feature vectors of edges through
three different formulations combining several node feature vectors.
This process aims to fuse the feature information of the nodes in
the specific path. Experimental results demonstrate that our model
outperforms baselines. The main contributions of this chapter are as
follows:
1. We build a Chinese dataset for controversy detection consisting
of 511 news items, 71.579 users, and 103,787 comments under the
same topic collected from Chinese Toutiao, each of the comments
is labeled as controversial or non-controversial.
2. We propose a random walk-based model, PathMerge, for
comment-level detection. The model can integrate the informa-
tion from the nodes of the path from the root node to the current
node in the aimed comment tree. Especially, PathMerge can fur-
ther fuse the dynamic features.
3. Extensive experiments on Toutiao dataset demonstrate the tem-
poral and structural information can effectively improve the
embedding vectors and get a better result in AUC and AP metrics.
Also, our model performs generalizability under different ratios of
training samples.
The rest of this chapter is organized as follows. In Section 2,
we review previous studies on controversy detection and several
graph embedding methods. In Section 4, we describe the PathMerge
method in detail and explain the construction of such heterogeneous
graphs. In Section 5, we evaluate the AUC and AP metrics using
PathMerge against other baselines. In Section 6, we conclude this
chapter and highlight some future research directions.
2. Related Work
3. Toutiao Dataset
Fig. 4: A controversial comment tree under one news of Huawei, with one of the
comments having been deleted.
Social Media Opinions Analysis 201
of the first-level comments has been deleted and its content becomes
unknown. In this situation, we can only conclude its label through its
child-comments. For Fig. 4 case, most of the child-comments under
the comment are against Huawei and do not follow the comments’
view. In that case, we think this comment presents supportive atti-
tude to a current Huawei topic.
3.2. Label
We use Huawei-related news in Toutiao social media and bring in
a third-party organization to mark the data of the comments. The
principles are as follows:
(1) The marked data can only have five values at most, which are
marked by five different people.
(2) Five individuals label the comment in order. If two people give
the same opinion on a comment consecutively, the labeling of
this comment is finished. It is considered that these kinds of
comments are clear, and do not require all five people to judge
on then.
(3) If the comments given by two consecutive people are labeled with
different marks, the labeling process continues. The comment
tendency obtained by the above process is finally calculated by
the following formula to get its comment tendency:
n
1
result = si , (1)
n
i=0
⎧
⎪
⎪ −1 if r < −0.3
⎨
L= 0 if r < 0.3 and r > −0.3 , (2)
⎪
⎪
⎩1 if r > 0.3
s represent a set of all the scores given for the comment. And si
means the i-th item of this set. n is the length of this score-set.
L means the last result of the classifier. As for L, −1 represents
a negative comment, indicating a negative opinion on Huawei
under the current topic, 0 represents a neutral comment, indi-
cating an unclear opinion on Huawei under the current topic, and
1 represents a positive comment, indicating a supportive attitude
toward Huawei under the current topic.
202 Z. Li and J. Zhang
3.3. Pre-processing
Based on the original dataset, we first extract all news id, comment
id, and user ids. Then we encode every node to rebuild a new net-
work. We also calculate the time difference between the posted time
of child-comment and the parent-comment nodes. For the top com-
ment node in the comment tree, we use the time difference between
the created time of the news and the top comment posted time.
Besides, there are also some comments which contain no information
about the text, the posted time, and the posted user. For such com-
ments, we have to find the child comment and the parent comment
and use this time difference to infer these comments’ posted time.
We perform extensive experiments on three subsets of the real-world
dataset. Table 1 shows the statistics of the Toutiao dataset. The
details are as follows:
Toutiao dataset: We built a Chinese dataset for controversy detec-
tion from the Toutiao website for this work. In total, this dataset
contains 511 news items, 103,787 comments, and 71,579 users. After
data preprocessing, there are 55,994 comments, 494 news items, and
28,908 users left from the total datasets.
Given such a large amount of data, we would like to focus on the
controversies under hot news in this chapter. Thus, we sample three
subsets of the data for experiments. Specifically, we first find the top
two active users who posted the most comments under different news
items and denote them as u1 and u2. The news items commented by
u1 and the corresponding comments consist of one subset, namely
Toutiao#1. Another subset, namely Toutiao#2, consists of the news
Item Amount
News 511
Comments 103,787
Users 71,579
Positive comments 54,994
Negative comments 25,557
Neutral comments 23,236
Controversial comment pairs 22,184
Non-controversial comment pairs 16,339
Social Media Opinions Analysis 203
Num. news 11 11 1
Num. user 5,940 3,496 1,573
Num. comment 10,580 5,570 2,466
Num. controversy-replies 4,570 2,610 1,166
Num. non-controversy-replies 2,976 1,309 584
Num. replies 9,504 4,995 2,294
Num. edges 19,685 10,418 4,644
(a) (b)
(c)
Fig. 5: The three datasets sampled from the total Toutiao datasets. (a) The het-
erogeneous network of Toutiao#1. (b) The heterogeneous network of Toutiao#3.
(c) The heterogeneous network of Toutiao#2.
4. Method
5. Experiment
5.1. Baseline
To validate the effectiveness of our methods, we implement sev-
eral representative methods including node2vec, metapath2vec, and
CTDNE. The basic settings are described as follows:
• Node2vec13 keeps the neighborhood of vertices to learn the vertex
representation among networks, and it achieves a balance between
homophily and structural equivalence. The ranges of its hyperpa-
rameters in this chapter are set to p, q ∈ {0.5, 1, 2}.
• CTDNE35 is a general framework for incorporating temporal
information into network embedding methods, and is based on
random walk, stipulating that the timestamp of the next edge in
the walk must be larger than that of the current edge.
Social Media Opinions Analysis 209
Toutiao dataset
Method Avg. AUC Avg. AP Avg. AUC Avg. AP Avg. AUC Avg. AP
6. Conclusion
References
Chapter 11
215
216 J. Jin et al.
1. Introduction
a
https://www.investor.gov/protect-your-investments/fraud/types-fraud/ponzi-
scheme.
Ethereum’s Ponzi Scheme Detection Work Based on Graph Ideas 219
b
https://cn.etherscan.com/address/0x311f71389e3de68f7b2097ad02c6ad7b2dde
4c71#code.
222 J. Jin et al.
compile decompile
Fig. 1: Three types of smart contract codes and the transformation between
them.
contracts are diverse, and some contracts are similar to Ponzi ones.
In summary, there is a certain degree of distinction between Ponzi
and non-Ponzi contracts in opcode characteristics.
These results suggest that manual features can explain the behav-
ior of smart contracts from different perspectives. Therefore, we
extracted these manual transaction characteristics as initial features
to facilitate Ponzi scheme detection.
226 J. Jin et al.
4. Method
c
https://etherscan.io/.
d
http://xblock.pro/ethereum/.
e
goo.gl/CvdxBp.
Ethereum’s Ponzi Scheme Detection Work Based on Graph Ideas 229
4.2. Model
In this section, we first introduce the graph convolutional network
(GCN) model and then illustrate how to use it for Ponzi scheme
detection.
GCN is a semi-supervised convolutional neural network, which
can work directly on graphs and use their structural information to
help feature learning. The graph convolutional layer is defined as
follows:
(i) W (i) ,
H (i+1) = σ AH (1)
5. Experiments
5.1. Data
Some Ponzi contracts only have creation records but no transaction
records, which do not meet our requirements. In order to build a con-
nected transaction network that contains as many Ponzi contracts as
possible, we screened out contracts for no less than five transaction
records. There are 191 Ponzi contracts that meet the requirements.
While there are a lot of fraud activities active on Ethereum, only a
fraction of them are Ponzi contracts, illustrating an extreme imbal-
ance between the number of Ponzi contracts and normal contracts.
Ethereum’s Ponzi Scheme Detection Work Based on Graph Ideas 231
Note: |V | and |E| are the numbers of nodes and edges, respectively, K
is the average degree.
f
OpenNE: github.com/thunlp/openne.
g
GCN: https://github.com/tkipf/gcn.
Ethereum’s Ponzi Scheme Detection Work Based on Graph Ideas 233
Data ba1 Precision 72.08 70.21 69.89 67.09 72.43 75.32 73.79 69.51
Recall 75.37 62.72 64.57 76.34 70.48 75.51 78.38 65.70
F1 73.11 65.66 66.65 70.60 70.62 74.03 75.28 67.05
Data ba2 Precision 75.70 71.85 70.89 75.14 70.85 79.69 76.25 68.18
Recall 78.26 63.11 63.87 79.34 76.04 76.83 77.52 68.17
F1 76.01 66.00 66.24 76.07 72.76 76.48 75.93 66.89
Data ba3 Precision 72.19 70.03 71.80 70.19 70.58 75.17 72.90 66.97
Recall 73.19 63.06 64.84 72.28 69.71 74.63 75.90 65.66
F1 71.97 65.80 67.58 70.45 68.97 74.16 73.65 65.75
Data unba Precision 66.56 68.36 70.78 65.37 47.91 75.85 64.91 41.29
Recall 41.31 32.48 39.13 28.04 48.03 38.08 35.23 10.86
F1 50.70 42.99 49.39 38.75 47.63 50.29 45.03 16.28
Note: The bold entries represent the best results among the classifiers.
Classifier
Data ba1 Precision 87.54 75.51 82 47 82.59 84.44 90.73 89.17 77.66
Recall 88.33 84.54 80 39 85.24 86.82 86.76 85.53 85.12
F1 87.57 79.14 80 65 83.48 84.92 88.48 87.14 80.32
Data ba2 Precision 88.93 80.18 83 33 83.31 80.42 90.44 91.51 80.31
Recall 87.13 86.47 84 42 83.61 83.73 86.73 87.90 86.40
F1 87.49 82.70 83 58 82.84 81.27 88.34 89.26 82.55
Data ba3 Precision 88.65 75.97 86 18 86.92 80.12 90.08 89.24 79.35
Recall 86.17 83.09 84 71 85.01 85.30 83.88 86.75 84.17
F1 87.10 78.57 85 15 85.47 82.28 86.69 87.66 80.98
Data unba Precision 90.34 84.09 89 48 84.94 72.95 94.22 89.83 81.28
Recall 75.35 69.78 74 93 73.13 74.76 73.70 74.36 66.22
F1 81.47 75.94 81.01 78.08 73.12 82.23 80.46 72.76
Note: The bold entries represent the best results among the classifiers.
Note: The bold entries represent the best results among the classifiers.
Obs. 4. Mixing multiple types of manual features does not mean better
performance: In some cases, using both features can result in nega-
tive benefits compared to using opcode features alone. The possible
reason is the similarity of the statistical transaction characteristics
of the Ponzi and non-Ponzi accounts.
Fig. 7: The performance of different embedding methods on the top three best
classifiers.
learned network features (N) into the three ensemble learning classi-
fiers (XGB, RFC, GBC). Figure 7 shows the performance comparison
of network features extracted by three embedding methods. Since the
structural features mentioned earlier are obtained by the embedding
method, we use (E) to represent the structural features that follow.
91.62 T O TO
90.38 89.90 90.45 E TE TOE
89.37
90 88.48 87.86 87.28 88.34 88.11
87.53 87.74 87.69 87.11
86.69
84.38
82.62 82.30
81.22 81.07
80
76.48
74.03 74.16
70
F1-score(%)
60
50.70
50
40
Dataset Data_ba1 Data_ba2 Data_ba3 Data_unba
Classifier RFC XGB
Fig. 9: The performance of all the features of the imbalanced dataset on the
eXtreme Gradient Boosting.
Table 5: Performance of the GCN model and the RFC using T and TO features.
Data ba1 Data ba2 Data ba3
Note: The bold terms are the best performances compare to that of other methods.
6. Conclusion
References
Chapter 12
The drug development cycle is long and expensive. Using computer algo-
rithms to screen lead compounds can effectively improve its efficiency.
Moreover, the quantitative structure–activity relationship modeling
methods can be used to predict the biological activity of molecules. This
has become a major research focus in the field of drug development.
However, due to the limitation of methods and computing power, the
existing machine learning-based modeling techniques cannot meet the
requirement of the big data-driven drug development. The purpose of
this chapter is to construct a more reliable prediction model for molecular
biological activities. The instability and unreliability caused by artificial
computing features are avoided by learning molecular graph features
directly. During the process of modeling, we address problems such as
adaptive learning in feature fusion and sample balance, thus improving
the overall performance of the model.
243
244 Y. Zhou et al.
1. Introduction
The drug development cycle is long and the cost is high. The sud-
den decrease of drugs in clinical research can lead to huge waste of
resources. At present, 9 of every 10 candidate drugs fail in phase-I
clinical trials or regulatory approval.1 To improve the low efficiency
of the drug development process, we aim to shorten the cycle of
new drug research and improve the success rate of the drug devel-
opment. Pharmaceutical chemists give the concept of quantitative
structure–activity relationships (QSAR). QSAR aims to quantita-
tively determine the biological activity of a series of derivatives of
known lead compounds, analyze the relationship between the main
physical and chemical parameters of derivatives and biological activ-
ity, and establish a mathematical model between structure and bio-
logical activity to guide drug molecular design.2 Machine learning
methods are common for chemical informatics. Since traditional
machine learning methods can only deal with fixed-size inputs, most
early QSAR modeling use manually generated molecular descriptors
for different tasks. Common molecular descriptors include: (1) molec-
ular fingerprints, which encode the molecular structure through a
series of binary numbers representing the specific substructure;2 (2)
one/two-dimensional molecular descriptors, which are derived from
molecular physical chemistry and differential topology processed by
statisticians and chemists.3 Common modeling methods include lin-
ear methods (such as linear regression) and nonlinear methods (such
as support vector machine and random forest, etc.). Recently, deep
learning has become the leading technique for QSAR modeling.
In the past decade, deep learning has become a popular mod-
eling method in various fields, especially in medicine. It is used in
biological activities and physicochemical properties’ prediction, drug
discovery, medical image analysis, and synthetic prediction, etc. Con-
volutional neural networks (CNN) are special in deep learning, and
have successfully solved many problems with structured data (such
as images).4 However, graph data have irregular shape and size; node
position has no spatial order; and the neighbor of the node is also
not related to the position. Hence, the traditional CNN cannot be
directly applied to the graph. For the non-Euclidean structure data,
graph convolutional network (GCN) is proposed, along with various
derivative architectures. In 2005, Gori et al.5 proposed the first graph
Research on Prediction of Molecular Biological Activity Based on GC 245
2. Method
(1) By directly learning the molecular graph, it can avoid the error
caused by manual screening features and its impact on the
robustness and reliability of the model.
(2) The generated attention weight matrix depends on the domain
characteristics of a node instead of global characteristics. The
weight is shared in all graphs. Therefore, the local characteristics
of the extracted data can be realized through the shared features.
2.4. Datasets
The datasets are from a public Chemistry Database PubChem.16
We selected a variety of analysis and screening methods in the lit-
erature14 and different types of biological activity datasets. We also
limited the screening targets, such as screening multiple series of
cytochrome P450 enzymes. Finally, four datasets of cytochrome P450
Series in 1851 target family, two inhibitors, and molecular series iden-
tifying binding r (CAG) RNA repeats were selected. Table 3 lists the
relevant information and filtering conditions of the selected dataset.
3. Experimental Results
Number of Number of
PubChem active inactive
AID Screening conditions molecules molecules
Random forest
Ntrees (50, 100, 150, ..., 500) Number of trees
max depth (1, 5, 10, ..., 45, 50) Maximum depth per tree
max features (1, 5, 10, ..., 45, 50) Maximum characteristic number of
partition time
Support vector machines
Kernel RBF Kernel function
C (1, 10, 100) Penalty coefficient
γ (0.1, 0.001, 0.0001, Affects the amount of data mapped
0.00001, 1, 10, 100) to the new feature space
Deep neural networks
Epoch 100 Number of iterations
Batch size 100 Minimum number of training
samples
Hidden layers (2, 3, 4) Number of hidden layers
Number (10, 50, 100, 500, 700, Number of neurons per layer
neurons 1000)
Loss function ReLu Neuron activation function
Aromaticity binary crossentropy Loss function
Research on Prediction of Molecular Biological Activity Based on GC 257
EAGCN
Batch size 64 Number of single training
samples
Epoch 100 Number of iterations
weight decay 0.00001 Weight decay rate
dropout 0.5 Random inactivation rate
Activation function ReLu Activation function
Loss function binary crossentropy Loss function
kernel size 1 Convolution kernel size
stride 1 Convolution kernel sliding
step
n sgcn1 (30, 10, 10, 10, 10) Number of convolution
output channels of multi
feature map
MF EAGCN
Batch size 64 Number of single training
samples
Epoch 100 Number of iterations
weight decay 0.00001 Weight decay rate
dropout 0.5 Random inactivation rate
Activation function ReLu Activation function
Loss function binary crossentropy Loss function
kernel size 1 Convolution kernel size
stride 1 Convolution kernel sliding
step
n sgcn1 (20, 20, 20, 20, 20) Number of convolution
output channels of
multi-feature map
precision · recall
F1 = 2 · . (5)
precision + recall
258 Y. Zhou et al.
F 1-score takes into account both the precision and recall of the
model. Only when both values are high, the value of F 1 is higher
and the model performance is better.
1851(1a2) 0.824 0.8 0.835 0.85 0.859 0.792 0.78 0.8 0.83 0.841
±0.005 ±0.02 ±0.015 ±0.01 ±0.012 ±0.01 ±0.008 ±0.007 ±0.012 ±0.01
1851(2c19) 0.776 0.75 0.79 0.802 0.815 0.8 0.77 0.823 0.84 0.852
±0.01 ±0.009 ±0.002 ±0.007 ±0.003 ±0.004 ±0.005 ±0.01 ±0.01 ±0.008
1851(2d6) 0.849 0.83 0.84 0.843 0.851 0.828 0.8 0.82 0.83 0.834
±0.006 ±0.007 ±0.002 ±0.005 ±0.003 ±0.013 ±0.004 ±0.003 ±0.01 ±0.006
1851(3a4) 0.77 0.737 0.792 0.817 0.825 0.73 0.701 0.74 0.791 0.807
±0.006 ±0.004 ±0.008 ±0.006 ±0.01 ±0.003 ±0.006 ±0.01 ±0.008 ±0.005
492992 0.713 0.705 0.745 0.757 0.762 0.683 0.674 0.692 0.74 0.75
±0.004 ±0.006 ±0.005 ±0.01 ±0.01 ±0.005 ±0.006 ±0.009 ±0.01 ±0.009
651739 0.753 0.753 0.814 0.83 0.843 0.8 0.776 0.88 0.882 0.891
±0.004 ±0.006 ±0.014 ±0.006 ±0.003 ±0.003 ±0.009 ±0.006 ±0.007 ±0.002
652065 0.75 0.7 0.755 0.77 0.774 0.73 0.67 0.796 0.787 0.792
±0.004 ±0.005 ±0.015 ±0.006 ±0.005 ±0.008 ±0.009 ±0.012 ±0.01 ±0.01
259
260 Y. Zhou et al.
Fig. 3: ACC index distribution used to represent the performance of seven bioac-
tive datasets in five classifiers.
(1) From the perspective of data, we try to modify and balance the
distribution of various categories.26 The most common methods
are expanding the dataset and balanced sampling.26 Sampling
mainly includes oversampling and undersampling.27
(2) From the perspective of algorithm, we modify the algorithm
model or loss function. Filtering out a large number of large
and simple samples and increasing the proportion of loss in the
overall loss value can alleviate the imbalance problem, which is
also the starting point of this chapter.28
− log y y=1
L = −y log y − (1 − y) log 1 − y = (6)
− log (1 − y ) y = 0.
Taking the binary label as an example, y represents the prediction
probability of the sample obtained by the model, and y represents
the label of the sample. When y is 1 and y is close to 1, the loss
is small. When the tag is 0 and the prediction probability value y
is close to 0, the loss is small. However, the weights of all samples
in the cross-entropy function are the same. If there is an imbalance
between positive and negative/difficult samples, a large number of
negative samples/easy samples dominate and lead to poor accuracy.
Therefore, by reducing the weight of easy samples, the model
focuses more on the learning of difficult samples during training to
alleviate the problem caused by the imbalance of difficult samples.
Focal Loss is corrected using the following equation (Eq. (7)):
− (1 − y )γ log y y=1
Lf l = (7)
−y γ log (1 − y ) y = 0.
−α (1 − y )y log y y=1
Lf l = (8)
−(1 − α)y log (1 − y ) y = 0.
1 − p p∗ = 1
g = |p − p∗ | = (9)
p p∗ = 0,
When the GD(g) value is large, β is smaller, and vice versa. For
the easily divided and particularly difficult samples, their distribu-
tions are very dense, i.e., the values of GD(g) are very large. The
parameter β can be used to reduce the weight of these two parts and
improve the weight of other samples. By applying β, the loss of the
sample is weighted.
Based on the β parameter, when the idea of GHM is applied to
classification, a new classification loss function GHM-C is obtained,
as defined in Eq. (12):
N
1
LGHM-C = βi LCE (pi , p∗i )
N
i=1
(12)
N
LCE (pi , p∗i )
= .
GD (gi )
i=1
Rind (g)
GD(g) = = ind(g)m, (13)
ε
N
β̂i = . (14)
GD (gi )
In general, focal loss gradually attenuates the loss from the confi-
dence p, while GHM attenuates the loss from the perspective of the
number of samples within a certain range of confidence. They all have
a good inhibitory effect on the loss of easily divided samples, How-
ever, GHM has a better inhibitory effect on the loss of particularly
difficult samples.
MF EAGCN FL
γ (0.5, 1, 2, 5) Modulation factor
α (0.1, 0.2, 0.5, 0.7) Equilibrium parameters
MF EAGCN GHM
Bins (1, 2, 3, ..., 10) Number of intervals
Momentum 0.1 Momentum partial coefficient
the original cross-entropy loss function with focal loss and GHM-C.
For the two models, as shown in Table 7, the other parameters of the
model are the same as those in Table 5. The table lists the hyperpa-
rameters that focal loss and GHM-C need to determine separately.
Similarly, the dataset is divided by the 20% cross-validation method.
The algorithms are executed three times with different random seeds.
The results are the average of three runs and the standard deviation
is listed.
In GHM-C, the gradient density is approximately solved by the
following two mechanisms:
(1) Splice the gradient value interval into bins and then count the
number of gradients in different bin intervals.
(2) A coefficient is used to calculate the exponentially weighted
moving average to approximate the gradient density. Momen-
tum is used and is called momentum coefficient. After analysis,
Li et al.30 found that the model is not sensitive to momentum
parameters, so it is set to 0.1 here.
267
268 Y. Zhou et al.
difficult and easy samples and α is used to adjust positive and neg-
ative samples. To adjust α, we set γ to be 1 and the values of α are
(0.1, 0.2, 0.3, 0.5, 0.7). According to the experimental results, when
α is 0.3, the performance of the model is the best. To adjust γ, we
set α to be 0.3 and the values of γ are (1, 2, 5). Figure 5 shows the
adjustment of γ to focal loss. When γ is 1, the performance of the
model is the best.
In MF EAGCN model based on GHM-C, we set the value of bins
to be (1, 2, ..., 10), i.e., the tolerance increases with 1 within 1∼10
(Fig. 6). The experimental results show that the performance of the
model is the best when the bin value is 5.
Research on Prediction of Molecular Biological Activity Based on GC 269
5. Conclusion
References
Index
275
276 Index
controversial or non-controversial, 197 deep learning, 20, 62, 84, 103, 142,
controversy detection, 194 146, 244
convex problem, 112 deep neural networks, 126
convolution in GAN, 129 DeepLabV3+, 52
convolution kernel, 97 degree distribution, 203
convolution neural networks, 146 denoised signal, 137
convolutional layer, 92 denoising effect, 138
convolutional neural networks denoising model, 137, 141
(CNN), 92, 244 denoising network, 127
coordinate system, 136 denoising technology, 142
correction linear units, 131 deployment requirements, 99
correlation, 169 depth-first search, 237
cracks, 46 derivative architectures, 244
crop diseases, 61 detect and analyze conflicts, 194
crop stress, 62 detecting controversial articles, 197
cross-entropy loss (CE), 132–133, 184, detection, 73
262 detection performance, 56
cross-layer link, 109 different SNRs, 161
cryptocurrency, 216 difficult samples, 262
CTDNE, 208
Diffpool, 148
customer cone, 176
digital modulation, 136
customer, provider, and peer degrees,
dimensions of network embedding,
176
237
cyclic spectrum, 146
discover rumors, 195
cyclostationary characteristics, 146
discrete static snapshot graphs, 199
cytochrome P450 enzymes, 253
discriminative model, 127
discriminator loss, 133
D
diseases, 66
dark channel, 7 distribution of sensors, 12
data analysis, 103 distribution of various categories, 261
data augmentation, 50 Dragon, 254
data information, 128 drought, 66, 72
dataset of ship license, 23 drug development cycle, 243
debatable content, 194
drug molecular design, 244
decentralization, 219
dynamic features, 207
decentralized applications, 218
decision boundaries, 106
E
decision surfaces, 106
decision tree, 170, 184 easy samples, 262
deep autoencoder, 151 ECG5000, 119
deep belief networks, 105 edge attention graph convolutional
deep Boltzmann machine, 105 network (EAGCN), 245
deep convolutional generative EEG signal classification, 147
adversarial networks (DCGANs), effective signal power, 137
50 electrocardiograms, 126
278 Index