0% found this document useful (0 votes)

15 views13 pages

JStrack-Enriching Malicious JavaScript Detection Based on AST Graph Analysis and Attention Mechanism

Enriching Malicious JavaScript Detection Based on AST Graph Analysis and Attention Mechanism

Uploaded by

chenmq20

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

15 views13 pages

JStrack-Enriching Malicious JavaScript Detection Based on AST Graph Analysis and Attention Mechanism

Enriching Malicious JavaScript Detection Based on AST Graph Analysis and Attention Mechanism

Uploaded by

chenmq20

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 13

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/356808188

JStrack: Enriching Malicious JavaScript Detection Based on AST Graph

Analysis and Attention Mechanism

Chapter · December 2021

DOI: 10.1007/978-3-030-92270-2_57

CITATIONS READS

0 124

6 authors, including:

Muhammad Fakhrur Rozi Tao Ban

Kobe University National Institute of Information and Communications Technology
4 PUBLICATIONS 6 CITATIONS 132 PUBLICATIONS 722 CITATIONS

SEE PROFILE SEE PROFILE

Seiichi Ozawa Takeshi Takahashi

Kobe University National Institute of Information and Communications Technology
178 PUBLICATIONS 1,628 CITATIONS 114 PUBLICATIONS 557 CITATIONS

SEE PROFILE SEE PROFILE

Some of the authors of this publication are also working on these related projects:

Mobile security analysis View project

NICTER Project View project

All content following this page was uploaded by Muhammad Fakhrur Rozi on 06 January 2022.

The user has requested enhancement of the downloaded file.

JStrack: Enriching Malicious JavaScript
Detection Based on AST Graph Analysis
and Attention Mechanism

Muhammad Fakhrur Rozi1,2(B) , Tao Ban1 , Seiichi Ozawa2 , Sangwook Kim2 ,

Takeshi Takahashi1 , and Daisuke Inoue1
1
National Institute of Information and Communications Technology, Koganei,
Tokyo, Japan
{fakhrurrozi95,bantao,takeshi takahashi,dai}@nict.go.jp
2
Kobe University, Kobe, Hyogo, Japan
[email protected], [email protected]

Abstract. Malicious JavaScript is one of the most common tools

for attackers to exploit the vulnerability of web applications. It can
carry potential risks such as spreading malware, phishing, or collect-
ing sensitive information. Though there are numerous types of malicious
JavaScript that are diﬃcult to detect, generalizing the malicious script’s
signature can help catch more complex JavaScripts that use obfuscation
techniques. This paper aims at detecting malicious JavaScripts based
on structure and attribute analysis of abstract syntax trees (ASTs) that
capture the generalized semantic meaning of the source code. We apply a
graph convolutional neural network (GCN) to process the AST features
and get a graph representation via neural message passing with neigh-
borhood aggregation. The attention layer enriches our method to track
pertinent parts of scripts that may contain the signature of malicious
intent. We comprehensively evaluate the performance of our proposed
approach on a real-world dataset to detect malicious websites. The pro-
posed method demonstrates promising performance in terms of detection
accuracy and robustness against obfuscated samples.

Keywords: Cyber security · Malicious JavaScript · Abstract syntax

tree · Graph neural network

1 Introduction

Javascript payload injection into legitimate or fake websites has been one of the
largest attack on the web. The malicious script can exploit the vulnerability
of the web applications to perform a drive-by download attack [2] or cross-site
scripting (XSS) [19]. When the attack is succesful, attackers distribute malware
to clients, which can cause damage such as sensitive data leakage, wire transfer,
or integrating into distributed denial-of-service (DDoS) attacks [3]. For instance,
one of the most famous examples of XSS vulnerability is the Myspace Samy
c Springer Nature Switzerland AG 2021
T. Mantoro et al. (Eds.): ICONIP 2021, LNCS 13109, pp. 669–680, 2021.
https://doi.org/10.1007/978-3-030-92270-2_57
670 M. F. Rozi et al.

worm by Samy Kamkar in 2005 [9]. He exploited a vulnerability on the target

that could give him priviledge to store a JavaScript payload on his Myspace
profile. Moreover, web technology improvement helps attackers use the latest
method to avoid detection, such as the obfuscation techniques.
Researchers have identified the malicious JavaScript payload, which is typ-
ically used by attackers as part of a web security attack. A variety of detec-
tion systems has been proposed that use JavaScript features to detect malicious
intent. We can take many approaches to create a detection system for malicious
JavaScript, such as strings, function calls, bytecode sequences, abstract syntax
tree (ASTs), outputs of dynamic analysis tools. Among these features, AST
gives the most notably excellent performance. Fass et al. [6] use this feature for
their static analysis and use the N-gram model to detect malicious obfuscated
JavaScripts. However, their work focused on the frequency analysis of the specific
patterns with the connection between syntactic units of AST feature ignored.
We have to analyze it at the tree level instead of the sequence level when we
want to capture the semantic meaning of the code.
We propose JStrack, a malicious JavaScript detection system using a graph-
based approach on the AST features to capture the whole semantic meaning
which has not been considered in previous works. We hypothesize that the style
of malicious code tends to be better structured due to the decryption or deob-
fuscation process that should exist inside the code instead of having an abstract
structure. Analyzing the whole AST as a graph structure also gives us more
information about the actual intent of the source code. To capture that infor-
mation thoroughly, we use a supervised graph neural network (GNN), known
as a graph convolutional neural network (GCN) model. This model can capture
the connections between nodes in the graph structures and formulate them as
vectorial features to be used in a neural network model. Moreover, we try to com-
bine it with the attention layer to know which parts of AST carry a significant
information to detect malicious JavaScript code.
To summarize, our contributions are as follow:

– We introduce JStrack, a static analysis method, to detect malicious Java-

Script using the AST features as a graph. We applied GCN to capture the
typical structure and attribute of the AST representation from malicious
JavaScript samples. The GCN model is built by stacking multiple convolu-
tional layers to be used as a layer-wise linear model in our detection system.
– We track the suspicious part of the AST graph, which corresponds to the
actual JavaScript code, by using the attention layer in our proposed model.
The attention scores give us signiﬁcant code segments that can lead us to the
signature of a malicious script.
– We evaluate our proposed approach using real-world malicious samples and
collected JavaScript ﬁles from the top domain list as benign. We show that our
graph-based approach can accurately detect malicious JavaScript even with
the presence of the obfuscation techniques to evade the detection system.
Moreover, our approach detects the obfuscation pattern of AST-graph by
JStrack 671

observing the similarity of graph structures and attributes among malicious

or benign samples.
The rest of the paper is organized as follows. Section 2 provides the back-
ground of JavaScript-based attack and related works. Then, we will explain how
we parse JavaScript code to get the AST representation and how we construct
the graph based on that. Section 3 explains our proposed approach, which uses
a graph-based model to extract the AST feature. Section 4 presents our exper-
iment and evaluation result of our JStrack in Sect. 5. Finally, we provide our
concluding remarks.

2 Background and Related Works

In this section, we explain the background of JavaScript-based attacks and how
attackers use obfuscation technique to hide their malicious intent. We also give
an overview of the AST feature as an abstract representation of JavaScript and
the derivation of the graph from the characteristics of the AST features.

2.1 JavaScript-Based Attack

According to Web Technology surveys [16], JavaScript is the most used client-
side programming language on websites, reaching about 97.4%. Because of that,
malicious JavaScript code is one of the most common web security vulnerabil-
ities that are frequently found in buttons, text, images, or pop-up pages. For
instance, if a website does not sanitize angle brackets (< >), attackers can insert
<script></script> to inject payload, which this tag instructs the browser to exe-
cute the JavaScript between them [21]. The injected script can be triggered when
a single HTTP request runs the malicious payload and attackers did not store it
anywhere on the website or when a site saves and renders it unsanitized [21].
The malicious JavaScript code generally contains some function calls that
attackers usually use to execute their intended action. Examples of function calls
include document.write(), eval(), unenscape(), SetCookie(), GetCookie(),
or newActiveXObject() [7]. Attackers will activate the malicious payload by
altering the document object model (DOM) to drop the malware or steal users’
sensitive data. Due to many malicious samples have these functions, we can
assume that this part of the code gives more important information about the
maliciousness of code. However, in practice, attackers hide the malicious code
by particular means to take advantage of the security ﬂaw. It won’t be easy to
detect such kinds of payload that it can bypass the system. In addition, they
utilize obfuscation techniques to hide their malicious code, making it harder to
ﬁnd the signature.

2.2 Related Works

Previous researches have thoroughly explored the machine learning-based
method for detecting malicious JavaScript. They used various features of
672 M. F. Rozi et al.

JavaScript and applied a diﬀerent approach to increase the performance. Ndichu

et al. [12] applied the FastText model to detect the malicious JavaScript based
on AST features. They tried to deobfuscate the source code to catch the identi-
cal actual malicious payload before modeling. However, their approach handles
the short relationship between syntactic units in AST that they forgot to con-
sider the edge connection. Besides that, Fass et al. [6] did a similar work that
they proposed a syntactical analysis approach using a low-overhead solution that
mixes AST feature extraction sequences and a random forest classifier model.
Differently, Rozi et al. [14] used bytecode sequences as the main features of
JavaScript code, which is the middle language between machine and high-level
code. Due to the super long problem in the bytecode sequence, they used a deep
pyramid convolutional neural network (DPCNN) that contains a pyramid shape
network to get a more straightforward representation. The limitation is that
they have to declare all possible DOM objects in every sample to generate the
sequences.
Moreover, Song et al. [15] and Fang et al. [5] used recurrent neural net-
works (RNNs) architectures to capture the semantic meaning of JavaScript. Song
et al. [15] tried to use the Program Dependency Graph (PDG), AST, and con-
trol flow diagram (CFG), which preserve the semantic information of JavaScript.
However, Fang et al. [5] only relied on AST features to capture the sequence pat-
terns of syntactic unit sequences. Both of them applied Bidirectional Long-Short
Term Memory (BiLSTM) and Long-Short Term Memory (LSTM) to learn the
long-term dependencies.

3 Proposed Approach
To overcome such challenges from malicious JavaScript, we propose a detection
system that can predict the label of a given source code, whether it is malicious
or benign. Our proposed approach uses AST as the feature of JavaScript that
can deﬁne the style and semantic meaning of the source code. By analyzing
this feature, we can capture the malicious intent based on the typical structure
and attribute of the AST graph. We use GCN to learn the graph to have the
generalization of malicious and benign samples.

3.1 Overview

We can see the entire detection system framework in Fig. 1. It begins with a
JavaScript file that we want to predict the malicious intent. After that, we parse
it using a parser to get the AST representation, describing how programmers
write the code. The output is a JSON format file where each record is a syn-
tactic unit object based on ESTree standardization [4]. We can construct graph
objects from a JSON file as a simplification of its data structure. The graph gen-
erator creates syntactic unit types as finite nodes, and the hierarchical connection
among nodes is an edge of the AST graph. Next, we create two matrices, feature
matrix X and adjacency A, representing the feature value of each node and
JStrack 673

Fig. 1. The overview of proposed approach. (a) The original architecture consists of
three layers of convolutional and pooling layers. (b) The combination of GCN and
attention mechanism to locate the suspicious codes of JavaScript. To get the whole
information of nodes, we put the pooling layer after attention layer before going to
fully-connected layer.

all connections of edges, respectively. The GCN is similar to the convolutional

neural network (CNN) in that it consists of two main layers, the convolutional
and pooling layers. The diﬀerence is that GCN applies these layers on a graph
structure to get a suitable vector representation for the graph. The output is the
prediction score to determine the JavaScript label.

3.2 AST Graph Construction

We often find many systems around us that use graph representation to solve
many problems. Graph representation can render a complex system become
more structured so that the problem will be easier to solve. A graph is a ubiqui-
tous data structure and universal language consisting of a collection of objects,
including a set of interactions between pairs of objects [8].
Formally, we can define graph G(V, E) as a set of nodes v ∈ V and edges
e ∈ E. (u, v) denotes an edge going from node u ∈ V to node v ∈ V [8].
We can represent a finite graph G in a squared matrix called adjacency matrix
A ∈ R|V|×|V| . Each row and column indicates all nodes that a finite graph G has.
Furthermore, edges represent entries in A where A[u, v] = 1 if (u, v) ∈ E and
otherwise A[u, v] = 0. Matrix A will not necessarily be symmetric if graph G has
directed edges. Some graphs also have weighted edges, where the entries in the
adjacency matrix are real-values. Besides that, a graph may have an attribute or
feature information for each node that using a real-valued matrix X|V|×m where
m is the feature size of nodes, and the ordering of the nodes is consistent with
674 M. F. Rozi et al.

the adjacency matrix A. In some cases, edges also have real-valued features in
addition to discrete edge types.
We can use a graph-based approach to represent the AST feature with a tree
graph structure. AST is a top-down parsing structure in which each syntactic
unit has at least one hierarchical connection where the root is always a ’program’
type. Based on that, we consider each syntactic unit as a node and hierarchical
link as an edge. Using graph representation simpliﬁes the AST feature in a ﬁxed
form to help the feature extraction process. This representation also allows us
to capture the big picture of the source code, which shows the complexity yet
the programmer’s obfuscation style.

3.3 Learning AST Graph Feature

Suppose we have G = {G1 , G2 , G3 , ..., GN }, a set of all graphs in our dataset. We

can define a graph Gi (Vi , Ei ) consisting of nodes V and edges E. In our problem,
we assume our target for the model is t ∈ {0, 1} which 0 as benign and 1 as the
malicious.
Graph Convolutional Neural Networks. The basic idea of GCN is actually
from convolutional neural networks (CNNs), where it also uses the convolution
and pooling function for getting feature information of each node in the graph.
Originally, Kipf et al. proposed GCN to solve semi-supervised classification tasks
such as graph Laplacian regularization include label propagation [22], manifold
regularization [1], and deep semi-supervised embedding [20]. The basic idea is to
generate embedding information of nodes via neural message passing to aggre-
gate information from all neighborhoods. GCN consists of a stack of graph con-
volution layers, where a point-wise non-linearity follows each layer. The number
of layers is the farthest distance that node features can travel. The number of
layers also influence the performance. More layers are not guaranteed to get a
good result because it makes the aggregation less meaningful if it goes further.
The multi-layer network in GCN follows layer-wise propagation rule:

(l+1) − 12 − 12 (l) (l)
H = σ D̃ ÃD̃ H W . (1)

Where Ã = A + IN is the adjacency

matrix of the undirected graph G with
added self-connections. D̃ii = j Ãij and W(l) is trainable weight matrix in
specific layer. H(l) ∈ RN ×m is the matrix of activations in the lth layer with m
is the feature size of nodes; H(0) = X. σ(·) stands for an activation function,
such as the ReLU(·) = max(0, ·).
Attention Mechanism. This mechanism is basically about paying more focus
on some component that significantly influences the system. Precisely, the atten-
tion function map a query and a set of key-value pairs to an output, where the
query, keys, values, and output are all vectors [17]. The computation of attention
function as follows:
JStrack 675

QKT
Attention(Q, K, V) = sof tmax( √ )V (2)
dk
where Q, K, V are query, key, and value matrices, respectively. dk is the key
of dimensions.
In this work, the attention mechanism can leverage the learning process of
GCN by giving attention weight to concentrate selectively on a discrete aspect of
the graph convolutional layer. We use a self-attention layer to handle long-range
dependencies and have lower complexity than other layer types (e.g., convolu-
tional or recurrent).

4 Experiments
In this section, we present our experiments to evaluate our proposed approach
for detecting malicious JavaScript samples. We evaluated our framework’s per-
formance by adjusting the maximum number of nodes in each graph. Then, we
compared our results with some related works that have a similar task. Finally,
we give some analysis discussion to ﬁnd out our limitations.

4.1 Setup

Dataset. We collect malicious and benign JavaScript datasets, where the mali-
cious samples are from two different sources due to the difficulties of getting the
real-world dataset. For our malicious samples, we mixed the dataset from Rozi
et al. [14] and Ndichu et al. [12] that use some different time stamps of files
from 2015 until 2017. We also confirmed that all those datasets are dangerous
scripts based on the VirusTotal scanner [18]. Meanwhile, we collected JavaScript
codes for benign samples by scrapping from the top domain list on the Majestic
website [10], and we combined it with the benign dataset from SRILAB [13]. We
consider all JavaScript codes inside popular websites as safe code without any
attacking intent.
We split our dataset into two parts: training and testing. We used the train-
ing dataset for the learning purpose of our graph learning model. Otherwise,
we evaluated our model with the testing dataset. We conducted 10-folds cross-
validation to see our model’s average performance that generalizes to an inde-
pendent dataset. Because of that, the proportion between training and testing
is 80% and 20%, respectively. Table 1 summarizes the number of JavaScript files
that we use in our experiments.
Hyper-parameters and Setup. We set optimal hyper-parameters to conduct
our experiments to control the learning process. We used the Adam algorithm
optimization with a 0.01 learning rate and 32 for the batch size. In addition, the
feature size of the convolutional layer in GCN is 32 and using rectified linear
unit (ReLU) as the activation function. For the pooling layer, we used a 50%
ratio to downsample the matrix node.
676 M. F. Rozi et al.

Table 1. The description of our dataset that is used for training and testing process.

Label Dataset
Training Testing Total
Benign 97,361 24,341 121,702
Malicious 31,560 7,890 39,450
Total 128,921 32,231 161,152

Unlike the usual deep learning model, adding more layers does not correlate
with the performance. When we work with the GNNs, this model will signif-
icantly lose the ability to learn if we have too deep layers, where we call this
problem over-smoothing [23]. The main idea of over-smoothing is that all node
representations look identical and uninformative after too many message passing
rounds due to too many layers. Zhou et al. [22] recommended using between 2
and 4 layers to achieve an optimal solution. Therefore, we used the middle range
number, three layers, in our experiments.
Moreover, we applied a data loader with disjoint mode for creating mini-
batches of data in graph learning. It represents a batch of graphs with a disjoint
union that gives us one big graph [11]. Figure 2 illustrates how the disjoint loader
works.

Fig. 2. Disjoint loader is a method to load dataset in graph learning process that
represents batch of graphs via disjoint union. It uses zero-based indices to keep track
of the diﬀerent graphs.

5 Evaluation and Discussion

Due to the memory capacity reason, we could not include all nodes in the learning
process. Because of that, we evaluated six different maximum nodes of the AST
graph: 50, 100, 200, 500, 1000, and 2000. This experiment aims to find the
sufficient nodes that we need to detect the maliciousness of JavaScript. Table 2
shows the performances (precision, recall, F1 score) for each maximum nodes
setting. We can see that the performance of our method will increase in line
JStrack 677

with the number of nodes in the AST graph that we can capture. This result
is in accordance with our hypothesis that AST nodes give an abstraction of
the source code where all nodes give essential information. However, using 2000
nodes still give high performance even though we did not include all information.
It is because AST uses the hierarchical structure that each node has summarized
its successor.

Table 2. Overall performances of our detection system using graph-based approach

on accuracy, precision, recall, F1 score, and AUC.

Max nodes Accuracy Precision Recall F1 score AUC

50 0.9864 0.9872 0.9878 0.9875 0.9878
100 0.9877 0.9881 0.9901 0.9891 0.9901
200 0.9906 0.9929 0.9937 0.9933 0.9937
500 0.9933 0.9940 0.9956 0.9948 0.9956
1000 0.9941 0.9953 0.9965 0.9959 0.9965
2000 0.9940 0.9956 0.9971 0.9963 0.9971

Table 3 shows the comparison between previous works and our proposed
method. GCN has around 98% in terms of F1 score for our dataset with the
maximum 50 nodes of the AST graph. Meanwhile, adding attention layers before
fully connected layers can improve the performance by 99%. Our approaches
outperform the previous works that use the FastText model based on frequency
analysis of syntactic AST units. Even though the diﬀerence is relatively small,
our proposed method can predict the part of the source code which gives more
attention to detect malicious intent. This information will be valuable for fur-
ther analysis of malicious code. Figure 4 is one of the malicious samples in our
dataset that shows the attention score for each node in a graph. Moreover, the
bytecode sequences feature cannot be implemented on every JavaScript samples
because we have to declare all possible DOM objects.
Moreover, we found in our experiments that the malicious JavaScript has its
obfuscation technique to hide the actual source code. Figure 3(a) shows the graph
visualization of malicious JavaScript code. The structure of the AST graph for
malicious JavaScript has many repetitions of the subgraph that we rarely ﬁnd
in benign samples. Some similar styles appear many times within the same time
range, indicating that attackers consistently use their obfuscation function that
normal programmers will not use. On the other hand, most benign samples in
Fig. 3(b) have an arbitrary structure of AST and inconsistent subgraph patterns.
This result is in line with our hypothesis that benign JavaScript mostly does not
use obfuscation techniques, or if it has obfuscated parts, it uses more complicated
methods to protect from reverse engineering.
678 M. F. Rozi et al.

Table 3. Performance comparison with closely related works.

Model Feature F1
DPCNN [14] Bytecode sequence 0.9684
DPCNN+LSTM [14] Bytecode sequence 0.9657
DPCNN+BiLSTM [14] Bytecode sequence 0.9683
LSTM [12] AST 0.9234
FastText [12] AST 0.9873
GCN (3-layers;max 50 nodes) AST 0.9875
GCN (w/attention; max 50 nodes) AST 0.9935

Fig. 3. A sample of AST graph that is constructed from a benign (a) and malicious
(b) JavaScript ﬁle.

Fig. 4. (a) A malicious sample where the highlight parts are the vital parts to execute
the code. (b) The AST representation of the malicious code that each node has a color
represents the attention score. Some nodes have high scores that correlate to the vital
part of malicious code.

However, there are two limitations to our proposed method that we are con-
sidering. First, we lose detailed information about malicious code due to using
the AST feature to represent JavaScript. In the AST graph, we merely use the
syntactic units and omit component details for each unit, which may contain the
JStrack 679

essential information for our detection system. Then, the use of deep/machine
learning does not always consider uncertainty in the prediction task. It relies on
statistical assumptions about the distribution of the dataset to train the model.
Consequently, adversaries-based attacks can exploit the machine learning model
to disrupt the analysis process and make false detection.

6 Conclusions and Future Works

In this paper, we proposed an alternative approach to detect malicious JavaScript
based on the analysis of AST representation. The syntactical structure of Java-
Script can give more comprehensive information about the source code’s seman-
tic meaning to capture the generalization of malicious signatures to overcome
future attacks. GCN successfully encodes the whole AST graph via a neural
message from its local neighborhood that leads to high detection performance.
Additionally, the attention layers also help us locate suspicious parts of the mali-
cious samples, signiﬁcantly contributing to the detection system. As future plan,
we will extend our research for future work to detect malicious websites based on
encoded JavaScript information. We will explore more about other JavaScript
features that probably increase the performance.

Acknowledgements. This research was partially supported by the Ministry of

Education, Science, Sports, and Culture, Grant-in-Aid for Scientiﬁc Research (B)
21H03444.

References
1. Belkin, M., Niyogi, P., Sindhwani, V.: Manifold regularization: a geometric frame-
work for learning from labeled and unlabeled examples. J. Mach. Learn. Res. 7,
2399–2434 (2006)
2. Cova, M., Kruegel, C., Vigna, G.: Detection and analysis of drive-by-download
attacks and malicious JavaScript code. In: Proceedings of the 19th International
Conference on World Wide Web, WWW 2010, pp. 281–290. Association for Com-
puting Machinery, New York (2010). https://doi.org/10.1145/1772690.1772720
3. Douligeris, C., Mitrokotsa, A.: DDoS attacks and defense mechanisms: classiﬁca-
tion and state-of-the-art. Comput. Netw. 44(5), 643–666 (2004)
4. The estree spec. https://github.com/estree/estree. Accessed 20 Jan 2021
5. Fang, Y., Huang, C., Liu, L., Xue, M.: Research on malicious JavaScript detection
technology based on LSTM. IEEE Access 6, 59118–59125 (2018)
6. Fass, A., Krawczyk, R.P., Backes, M., Stock, B.: JaSt: fully syntactic detection of
malicious (obfuscated) JavaScript. In: Giuﬀrida, C., Bardin, S., Blanc, G. (eds.)
DIMVA 2018. LNCS, vol. 10885, pp. 303–325. Springer, Cham (2018). https://doi.
org/10.1007/978-3-319-93411-2 14
7. Gupta, S., Gupta, B.: Enhanced XSS defensive framework for web applica-
tions deployed in the virtual machines of cloud computing environment. Proce-
dia Technol. 24, 1595–1602 (2016). https://doi.org/10.1016/j.protcy.2016.05.152.
https://www.sciencedirect.com/science/article/pii/S2212017316302419. Interna-
tional Conference on Emerging Trends in Engineering, Science and Technology
(ICETEST - 2015)
680 M. F. Rozi et al.

8. Hamilton, W.L.: Graph representation learning. In: Synthesis Lectures on Artiﬁcial

Intelligence and Machine Learning, vol. 14, no. 3, pp. 1–159 (2020)
9. Kamkar, S.: phpwn: attacking sessions and pseudo-random numbers in PHP. In:
Blackhat (2010)
10. Majestic. https://majestic.com/. Accessed 26 Jan 2021
11. Data modes. https://graphneural.network/data-modes/. Accessed 17 Apr 2021
12. Ndichu, S., Kim, S., Ozawa, S.: Deobfuscation, unpacking, and decoding of obfus-
cated malicious JavaScript for machine learning models detection performance
improvement. CAAI Trans. Intell. Technol. 5, 184–192 (2020)
13. Raychev, V., Bielik, P., Vechev, M., Krause, A.: Learning programs from noisy
data. SIGPLAN Not. 51(1), 761–774 (2016)
14. Rozi, M.F., Kim, S., Ozawa, S.: Deep neural networks for malicious JavaScript
detection using bytecode sequences. In: 2020 International Joint Conference on
Neural Networks (IJCNN), pp. 1–8 (2020)
15. Song, X., Chen, C., Cui, B., Fu, J.: Malicious JavaScript detection based on bidi-
rectional LSTM model. Appl. Sci. 10(10), 3440 (2020). https://doi.org/10.3390/
app10103440. https://www.mdpi.com/2076-3417/10/10/3440
16. Usage statistics of JavaScript as client-side programming language on websites.
https://w3techs.com/technologies/details/cp-javascript. Accessed 08 May 2021
17. Vaswani, A., et al.: Attention is all you need. In: Proceedings of the 31st Inter-
national Conference on Neural Information Processing Systems, NIPS 2017, pp.
6000–6010. Curran Associates Inc., Red Hook (2017)
18. Virustotal. https://www.virustotal.com/gui/. Accessed 15 Jan 2021
19. Wassermann, G., Su, Z.: Static detection of cross-site scripting vulnerabilities. In:
2008 ACM/IEEE 30th International Conference on Software Engineering, pp. 171–
180 (2008). https://doi.org/10.1145/1368088.1368112
20. Weston, J., Ratle, F., Collobert, R.: Deep learning via semi-supervised embedding.
In: Proceedings of the 25th International Conference on Machine Learning, ICML
2008, pp. 1168–1175. Association for Computing Machinery, New York (2008).
https://doi.org/10.1145/1390156.1390303
21. Yaworski, P.: Real-world bug hunting: a field guide to web hacking 14(3) (2019)
22. Zhou, K., et al.: Understanding and resolving performance degradation in graph
convolutional networks. arXiv e-prints arXiv:2006.07107, June 2020
23. Zhu, X., Ghahramani, Z., Lafferty, J.D.: Semi-supervised learning using gaussian
fields and harmonic functions. In: Fawcett, T., Mishra, N. (eds.) Proceedings of the
Twentieth International Conference on Machine Learning (ICML 2003), Washing-
ton, DC, USA, 21–24 August 2003, pp. 912–919. AAAI Press (2003). http://www.
aaai.org/Library/ICML/2003/icml03-118.php

View publication stats

Java 17 Backend Development: Design backend systems using Spring Boot, Docker, Kafka, Eureka, Redis, and Tomcat
From Everand
Java 17 Backend Development: Design backend systems using Spring Boot, Docker, Kafka, Eureka, Redis, and Tomcat
Elara Drevyn
No ratings yet
Practical C++ Backend Programming
From Everand
Practical C++ Backend Programming
Justin Barbara
No ratings yet
Rust for Network Programming and Automation
From Everand
Rust for Network Programming and Automation
Brian Anderson
No ratings yet
On Tap NWC203c
No ratings yet
On Tap NWC203c
28 pages
Exploring Hadoop Ecosystem (Volume 2): Stream Processing
From Everand
Exploring Hadoop Ecosystem (Volume 2): Stream Processing
Wei Liu
No ratings yet
DDoS Attack Detection and Mitigation Using SDN
100% (1)
DDoS Attack Detection and Mitigation Using SDN
18 pages
Dell Inspiron 660S Vostro 270S - 11061-1
100% (1)
Dell Inspiron 660S Vostro 270S - 11061-1
51 pages
ERP Implementation at Tata Motors
No ratings yet
ERP Implementation at Tata Motors
8 pages
JavaScript Programming: 3 In 1 Security Design, Expressions And Web Development
From Everand
JavaScript Programming: 3 In 1 Security Design, Expressions And Web Development
Richie Miller
No ratings yet
PACIS_2021_paper_314
No ratings yet
PACIS_2021_paper_314
15 pages
Koa Web Development Essentials: Definitive Reference for Developers and Engineers
From Everand
Koa Web Development Essentials: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Tapestry 5: Building Web Applications
From Everand
Tapestry 5: Building Web Applications
Alexander Kolesnikov
3.5/5 (2)
Statistics with Rust: 50+ Statistical Techniques Put into Action
From Everand
Statistics with Rust: 50+ Statistical Techniques Put into Action
Keiko Nakamura
No ratings yet
Rust for Network Programming and Automation, Second Edition: Work around designing networks, TCP/IP protocol, packet analysis and performance monitoring using Rust 1.68
From Everand
Rust for Network Programming and Automation, Second Edition: Work around designing networks, TCP/IP protocol, packet analysis and performance monitoring using Rust 1.68
Gilbert Stew
No ratings yet
Rust for Network Programming and Automation, Second Edition
From Everand
Rust for Network Programming and Automation, Second Edition
Gilbert Stew
No ratings yet
ZAP Essentials: Definitive Reference for Developers and Engineers
From Everand
ZAP Essentials: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Executive Functions and ADHD in Adults Evidence
100% (3)
Executive Functions and ADHD in Adults Evidence
13 pages
Comprehensive Guide to Zipkin: Definitive Reference for Developers and Engineers
From Everand
Comprehensive Guide to Zipkin: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Effective Vulnerability Management: Managing Risk in the Vulnerable Digital Ecosystem
From Everand
Effective Vulnerability Management: Managing Risk in the Vulnerable Digital Ecosystem
Chris Hughes
5/5 (1)
Boost.Asio Techniques and Applications: Definitive Reference for Developers and Engineers
From Everand
Boost.Asio Techniques and Applications: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Mastering JavaScript Secure Web Development+: Unlock the Secrets of Expert-Level Skills
From Everand
Mastering JavaScript Secure Web Development+: Unlock the Secrets of Expert-Level Skills
Larry Jones
No ratings yet
JavaScript Design Patterns: Deliver fast and efficient production-grade JavaScript applications at scale
From Everand
JavaScript Design Patterns: Deliver fast and efficient production-grade JavaScript applications at scale
Hugo Di Francesco
No ratings yet
Java 17 Backend Development
From Everand
Java 17 Backend Development
Elara Drevyn
No ratings yet
ESLint Configuration and Best Practices: Definitive Reference for Developers and Engineers
From Everand
ESLint Configuration and Best Practices: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Linux Essentials for Hackers & Pentesters: Kali Linux Basics for Wireless Hacking, Penetration Testing, VPNs, Proxy Servers and Networking Commands
From Everand
Linux Essentials for Hackers & Pentesters: Kali Linux Basics for Wireless Hacking, Penetration Testing, VPNs, Proxy Servers and Networking Commands
Linux Advocate Team
No ratings yet
Linux Essentials for Hackers & Pentesters
From Everand
Linux Essentials for Hackers & Pentesters
Linux Advocate Team
No ratings yet
Snort for Network Security: Definitive Reference for Developers and Engineers
From Everand
Snort for Network Security: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
1-s2.0-S2352711024003492-main
No ratings yet
1-s2.0-S2352711024003492-main
10 pages
2412.18641v2
No ratings yet
2412.18641v2
50 pages
NativeScript for Application Development: Definitive Reference for Developers and Engineers
From Everand
NativeScript for Application Development: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Ultimate Web API Development with Django REST Framework: Build Robust and Secure Web APIs with Django REST Framework Using Test-Driven Development for Data Analysis and Management (English Edition)
From Everand
Ultimate Web API Development with Django REST Framework: Build Robust and Secure Web APIs with Django REST Framework Using Test-Driven Development for Data Analysis and Management (English Edition)
Leonardo Lazzaro
No ratings yet
Veracode Essentials: Definitive Reference for Developers and Engineers
From Everand
Veracode Essentials: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Metasploit Masterclass For Ethical Hackers: Expert Penetration Testing And Vulnerability Assessment
From Everand
Metasploit Masterclass For Ethical Hackers: Expert Penetration Testing And Vulnerability Assessment
Rob Botwright
No ratings yet
OWASP Security Principles and Practices: Definitive Reference for Developers and Engineers
From Everand
OWASP Security Principles and Practices: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Rust for Network Programming and Automation: Learn to Design and Automate Networks, Performance Optimization, and Packet Analysis with low-level Rust
From Everand
Rust for Network Programming and Automation: Learn to Design and Automate Networks, Performance Optimization, and Packet Analysis with low-level Rust
Brian Anderson
No ratings yet
Hands-on Cryptography with Python: Master Cryptographic Foundations with Real-World Implementation for Secure System Development Using Python (English Edition)
From Everand
Hands-on Cryptography with Python: Master Cryptographic Foundations with Real-World Implementation for Secure System Development Using Python (English Edition)
Md Rasid Ali
No ratings yet
Practical C++ Backend Programming: Crafting Databases, APIs, and Web Servers for High-Performance Backend
From Everand
Practical C++ Backend Programming: Crafting Databases, APIs, and Web Servers for High-Performance Backend
Justin Barbara
No ratings yet
Modern JavaScript Bundling with Rollup: Definitive Reference for Developers and Engineers
From Everand
Modern JavaScript Bundling with Rollup: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
JavaScript OOP Step by Step: A Practical Guide with Examples
From Everand
JavaScript OOP Step by Step: A Practical Guide with Examples
William E. Clark
No ratings yet
DeepLearningApproachCombainingsparseautoencoderwithSVMForNetworkintrusiondetection
No ratings yet
DeepLearningApproachCombainingsparseautoencoderwithSVMForNetworkintrusiondetection
15 pages
Statistics with Rust, Second Edition: Explore rust programming and its powerful crates across data science, machine learning and NLP projects
From Everand
Statistics with Rust, Second Edition: Explore rust programming and its powerful crates across data science, machine learning and NLP projects
Keiko Nakamura
No ratings yet
Statistics with Rust, Second Edition
From Everand
Statistics with Rust, Second Edition
Keiko Nakamura
No ratings yet
Learn Penetration Testing with Python 3.x: Perform Offensive Pentesting and Prepare Red Teaming to Prevent Network Attacks and Web Vulnerabilities (English Edition)
From Everand
Learn Penetration Testing with Python 3.x: Perform Offensive Pentesting and Prepare Red Teaming to Prevent Network Attacks and Web Vulnerabilities (English Edition)
Yehia Elghaly
5/5 (1)
Comprehensive Guide to Meteor Development: Definitive Reference for Developers and Engineers
From Everand
Comprehensive Guide to Meteor Development: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Comprehensive Guide to Dash Applications: Definitive Reference for Developers and Engineers
From Everand
Comprehensive Guide to Dash Applications: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Jasmine JavaScript Testing - Second Edition
From Everand
Jasmine JavaScript Testing - Second Edition
Paulo Ragonha
No ratings yet
Mastering OpenCV Android Application Programming
From Everand
Mastering OpenCV Android Application Programming
Salil Kapur
No ratings yet
Comprehensive Guide to Flutter Development: Definitive Reference for Developers and Engineers
From Everand
Comprehensive Guide to Flutter Development: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Introduction to Web Hacking: Cross-site Scripting
From Everand
Introduction to Web Hacking: Cross-site Scripting
Gary Drocella
No ratings yet
Ec32022 164
No ratings yet
Ec32022 164
9 pages
Semantic Computing
From Everand
Semantic Computing
Phillip C.-Y. Sheu
No ratings yet
JavaScript for Modern Web Development: Building a Web Application Using HTML, CSS, and JavaScript
From Everand
JavaScript for Modern Web Development: Building a Web Application Using HTML, CSS, and JavaScript
Alok Ranjan
No ratings yet
Said, 2021
No ratings yet
Said, 2021
25 pages
Developing Interactive Web Applications with Shiny: Definitive Reference for Developers and Engineers
From Everand
Developing Interactive Web Applications with Shiny: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Tslearn, A Machine Learning Toolkit For Time Series Data: January 2020
No ratings yet
Tslearn, A Machine Learning Toolkit For Time Series Data: January 2020
8 pages
Severe Plastic Deformation SPD Process For Metals
No ratings yet
Severe Plastic Deformation SPD Process For Metals
21 pages
Main
No ratings yet
Main
18 pages
A State-Of-The-Art Review on Phishing Website Detection Techniques
No ratings yet
A State-Of-The-Art Review on Phishing Website Detection Techniques
38 pages
ESCARPaper
No ratings yet
ESCARPaper
21 pages
Blockchain-Aided_Flow_Insertion_and_Verification_i
No ratings yet
Blockchain-Aided_Flow_Insertion_and_Verification_i
11 pages
IJEME
No ratings yet
IJEME
11 pages
978-981!99!3177-4!!!!!!Aspect-Based Sentiment Classification Survey 2024!!!!
No ratings yet
978-981!99!3177-4!!!!!!Aspect-Based Sentiment Classification Survey 2024!!!!
561 pages
A Property Graph Data Model for a Context-Aware Design Assistant
No ratings yet
A Property Graph Data Model for a Context-Aware Design Assistant
11 pages
MTH PDF
No ratings yet
MTH PDF
1 page
Privacy Policy: For Clients
No ratings yet
Privacy Policy: For Clients
11 pages
LT1076 5
No ratings yet
LT1076 5
8 pages
Study Material
No ratings yet
Study Material
76 pages
Your Document Right Now, Plus Millions More, With A Free Trial
No ratings yet
Your Document Right Now, Plus Millions More, With A Free Trial
12 pages
Free Proxy List
No ratings yet
Free Proxy List
5 pages
ESNI Schema-Process - 2018
No ratings yet
ESNI Schema-Process - 2018
6 pages
EthicalHackingMethodologiesAComparativeAnalysis
No ratings yet
EthicalHackingMethodologiesAComparativeAnalysis
6 pages
Project Report: Bore-Well Rescue Using Robotic Arm
100% (1)
Project Report: Bore-Well Rescue Using Robotic Arm
19 pages
Dragonpay Corp 0075352743 Colleague/Friend/Others Others Payment
No ratings yet
Dragonpay Corp 0075352743 Colleague/Friend/Others Others Payment
1 page
Portfolio Theory Lecture Notes
100% (1)
Portfolio Theory Lecture Notes
2 pages
Business Model Canvas Radiology Online Booking Apps (Radob)
No ratings yet
Business Model Canvas Radiology Online Booking Apps (Radob)
5 pages
ADV Excercise
No ratings yet
ADV Excercise
4 pages
Xiii Internal Moot Court Competition
No ratings yet
Xiii Internal Moot Court Competition
29 pages
CorrigoE Manual Heating Long Eng
No ratings yet
CorrigoE Manual Heating Long Eng
68 pages
North West University Prospectus
No ratings yet
North West University Prospectus
32 pages
Cyberview SB Corporate Presentation
No ratings yet
Cyberview SB Corporate Presentation
11 pages
Vishwas Gupta ResumeWed Oct 17-10-58!41!2018
No ratings yet
Vishwas Gupta ResumeWed Oct 17-10-58!41!2018
3 pages
IEEE 802.11g (54Mbps) Mini PCI Wireless LAN Module - D711035D
No ratings yet
IEEE 802.11g (54Mbps) Mini PCI Wireless LAN Module - D711035D
7 pages
Database Fundamental (TIS 1101 Tutorial 8: Staff - ID Staff - Name Staff - Age Course - Type Hourly - Fees
No ratings yet
Database Fundamental (TIS 1101 Tutorial 8: Staff - ID Staff - Name Staff - Age Course - Type Hourly - Fees
3 pages
Atm System
50% (2)
Atm System
69 pages
Fortigate Fortiwifi 80f Series PDF
No ratings yet
Fortigate Fortiwifi 80f Series PDF
16 pages
TOP密码国外字典1200
No ratings yet
TOP密码国外字典1200
23 pages
Win 7
No ratings yet
Win 7
3 pages
2SK2225
No ratings yet
2SK2225
7 pages
Nuendo 12 Operation Manual en
No ratings yet
Nuendo 12 Operation Manual en
1,569 pages
Brooks Rotameter Variable Area
No ratings yet
Brooks Rotameter Variable Area
22 pages

JStrack-Enriching Malicious JavaScript Detection Based on AST Graph Analysis and Attention Mechanism

Uploaded by

JStrack-Enriching Malicious JavaScript Detection Based on AST Graph Analysis and Attention Mechanism

Uploaded by

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

JStrack: Enriching Malicious JavaScript Detection Based on AST Graph

Chapter · December 2021

Muhammad Fakhrur Rozi Tao Ban

SEE PROFILE SEE PROFILE

Seiichi Ozawa Takeshi Takahashi

SEE PROFILE SEE PROFILE

Mobile security analysis View project

NICTER Project View project

The user has requested enhancement of the downloaded file.

Muhammad Fakhrur Rozi1,2(B) , Tao Ban1 , Seiichi Ozawa2 , Sangwook Kim2 ,

Abstract. Malicious JavaScript is one of the most common tools

Keywords: Cyber security · Malicious JavaScript · Abstract syntax

worm by Samy Kamkar in 2005 [9]. He exploited a vulnerability on the target

– We introduce JStrack, a static analysis method, to detect malicious Java-

observing the similarity of graph structures and attributes among malicious

2 Background and Related Works

2.1 JavaScript-Based Attack

2.2 Related Works

JavaScript and applied a diﬀerent approach to increase the performance. Ndichu

all connections of edges, respectively. The GCN is similar to the convolutional

3.2 AST Graph Construction

3.3 Learning AST Graph Feature

Suppose we have G = {G1 , G2 , G3 , ..., GN }, a set of all graphs in our dataset. We

Where Ã = A + IN is the adjacency

5 Evaluation and Discussion

Table 2. Overall performances of our detection system using graph-based approach

Max nodes Accuracy Precision Recall F1 score AUC

Table 3. Performance comparison with closely related works.

6 Conclusions and Future Works

Acknowledgements. This research was partially supported by the Ministry of

8. Hamilton, W.L.: Graph representation learning. In: Synthesis Lectures on Artiﬁcial

View publication stats

You might also like