2018 Minhash
2018 Minhash
Abstract—The growing threat of malware is becoming more and more difficult to ignore. In this paper, a malware feature images
generation method is used to combine the static analysis of malicious code with the methods of recurrent neural networks (RNN) and
convolutional neural networks (CNN). By using an RNN, our method considers not only the original information of malware but also the
ability to associate the original code with timing characteristics; furthermore, the process reduces the dependence on category labels of
malware. Then, we use minhash to generate feature images from the fusion of the original codes and the predictive codes from the
RNN. Finally, we train a CNN to classify feature images. When we trained very few samples (the proportion of the sample size of
training dataset to validation dataset was 1:30), we obtained accuracy over 92 percent. When we adjust the proportion to 3:1, the
accuracy exceeds 99.5 percent. As shown in confusion matrices, our method obtains a good result, where the worst false positive rate
of all the malware families is 0.0147 and the average false positive rate is 0.0058.
Index Terms—Malware family identification, malware feature image, recurrent neural network, convolutional neural network
1 INTRODUCTION
1545-5971 ß 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See ht_tps://www.ieee.org/publications/rights/index.html for more information.
Authorized licensed use limited to: Shanghai Maritime University. Downloaded on September 23,2021 at 11:57:51 UTC from IEEE Xplore. Restrictions apply.
284 IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING, VOL. 18, NO. 1, JANUARY/FEBRUARY 2021
Authorized licensed use limited to: Shanghai Maritime University. Downloaded on September 23,2021 at 11:57:51 UTC from IEEE Xplore. Restrictions apply.
SUN AND QIAN: DEEP LEARNING AND VISUALIZATION FOR IDENTIFYING MALWARE FAMILIES 285
Authorized licensed use limited to: Shanghai Maritime University. Downloaded on September 23,2021 at 11:57:51 UTC from IEEE Xplore. Restrictions apply.
286 IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING, VOL. 18, NO. 1, JANUARY/FEBRUARY 2021
Fig. 3. Illustration of (a) LSTM and (b) GRU. (a) i, f and o are the input,
forget and output gates, respectively. c and c~ denote the memory cell
and the new memory cell content. (b) r and z are the reset and update
gates, and h and h~ are the activation and the candidate activation [44]. Fig. 4. Use sliding windows to train RNN and generate predictive codes.
Every time, we use the opcodes in rectangle but not in circle to predict
the opcode in circle. Finally, we get a predictive code sequence.
analysis. Yuan et al. [12] proposed an ML-based method and
were the first to use neural networks; however, they only We convert opcodes into 1-hot vectors. For 256 types of
used traditional neural networks, and the only role of the neu- opcode, the encoding length is 256. 1-hot vector increases
ral network was to replace traditional classifiers. Razvan et al. the differences between pieces of opcode in the RNN. Each
[13] and Shun Tobiyama et al. [14] use the more complex 1-hot encoding contains only one 1, and the other positions
RNN, but they extracted information from the hidden layers in the vector are 0. The integer number of the opcode deter-
that may be too abstract to interpret. Furthermore, the method mines the location of the 1 in the 1-hot vector. For example,
by which they used the RNN may lose some useful informa- the 1-hot code of operation code 56 is as following:
tion. When generating feature images, their method [14]
does the equivalent of stretching and transforming large ½0; 0; ; 0 1 0; 0; ; 0: (1)
|fflfflfflfflfflffl{zfflfflfflfflfflffl} |fflfflfflfflfflffl{zfflfflfflfflfflffl}
images to process the information into the same dimension. 55 200
For large images, limited useful information is kept in the
simple way. We construct a bidirectional RNN with one input layer,
For these reasons, we use locality-sensitive hashing, which three hidden layers (each with 386 GRUs) and one softmax
can solve the problem of the sizes of feature images not being output layer.
the same. It can also extract locality-sensitive information Considering the problem of gradient extinction and gra-
from opcodes and make the same local information appear dient explosion in RNN training, we do not use LSTM
the same as visualization information in feature images. Com- (Fig. 3a) but GRU (Fig. 3b). Compared to LSTM contains
pared to the method containing single information, adding fewer training parameters, and each iteration requires less
RNN to process information can improve the classification time and space resources. Using either LSTM or GRU is bet-
effect. Finally, we send the feature images into CNN for train- ter than using the traditional tanh unit. However, it is diffi-
ing and classification. CNN can discover the local features of cult to definitively conclude which is better [44]. We discuss
the code block. Through the trained shallow convolution the structure of RNN in the experimental part. After weigh-
kernel features, it is possible to analyze the local features of ing the computing speed and accuracy, we adopted GRU.
different malwares and infer which malicious features reflect When training BRNN as shown in Fig. 4, we set a sliding
the maliciousness of the code. Using CNN to extract images window with a length of K. The hyperparameter K deter-
local features is, so far, a good choice for feature images gener- mines the learning effect of BRNN. K can be neither too
ated by different visualization methods. small nor too big. If K is too small, the information con-
tained in the feature may not be enough to make the correct
3.2 Extracting opcodes prediction; on the other hand, K being too large will
Using disassembler, we can obtain disassembly codes. For increase pressure to learn long-term dependencies in
simplicity, we only consider opcodes. There are a lot of opc- BRNN, which makes training more difficult and less accu-
odes, so we only consider 255 types that are used fre- rate. Unlike traditional RNNs, BRNNs are based on the idea
quently, and the rest are classified as the 256th. The that the prediction depends on the information of not only
experimental data is obtained from the 2015 Kaggle Micro- previous input but also the whole input sequence. For
soft Malware Classification Challenge: Classify malware example, to predict a missing word in a sequence, you
into families based on file content and characteristics [43]. should look at both the left and the right contexts. The RNN
For instance, among the data we have collected, there are predicts the Mth opcode in each sliding window. In other
735 types of opcode, 255 types of which often occurred words, each time, we use the former(M 1) opcodes in the
account for 99.98 percent of the total, and the remaining 480 sliding window and the latter(K M) opcodes in the win-
types account for only 0.02 percent. dow to predict the Mth opcode in the window. The parame-
ter M determines the size of the information before and
3.3 Training RNN after the impact of the prediction. If the prediction depends
After the malicious code is processed, it only contains 256 on less previous information, we can set a smaller M, on the
kinds of opcodes. We cannot directly put the opcodes into contrary, we can set a larger M if the prediction requires
neural network. We use 0-255 integer represents each less future information. In summary, the input and output
opcode, which can have one of 256 (¼ 28 ) possible values. specification for RNN is as as Eqs. (2) and (3).
Authorized licensed use limited to: Shanghai Maritime University. Downloaded on September 23,2021 at 11:57:51 UTC from IEEE Xplore. Restrictions apply.
SUN AND QIAN: DEEP LEARNING AND VISUALIZATION FOR IDENTIFYING MALWARE FAMILIES 287
Authorized licensed use limited to: Shanghai Maritime University. Downloaded on September 23,2021 at 11:57:51 UTC from IEEE Xplore. Restrictions apply.
288 IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING, VOL. 18, NO. 1, JANUARY/FEBRUARY 2021
Authorized licensed use limited to: Shanghai Maritime University. Downloaded on September 23,2021 at 11:57:51 UTC from IEEE Xplore. Restrictions apply.
SUN AND QIAN: DEEP LEARNING AND VISUALIZATION FOR IDENTIFYING MALWARE FAMILIES 289
Authorized licensed use limited to: Shanghai Maritime University. Downloaded on September 23,2021 at 11:57:51 UTC from IEEE Xplore. Restrictions apply.
290 IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING, VOL. 18, NO. 1, JANUARY/FEBRUARY 2021
Fig. 10. Structure of the designed CNN for feature images training.
fraction of the total samples so as to evaluate the generali- 4.3 Training RNN
zation ability of the proposed method in very few sam- After trying different network structures, we found that
ples. These samples contain all sorts of malicious families LSTM and GRU have different effects on different layers. As
in the test, some families have more samples (10 sam- shown in Table 2, it is best to use two-layer LSTM structures
ples), and some families have only 3 samples. The split and three-layer GRU structures. The GRU structure is faster
can raise the difficulty of classification. The detailed dis- to compute, so we designed a 3-layer GRU structure RNN.
tribution of each malware family is as described in We also compare 1-hot and word embedding [46]. Word
Table 1. Dataset(300) and dataset(600) are also make up embedding is indeed a very effective and interesting idea
by samples in dataset1. that seems to provide more information than 1-hot. We also
In general, the purposes of dataset1 are as follows: complemented comparative test between word embedding
and 1-hot. However, the experimental results show that, for
1) The samples in dataset1 will be learned without malicious code in particular, word embedding is not supe-
labels by BRNN. rior to 1-hot. The convergence speed is slightly slower, and
2) In very few training sample test, it serves as the test- the accuracy is almost the same as for 1-hot embedding.
ing data to measure the effect on classification of Compared to natural language, the quantity of opcodes in
samples which have been learned by BRNN. malicious codes is much less obviously, and there are also
3) It is used as the training data, and then the trained clear differences in the method between using opcodes and
model is used to test dataset3. using natural language. In our opinion, the parameters
Dataset2 has only one function that is to be used as the regarding the 1-hot input layer and the first hidden layer
training set, and then the trained model is used to test can also be seen as a form of word embedding. Opcodes in
dataset3. the same window will have similar updates. These updates
Dataset3 has 2 functions: will accumulate, and opcodes with similar patterns will
1) In very few sample training tests, it serves as the test accumulate these similar updates to a considerable degree.
data to measure the effect on Classification of sam- Using the RNN sliding window, we need to consider the
ples which havent been learned by BRNN. value K and the value M. As shown in Fig. 11, when
2) It is used as testing data when dateset1 and dataset2 the size of sliding window is fixed(K is fixed), the RNN is
are trained. able to learn the characteristics of malicious families better
when M is near K=2. Therefore, the opcode to be predicted
should be set near the center of the sliding window. For the
4.2 Implementation size of the sliding window(K), a smaller K can bring faster
We implemented our models in python using keras-2.0.8 training speed; however, an appropriate increase in the
and select tensorflow-1.0.1 as backend. The hardware con- window size can fit the data better.
figuration of the experiment platform is: Intel(R) Xeon(R) At last, When training the BRNN we select dataset1 to
CPU E5-2640 v4 @ 2.40 GHz * 2, Tesla P100 16 GB * 2, 128 learn the malicious code without category labels. The
GB RAM. BRNN contains 3 hidden layers, and there are 384 GRUs in
each hidden layer. We set the size of sliding windowK ¼ 14
TABLE 1 and M ¼ 9. After all the samples in dataset1 are processed
Malware datasets for Experiment into the BRNN, we can obtain 4,276,033 sliding windows in
total. Different from traditional RNNs, BRNNs are based on
Quantities Families the idea that the prediction depends not only on the previ-
1 2 3 4 5 6
Datesets ous input but also on the entire input sequence. Intuitively,
dataset1 299 377 104 196 270 224 previous information in the sequence is more important
dataset2 600 377 194 304 728 460 than future information. Keeping the balance between the
dataset3 360 69 108 80 183 94 previous and future information as much as possible, previ-
dataset(49) 9 10 3 10 9 8 ous information is weighted slightly more than the future
dataset(300) 50 50 50 50 50 50
dataset(600) 100 100 100 100 100 100 information. Therefore, each time, we use the first 8 opcodes
and the last 5 opcodes to predict the 9th opcode in the
Authorized licensed use limited to: Shanghai Maritime University. Downloaded on September 23,2021 at 11:57:51 UTC from IEEE Xplore. Restrictions apply.
SUN AND QIAN: DEEP LEARNING AND VISUALIZATION FOR IDENTIFYING MALWARE FAMILIES 291
TABLE 2
RNN in Different Structure
window. During training, we increase batch size is used to We also tried to use training sets in different sizes to train
make the BRNN converge to a better result. At the begin- RNN, as shown in Fig. 13. As the size of the RNN training
ning of the training stage, we use a small batch (768) to dataset increases, the classification accuracy also increases.
jump out of the local optimum. When the accuracy con- The experiment proves our view that using RNN predictive
verges to a better result with regards to the current batch, information can indeed increase the accuracy of prediction
we increase the batch size gradually (1024, 2048, 3072, 4096) compared to not using RNN predictive information. This
to accelerate the convergence. The process is as shown in technique for unsupervised learning works. Malicious code
Fig. 12. Finally, we save the trained model. Initially, we tried in the same malware family has similar characteristics that
using 4 layer unidirectional RNN for training, but the final cannot be found in other families. A larger training dataset
accuracy was only 0.6549. After using bidirectional RNN, means it is easier to find the characteristics of the same mali-
the final accuracy increased to 0.8697, as shown in Fig. 12. cious family. Once the RNN finds common characteristics of
a family, it will make the malicious code in the same family
more similar by generating a predictive sequence. RNN pre-
dictive information can reduce the dependence on category
labels to some degree.
Fig. 12. BRNN training process. Gradually increase the size of the batch
(768, 1024, 2048, 3072, 4096) to accelerate the convergence speed. Fig. 13. Training RNN with training sets of different sizes.
Authorized licensed use limited to: Shanghai Maritime University. Downloaded on September 23,2021 at 11:57:51 UTC from IEEE Xplore. Restrictions apply.
292 IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING, VOL. 18, NO. 1, JANUARY/FEBRUARY 2021
Authorized licensed use limited to: Shanghai Maritime University. Downloaded on September 23,2021 at 11:57:51 UTC from IEEE Xplore. Restrictions apply.
SUN AND QIAN: DEEP LEARNING AND VISUALIZATION FOR IDENTIFYING MALWARE FAMILIES 293
TABLE 3
Confusion Matrix of Training Dataset(49) to Predict Dataset3
TABLE 4
Confusion Matrix of Training Dataset1 to Predict Dataset3
Authorized licensed use limited to: Shanghai Maritime University. Downloaded on September 23,2021 at 11:57:51 UTC from IEEE Xplore. Restrictions apply.
294 IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING, VOL. 18, NO. 1, JANUARY/FEBRUARY 2021
TABLE 5
Confusion Matrix of Training Dataset2 to Predict Dataset3
ACKNOWLEDGMENTS
This work is partially sponsored by National Key Resea-
rch and Development Program of China(2018YFB0704400,
2016YFB0700504), Shanghai Municipal Science and Technol-
ogy Commission(15DZ2260301), Natural Science Foundation
of Shanghai(16ZR1411200). The authors gratefully appreciate
the anonymous reviewers for their valuable comments.
REFERENCES
[1] Symantec, “2016 internet security threat report highlights,” 2016.
[Online]. Available: https://www.symantec.com/security-center/
threat-report
[2] B. Brenner, “Wannacry: the ransomware worm that didnt arrive
on a phishing hook,” May 18, 2017. [Online]. Available: https://
nakedsecurity.sophos.com/2017/05/17/wannacry-the-
ransomware-wor m-that-didnt-arrive-on-a-phishing-hook/
[3] B. News, “Cyber-attack: Europol says it was unprecedented in
scale,” May 13, 2017. [Online]. Available: http://www.bbc.com/
Fig. 17. The performance of different methods under various scales of news/world-europe-39907965
training sets. [4] A. Liptak, “The wannacry ransomware attack has spread to 150
countries,” May 14, 2017. [Online]. Available: https://www.
theverge.com/2017/5/14/15637888/authorities-wannacry-ransom
ware-attack-spread-150-countries
5 CONCLUSION [5] E. Gandotra, D. Bansal, and S. Sofat, “Malware analysis and
classification: A survey,” J. Inf. Secur., vol. 5, no. 3, p. 56, 2014.
In this paper, we propose the RMVC method to analyze mali- [6] A. Moser, E. Kirda, and C. Kruegel, “Limits of static analysis for mal-
cious code statically using visual images by applying two ware detection,” in Proc. Annu. Comput. Secur. Appl. Conf., Dec. 2007,
deep learning techniques, RNN and CNN. RNN is applied to pp. 421–430.
associate the current data with similar information and [7] D. E. Rumelhart, G. E. Hinton, R. J. Williams, et al., “Learning
representations by back-propagating errors,” Cognitive Model.,
improve the anti-interference ability in the process of analysis. vol. 5, no. 3, 1988, Art. no. 1.
CNN is applied to classify feature images. This method pro- [8] T. Mikolov, M. Karafiat, L. Burget, J. Cernockỳ, and S. Khudanpur,
vides both high accuracy and good generalization. Because of “Recurrent neural network based language model,” in Proc. 11th
the RNN, RMVC learns more malware classification without Annu. Conf. Int. Speech Commun. Assoc., Sep. 2010, pp. 1045–1048.
[9] A. Graves, “Generating sequences with recurrent neural net-
being given category labels. Using an RNN improves the works,” arXiv:1308.0850, 2013.
results in all experiments. Even if a small training dataset is [10] A. Graves, A. rahman Mohamed, and G. Hinton, “Speech
used, the accuracy of this method can still exceed 92 percent. recognition with deep recurrent neural networks,” in Proc. IEEE
Int. Conf. Acoust. Speech Signal Process., May 2013, pp. 6645–6649.
RMVC improves the accuracy of the traditional method by
more than 10 percent. If the training dataset size is increased,
Authorized licensed use limited to: Shanghai Maritime University. Downloaded on September 23,2021 at 11:57:51 UTC from IEEE Xplore. Restrictions apply.
SUN AND QIAN: DEEP LEARNING AND VISUALIZATION FOR IDENTIFYING MALWARE FAMILIES 295
[11] Y. LeCun, “Generalization and network design strategies,” [36] Y. Bengio, P. Frasconi, and P. Simard, “The problem of learning
Connectionism Perspective, R. Pfeifer, Z. Schreter, F. Fogelman, and long-term dependencies in recurrent networks,” in Proc. IEEE Int.
L. Steels, Eds., Elsevier, 1989. Conf. Neural Netw., Mar. 1993, pp. 1183–1188.
[12] Z. Yuan, Y. Lu, Z. Wang, and Y. Xue, “Droid-sec: deep learning in [37] S. Hochreiter, “Untersuchungen zu dynamischen neuronalen
android malware detection,” ACM SIGCOMM Comput. Commun. netzen,” Diploma, Technische Universit€at M€ unchen, M€
unchen,
Rev., vol. 44, no. 4, pp. 371–372, 2015. Germany, vol. 91, 1991.
[13] R. Pascanu, J. W. Stokes, H. Sanossian, M. Marinescu, and [38] S. Hochreiter and J. Schmidhuber, “Long short-term memory,”
A. Thomas, “Malware classification with recurrent networks,” in Neural Comput., vol. 9, no. 8, pp. 1735–1780, 1997.
Proc. IEEE Int. Conf. Acoust. Speech Signal Process., Apr 2015, [39] K. Cho, B. Van Merri€enboer, D. Bahdanau, and Y. Bengio, “On
pp. 1916–1920. the properties of neural machine translation: Encoder-decoder
[14] S. Tobiyama, Y. Yamaguchi, H. Shimada, T. Ikuse, and T. Yagi, approaches,” http://arxiv.org/abs/1409.1259, 2014.
“Malware detection with deep neural network using process behav- [40] J. Huang, “Accelerating ai with gpus: A new computing model,”
ior,” in Proc. IEEE 40th Annu. Comput. Softw. Appl. Conf., Jun. 2016, Jan. 12, 2016. [Online]. Available: https://blogs.nvidia.com/
pp. 577–582. blog/2016/01/12/accelerating-ai-artificial-intelligence-gpus/
[15] W. Hu and Y. Tan, “Black-box attacks against rnn based malware [41] E. C. R. Shin, D. Song, and R. Moazzezi, “Recognizing functions in
detection algorithms,” in Proc. 32nd AAAI Conf. Artif. Intell., New binaries with neural networks,” in Proc. 24th USENIX Conf. Secur.
Orleans, USA, Feb. 2018, pp. 1–10. Symp., Aug. 2015, pp. 611–626.
[16] A. Z. Broder, “On the resemblance and containment of documents,” [42] T. Bao, J. Burket, M. Woo, R. Turner, and D. Brumley,
in Proc. Int. Conf. Compression Complexity Sequences, Jun. 1997, “Byteweight: Learning to recognize functions in binary code,” in
pp. 21–29. Proc. 23rd USENIX Conf. Secur. Symp., Aug. 2014, pp. 845–860.
[17] R. Tian, L. M. Batten, and S. Versteeg, “Function length as a tool [43] Microsoft, “Microsoft malware classification challenge
for malware classification,” in Proc. 3rd Int. Conf. Malicious (BIG 2015),” [Online]. Available: https://www.kaggle.com/c/
Unwanted Softw., Oct. 2008, pp. 69–76. malware-classification, Accessed on: 2015.
[18] M. Z. Shafiq, S. A. Khayam, and M. Farooq, “Embedded malware [44] J. Chung, C. Gulcehre, K. Cho, and Y. Bengio, “Empirical evaluation
detection using markov n-grams,” in Proc. Int. Conf. Detection of gated recurrent neural networks on sequence modeling,” http://
Intrusions Malware Vulnerability Assessment, 2008, pp. 88–107. arxiv.org/abs/1412.3555, 2014.
[19] M. Schultz, E. Eskin, F. Zadok, and S. J. Stolfo, “Data mining [45] A. Z. Broder, M. Charikar, A. M. Frieze, and M. Mitzenmacher,
methods for detection of new malicious executables,” in Proc. “Min-wise independent permutations,” J. Comput. Syst. Sci., vol. 60,
IEEE Symp. Secur. Privacy, May 2001, pp. 38–49. no. 3, pp. 630–659, 2000.
[20] M. F. Zolkipli and A. Jantan, “An approach for malware behavior [46] Y. Gal and Z. Ghahramani, “A theoretically grounded application of
identification and classification,” in Proc. 3rd Int. Conf. Comput. dropout in recurrent neural networks,” in Proc. 30th Int. Conf. Neural
Res. Develop., Mar. 2011, pp. 191–194. Inf. Process. Syst., Barcelona, Spain, 2016, pp. 1027–1035.
[21] D. Kong and G. Yan, “Discriminant malware distance learning on [47] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley,
structural information for automated malware classification,” in S. Ozair, A. Courville, and Y. Bengio, “Generative adversarial nets,”
Proc. 19th ACM SIGKDD Int. Conf. Knowl. Discovery Data Mining, in Proc. Neural Inf. Process. Syst., 2014, pp. 2672–2680.
Aug. 2013, pp. 1357–1365. [48] A. Shrivastava, T. Pfister, O. Tuzel, J. Susskind, W. Wang, and
[22] U. Bayer, TTAnalyze: A Tool for Analyzing Malware, Technical R. Webb, “Learning from simulated and unsupervised images
University of Vienna, Dec. 2005. through adversarial training,” http://arxiv.org/abs/1612.07828,
[23] C. Willems, T. Holz, and F. Freiling, “Toward automated dynamic 2016.
malware analysis using cwsandbox,” IEEE Secur. Privacy, vol. 5,
no. 2, pp. 1735–1780, Mar./Apr. 2007. Guosong Sun received the bachelor’s degree in
[24] J. Z. Kolter and M. A. Maloof, “Learning to detect malicious computer science & technology from Huazhong
executables in the wild,” in 10th ACM SIGKDD Int. Conf. Knowl. University of Science and Technology, China.
Discovery Data Mining, Aug. 2004, pp. 470–478. He is working toward the master’s degree in the
[25] B. Anderson, D. Quist, J. Neil, C. Storlie, and T. Lane, “Graph-based School of Computer Engineering & Science,
malware detection using dynamic analysis,” J. Comput. Virology, Shanghai University, China. Now, his research
vol. 7, no. 4, pp. 247–258, 2011. interests include cloud computing, big data analy-
[26] L. Nataraj, S. Karthikeyan, G. Jacob, and B. S. Manjunath, “Malware sis, computer and network security especially in
images: Visualization and automatic classification,” in Proc. 8th Int. malware analysis.
Symp. Vis. Cyber Secur., Jul. 2011, pp. 1–7.
[27] K. Kancherla and S. Mukkamala, “Image visualization based
malware detection,” in Proc. IEEE Symp. Comput. Intell. Cyber
Secur., Apr. 2013, pp. 40–44. Quan Qian received the PhD degree in computer
[28] K. S. Han, J. H. Lim, B. Kang, and E. G. Im, “Malware analysis science from the University of Science and Tech-
using visualized images and entropy graphs,” Int. J. Inf. Secur., nology of China (USTC), in 2003 and conducted
vol. 14, no. 1, pp. 1–14, 2015. postdoc research in USTC from 2003 to 2005.
[29] A. Makandar and A. Patrot, “Malware class recognition using After that, he joined Shanghai University and now
image processing techniques,” in Proc. Int. Conf. Data Manage. he is the lab director of network and information
Analytics Innovation, Feb. 2017, pp. 76–80. security, also the director of the center of materi-
[30] S. Z. M. Shaid and M. A. Maarof, “Malware behavior image for als data and informatics. He is a full professor
malware variant identification,” in Proc. Int. Symp. Biometrics with the School of Computer Engineering &
Secur. Technol., Aug. 2014, pp. 238–243. Science, Shanghai University, China. His main
[31] T. Wang and N. Xu, “Malware variants detection based on opcode research interests concerns computer network
image recognition in small training set,” in Proc. IEEE 2nd Int. and network security, especially in cloud computing, big data analysis
Conf. Cloud Comput. Big Data Anal., Apr. 2017, pp. 328–332. and wide scale distributed network environments.
[32] M. Yang and Q. Wen, “Detecting android malware by applying
classification techniques on images patterns,” in Proc. IEEE 2nd
Int. Conf. Cloud Comput. Big Data Anal., Apr. 2017, pp. 344–347. " For more information on this or any other computing topic,
[33] J. Zhang, Z. Qin, H. Yin, L. Ou, and Y. Hu, “Irmd: Malware please visit our Digital Library at www.computer.org/csdl.
variant detection using opcode image recognition,” in Proc. IEEE
22nd Int. Conf. Parallel Distrib. Syst., Dec. 2016, pp. 1175–1180.
[34] L. Liu and B. Wang, “Malware classification using gray-scale
images and ensemble learning,” in Proc. 3rd Int. Conf. Syst.
Informat., Nov. 2016, pp. 1018–1022.
[35] M. Schuster and K. K. Paliwal, “Bidirectional recurrent neural
networks,” IEEE Trans. Signal Process., vol. 45, no. 11, pp. 2673–2681,
Nov. 1997.
Authorized licensed use limited to: Shanghai Maritime University. Downloaded on September 23,2021 at 11:57:51 UTC from IEEE Xplore. Restrictions apply.