A Novel Approach For Code Smells Detection Based On Deep Learning
A Novel Approach For Code Smells Detection Based On Deep Learning
Deep Leaning
1 Introduction
Code smells are increasingly generated by modern agile software development. This
is because code changes are much more frequent and occur on a daily basis for large
software companies and dominant open-source communities.
Although there are many more test approaches to detect code smells, these methods
have some defects. Due to the frequent changes, it is increasing probable to generate
code smells overheads. Code smells, like software bugs, are a serious problem in
modern software.
Nowadays, the code smells are being researched by many practitioners. Software
developers are not aware of what is the code smell, although they are aware of software
bugs, thanks integrated environment development kits that can provide many instant
suggestions and notifications when there are bugs.
The question here is how to detect code smells effectively? And what is the
motivation for detecting code smells? Although there are many more test approaches
to detect code smells, these methods have some defects. There are mainly two
categories of deep learning networks. One is Recurrent Neural Networks, and another
is Convolutional networks.
Corresponding Authors: Tao Lin (Amazon, USA, [email protected]), Luqun Li (Shanghai Normal
University, China, [email protected]), Fu Chen (Central University of Finance and
Economics, China, [email protected]), Xue Fu (Shanghai Normal University, China)
2
With recent development in software engineering, it is easy to find software bugs using
several methods from compiling to running. Our work is based on the state-of-the-art
deep learning methods for detection and recognition task.
3
Firstly, we transform the software source code to XML file, in order to be processed
by deep learning models [2]. Then there are two steps: code segment proposal and
classification. The code segment proposal leverages the heuristic search to generate
following inputs. These segments are processed by CNN classifier. We try to avoid to
use R-CNN, otherwise. One of the main reasons is that R-CNN uses selective search
algorithm, which is time consuming.
We use the following equation for segments:
𝑙
Seg = (max𝑟𝑝2 ⁄𝐷 + 𝑚𝑖𝑛𝑟𝑞) ∗ (𝐼/𝐷)
Seg is the segments, r is the rule of limits, and p is process variable, q is the next
graph inputs, I is interception, and D is next destination.
From Table 1, this work achieves high performance in terms of F2 score, especially
for the category of uncontrolled side effects and contrived complexity.
4
4 Conclusion
In this work, we conducted a research for code smells detection based on deep learning.
Our solution uses convolutional neural network for training a model to detect several
common code smells problems in software engineering. The solution achieves satisfied
F2 score with the average above 0.75.
Acknowledge
Part of this work is from the author’s PhD study [1], before the author joining
Amazon. Professor Fu Chen from Central University of Finance and Economics
provided many constructive suggestions and perspectives for this work during author’s
PhD study. Professor Fu Chen and this work was supported in part by National Science
Foundation of China under No.61672104.
Reference
[1] T. Lin, “A Data Triage Retrieval System for Cyber Security Operations Center,”
Pennsylvania State Univ. Thesis, 2018.
[2] T. Lin, “A Container - Destructor – Explorer Paradigm to Code Smells
Detection,” J. Chinese Comput. Syst., vol. 37, no. 3, 2016.
[3] T. Lin and X. Fu, “Flame Detection Based on SIFT Algorithm and One Class
Classifier with Undetermined Environment,” Comput. Sci., vol. 42, no. 6, 2015.
[4] T. Lin, C. Zhong, J. Yen, and P. Liu, “Retrieval of Relevant Historical Data
Triage Operations in Security Operation Centers,” in From Database to Cyber
Security, Springer, Cham, 2018, pp. 227–243.
[5] T. Lin, “A Novel Image Matching Algorithm Based on Graph Theory,”
Comput. Appl. Softw., vol. 33, no. 12, 2016.
[6] T. Lin, “Graphic User Interface Testing Based on Petri Net,” Appl. Res.
Comput., vol. 33, no. 3, 2016.
[7] T. Lin, “A Novel Direct Small World Network Model,” J. Shanghai Norm.
Univ., vol. 45, no. 5, 2016.
[8] T. Lin, J. Gao, X. Fu, and Y. Lin, “A Novel Bug Report Extraction Approach,”
in International Conference on Algorithms and Architectures for Parallel
Processing, 2015, pp. 771–780.
[9] C. Zhong, T. Lin, P. Liu, J. Yen, and K. Chen, “A cyber security data triage
operation retrieval system,” Comput. Secur., vol. 76, pp. 12–31, 2018.
[10] T. Lin, “Deep Learning for IoT,” 39th IEEE -- International Performance
Computing and Communications Conference, 2020.
[11] T.Lin, “Security Operations Center Retrieval,” 2021.
https://github.com/ltaocs/SecurityOperationsCenterRetrieval.
5