0% found this document useful (0 votes)
4 views

Research On Software Multiple Fault Localization Method Based On Machine Learning

Uploaded by

mehajol272
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views

Research On Software Multiple Fault Localization Method Based On Machine Learning

Uploaded by

mehajol272
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

MATEC Web of Conferences 232, 01060 (2018) https://doi.org/10.

1051/matecconf/201823201060
EITCE 2018

Research on Software Multiple Fault Localization Method Based


on Machine Learning
Meng Gao 1,2 , Pengyu Li 1,2 , Congcong Chen 2 and Yunsong Jiang 1,2
1Beijing Institute of Control Engineering, No. 16, South Third Street, Zhongguancun, Haidian District, Beijing, China
2 Beijing Sunwise Information Technology Ltd., No. 16, South Third Street, Zhongguancun, Haidian District, Beijing, China

Abstract. Fault localization is one of time-consuming and labor-intensive activity in the debugging process.
Consequently, there is a strong demand for techniques that can guide software developers to the locations of
faults in a program with high accuracy and minimal human intervention. Despite the research of neural
network and decision tree has made some progress in software multiple fault localization, there is still a lack
of systematic research on various algorithms of machine learning. Therefore, a novel machine-learning-
based multiple faults localization is proposed in this paper. First, several concepts and connotation of
software multiple fault localization are introduced, move on to the status and development trends of the
research. Next, the principles of machine learning classification algorithm are explained. Then, a software
multiple fault localization research framework based on machine learning is proposed. The process is taking
the Mid function as an example, compares and analyzes the performance of 22 machine learning models in
software multiple fault localization. Finally, the optimal machine learning method is verified in the multiple
fault localization of the Siemens suite dataset. The experimental results show that the machine learning
based on Random Forest algorithm has more accuracy and significant positioning efficiency. This paper
effectively solved the problem of large amount of program spectrum data and multi-coupling fault location,
which is very helpful for improving the efficiency of software multiple fault debugging.

1 Introduction the software developer’s experience, judgment, and


intuition to identify and prioritize code that is likely to be
Software is fundamental to our lives today, and with its faulty. These limitations have led to a surge of interest in
ever-increasing usage and adoption, its influence is developing techniques that can partially or fully
practically ubiquitous. At present, software is critical to automate the localization of faults in software while
many security and safety-critical systems in industries reducing human input. Though some techniques are
such as medicine, aeronautics, and nuclear energy. Not similar and some are very different,they each try to
surprisingly, this trend has been accompanied by a attack the problem of fault localization from a unique
drastic increase in the scale and complexity of software. perspective, and typically offer both advantages and
Unfortunately, this has also resulted in more software disadvantages relative to one another. With many
bugs, which often lead to execution failures with huge techniques already in existence and others continually
losses [1,2]. Furthermore, software faults in safety- being proposed, as well as with advances being made
critical systems have significant ramifications, including both from a theoretical and practical perspective. There
not only financial loss, but also potential loss of life, is a huge demand for technologies that can help
which is an alarming prospect [3]. programmers effectively locate errors and also stimulates
Developing software programs are universally the proposal of many fault localization techniques from a
acknowledged as an error-prone task. The major widespread perspective. As more and more researchers
bottleneck in software debugging is how to identify have devoted themselves to the area of software fault
where the bugs are [4], this is known as fault localization localization over the last 10 years, numbers of papers
problem. Nonetheless, faults in software are discovered published on software fault localization from 1977 to
due to erroneous behavior or some other manifestation of 2018 are also increasing. This growth trend again
the faults, finding and fixing them is an entirely different supports the claim that software fault localization is not
matter. Fault localization, i.e., identifying the locations just an important but also a popular research topic as it
of faults, has historically been a manual task that has has been discussed very heavily in top quality software
been recognized to be time consuming and tedious as engineering journals and conferences over the last 10
well as prohibitively expensive [5], given the size and years [6].
complexity of large-scale software systems today. The remainder of this article is organized in the
Furthermore, manual fault localization relies heavily on following manner: we begin by describing software

a Corresponding author: [email protected]


© The Authors, published by EDP Sciences. This is an open access article distributed under the terms of the Creative Commons Attribution License 4.0
(http://creativecommons.org/licenses/by/4.0/).
MATEC Web of Conferences 232, 01060 (2018) https://doi.org/10.1051/matecconf/201823201060
EITCE 2018

multiple fault localization techniques, research status and Software defects will only be converted into runtime
development trend, and machine learning classification software failures when the specific conditions are met.
model in Section 2. In Section 3, the specific process of Software errors will accumulate and effective
software multiple fault localization method based on propagation will occur. Eventually the software is
machine learning is summarized. Experimental results invalidated.
and analysis are described in Section 4. Finally, According to the multiple fault mechanism of the
conclusions are presented in Sections 5. defect described in Figure 1, the basic principle of the
traditional fault localization is to re-run the defect
program with the same input after setting the breakpoint,
2 Background then check the corresponding program state and perform
This section describes software multiple fault reverse reasoning, repeat the above process until find the
localization techniques, research status and development defect. The basic techniques used for traditional fault
localization are: breakpoints, single-step execution,
trend,machine learning classification model.
output debug information, logging, event tracking, dump
files, stack trace back, disassembly, observation and
2.1 Software Multiple Fault Localization modification of data, control of debugged processes and
threads.
Software multiple fault localization is to find the wrong
instruction, procedure, or data definition implied in the
source code of the program. The granularity of multiple 2.2 Research Status and Development Trend of
fault localization can be program statements, basic Fault Localization Techniques
blocks, branches, functions, or classes. Software multiple In 2016, W.Eric Wong et al. [6] summarized a milestone
fault localization is mainly divided into static analysis development history of software fault localization
method and dynamic testing method. The multiple fault technology in the paper "A Survey on Software Fault
localization based on static analysis method mainly uses Localization", which from a publication repository that
program dependencies, constraint solving, theorem includes 331 paper published form 1977 to 2014. In his
proving to analyse possible error locations in the survey, the fault localization techniques were classified
program. The multiple fault localization based on into eight categories, including slice-based techniques,
dynamic testing method mainly uses test cases to collect program spectrum-based techniques, statistics-based
program execution information and calculate possible techniques, program state-based techniques, machine
error locations in the program. The process of software learning-based techniques, data mining-based techniques,
multiple failures caused by defects is shown in Figure 1 model-based techniques and miscellaneous techniques.
as follows. In 2015, Chen Xiang et al. [7] offers a systematic
overview of existing research achievements of the
domestic and foreign researchers in recent years in the
paper “Review of Dynamic Fault Localization
Approaches Based on Program Spectrum". The survey
proposed a research framework based on program
spectrum for dynamic defect localization and identified
important influencing factors which can affect the
effectiveness of fault localization. These factors include
program spectrum construction, test suite maintenance
and composition, number of faults, test case oracle, user
feedback, and fault removal cost.
With the development of electronic technology, the
Figure 1. The process of software multiple failures caused by software scale function is getting bigger and bigger, the
defects internal logic relationship is more and more complicated,
First, during the operation of the system, defects may the number of defects in the system is also increasing,
be activated, causing the system to malfunction; and the difficulty of software fault localization is
secondly, the fault will propagate as the system runs, and increasing day by day. The traditional spectrum-based
will continue to be converted into errors and passed software fault localization and slice-based software fault
between subsystems; finally, with the error continues to localization solve the problem of software single fault
spread, and the error eventually reaches the user location. Currently, there still exist the problems of low
interface in the system, causing the system's incorrect accuracy of software multiple fault localization and large
behavior to be perceived by the user, which leads to amount of program spectrum data. Under the
system failure. It can be found that there is a multi- background of artificial intelligence and big data
coupling fault action chain between the defect and the promotion, the development trend of the domestic and
failure. Based on the multi-coupling fault action chain, foreign researchers in recent years is as follows:
the backtracking method can be used to find the defect
occurrence position and then locate the defect. From the
above failure mechanism, it can be found that software 2.2.1 Software Multiple Fault Localization
defects do not necessarily lead to software failure. Techniques

2
MATEC Web of Conferences 232, 01060 (2018) https://doi.org/10.1051/matecconf/201823201060
EITCE 2018

Software fault localization techniques usually assume neural networks are known to suffer from issues such as
that only one defect is included in the error program, paralysis and local minima. Subsequently, in 2012,
which is not the case. The presence of multiple faults in Wong et al.[15] proposed an improved approach based
a program can inhibit the ability of fault localization on radial basis function (RBF) networks, which are less
techniques to locate the faults. This problem occurs for susceptible to these problems and have a faster learning
two reasons: first, when a program fails, the number of rate [16], [17]. The RBF network is trained using an
faults is generally unknown; second, certain faults may approach similar to the BP network. Once the training is
mask or obfuscate other faults. In recent years, completed, the output.
researchers have studied how to locate error programs In 2013, He J.L et al. [18] proposed a novel neural-
that contain multiple faults. network-based multiple faults location model, which
In 2007, Jones et al. [8] presented a parallel support degree of the input for each fault. The model
debugging approach to solving thus problem that learns the relationship between the faults and the
leverages the well-known advantages of parallel work candidate locations of faults using the constructed neural
flows to reduce the time-to-release of a program, which network. Constructing an ideal input as the input of
consists of a technique that enables more effective learned neural network, the model can calculate the
debugging in the presence of multiple faults and a suspicious degree of each candidate location of fault,
methodology that enables multiple developers to then obtain the sequence sorting by the suspicious degree,
simultaneously debug multiple faults. Unlike Jones and and complete the task of multiple fault localization.
others who use only program feature behavior, Abreu et The above research is only based on the research of
al. [9] proposed a hybrid framework with logical neural network and decision tree. It has made some
reasoning in 2010. They uses both feature information progress in software multiple fault localization. However,
from program execution and Bayesian inference to infer there is still a lack of systematic research on various
multiple instances. One of the characteristics of Bayesian algorithms of machine learning in software multiple fault
inference that is questionable in the case of defects and localization. For the coupling, correlation and
their suspicious size is that it can well explain why nonlinearity of software multiple fault, machine learning
multiple defects occur intermittently and cause program has strong generalization ability, adaptability and
errors. robustness. It can learn the inherent law implicit in the
sample by learning the finite sample. In summary, there
is an urgent need to use machine learning to train big
2.2.2 Machine Learning-Based Techniques
data program spectrum to solve the problem of software
Machine learning is the study of computer algorithms multiple fault localization.
that improve through experience. Machine learning
techniques are adaptive and robust and can produce 2.3 Machine learning classification model
models based on data. In the context of fault localization,
the problem can be identified as trying to learn or deduce Machine learning classification is a supervised learning
the location of a fault based on input data such as approach in which the computer program learns from the
statement coverage and the execution result of each test data input given to it and then uses this learning to
case. classify new observation. The machine learning
Briand et al. [10] uses the C4.5 decision tree classification model attempts to construct a classifier by
algorithm to construct rules that classify test cases into using a known-observed values, and predict the category
various partitions. The statement coverage of both the of an unknown category object. Machine learning
failed and successful test cases in each partition is used classification algorithms include Bayesian, neural
to rank the statements using a heuristic similar to networks, support vector machines, rules, decision trees,
Tarantula [11] to form a ranking. These individual and integrated learning.
rankings are then consolidated to form a final statement
ranking which can be examined to locate the faults. This
2.3.1 Bayesian Network
technique is more effective for bug locating, as only a
relatively smaller amount of code needs to be examined Bayesian network [19] is a probabilistic network, which
to find bugs, compared to other state of the art is a graphical network based on probabilistic reasoning.
contemporary techniques. Wong et al. [12] proposed a The Bayesian formula is the basis of this probabilistic
fault localization technique based on a back-propagation network. It is suitable for the expression and analysis of
(BP) neural network, The coverage data of each test case uncertain and probabilistic events. Reasoning can be
and the corresponding execution results are collected, made from incomplete, inaccurate or uncertain
and to be used to train a BP neural network so that the knowledge. The main goal of Bayesian inference is to
network can learn the relationship between them. Then, estimate the value of a hidden node given the value of
the coverage of a suite of virtual test cases that each the observed node. Bayesian-based classification
covers only one statement in the program is input to the algorithms include Bayes Net, Naive Bayes, Naive
trained BP network. The outputs can be regarded as the Bayes Multinomial, Naive Bayes Multinomial Text,
likelihood of each statement containing the bug. Naive Bayes Multinomial Updateable, Naive Bayes
In 2009, Ascari et al. [13] extended the BP-based Updateable.
technique [14] to object-oriented programs. As BP

3
MATEC Web of Conferences 232, 01060 (2018) https://doi.org/10.1051/matecconf/201823201060
EITCE 2018

2.3.2 Neural Network category label indicating a possible classification result.


The number in parentheses indicates the number of
A is a mathematical model that mimics the structure and instances that arrive at the leaf node. Decision tree
function of a biological neural network [20]. A neural classification algorithms include Decision Stump,
network consists of a large number of nodes and nodes Hoeffding Tree, J48, LMT, Random Tree, REP Tree.
connected to each other. Each node represents a specific
output function called an excitation function. The
connection between every two node has a numerical 2.3.6 Ensemble Learning
weight value. The weight value can be adjusted
Ensemble learning improves classification accuracy by
experimentally, which enables the neural network to
aggregating prediction results from multiple classifiers.
adapt to the input and be able to learn. The network
The ensemble method constructs a set of base classifiers
connection, weight values, and stimulus functions
from the training data and then classifies them by the
determine the output of the network. The most common
predicted votes of each base classifier. The performance
classification algorithm based on neural network is BP
of the ensemble classifier is better than any single
(Back Propagation) backpropagation neural network.
classifier because collective decision making is superior
to individual decision making in terms of overall
2.3.3 Support Vector Machine reliability and accuracy. There are two types of
classification methods for building integrated classifiers.
SVM [21] is a supervised learning method widely used The first type processes the training data and subsamples
in statistical classification and regression analysis. The to obtain a plurality of training sets according to a
SVM is characterized by the ability to minimize both sampling distribution that determines the probability of
empirical errors and maximize geometric edges. sampling of a certain sample. Then use a specific
Therefore, the support vector machine is also called the learning algorithm to build a classifier for each training
maximum edge classifier. Support vector machine set. The second type processes the input features to
technology has a solid theoretical foundation of statistics, obtain multiple training sets by selecting multiple
and there are many successful cases in practice. SVM subsets of the input features. This method is especially
can be used well for high-dimensional data to avoid suitable for data with a large number of redundant
dimensional disasters. It has a unique feature that uses a features set. Ensemble learning algorithms include
subset of training instances to represent decision AdaBoostM1, Logit Boost, Random Forest [24]。
boundaries, which are called support vectors. The
The logical view of ensemble learning is shown in
Sequential Minimal Optimization algorithm, the most
Figure 2. The basic idea is to build multiple classifiers
common classification algorithm based on support vector
on the original data, then predict the categories of
machine.
unknown samples separately, and finally gather the test
results.
2.3.4 Rule-based
Rule-based [22] classification is a technique for
classifying records by using a set of judgment rules. In
order to build a rule-based classifier, a set of rules need
to be extracted to identify key relationships between
dataset attributes and category labels. There are two
ways to extract classification rules: the first one is the
direct method, which extracts the classification rules
directly from the data; the second is the indirect method,
which extracts from other models such as decision trees,
first finds the decision tree, and then from each The leaf
node extracts a rule and merges the rules that can be Figure 2. the Logical View of Ensemble Learning
merged through some optimization criteria. Rule-based The ensemble learning algorithm framework is as
classification algorithms include Decision Table, JRip, follows:
OneR, PART, and ZeroR. For i = 1 to k do
Create training set Di by D
Constructing a base classifier Ci by Di
2.3.5 Decision Trees End For
A decision tree is a mechanical way to make a decision For Every test sample x∈T do
by dividing the inputs into smaller decisions. The C*(x) = Vote(C1(x),C2(x),…,Ck(x))
decision tree classification includes three parts: decision End For
nodes, branches, and leaf nodes [23]. The decision node Where, D represents the original training dataset, K
represents a test, usually representing an attribute of the represents the number of base classifiers, and T
sample to be classified, and different test results on the represents the test dataset.
attribute represent a branch which represents a different
value of a decision node. Each leaf node stores a 3 Multiple Fault Localization Method

4
MATEC Web of Conferences 232, 01060 (2018) https://doi.org/10.1051/matecconf/201823201060
EITCE 2018

The specific process of software multiple fault The mid function is a classic example of software
localization method based on machine learning is fault localization demonstration. The mid function is a
summarized as follows. the block diagram is shown in function to realize the intermediate value of three
figure 3. numbers. The mid-program is taken as an example to
(1) Input program spectrum dataset. illustrate the software fault localization method of 22
The program spectrum data consists of the statement machine learning models (RF is Random Forest, BP is
coverage vector executed by the test case and the Back Propagation Neural Network, LB is Logit Boost,
execution result of the test case. AB is AdaBoostM1, NB is Naive Bayes, NBU is Naive
(2) The program spectrum dataset is trained to obtain Bayes Updateable, NBM is Naive Bayes Multinomial,
a optimal model. NBMU is Naive Bayes Multinomial Updateable, DS is
The machine learning model (Bayesian, neural Decision Stump, SMO is, HT is Hoeffding Tree, BN is
network, support vector machine, rules, decision tree, Bayes Net, RT is, NBMT is Naive Bayes Multinomial
integrated learning) is used to train the program Text, DT is Decision Table, and REPT is REP Tree).
spectrum dataset, and optimize the training model by
adjusting machine learning parameters.
3.1 Software Single Fault Localization
(3) A statement suspiciousness ranking is obtained by
testing optimal model through the virtual unit matrix. For the single fault localization of line 4 of the mid
Test the optimal model of machine learning by using function, 22 machine learning model training program
the virtual unit matrix, obtain the suspiciousness value of spectrum data were used.
each statement from the test results. Then sort them in
order from high to low. The suspiciousness ranking can
help facilitate the debugger to check the error line by line
according to the suspiciousness of each statement,
thereby improving the fault localization efficiency.

Figure 4. Single fault localization spectrum data


As can be seen from Figure 5, the best machine
learning model for single fault localization is RF, BP,
LB, AB, NB, NBU, NBM, NBMU, DS, SMO, HT, BN,
which are ranked 1st in suspiciousness. The statement is
the fault statement. The machine learning model RT,
NBMT, DT, JRip, OneR, PART, ZeroR, J48, LMT, and
REPT have poor resolution and no discrimination for the
spectrum data of the mid function.

3.2 Software Double Fault Localization


For the double fault localization of lines 6 and 9 of the
mid function, the machine learning model training
program spectrum data is used.
Figure 3. Block diagram of software multiple fault localization
method based on Machine Learning

Figure 5. Machine learning algorithms compare single fault suspicious calculation data

5
MATEC Web of Conferences 232, 01060 (2018) https://doi.org/10.1051/matecconf/201823201060
EITCE 2018

and the RT suspiciousness ranking 1st sentence is defined


as one of the fault statements.

Figure 9. Machine learning algorithm comparison three fault


suspicious calculation data
Figure 6. Double fault localization spectrum data
As can be seen from Figure 7, the machine learning 3.4 Software Four Fault Localization
model with the best double fault localization is RF, BP,
LB, and AB. The statements with the first and second For the fourth fault localization of the 4th, 6th, 9th and
suspicious ranks are double fault statements. The 11th lines of the mid function, the machine learning
machine learning model NB, NBU, NBM, DS, and RT model training program spectrum data is used.
suspiciousness ranks the first sentence for one of the fault
statements. The machine learning model HT, BN, SMO
double fault localization is not effective, no
discrimination.

Figure 10. Four fault localization spectrum data


As can be seen from Figure 11, the machine learning
model with the best four-fault positioning effect is the
four-fault statement with the RF, BP, and LB being the
Figure 7. Machine learning algorithm compares double fault
suspiciousness calculation data
suspicious rankings 1, 2, 3, and 4. The machine learning
model AB suspicious degree rankings 1 and 2 are defined
as two fault statements. The machine learning model of
3.3 Software Three Fault Localization Random Forest and LogitBoost based on integrated
learning and BP neural network excels in software
For the third fault localization of the 4th, 6th and 9th lines multiple fault localization.
of the mid function, the machine learning model training
program spectrum data is used.

Figure 11. Machine learning algorithm comparison four fault


suspicious calculation data

4 Experimental Results and Analysis


This article uses the test dataset Siemens Suite [25] as the
Figure 8. Three fault localization spectrum data experimental object of the experiment, which contains
As can be seen from Figure 9, the machine learning seven programs, each with the correct version, several
model with the best three-fault positioning effect is the wrong versions and test cases. All of the above resources
three-fault statement in which the RF, BP, and LB are the can be downloaded from the Software-artifact
first, second, and third sentences of the suspicious Infrastructure Repository (SIR) library at the University
ranking. The machine learning model AB, NB, NBU, of Nebraska-Lincoln, as shown in the following table:
NBM, NBMU suspicious rankings 1 and 2 are defined as
two fault statements. The machine learning model DS

6
MATEC Web of Conferences 232, 01060 (2018) https://doi.org/10.1051/matecconf/201823201060
EITCE 2018

Table1. Summary of the Siemens of the Siemens Suite

The entire system needs to go through : data


collection, feature extraction, comparison of results, input
data integration, machine learning model training and
testing, virtual unit matrix, Comparison of multiple fault
localization efficiency. The specific process is shown in
the following figure:

Figure 14. Machine learning model double fault localization


accuracy

Figure 12. Siemens Suite basic process of multiple fault


localization experiment
The main efficiency comparison criterion of the fault
localization algorithm is to check as few statements as Figure 15. Machine learning model three fault localization
possible under the premise that the error has been located. accuracy
Here, the accuracy [26] value is used to describe the
positioning efficiency:
Accuracy = The suspicious ranking of the fault
statement/
Total number of execution statements (1)
Among them: accuracy indicates the percentage of the
statement that needs to be found before finding the real
error statement. The smaller the value, the higher the
efficiency of fault localization.
Among them: accuracy indicates the percentage of the Figure 16. Machine learning model four fault localization
statement that needs to be found before finding the real accuracy
error statement. The smaller the value, the higher the
efficiency of fault localization.

Figure 17. Machine learning model multiple fault localization


Figure 13. Machine learning model single fault localization accuracy
accuracy From Figure 13~17,Under the condition of single
fault 、 double faults 、 three faults 、 four faults , the

7
MATEC Web of Conferences 232, 01060 (2018) https://doi.org/10.1051/matecconf/201823201060
EITCE 2018

accuracy of machine learning model of Random Forest References


and Logit Boost based on integrated learning and BP
neural network perform well in software multiple fault 1. G. J. Pai and J. B. Dugan, "Empirical analysis of
localization. software fault content and fault proneness using
This article introduces the standard Imp percentage Bayesian methods," IEEE Trans. Softw. Eng., 33(10),
[27], which is the percentage of the total number of pp. 675–686, (2007)
lookup statements and the total number of statements in 2. C. S. Wright and T. A. Zia, "A quantitative analysis
all error versions, in the case where all versions of the into the economics of correcting software bugs," in
error for a single program are found. Obviously, the Proc. Int. Conf. Comput. Intell. Security Inf. Syst.,
lower the Imp percentage, the less time it takes to Torremolinos, (2011)
represent the fault localization algorithm and the higher
3. W. E. Wong, V. Debroy, A. Surampudi, H. Kim, and
the efficiency.
M. F. Siok, "Recent catastrophic accidents:
Investigating how software was responsible," in Proc.
4th Int. Conf. Secure Softw. Integr. Rel.Improvement,
(2010)
4. I. Vessey, "Expertise in debugging computer
programs: A process analysis," International Journal
of Man-Machine Studies, 23(5):459–494,(1985).
5. T. Wang, "Post-mortem dynamic analysis for
software debugging," Ph.D dissertation, FudanUniv.
(2007)
6. W. E. Wong, R. GAO, Y. LI, R. ABREU, & F.
WOTAWA, "A Survey on Software Fault
Localization," IEEE Trans. Softw. Eng.,42, 707-
Figure 18. Comparison of Imp percentages of RF, BP, and LB 740(2016)
programs 7. X. Chen, X. Ju, W. Wen, Q. Gu, "Review of
As can be seen from Figure 18, in the seven programs dynamic fault localization approaches based on
of the Siemens Suite, the Random Forest algorithm program spectrum," Journal of Software,
consumes the least search time and is more efficient than 26(2) ,(2015)
the LogitBoost and BP Neural Network algorithms.
8. J. A. Jones, J. Bowring, and M. J. Harrold,
"Debugging in parallel," in Proc. ACM/SIGSOFT Int.
5 Conclusion Symp. Softw. Testing Anal., (2007)
9. R. Abreu, A. Gonz_alez, and A. J. Gemund,
A good software fault localization technology can save
"Exploiting count spectra for Bayesian fault
funds, save human resources, and accelerate the progress
localization," presented at the 6th Int.Conf.
of software development. Therefore, how to solve these
Predictive Models Softw. Eng., (2010)
problems to improve the positioning with high accuracy
and minimal or no human intervention has a great 10. L. C. Briand, Y. Labiche, and X. Liu, "Using
significant. Various techniques have been proposed to machine learning to support debugging with
meet this requirement. However, the interactions between tarantula," in Proc. IEEE Int. Symp. Softw. Rel.,
multiple faults which have not been fully considered in (2007)
previous make the fault localization more complicated. In 11. J. A. Jones, and M. J. Harrold, "Empirical evaluation
order to solve this problem, the paper proposed a multiple of the tarantula automatic fault-localization
fault localization method, by comparing 22 kinds of technique," in Proc. Int. Conf. Autom. Softw. Eng.,
machine learning model and using in Siemens suite (2005)
dataset. The experimental results show that the Random 12. W. E. Wong, T. Sugeta, Y. Qi, and J. C. Maldonado,
Forest, BP neural network and Logit Boost machine "Smart debugging software architectural design in
learning model based on Ensemble Learning are excellent SDL," J. Syst. Softw., 76(1), pp. 15–28, (2005)
in software multiple fault localization. Among them, the
13. L. C. Ascari, L. Y. Araki, A. R. T. Pozo, and S. R.
Random Forest is suitable for solving the high-
Vergilio, "Exploring machine learning techniques for
dimensional features of the program spectrum data, and
fault localization," in Proc. 10th Latin Am. Test
the random forest model has strong generalization ability
Workshop, (2009)
and high training speed. Therefore the machine learning
model based on the random forest algorithm has more 14. W. E. Wong and Y. Qi, "BP neural network-based
accurate multiple fault positioning effect and more effective fault localization," Int. J. Softw. Eng. Knowl.
significant positioning efficiency. Eng., 19(4), pp. 573–597, (2009)
15. W. E. Wong, V. Debroy, R. Golden, X. Xu, and B.
Thuraisingham, "Effective software fault localization
using an RBF neural network," IEEE Trans. Rel.,
61(1), pp. 149–169, (2012)

8
MATEC Web of Conferences 232, 01060 (2018) https://doi.org/10.1051/matecconf/201823201060
EITCE 2018

16. C. C. Lee, P. C. Chung, J. R. Tsai, and C. I. Chang,


"Robust radial basis function neural networks," IEEE
Trans. Syst., Man, Cybern. B, Cybern., 29(6), pp.
674–685, (1999)
17. P. D. Wasserman, "Advanced Methods in Neural
Computing," (1993)
18. J. L. He, H. Zhang, "Application of Artificial Neural
Network in Software Multiple faults Location,"
Jorunal of Computer Research and Development,
50(3),(2013)
19. T. Hastie, R. Tibshirani, and J. Friedman, "The
Elements of Statistical Learning: Data Mining",
Inference and Prediction. Springer,( 2002)
20. T. M. Khoshgoftaar, E. B. Allen, J. P. Hudepohl, and
S. J. Aud, "Application of Neural Networks to
Software Quality Modeling of a Very Large
Telecommunications System," IEEE Trans. Neural
Networks, 8( 4) pp. 902-909, (1997)
21. T. V. Gestel, J. A. K. Suykens, B. Baesens, S.
Viaene, J. Vanthienen, G. Dedene, B. De Moor, and
J. Vandewalle, "Benchmarking Least Squares
Support Vector Machine Classifiers," Machine
Learning, 54( 1) pp. 5-32, (2004)
22. T. M. Khoshgoftaar and N. Seliya, "Analogy-Based
Practical Classification Rules for Software Quality
Estimation," Empirical Software Eng., 8(4), pp. 325-
350, (2003)
23. R. W. Selby and A. A. Porter, "Learning from
Examples: Generation and Evaluation of Decision
Trees for Software Resource Analysis, " IEEE Trans.
Software Eng., 14(12) pp. 1743-1756, (1988)
24. L. Breiman, "Random Forests,"Machine Learning,
45(1), (2001)
25. H. Do, S. Elbaum, G. Rothermel, "Supporting
controlled experimentation with testing techniques:
An infrastructure and its potential impact," Empirical
Software Engineering,10(4),(2005)
26. Y. Lei, X. G. Mao, Z. Y. Dai, C. S. Wang, "Effective
statistical fault localization using program slices,"
2012 IEEE 36th Inter. Conf. Comp. Softw. and Appl,
(2012)
27. V. Debroy, W. E. Wong, X. Xu and B. Choi, "A
Grouping-Based Strategy to Improve the
Effectiveness of Fault Localization Techniques,"
10th Inter. Conf. Qual. Softw., (2010)

You might also like