0% found this document useful (0 votes)
31 views5 pages

Comparison of Big Data Analyses For Reliable Open Source Software

Uploaded by

Sohad S. Nassar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
31 views5 pages

Comparison of Big Data Analyses For Reliable Open Source Software

Uploaded by

Sohad S. Nassar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

Comparison of Big Data Analyses for Reliable Open Source Software

Yoshinobu Tamura1 and Shigeru Yamada2


1 Graduate School of Sciences and Technology for Innovation, Yamaguchi University, Ube, Japan
2 Graduate School of Engineering, Tottori University, Tottori, Japan

[email protected] and [email protected]

Abstract—Open source software are used in wide ranging We discuss the recognition method of severity level
areas of software system development, because of the stan- for software fault in this paper. Then, the method of big
dardization, cost reduction, quick delivery. Many open source fault data analysis by using deep learning is proposed.
software are useful for the software developer and software
managers to develop the software system quickly. Also, the Also, this paper represents several numerical examples of
open source software are characterized by the bug tracking the proposed method by using the actual big fault data.
system (BTS). The BTS’s such as Bugzilla are controlled by Moreover, the proposed method by using the deep learning
almost open source projects. In the BTS, many data sets are is compared with the method by using neural network.
recorded by project members and software users. In this Then, the proposed method is applied to the amount of
paper, we compare the methods of big fault data analyses
based on the deep learning and neural network. Moreover, we learning data in order to assess the effectiveness of deep
show several numerical examples of big fault data analyses in learning.
the actual open source software project. As the effectiveness
analysis of the proposed method, the comparison results of
II. B IG FAULT DATA A NALYSIS BY U SING N EURAL
recognition rate in terms of the proposed method and the
conventional method are shown in this paper. N ETWORK
Keywords—Big data, deep learning, neural network, open
source software, software reliability Considering general method of neural network by using
back-propagation, we can give the input-output rules of
I. I NTRODUCTION each unit on each layer as follows:
Many open source software are useful for many soft- ! I #
ware engineers and general users around the world. How- "
1
Hj = f ωij Ii , (1)
ever, there is no established standard method of qual-
ity/reliability assessment for open source software. The ⎛ i=1 ⎞
"J
bug tracking system (BTS) is well known as the useful
Rk = f ⎝ ωjk Hj ⎠ ,
2
(2)
system for quality improvement of open source software.
j=1
The BTS’s are controlled in various open source projects
in recent years. Also, various fault data sets are registered where Rj is the activation function of hidden layer. Simi-
on the database of BTS. larly, Rk is the activation function of response layer. Also,
In past times, various software reliability growth mod- the following logistic function f (·) is widely-known as a
els [1]–[3] have been applied to analyze the reliability sigmoid function.
for quality management of the system testing phase in
software development. Also, several software reliability 1
f (x) = , (3)
models for open source software have been proposed in 1 + e−θx
several research papers [4]. However, it will be difficult
for the software developers to select the optimal software where θ is the gain. The multi-layered neural networks
reliability growth model for the testing-phase of an actual [5] based on back-propagation in order to learn the fault
software project. Moreover, the software managers need severity levels from the big fault data recorded on BTS is
to prepare the fault count data in case of the software used in this paper. The error function is given as follows:
reliability models, i.e., it will be need for the software K
managers to convert the raw data of software fault recorded 1"
E= (Rk − Lk )2 , (4)
on BTS to the software fault count data. Then, it may takes 2
k=1
several trouble with the modification of fault count data
sets from the raw data. Therefore, it will be difficult for where Lk (k = 1, 2, · · · , K) are the target input values as
the software developers to analyze the reliability/quality the output values.
of open source software by using the software reliability In terms of the amount of characteristics for the fault
models. On the other hand, it will be able to take a prompt data recorded on BTS’s, The following amount of infor-
action for quality management if the software developers mation is used as parameters to the input data Ii (i =
can use the raw data recorded on the BTS. 1, 2, · · · , I).

978-1-5090-3665-3/16/$31.00 ©2016 IEEE 1345


Proceedings of the 2016 IEEE IEEM

[Hidden Layer] Compressed


1st Layer
zm Α th Input and Output Layer Characteristics

1 1 1 1

. . . . .
2 2 2 Continued 2
. . . . .
. . .

. . .

. . .
zl
. . . . . on

L M Α N

zl [Hidden Layer]
zα on
m nd Input and Output Layer

Fig. 1. The flow of deep learning.

✓ Input data used in neural network. ✏

! Recorded date
TABLE I

! Nickname of assignee
T HE INDEXED FAULT LEVELS .

! Nickname of reporter
! Fault Status
Index Number Software Fault Level

! Operating system
1 Trivial
2 Enhancement

! Software component name


3 Minor

! Software product name


4 Normal

! Software version name


5 Regression
6 Blocker

✒ ✑
7 Major
8 Critical
8 kinds of fault level are selected as the amount of
compressed characteristics, i.e., Critical, Major, Blocker,
Regression, Normal, Minor, Enhancement, and Trivial,
i.e., Critical, Major, Blocker, Regression, Normal, Minor,
respectively. Then, the number of units for the response
Enhancement, and Trivial, respectively. As is the case in
layer, K, is set to 8, because the number of fault levels is
neural network, the following input data is applied as the
8.
✓ ✏
explanatory variable to the input data Zl (l = 1, 2, · · · , L).
Input data as the explanatory variable.
! Recorded date
III. B IG FAULT DATA A NALYSIS BY U SING D EEP

! Nickname of assignee
L EARNING

! Nickname of reporter
The general flow of the deep learning is shown in Fig.

! Fault Status
1. In Fig. 1, Zl (l = 1, 2, · · · , L), Zα (α = 1, 2, · · · , A),

! Operating system
and Zm (m = 1, 2, · · · , M ) are the pre-training units.

! Software component name


Moreover, the amount of compressed characteristics is

! Software product name


shown as On (n = 1, 2, · · · , N ). In terms of deep learning,

! Software version name


several algorithms have been proposed [6]–[11] by several
researchers in the past. Also, several software reliability
assessment methods based on deep learning have been ✒ ✑
proposed [12], [13]. In order to learn the fault data Then, these 8 kinds of explanatory variables means the
registered on BTS’s of open source software projects, the amount of characteristics for pre-training units. In order to
deep neural network is applied in this paper. assess by using deep learning, the character data included
Then, the fault levels as represented in Table I are in each input data as the explanatory variable is converted
used as the objective variable. Therefore, the amount to the numerical value such as the occurrence ratio.
of compressed characteristics is 8 kinds of fault level,

1346
Proceedings of the 2016 IEEE IEEM

Actual Actual
Deep Learning Neural Network
9 9
FAULT LEVEL

FAULT LEVEL
6 6

3 3

0 0

0 1000 2000 3000 4000 5000 0 1000 2000 3000 4000 5000

DATA (LINES) DATA (LINES)

Fig. 2. The estimated software fault levels by using deep learning based Fig. 3. The estimated software fault levels by using neural network
on 50% (5,000 lines) learning data. based on 50% (5,000 lines) learning data.

IV. N UMERICAL E XAMPLES Actual


Deep Learning

From the point of view of the standarzation, the quick 9

delivery, and cost reduction, many open source software FAULT LEVEL
is used in the software development. In order to assess 6
the performance of our methods, this paper focuses on
Apache HTTP server [14]. Several performance examples
based on the data sets for Apache HTTP server as open 3

source software are shown in this paper. In this paper,


the data recorded on the website of the BTS in Apache 0
software foundation project. Then, 10,000 fault data set 0 1000 2000 3000 4000

are obtained from website of Apache foundation. DATA (LINES)

A. Comparison results of big fault data analyses


8 fault levels obtained from BTS’s is selected as the Fig. 4. The estimated software fault levels by using deep learning based
on 60% (6,000 lines) learning data.
objective variable. In this section, 50%, 60%, 70%, 80%,
and 90% of the obtained data are applied as the learning
data. Then, 5,000, 6,000, 7,000, 8,000, and 9,000 lines Actual

learning fault data are used in this section, respectively. 9


Neural Network

The estimated fault severity levels by using the deep


learning and neural network are shown in Figs. 2 ∼ 3,
FAULT LEVEL

respectively. As an example, the estimated results for 5,000 6

testing data set by using deep learning based on 5,000 lines


learning data set is shown in Fig. 2. Moreover, we show 3
the estimated results for 1,000 lines testing data set by
using neural network based on 9,000 lines learning data
set in Fig. 10. 0

These results show that the estimate by using deep 0 1000 2000 3000 4000

learning fits better than one by using neural network from DATA (LINES)

Figs. 2 and 11. As the characteristic findings, we find that


the critical fault severity level cannot be recognized in Fig. 5. The estimated software fault levels by using neural network
almost cases of neural network from Figs. 2 and 11. It is based on 60% (6,000 lines) learning data.
very important for the software developer and managers
to confirm the critical fault in terms of the quality of open
source software. The proposed method will be useful for B. Comparison results of recognition rate
the software developers to quickly recognize the critical Fig. 12 shows the comparison results of recognition rate
fault. by using the deep learning and neural network. We find
that the rates of recognition by using the deep learning
perform better than that of the neural network from Fig.

1347
Proceedings of the 2016 IEEE IEEM

Actual Actual
Deep Learning Neural Network
9 9
FAULT LEVEL

FAULT LEVEL
6 6

3 3

0 0

0 1000 2000 3000 0 500 1000 1500 2000

DATA (LINES) DATA (LINES)

Fig. 6. The estimated software fault levels by using deep learning based Fig. 9. The estimated software fault levels by using neural network
on 70% (7,000 lines) learning data. based on 80% (8,000 lines) learning data.

Actual Actual
Neural Network Deep Learning
9 9
FAULT LEVEL

FAULT LEVEL

6 6

3 3

0 0

0 1000 2000 3000 0 250 500 750 1000

DATA (LINES) DATA (LINES)

Fig. 7. The estimated software fault levels by using neural network Fig. 10. The estimated software fault levels by using deep learning
based on 70% (7,000 lines) learning data. based on 90% (9,000 lines) learning data.

Actual Actual
Deep Learning Neural Network
9 9
FAULT LEVEL

FAULT LEVEL

6 6

3 3

0 0

0 500 1000 1500 2000 0 250 500 750 1000

DATA (LINES) DATA (LINES)

Fig. 8. The estimated software fault levels by using deep learning based Fig. 11. The estimated software fault levels by using neural network
on 80% (8,000 lines) learning data. based on 90% (9,000 lines) learning data.

12. based on the fault data. Our method will be helpful to


Especially, the proposed method by using deep learning quickly find the software fault depending on the level. By
has the high-recognition rates more than four times better using the proposed method, the software developers will
than the method based on neural network during the total be able to take quickly action for the debugging by using
period. It will be helpful for the software developers to as the judgment for the major software fault or not. Also,
predict the software fault severity level as importance we find that the recognition rate increases depending on

1348
Proceedings of the 2016 IEEE IEEM

Foundation for Information and Telecommunications in


Neural Network
Deep Learning Japan, and the JSPS KAKENHI Grant No. 15K00102 and
0.6 No. 16K01242 in Japan.
RECOGNITION RATE

R EFERENCES
0.4
[1] M.R. Lyu, ed., Handbook of Software Reliability Engineering, IEEE
Computer Society Press, Los Alamitos, CA, 1996.
[2] S. Yamada, Software Reliability Modeling: Fundamentals and Ap-
0.2
plications, Springer–Verlag, Tokyo/Heidelberg, 2014.
[3] P.K. Kapur, H. Pham, A. Gupta, and P.C. Jha, Software Reliability
Assessment with OR Applications, Springer–Verlag, London, 2011.
0.0 [4] S. Yamada and Y. Tamura, OSS Reliability Measurement and
0.5 0.6 0.7 0.8 0.9 Assessment, Springer–Verlag, London, 2016.
RATE OF USED LEARNING DATA
[5] E. D. Karnin, “A simple procedure for pruning back-propagation
trained neural networks,” IEEE Transaction on Neural Networks.,
vol. 1, pp. 239-242, June 1990.
[6] D.P. Kingma, D.J. Rezende, S. Mohamed, and M. Welling, “Semi-
Fig. 12. The comparison results of recognition rate for deep learning Supervised Learning with Deep Generative Models,”Proceedings of
and neural network. Neural Information Processing Systems, 2014, pp. 3581–3589.
[7] A. Blum, J. Lafferty, M.R. Rwebangira, and R. Reddy, “Semi-
supervised learning using randomized mincuts,” Proceedings of the
the amount of learning data. Therefore, it will be needed International Conference on Machine Learning, 2004, pp. 1–13.
[8] E.D. George, Y. Dong, D. Li, and A. Alex, “Context-dependent pre-
for us to assess the recognition rate by using the more trained deep neural networks for large-vocabulary speech recogni-
fault data sets in the future study. tion,” IEEE Transactions on Audio, Speech, and Language Process-
ing, Vol. 20, No. 1, pp.30- 42, 2012.
V. C ONCLUSION [9] P. Vincent, H. Larochelle, I. Lajoie, Y. Bengio, and P.A. Manzagol,
“Stacked Denoising Autoencoders: Learning Useful Representa-
In many open source projects, the BTS’s such as tions in a Deep Network with a Local Denoising Criterion,” Journal
Bugzilla are helpful for the software developers, managers of Machine Learning Research, Vol. 11, No. 2, 3371–3408, 2010.
and users. However, the users as well as the software [10] H.P. Martinez, Y. Bengio, and G.N. Yannakakis, “Learning deep
physiological models of affect,” IEEE Computational Intelligence
developer and manager can easily register the fault data Magazine, Vol. 8, No. 2, pp. 20–33, 2013.
by clicking on the dialog screen. Therefore, it is difficult [11] B. Hutchinson, L. Deng, and D. Yu, “Tensor Deep Stacking
to quickly judge the fault level from the data. Then, it will Networks,” IEEE Transactions on Pattern Analysis and Machine
Intelligence, Vol. 35, No. 8, pp. 1944–1957, 2013.
be able to take quickly action for the software managers [12] Y. Tamura, S. Ashida, M. Matsumoto and S. Yamada, “Identi-
in the debugging phase, if the software developers can fication method of fault level based on deep learning for open
quickly judge the observed fault to be major level or not. source software,” Software Engineering Research, Management
and Applications, Studies in Computational Intelligence, Springer,
The recognition method of software fault level is dis- 2016, pp. 65–76.
cussed in this paper. The method of big fault data analysis [13] Y. Tamura, M. Matsumoto, and S. Yamada, “Software reliability
by using deep learning have been proposed. Especially, model selection based on deep learning,” Proceedings of the
International Conference on Industrial Engineering, Management
it will be difficult for the software developers to quickly Science and Application 2016, Korea, May 23-26, 2016, pp. 77-
judge the fault type by only the data obtained from BTS, 81.
because several general software users as well as the main [14] The Apache Software Foundation, The Apache HTTP Server
Project, http://httpd.apache.org/
open source project members can report the fault contents
to the BTS. Then, the recognition method of software fault
severity levels by using deep learning based on big fault
data have been proposed in this paper. Moreover, several
performance examples of the proposed method by using
the big fault data in the actual open source software project
have been shown. Furthermore, the estimation methods for
the deep learning and neural network have been compared
by using actual software big fault data. Thereby, we have
found that our method by using deep learning has the high-
recognition rates.
We have found that the recognition rate depends on the
amounts of fault data. It will be necessary to assess the
recognition rate by using various training data sets in the
future.
ACKNOWLEDGMENTS
This work was supported in part by the Telecommu-
nications Advancement Foundation in Japan, the Okawa

1349

You might also like