0% found this document useful (0 votes)
82 views13 pages

Application of Deep Learning in Software Testing and Quality Assurance

The document discusses applying deep learning techniques to software testing and quality assurance. It provides an overview of software testing and issues like the expense of testing large software systems. It then reviews several studies that have applied techniques like deep neural networks, genetic algorithms, and Bayesian networks to tasks like defect prediction, test data generation, and assessing software reliability. The goal is to automate parts of the software testing process using deep learning.

Uploaded by

Leah Rachael
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
82 views13 pages

Application of Deep Learning in Software Testing and Quality Assurance

The document discusses applying deep learning techniques to software testing and quality assurance. It provides an overview of software testing and issues like the expense of testing large software systems. It then reviews several studies that have applied techniques like deep neural networks, genetic algorithms, and Bayesian networks to tasks like defect prediction, test data generation, and assessing software reliability. The goal is to automate parts of the software testing process using deep learning.

Uploaded by

Leah Rachael
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

APPLICATION OF DEEP LEARNING IN SOFTWARE TESTING AND

QUALITY ASSURANCE: A LITERATURE REVIEW

Nanyangwe Rachael 202208075


Mulungushi University

1.0 Introduction

Software testing is done to debug software systems to uncover bugs. Software testing is
expensive and there has been huge interest in applying deep learning to automate software
testing (Durelli et al. 2019).

The objective of this study is to review deep learning (DL) techniques used in testing software
systems. There has been a growth in the software industry, primarily driven by advances in
technology. Software has become an important tool in modern society, so it is important to
ensure that the software obtained is reliable. Software testing play a very important role in the
software development cycle, as it is the primary method used to evaluate the quality of
software. Software testing is usually done to ensure that the software is able to do what it was
designed to do. Software testing is expensive, and consumes a lot of resources such as money
(Durelli et al. 2019).

It is very difficult to achieve software testing in larger systems, because testing all modules in
a system take a great amount of time. Deep learning techniques is an effective technique that
is able to learn through its own computing brain (Hasanpour et al. 2020). The goal of every
software developer is to develop a software product that is free of bugs, in order to achieve this
software products needs to be well tested.

Software testing addresses test debt, agile testing, security testing, uncertainty in testing,
separation, software system evolution, and testing as a service. Software developers usually
race against time to deliver software. This mainly happens in response to users who require fast
automation of tasks they have been managing manually. The sooner these tasks are automated,
the faster software developers reap financial gains. This has initiated unbelievable growth in
the software industry. Software business houses compete with each other to reach out to users
with the promise of deliverables as quickly as possible.

1
Quality software requires exhaustive testing, the process is time consuming, but at the end of
the testing process is a reliable and good quality software (Mohanty, Mohanty, and
Balakrishnan 2017). The world has gone digital and almost all people own a gadget that enables
them to access internet services and perform daily activities using various installed
applications. Software testing aims to provide quality software according to user’s
expectations.

2.0 Literature Review

(Qiao et al. 2020) proposed a Deep learning neural network-based defect prediction (DPNN),
a deep learning technique that is used to predict defects in software systems by learning the
pattern and use the pattern to automatically capture the defects from software. This results in a
more accurate prediction model. The objective was to solve the problem of larger systems that
are more complex, challenging in order to prevent software defects. Using software that can
help to automatically predict the number of defects in software systems can help developers to
easily locate these defects at a lower cost. Deep learning neural network-based defect
prediction (DPNN) model was used to create algorithms that can help predict the number of
defects in software systems. This helps software developers to concentrate their testing efforts
on modules that have defects and accelerate the release schedule. Software testing also helps
to predict whether the software system will be fault or not.

(Manjula and Florence 2019) proposed a hybrid that combines genetic algorithm (GA) and
deep neural network (DNN to handle feature optimization and classification to help detect
defects in software systems. Software testing help to improve software quality and reliability.
The hybrid enabled higher classification rate of defect prediction, which helps in having early
predication of defects in software systems.

(Wang and Zhang 2018) reviewed how deep neural network (NN) model, is suitable for
predicting performance, that can help to predict, which modules can have defects in future. The
Deep neural network model captures the training characteristics and use it to learn the patterns
that can help fix the fault modules. Deep learning model based on the recurrent (RNN) is also
a deep learning model that can predict the number of faults in software and assess software
reliability. With the rapid development of software technology, modern software sizes have
become larger and more software functions must be attained to satisfy client requirements.
Assessing software reliability is an important issue in the modern software development

2
process. Researchers have developed different software reliability growth models to solve this
problem.

Bayesian Network is a deep learning technique that is also used to assess software reliability.
Bayesian network models use subjective probability to analyse lager systems that has more
data and require complex calculations. The deep NN model is able to capture stable and
accurate features of software faults. Deep NN model have better prediction performance than
traditional NN and parameter models. Deep NN model can obtain a global optimal solution
through a greedy algorithm. It can also automatically learn better feature representations from
software faults than other methods.

(Liu et al. 2017) proposed a novel deep learning-based approach to solve the challenges of test
data generation. Their approach consists of two phases: In the training phase, the monkey
testing procedure is used to learn the testers’ manual inputs and statistically associates them
with the contexts, such as the action history and the textbox label; In the prediction phase, the
monkey automatically predicts text inputs based on the observed contexts.

(Yang et al. 2015) applies the classification function of the deep belief network included in the
deep learning to predict whether or not an error would occur while updating the software
version.

(Ma et al. 2018) proposed a mutation testing framework specialized for deep learning systems
to measure the quality of test data. To do this, by sharing the same spirit of mutation testing in
traditional software, they first defined a set of source-level mutation operators to inject faults
to the source of deep learning (i.e., training data and training programs). Then they designed a
set of model-level mutation operators that directly inject faults into deep learning models
without a training process. The quality of test data could be evaluated from the analysis on to
what extent the injected faults could be detected. The usefulness of the proposed mutation
testing techniques was demonstrated on two public datasets, namely MNIST and CIFAR-10,
with three deep learning models. Deep learning (DL) defines a new data-driven programming
paradigm where the internal system logic is largely shaped by the training data. The standard
way of evaluating deep learning models is to examine their performance on a test dataset. The
quality of the test dataset is of great importance to gain confidence of the trained models.

3
2.1 What is Software Testing?

Software testing is the process of executing a software system to determine whether it matches
its specification and executes in its intended environment (Whittaker 2000). Testing requires a
running executable and a specification to support testing. Testing of software defines correct
behaviour so that incorrect behaviour is easier to identify. Incorrect behaviour is a software
failure. Failures are caused by faults in the source code, which are often referred to as defects
or bugs. Software can fail for example if the code takes up too much memory, executes too
slowly (Whittaker 2000).

2.3 Levels of Software Testing

Testing is done throughout several levels and stages as shown in Figure 2, the following are the
main levels (Hourani, Hammad, and Lafi 2019):

• Development Testing: consists of the following types:


o Unit Testing: sometimes requires the construction of throwaway driver code and
stubs and is often performed in a debugger. Testing basic units such as method
or class and focusing on functionality.
o Component Testing: Integrating software units and testing them, focusing on
testing the components interface.
o Integration testing: tests multiple components that have each received prior and
separate unit testing. In general, the focus is on the subset of the domain that
represents communication between the components.
o System Testing: Integrating components from different teams and reusable code
and third-party code then testing the whole system. tests a collection of
components that constitutes a deliverable product. Usually, the entire domain
must be considered to satisfy the criteria for a system test.
o Functional testing: requires the selection of test scenarios without regard to
source code structure. Thus, test selection methods and test data adequacy
criteria, described in the main text, must be based on attributes of the
specification or operational environment and not on attributes of the code or
data structures. Functional testing is also called specification-based testing,
behavioural testing, and black-box testing.

4
o Structural testing: requires that inputs be based solely on the structure of the
source code or its data structures. Structural testing is also called code-based
testing and white-box testing
• Release Testing: consists of the following types:
o Requirements Testing: Inventing test case from each requirement.
o Scenario Testing: Inventing scenario of the system and using and testing this
scenario.
o Performance Testing: is designed to check that the system can process its
intended load.
• User Testing: consists of the following types:
o Alpha Testing: is done in development environment, to test if the system is
working according to the user’s specifications and that it does not contain
unnecessary features or functions.
o Beta Testing: is done in the user environment to determine if the system
provides the required functions specified by the user.
o Acceptance Testing: is performed by customers to determine if the system meets
the user’s requirements.

Release
Development Testing User Testing
Testing

Requiremen Alpha
Unit testing
t Testing Testing

Component Scenario
Beta Testing
Testing Testing

Performanc Acceptance
System Testing
e Testing Testing

Integration
Testing

Functional
Testing

Structural
Testing

Figure 2: Software testing levels

5
2.3 Deep learning models

Deep learning (DL) models mostly used in software testing are:

• Multi-Layer Perceptron (MLP) is the fundamental structure of feedforward neural


networks and has multiple layers (Goodfellow et al., 2016). The input layer contains
the vectorized input data; hidden layers of interconnected nodes allow the structure to
learn transformations on the data. The MLP model learns weights and biases using the
backpropagation mechanism and nonlinear activation functions, extracting more
advanced features. Finally, the output layer produces output vectors that correspond to
the model’s prediction of the input’s class.
• Deep Neural Networks (DNNs) are particular ANNs devised to learn by multi-
connection layers (Montavon et al., 2018). The architecture of DNNs includes one input
layer, one output layer, and one or more hidden layers between them. The input feature
space of the data constitutes the input layer of the DNN. The input can be constructed
with feature extraction methods. The output layer has one node in binary classification
and has nodes as many as the number of classes in multi-class classification. DNN uses
a standard backpropagation algorithm with a nonlinear activation function like sigmoid
or Relu (Apicella et al., 2021). With the help of the defined architecture, DNNs extract
features from the input data. Then, the model is trained to optimize weight and bias
values in the neural network structure. Finally, the trained model is used to predict the
class of the new input.
• Convolutional Neural Network (CNN) uses convolution operations to extract input
features. This process is implemented through the multiple sliding kernels, matrices of
specified shape and size, also called filters, and the elementwise multiplication of these
kernels with the corresponding image data. This operation yields information about
various features in the input. The CNN structure is very commonly used in image
processing. For example, 1D convolution operations are applicable in diverse areas,
including examining sequential data, text, or time-series data to find the patterns in the
data (Rao and McMahan, 2019).
• Recurrent Neural Networks (RNNs) have feedback loops in their architecture, allowing
information to be memorized in short terms. Due to this property, RNN can analyze
sequential input, such as speech, audio, weather, and financial data. However, an RNN’s
output at a stage relies on the previous output and the current input. While CNN shares

6
unknown parameters across space, RNN shares them across time. Nevertheless, RNNs’
memory is short-termed, their computation can be slow, and they suffer from the
vanishing or exploding gradients problem (Hochreiter and Schmidhuber, 1997).
Hochreiter and Schmidhuber (1997) developed the Long Short-Term Memory (LSTM)
model to solve the problems with RNN structure. To overcome these issues, additional
neural networks, called gates, were introduced, which handle the information stream in
the network.
• Gated Recurrent Units (GRUs), is a type of RNN and were proposed by Chung et al.
(2014). In GRUs, the gated approach solves RNN’s information flow problems in long
sequences with a simpler architecture introducing two gates: the update and the reset
gates. Since GRUs have fewer gate structures than LSTMs’, they have fewer
parameters to change during the training, which leads them to being faster.
• Deep Belief Neural Networks (DBNs) are feedforward Neural Networks (NNs) with
many layers (Golovko et al., 2014). A DBN is not the same as the traditional types of
DNNs discussed so far. Instead, a DBN is a particular DNN with undirected connections
between some layers. These undirected layers are Restricted Boltzmann Machines and
can be trained using unsupervised learning algorithms. Encoder–Decoder models (also
known as Sequence to Sequence or Seq2Seq models) are commonly used DNN
architectures to convert input data in a particular domain into output data in another
domain via a two-stage network (Cho et al., 2014; Chollampatt and Ng, 2018). First,
the encoder takes a variable-length sequence in a specific domain and compresses it to
a fixed-length representation. Then, the decoder maps the encoded data to a variable-
length output in another domain. Due to these features, encoder–decoder models are
widely used in many application areas, such as machine translation (Cho et al., 2014;
Chollampatt and Ng, 2018). Autoencoders might be considered as specific types of
encoder-decoder models (Zhu et al., 2020). Autoencoder is an unsupervised ANN that
learns efficient encoding of unlabelled data. First, it learns how to efficiently compress
and encode input data. Next, autoencoders learn how to ignore the noise in the data and
reduce data dimensions. Then, using the encoded representation, it learns how to
reconstruct the data as close as to the original input. Autoencoders are used in many
deep learning tasks such as anomaly and face detection. In addition, modified versions
of autoencoders are used for specific tasks in deep learning. For example, sparse and
denoising autoencoders are used in learning representations for subsequent

7
classification tasks. Variational autoencoders are used in generative tasks to produce
similar outputs to the input data. In SDP, autoencoders’ main use is to extract features
of input data automatically (Tong et al., 2018; Zhu et al., 2020; Wu et al., 2021; Zhang
et al., 2021b).
• Extreme learning machines (ELMs) are special feedforward neural networks invented
by Huang et al. (2006). ELM architecture includes single or multiple layers of hidden
nodes whose parameters need not be tuned. The hidden nodes of ELMs can be assigned
randomly. No update operation is performed for these nodes, or they can be inherited
from their antecessors without being changed. Generally, these nodes’ weights are
learned in a single step converging to a linear model. Hence, these models might be
much faster than backpropagation-based neural networks. Moreover, these models
might produce comparable results with SVM in classification and regression tasks (Liu
et al., 2005, 2012).
• Generative Adversarial Networks (GANs) are another approach used in generative
modelling, designed by Goodfellow et al. (2014). Generative modelling is an
unsupervised task. GANs use deep learning methods, such as CNN, to produce new
outputs similar to the input data acquired from the original dataset. GANs use two
neural networks named generator and discriminator. The generator is a CNN, and the
discriminator is a de-convolutional NN. These networks compete in a game where one
agent’s gain is another agent’s loss to predict more accurately. In this game, the
generator produces artificial data similar to the real data, and the discriminator tries to
distinguish the artificially generated data from the original data. The generator produces
better artificial outputs as the game continues, and the discriminator will detect them
better. In this way, GANs learn to generate new data with the same statistics as the
training set.
• Siamese Neural Networks (SNNs) are NNs that contain two or more subnetworks
whose configurations, parameters, and weights are the same. Moreover, parameters are
updated in both networks in the same way. In this way, Siamese NN compares its feature
vectors and finds the similarity of the inputs by learning a similarity function. Hence it
can be trained to check whether two inputs (for example, images of a person) are the
same. Hence, SNN’s architecture enables new data classification without retraining the
network and making them suitable for one-shot learning problems. Furthermore,
Siamese NNs are robust to class imbalance and learn semantic similarity. So, SNNs

8
were used in several types of research in the SDP domain, although they require more
training time than NNs (Zhao et al., 2018, 2019).
• Hierarchical Neural Network (HNN) is a special NN that consists of multiple loosely
coupled subnets defined in the form of an acyclic graph. The subnets of the graph can
be single neurons or complex NNs. Each subnet tries to acquire a specific figure of the
input data (Mavrovouniotis and Chang, 1992). They are being used in various deep
learning-based tasks such as classification (Wang et al., 2012) and image interpretation
(Behnke, 2003). Further, HNNs have been used in SDP to provide better fault
predictions (Wang et al., 2021; Yu et al., 2021a).
• Graph Neural Networks (GNNs) are NNs designed to leverage the structure and
properties of graphs. GNNs perform inference on data described by graphs by using
deep learning methods. Hence, GNNs can be used in graph operations performing node
level, edge-level, and graph-level predictions. GNNs are active research topics in many
domains such as social networks, knowledge graphs, and recommender systems (Chen
et al., 2021; Kumar et al., 2022). GNNs are also used in the SDP domain to take full
advantage of the tree structure of the source code. To this end, GNNs are exploited to
acquire the inherent defect information of faulty subtrees, which are excluded based on
a fix-inducing change (Xu et al., 2021a). 2.3. Related work Researchers use ML models
(Giray et al. 2023)

2.4 Advantage of Deep Learning Software Testing Over Traditional


Software Testing

Building deep learning-based software testing systems is different from that of traditional
software testing systems. Traditional software concentrates on the implementation of logic
flows crafted by developers in the form of source code as shown in figure 1 below, which can
be decomposed into units (e.g., classes, methods, statements, branches). Each unit specifies
some logic and allows to be tested as targets of software quality measurement (e.g., statement
coverage, branch coverage). After the source code is programmed, it is compiled into
executable form, which will be running in respective runtime environments to fulfil the
requirements of the system. For example, in object-oriented programming, developers analyse
the requirements and design the corresponding software architecture. Each of the architectural

9
units (e.g., classes) represents specific functionality, and the overall goal is achieved through
the collaborations and interactions of the units.

Deep learning, on the other hand, follows a data-driven programming paradigm, which
programs the core logic through the model training process using a large amount of training
data. The logic is encoded in a deep neural network, represented by sets of weights fed into
non-linear activation functions. To obtain a deep learning software F for a specific task M, a
deep learning developer needs to collect training data, which specifies the desired behaviour of
F on M, and prepare a training program, which describes the structure of deep neural network
and runtime training behaviours as shown in figure 1 below.

The deep neural network is deep learning model that is built by running the training program
on the training data. The major effort for a deep learning developer is to prepare a set of training
data and design a deep neural network model structure, and deep learning logic is determined
automatically through the training procedure. In contrast to traditional software, deep learning
models are often difficult to be decomposed or interpreted, making them unamenable to most
existing software testing techniques. Moreover, it is challenging to find high-quality training
and test data that represent the problem space and have good coverage of the models to evaluate
their generality.

Figure 1: Design of traditional software and deep learning software (Ma et al. 2018)

10
3.0 Conclusion

Deep learning (DL) defines a new data-driven programming paradigm where the internal
system logic is largely shaped by the training data. Quality software requires exhaustive
testing, the process is time consuming (Mohanty et al. 2017). The world has gone digital and
almost all people own a gadget that enables them to access internet services and perform daily
activities using various installed applications. Software testing aims to provide quality and
reliable software according to user’s expectations.

From the review papers, it shows that deep learning technology is used by most programmers
to perform software testing their software products. Deep learning neural network-based defect
prediction (DPNN) is the most common deep learning model used to help predict the number
of defects or bugs in software systems. This helps software developers to concentrate their
testing efforts on modules that have defects and accelerate the release schedule. Software
testing also helps to predict whether the software system will be fault or not.

11
References

Abdu, Ahmed, Zhengjun Zhai, Redhwan Algabri, Hakim A. Abdo, Kotiba Hamad, and
Mugahed A. Al-antari. 2022. ‘Deep Learning-Based Software Defect Prediction via
Semantic Key Features of Source Code—Systematic Survey’. Mathematics
10(17):3120. doi: 10.3390/math10173120.

Basili, V. R., and R. W. Selby. 1987. ‘Comparing the Effectiveness of Software Testing
Strategies’. IEEE Transactions on Software Engineering SE-13(12):1278–96. doi:
10.1109/TSE.1987.232881.

Battina, Dhaya Sindhu. 2019. ‘Artificial Intelligence in Software Test Automation: A


Systematic Literature Review’.

Durelli, Vinicius H. S., Rafael S. Durelli, Simone S. Borges, Andre T. Endo, Marcelo M. Eler,
Diego R. C. Dias, and Marcelo P. Guimaraes. 2019. ‘Machine Learning Applied to
Software Testing: A Systematic Mapping Study’. IEEE Transactions on Reliability
68(3):1189–1212. doi: 10.1109/TR.2019.2892517.

Everett, Gerald D., and Raymond McLeod. 2007. Software Testing: Testing Across the Entire
Software Development Life Cycle. 1st ed. Wiley.

Giray, Görkem, Kwabena Ebo Bennin, Ömer Köksal, Önder Babur, and Bedir Tekinerdogan.
2023. ‘On the Use of Deep Learning in Software Defect Prediction’. Journal of Systems
and Software 195:111537. doi: 10.1016/j.jss.2022.111537.

Hasanpour, Ahmad, Pourya Farzi, Ali Tehrani, and Reza Akbari. 2020. ‘Software Defect
Prediction Based On Deep Learning Models: Performance Study’.

Hourani, Hussam, Ahmad Hammad, and Mohammad Lafi. 2019. ‘The Impact of Artificial
Intelligence on Software Testing’. Pp. 565–70 in 2019 IEEE Jordan International Joint
Conference on Electrical Engineering and Information Technology (JEEIT). Amman,
Jordan: IEEE.

Khaliq, Zubair, Sheikh Umar Farooq, and Dawood Ashraf Khan. 2022. ‘Artificial Intelligence
in Software Testing : Impact, Problems, Challenges and Prospect’.

Liu, Peng, Xiangyu Zhang, Marco Pistoia, Yunhui Zheng, Manoel Marques, and Lingfei Zeng.
2017. ‘Automatic Text Input Generation for Mobile Testing’. Pp. 643–53 in 2017
IEEE/ACM 39th International Conference on Software Engineering (ICSE).

Ma, Lei, Fuyuan Zhang, Jiyuan Sun, Minhui Xue, Bo Li, Felix Juefei-Xu, Chao Xie, Li Li,
Yang Liu, Jianjun Zhao, and Yadong Wang. 2018. ‘DeepMutation: Mutation Testing of
Deep Learning Systems’. Pp. 100–111 in 2018 IEEE 29th International Symposium on
Software Reliability Engineering (ISSRE).

Mahapatra, Sumit, and Subhankar Mishra. 2020. ‘Usage of Machine Learning in Software
Testing’. Pp. 39–54 in Automated Software Engineering: A Deep Learning-Based
Approach, Learning and Analytics in Intelligent Systems, edited by S. C. Satapathy, A.
K. Jena, J. Singh, and S. Bilgaiyan. Cham: Springer International Publishing.

12
Manjula, C., and Lilly Florence. 2019. ‘Deep Neural Network Based Hybrid Approach for
Software Defect Prediction Using Software Metrics’. Cluster Computing 22(4):9847–
63. doi: 10.1007/s10586-018-1696-z.

Mohanty, Hrushikesha, J. R. Mohanty, and Arunkumar Balakrishnan, eds. 2017. Trends in


Software Testing. Singapore: Springer.

Qiao, Lei, Xuesong Li, Qasim Umer, and Ping Guo. 2020. ‘Deep Learning Based Software
Defect Prediction’. Neurocomputing 385:100–110. doi:
10.1016/j.neucom.2019.11.067.

Wang, Jinyong, and Ce Zhang. 2018. ‘Software Reliability Prediction Using a Deep Learning
Model Based on the RNN Encoder–Decoder’. Reliability Engineering & System Safety
170:73–82. doi: 10.1016/j.ress.2017.10.019.

Whittaker, J. A. 2000. ‘What Is Software Testing? And Why Is It so Hard?’ IEEE Software
17(1):70–79. doi: 10.1109/52.819971.

Yang, Xinli, David Lo, Xin Xia, Yun Zhang, and Jianling Sun. 2015. ‘Deep Learning for Just-
in-Time Defect Prediction’. Pp. 17–26 in 2015 IEEE International Conference on
Software Quality, Reliability and Security.

13

You might also like