Artificial Intelligence for Automated Software Testing

.lusoftware veriﬁcation & validation
VVS
Artificial Intelligence for Automated
Software Testing
Lionel Briand
ISSTA/ECOOP Summer School 2018

Objectives
• Applications of main AI techniques in test automation
• Not a short university course
• Overview (partial) with pointers for further information
• Industrial research projects
• Challenging as there is a lot of material and many techniques
involved
• Disclaimer: Inevitably biased presentation based on personal
experience
2

Acknowledgments
• Annibale Panichella, for introductory slides on Search-Based Software
Testing
• P. Repoussis, “Metaheuristic Algorithms: A brief introduction on the
basics you need to know” (Presentation)
• Ethem Alpaydim, “Introduction to Machine Learning”, MIT press
• Amy Davis, “Overview of Natural Language Processing” (Presentation)
• Research projects: Shiva Nejati, Raja Ben Abdessalem, Annibale
Panichella, Sadeeq Jan, Andrea Arcuri, Dennis Appelt, Fabrizio Pastore,
Chunhui Wang … (sorry for those I forgot)
3

Biography
• 24 years of post-PhD research experience
• IEEE Fellow, Harlan Mills IEEE CS award
• Canada Research Chair, ERC Advanced grant
• ICSE PC co-chair in 2014
• EiC of Empirical Software Engineering (Springer) for 13 years
• Graduated 27 PhD students
• Worked with >30 industry partners (aerospace, automotive, health care, finance …)
• I like research driven by industrial problems
• H-index = 73, around 24K citations (for those interested in the “number game”)
4

Collaborative Research @ SnT
5
• Research in context
• Addresses actual needs
• Well-defined problem
• Long-term collaborations
• Our lab is the industry

SVV Dept.
6
• Established in 2012, part of the SnT centre
• Requirements Engineering, Security Analysis, Design Verification,
Automated Testing, Runtime Monitoring
• ~ 25 lab members
• Partnerships with industry
• ERC Advanced grant

Outline
• Introduction to software testing
• Introduction to relevant AI techniques
• Introduction to Search-Based Software Testing (SBST)
• Industrial research projects where AI was applied to testing
problems
• Lessons learned and the road ahead
7

Introduction to Software
Testing
8

Outline
• Quick overview of software testing
• The role of AI in automated software testing
• Metaheuristic search
• Machine learning
• Natural Language Processing (NLP)
9

Definition of Software Testing
• International Software Testing Qualifications Board:
“Software testing is a process of executing a program or
application with the intent of finding the software bugs. It can
also be stated as the process of validating and verifying that
a software program or application or product meets the
business and technical requirements that guided its design and
development.”
10

Software Testing Overview
11
SW Representation
(e.g., specifications)
SW Code
Derive Test cases
Execute Test cases
Compare
Expected
Results or properties
Get Test Results
Test Oracle
[Test Result==Oracle][Test Result!=Oracle]
Automation!

Main Challenge
• The main challenge in testing software systems is
scalability
• Scalability: The extent to which a technique can be applied
on large or complex artifacts (e.g., input spaces, code,
models) and still provide useful, automated support with
acceptable effort, CPU, and memory?
• Effective automation is a prerequisite for scalability
12

Importance of Software Testing
• Software testing is the most prevalent verification and validation
technique in practice
• It represents a large percentage of software development costs,
e.g., >50% is not rare
• Testing services are a USD 9-Billion market
• The cost of software failures was estimated to be (a very minimum
of) USD 1.1 trillion in 2016
• Inadequate tools and technologies is one of the most important
factors of testing costs and inefficiencies
13

Search-Based Software Testing
• Express test generation problem
as a search or optimization
problem
• Search for test input data with
certain properties, i.e., source
code coverage
• Non-linearity of software (if, loops,
…): complex, discontinuous, non-
linear search spaces
• Many search algorithms
(metaheuristics), from local
search to global search, e.g., Hill
Climbing, Simulated Annealing
and Genetic Algorithms
e search space neighbouring the
for fitness. If a better candidate
mbing moves to that new point,
rhood of that candidate solution.
the neighbourhood of the current
fers no better candidate solutions;
If the local optimum is not the
gure 3a), the search may benefit
performing a climb from a new
cape (Figure 3b).
le Hill Climbing is Simulated
Simulated Annealing is similar to
ement around the search space is
be made to points of lower fitness
he aim of escaping local optima.
bability value that is dependent
‘temperature’, which decreases
ogresses (Figure 4). The lower
kely the chances of moving to a
ch space, until ‘freezing point’ is
the algorithm behaves identically
d Annealing is named so because
hysical process of annealing in
curve of the fitness landscape until a local optimum is found. The fina
position may not represent the global optimum (part (a)), and restarts ma
be required (part (b))
Fitness
Input domain
Figure 4. Simulated Annealing may temporarily move to points of poore
fitness in the search space
Fitness
Input domain
Figure 5. Genetic Algorithms are global searches, sampling many poin
in the fitness landscape at once
“Search-Based Software Testing: Past, Present and Future”
Phil McMinn
Genetic Algorithm
14
cusses future directions for Search-Based
g, comprising issues involving execution
estability, automated oracles, reduction of
st and multi-objective optimisation. Finally,
udes with closing remarks.
-BASED OPTIMIZATION ALGORITHMS
form of an optimization algorithm, and
mplement, is random search. In test data
s are generated at random until the goal of
mple, the coverage of a particular program
nch) is fulfilled. Random search is very poor
ns when those solutions occupy a very small
ll search space. Such a situation is depicted
re the number of inputs covering a particular
are very few in number compared to the
ut domain. Test data may be found faster
ly if the search is given some guidance.
c searches, this guidance can be provided
a problem-specific fitness function, which
points in the search space with respect to
or their suitability for solving the problem
Input domain
portion of
input domain
denoting required
test data
randomly-generated
inputs
Figure 2. Random search may fail to fulfil low-probability test goals
Fitness
Input domain
(a) Climbing to a local optimum

Machine Learning and Testing
• ML supports decision making
based on data
• Test planning
• Test cost estimation
• Test case management
• Test case prioritization
• Test case design
• Test case refinement
• Test case evaluation
15
• Debugging
• Fault localization
• Bug prioritization
• Fault prediction
• “Machine Learning-based Software
Testing: Towards a Classification
Framework.” SEKE 2011

NLP and Testing
• Natural language is prevalent in software development
• User documentation, procedures, natural language
requirements, etc.
• Natural Language Processing (NLP)
• Can it be used to help automate testing?
• Derive test cases, including oracles
• Traceability between requirements and system test
cases (required by many standards)
16

Introduction to Relevant AI
Techniques
17

Outline
• Metaheuristic search
• Natural Language Processing
18

Metaheuristic Search
• Stochastic optimization through search
• They efficiently explore the search space in order to find good
(near-optimal) feasible solutions
• They can address both discrete- and continuous-domain
optimization problems
• Applicable to many practical situations
• They provide no guarantee of global or local optimality
19

Search Problem
Local Optimum
Global Optimum
MinimizeFind a value x* which minimises (maximises) the objective function f over
a search space X:
∀ x ∈ X : f(x*) ⩽ f(x)
A. Panichella20

Example Problem
• Let’s consider the problem of finding the best visiting sequence (route) to
serve 14 customers.
• Traveling Salesman Problem (STP) – Combinatorial optimization
• How many possible routes?
• (n-1)!=(15-1)!=14!= 8,7178 X 1010 = 88 billion solutions
• Exhaustive search is feasible within a day for the above problem
• But what type of algorithm would you pick with 13,508 cities and 1049,933
feasible solutions?
• Combinatorial explosion
21

Example Metaheuristics
• Genetic Algorithms
• Simulated Annealing
• Tabu Search
• Ant Colony Optimization
• Particle Swarm Optimization
• Iterated Local Search
22

Remarks
• Metaheuristics are non-deterministic
• They usually incorporate mechanisms to avoid getting trapped in
confined areas of the search space
• They are not problem specific
• They may use some form of memory to better guide the search
• They are a relatively new field (since the ‘80s or so)
• They have become possible because we can now afford vast
amounts of computation
23

Different Categories of Heuristics
24
Trajectory Population-based
From P. Repoussis

Genetic Algorithms (GAs)
Genetic Algorithm: Population-based, search algorithm
inspired be evolution theory
Natural selection: Individuals that best
fit the natural environment survive
Reproduction: surviving individuals
generate offsprings (next generation)
Mutation: offsprings inherits
properties of their parents with some
mutations
Iteration: generation after generation
the new offspring fit better the
environment than their parents
From A. Panichella 25

Machine Learning
• Machine learning is programming computers to optimize a
performance criterion using example data or past experience.
• Learning general models from data capturing particular
examples.
• Data is increasingly cheap and abundant; knowledge is
expensive and scarce.
• Build a model that is a good and useful approximation to the
data.
27

28
Learning Strategies
• Supervised Learning, e.g., decision trees
• Classification
• Regression
• Unsupervised Learning, e.g., clustering
• Reinforcement Learning

29
Classification
• Example: Credit
scoring
• Differentiating
between low-risk and
high-risk customers
from their income and
savings
Discriminant: IF income > θ1 AND savings > θ2
THEN low-risk ELSE high-risk
From E. Alpaydim

Natural Language Processing
l Natural Language Understanding
l Natural Language Generation
l In software testing:
l Analyze NL requirements or other forms of documentation to interpret
them and translate them into a form supporting verification and testing
l Generating specifications, test cases (inputs and oracles), and scripts from
NL requirements or other forms of documentation
l Traceability between requirements and other artifacts
30

Understanding Sentences
Parsing and Grammar
How is a sentence composed? (Syntactic analysis)
Lexicons
How is a word composed? (Morphological analysis)
Prefixes, suffixes, and root forms, e.g., short-ness
Ambiguity
Disambiguation: Finding the correct interpretation
31

A Parsing Example: The answer
The boy went home.
From A. Davis 33

Syntactic Analysis Challenges
• Singular vs plural, gender
• Adjectives, adverbs …
• Handling ambiguity
• Syntactic ambiguity: “fruit flies like a banana”
• Having to parse syntactically incorrect sentences
34

Semantic Disambiguation
Example: “with”
Sentence Relation
I ate spaghetti with meatballs. (ingredient of spaghetti)
I ate spaghetti with salad. (side dish of spaghetti)
I ate spaghetti with abandon. (manner of eating)
I ate spaghetti with a fork. (instrument of eating)
I ate spaghetti with a friend. (accompanier of eating)
Disambiguation is probabilistic!
35

Introduction to Search-
Based Software Testing
(SBST)
36

Outline
• Definition of Search-Based Software Engineering (SBSE)
• Definition of Search-Based Software Testing (SBST)
• SBST applied to coverage testing
• Multiple-target techniques
37

Definitions
• Search-Based Software Engineering (SBSE): «The application
of meta-heuristic search-based optimization techniques to
find near-optimal solutions in software engineering
problems.»
• Problem Reformulation: Reformulating typical SE problems as
optimization problems
• Search-based Software Testing: Metaheuristics have shown
to be particularly useful to address many testing problems.
38

Example
Unit Testing
Minimum Maximum
Execution cost Code coverage
Detected bugsNumber of test cases
Search space = Set of all possible Test Suites
39

Why SBST?
No Exhaustive Search
No Exact Search
Meta-heuristics
Random Algorithms
Tabu Search
Hill Climbing
Hill Climbing
Ant Colony
Simulating AnnealingParticle Swarm Optimization
Genetic Algorithms
Issues:
1. Large search space
2. Complex problem (NP-Complete)
4. Often required data are (partly)
available only upon test execution
40

SBST Example
41
Class Triangle {
int a, b, c; //sides
int type = NOT_A_TRIANGLE;
Triangle (int a, int b, int c){…}
void checkRightAngle() {…}
void computeTriangleType() {…}
boolean isTriangle() {…}
public static void main (String args[]) {…}
}
Goal: Automatic generation of test cases using genetic algorithms in
order to achieve the maximum statement coverage
Genetic
Algorithms:
1) Solution
Representation
2) Fitness function
3) Selection
4) Reproduction
(crossover and
mutation)

Solution Representation
42
Class Triangle {
int a, b, c; //sides
int type = NOT_A_TRIANGLE;
Triangle (int a, int b, int c){…}
void checkRightAngle() {…}
void computeTriangleType() {…}
boolean isTriangle() {…}
public static void main (String args[]) {…}
}
The chromosome used for test case generation is the input vector (sequence
of input values used by the test case to run the program), which may be fixed
length or variable length
In our running
example there are
only three input
parameters: a, b, c
a b cX =
Fixed length
chromosome

Fitness Function?
class Triangle {
void computeTriangleType() {
if (a == b) {
if (b == c)
type = "EQUILATERAL";
else
type = "ISOSCELES";
} else if (a == c) {
type = "ISOSCELES";
} else {
if (b == c)
type = "ISOSCELES";
else
checkRightAngle();
}
System.out.println(type);
}
}
1
25
6 7 3
98
10
4
1
25
6 7 3
98
4
Control flow
graph
Dependency
graph
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
43

Fitness Function?
class Triangle {
if (a == b) {
if (b == c)
else
type = "ISOSCELES";
type = "ISOSCELES";
} else {
if (b == c)
type = "ISOSCELES";
else
checkRightAngle();
}
}
}
1
25
6 7 3
98
10
4
1
25
6 7 3
98
4
Control flow
graph
Dependency
graph
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
44

Fitness Function?
class Triangle {
if (a == b) {
if (b == c)
else
type = "ISOSCELES";
type = "ISOSCELES";
} else {
if (b == c)
type = "ISOSCELES";
else
checkRightAngle();
}
}
}
1
25
6 7 3
98
10
4
1
25
6 7 3
98
4
Control flow
graph
Dependency
graph
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
x1 = (2, 2, 2) Path(x1) = <1, 2, 3, 10>
45

Fitness Function?
class Triangle {
if (a == b) {
if (b == c)
else
type = "ISOSCELES";
type = "ISOSCELES";
} else {
if (b == c)
type = "ISOSCELES";
else
checkRightAngle();
}
}
}
6
1
25
7 3
98
10
4
1
25
6 7 3
98
4
Control flow
graph
Dependency
graph
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
x1 = (2, 2, 2) Path(x1) = <1, 2, 3, 10>
x2 = (2, 3, 4) Path(x2) = <1, 5, 7, 9, 10>
What is the closest TC to
cover the statement 8?
46

Approach Level
Approach_level(P(x), t)
Given the execution trace obtained by running program P with input vector x,
the approach level is the minimum number of control nodes between an
executed statement and the coverage target t.
x1 = (2, 2, 2) Path(x1) = <1, 2, 3, 10> AL=2
x2 = (2, 3, 4) Path(x2) = <1, 5, 7, 9, 10> AL=0
1
25
7 3
98
10
46
47

Fitness Function?
class Triangle {
if (a == b) {
if (b == c)
else
type = "ISOSCELES";
type = "ISOSCELES";
} else {
if (b == c)
type = "ISOSCELES";
else
checkRightAngle();
}
}
}
6
1
25
7 3
98
10
4
1
25
6 7 3
98
4
Control flow
graph
Dependency
graph
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
x1 = (2, 2, 2) Path(x1) = <1, 2, 3, 10> AL=2
x2 = (2, 3, 4) Path(x2) = <1, 5, 7, 9, 10> AL=0
x3 = (2, -2, 10) Path(x3) = <1, 5, 7, 9, 10> AL=0

What is the closest TC to
cover the statement 8?
48

Fitness Function?
class Triangle {
if (a == b) {
if (b == c)
else
type = "ISOSCELES";
type = "ISOSCELES";
} else {
if (b == c)
type = "ISOSCELES";
else
checkRightAngle();
}
}
}
6
1
25
7 3
98
10
4
1
25
6 7 3
98
4
Control flow
graph
Dependency
graph
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
x1 = (2, 2, 2) Path(x1) = <1, 2, 3, 10>
x2 = (2, 3, 4) Path(x2) = <1, 5, 7, 9, 10> if (3==4)
x3 = (2, -2, 10) Path(x3) = <1, 5, 7, 9, 10> if (10==-2)
49

Fitness Function?
class Triangle {
if (a == b) {
if (b == c)
else
type = "ISOSCELES";
type = "ISOSCELES";
} else {
if (b == c)
type = "ISOSCELES";
else
checkRightAngle();
}
}
}
6
1
25
7 3
98
10
4
1
25
6 7 3
98
4
Control flow
graph
Dependency
graph
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
x1 = (2, 2, 2) Path(x1) = <1, 2, 3, 10>
x2 = (2, 3, 4) Path(x2) = <1, 5, 7, 9, 10> abs (3-4) = 1
x3 = (2, -2, 10) Path(x3) = <1, 5, 7, 9, 10> abs (-2-10) = 12
50

Branch Distance
Branch_distance(P(x), t)
Given the first control node where the execution diverges from the target t, the predicate
at such node is converted to a distance (from taking the desired branch), normalised
between 0 and 1 (less important than Approach level).
Such a distance measure how far is the test case from taking the desired branch. For
boolean and numerical variables a, b:
51

Branch Distance
For string variables a and b, the branch distance is computed using the following rules:
where j is the position of the first different character such that a[j]!=b[j], while a[i] ==
b[i] for i<j (a[j]-b[j]) is set to zero if a==b). Example of edit distance:
edit_dist(“abcd”, “abbb”)=2
52
M. Alshraideh, L. Bottaci, “Search-
based software test data
generation for string data using
program-specific search
Operators”, STVR, 2006

Branch distance rules for composite predicate
Branch Distance
Alternative normalisation of d:
BD(c) = 1-α-d
BD(c) = d/(d+β)
with α>1 and β>0
53

For statement and branch coverage, given a specific coverage target t, a widely
used fitness function (to be minimised) is:
f(x) = approach_level(P(x), t) + branch_distance(P(x),t)
Approach_level(P(x), t)
Given the execution trace obtained by running program P with input vector x, the
approach level is the minimum number of control nodes between an executed
statement and the coverage target t.
Given the first control node where the execution diverges from the target t, the
predicate at such node is converted to a distance (from taking the desired
branch), normalised between 0 and 1.
Fitness Function?
54

Fitness Function?
class Triangle {
if (a == b) {
if (b == c)
else
type = "ISOSCELES";
type = "ISOSCELES";
} else {
if (b == c)
type = "ISOSCELES";
else
checkRightAngle();
}
}
}
1
25
6 7 3
98
4
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
X1 = (2, 2, 2) Path(x1) = <1, 2, 3, 10> AL=2 f=2.5
X2 = (2, 3, 4) Path(x2) = <1, 5, 7, 9, 10> AL=0
d(a != b) = K = 1
BD (a != b) = 1/ (1+1) = 0.5
f(X1) = 2 + 0.5 = 2.5
a!=b a==b
55

Fitness Function?
class Triangle {
if (a == b) {
if (b == c)
else
type = "ISOSCELES";
type = "ISOSCELES";
} else {
if (b == c)
type = "ISOSCELES";
else
checkRightAngle();
}
}
}
1
25
6 7 3
98
4
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
d(b == c) = abs(b-c) + K = 2
BD (b == c) = 2 / (2+1) = 0.66
f(X2) = 0 + 0.66 = 0.66
a!=b a==b
X1 = (2, 2, 2) Path(x1) = <1, 2, 3, 10> AL=2 f=2.5
X2 = (2, 3, 4) Path(x2) = <1, 5, 7, 9, 10> AL=0 f=0.66
56

x6 = (3,4,5) P ≈ 1/f = 1/0.66 ≈ 1.51
x2 = (2,3,4) P ≈ 1/f = 1/0.66 ≈ 1.51
x7 = (3,5,7) P ≈ 1/f = 1/0.75 ≈ 1.33
x8 = (6,8,4) P ≈ 1/f = 1/0.83 ≈ 1.20
x1 = (2,2,2) P ≈ 1/f = 1/2.50 ≈ 0.40
x5 = (2,2,3) P ≈ 1/f = 1/2.50 ≈ 0.40
x3 = (-2,3,6) P ≈ 1/f ≈ 0
x4 = (2,3,7) P ≈ 1/f ≈ 0
Roulette wheel selection
1) Assign to each test case a probability equal to 1/f (inverse of the fitness score)
Roulette Wheel Selection

x6 = (3,4,5) P ≈ 1/f = 1/0.66 ≈ 1.51
x2 = (2,3,4) P ≈ 1/f = 1/0.66 ≈ 1.51
x7 = (3,5,7) P ≈ 1/f = 1/0.75 ≈ 1.33
x8 = (6,8,4) P ≈ 1/f = 1/0.83 ≈ 1.20
x1 = (2,2,2) P ≈ 1/f = 1/2.50 ≈ 0.40
x5 = (2,2,3) P ≈ 1/f = 1/2.50 ≈ 0.40
x3 = (-2,3,6) P ≈ 1/f ≈ 0
x4 = (2,3,7) P ≈ 1/f ≈ 0
2) Normalise the obtained probability
3) Each test case has a probability to be selected that is proportional to its slice in the
roulette wheel
Tot. = 6.35
0.23
0.23
0.20
0.18
0.06
0.06
x6
24%
x2
24%
x7
21%
x8
19%
x1
6%
x5
6%
x3
0%
x4
0%
Roulette wheel

x6 = (3,4,5) P ≈ 0.23
x2 = (2,3,4) P ≈ 0.23
x7 = (3,5,7) P ≈ 0.20
x8 = (6,8,4) P ≈ 0.18
x1 = (2,2,2) P ≈ 0.06
x5 = (2,2,3) P ≈ 0.06
x3 = (-2,3,6) P ≈ 0
x4 = (2,3,7) P ≈ 0
2) Normalise the obtained probability
3) Each test case has a probability to be selected that is proportional to its slice in the
roulette wheel
x6 = (3,4,5)
x2 = (2,3,4)
x6 = (3,4,5)
x1 = (2,2,2)
x8 = (-3,0,-2)
x2 = (2,3,4)
x7 = (3,5,2)
x7 = (3,5,2)

One-point crossover (probability = 0.8)
It takes two parents and cuts their chromosome strings at some randomly chosen position
and the produced substrings are then swapped to produce two new full-length
chromosomes.
x1 = (2,2,2)
x2 = (2,3,4)
x3 = (-2,3,6)
x4 = (2,3,7)
x5 = (2,2,3)
x6 = (3,4,5)
x7 = (3,5,7)
x8 = (6,8,4)
Parents
-
x2, x5
x3, x8
x4, x6
SEL
SEL
-
SEL
Offsprings
x1 = (2,2,2)
x2 = (2,2,3)
x3 = (-2,3,4)
x4 = (2,3,5)
x5 = (2,3,4)
x6 = (3,4,7)
x7 = (3,5,7)
x8 = (6,8,6)
Cut-point
-
1
2
2
1
2
-
2
Reproduction (Crossover)

Mutation: randomly change some genes (elements within each chromosome)
Mutation probability: 1/n where n = chromosome length
Offsprings
x1 = (2,2,2)
x2 = (2,2,4)
x3 = (-2,8,6)
x4 = (2,4,7)
x5 = (2,3,3)
x6 = (3,3,5)
x7 = (3,5,7)
x8 = (6,3,4)
Mutated Offsprings
x1 = (2,2,2)
x2 = (2,-1,4)
x3 = (-2,8,0)
x4 = (2,4,7)
x5 = (2,3,3)
x6 = (3,12,5)
x7 = (3,5,7)
x8 = (6,3,4)
Reproduction (Mutation)

Initial Population
Mutation
Crossover
Selection
End?
YESNO
One target approach:
1) Select one target (statement or branch) to cover
2) Run GAs until reaching the maximum search budget (max iterations) or when the target is
covered (fitness function = 0)
3) Repeat from step (1) for a new target (statement or branch)
One-Target Approach
class Triangle {
if (a == b) {
if (b == c)
else
type = "ISOSCELES";
type = "ISOSCELES";
} else {
if (b == c)
type = "ISOSCELES";
else
checkRightAngle();
}
}
}
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
target
62

Limitations
Some coverage targets may be infeasible
Some coverage targets may be very difficult to achieve
Since a limited search budget is available for test case generation:
• Infeasible targets may use the search budget without reaching any target
• Difficult target may use most of the search budget, leaving lots of easier
coverage target uncovered
• The order in which targets are considered affects the final results
How to solve multiple targets at once? How?
63

Techniques for Multiple Targets
G. Fraser, A.Arcuri
“Whole Test Suite Generation”
IEEE Transactions on Software Engineering, 2013
A.Panichella, F. M. Kifetew, P. Tonella
“Reformulating Branch Coverage as Many-Objective
Optimization Problems”
IIEEE International Conference on Software Testing, Verification, and
Validation (ICST), 2015
64

Outline
• Example research projects with industry partners:
• Vulnerability testing (Banking)
• Testing advanced driver assistance systems
• Testing controllers (automotive)
• Stress testing critical task deadlines (Energy)
66

Vulnerability Testing
[Appelt et al.], [Jan et al.]
67

42%
32%
9%
4%
3%
3%
3%
2%
2%
Code Injection
Manipulated data structures
Collect and analyze information
Indicator
Employ probabilistic techniques
Manipulate system resources
Subvert access control
Abuse existing functionality
Engage in deceptive…
X-Force Threat Intelligence Index
2017
68
https://www.ibm.com/security/xforce/
More than 40% of all
attacks were injection
attacks (e.g., SQLi)

Web Applications
69
Server SQL DatabaseClient

Web Applications
70
Web form
str1
str2
Username
Password
OK
SQL query
SELECT *
FROM Users WHERE
(usr = ‘str1’ AND psw = ‘str2’)
Name Surname …
John Smith …
Result

Injection Attacks
71
SQL query
Name Surname …
Aria Stark …
John Snow …
… … …
Query result
SELECT *
FROM Users
WHERE (usr = ‘’ AND
psw = ‘’) OR 1=1 --
Web form
‘) OR 1=1 --
Username
Password
OK

Protection Layers
Server
SQL
Database
Client
Data input
Validation
and
Sanitization
Database
Firewall
Web
Application
Firewall
72

Web Application Firewalls (WAFs)
73
Servermalicious
malicious
malicious
legitimate
WAF

WAF Rule Set
74
Rule set of Apache ModSecurity
https://github.com/SpiderLabs/ModSecurity

Misconfigured WAFs
75
BLOCKED
False Positive
ALLOWED
False Negative

Grammar-based Attack
Generation
• BNF grammar for SQLi attacks
• Random strategy: randomly selected production rules are
applied recursively until only terminals are left
• Random strategy not efficient for bypassing attacks that are
difficult to find
• Machine learning? Search?
• How to guide the search? How can ML help?
76

Anatomy of SQLi attacks
77
‘ OR“a”=“a”#
Bypassing Attack
<START>
<sq> <wsp> <sqliAttack> <cmt>
<boolAttack>
<opOR> <boolTrueExpr>
OR <bynaryTrue>
<dq> <ch> <dq> <opEq> <dq> <ch> <dq>
“ a ” = “ a ”
<sQuoteContext>
‘ #_
Derivation Tree
‘
_
OR”a”=“a”
#
S =
{
Attack Slices

Learning Attack Patterns
78
S1 S2 S3 S4 … Sn Outcome
A1 1 1 0 0 … 0 Passed
A2 0 1 0 0 … 0 Blocked
… … … … … … … …
Am 1 1 1 1 … 1 Blocked
Training Set
PassedBlocked
S4
YesNo
YesNo
YesNo
S3
S2
Decision Tree
Sn
S1
…
• Random trees
• Random forest

Learning Attack Patterns
79
S1 S2 S3 S4 … Sn Outcome
A1 1 1 0 0 … 0 Passed
A2 0 1 0 0 … 0 Blocked
… … … … … … … …
Am 1 1 1 1 … 1 Blocked
PassedBlocked
S4
YesNo
YesNo
YesNo
S3
S2
Sn
S1
…
Training Set Decision Tree
Attack Pattern
S2 ∧ ¬ Sn ∧ S1

Machine Learning
Generating Attacks via ML and
EAs
80
Evolutionary Algorithm (EA)
Iteratively refine successful attack
conditions PassedBlocked
S4
YesNo
YesNo
YesNo
S3
S2
Sn
S1
…

Some Results
Apache ModSecurity
81
DistinctAttacks
Industrial WAFs
DistinctAttacks
Machine Learning-driven attack generation led to more
distinct, successful attacks being discovered faster

Related Work
• Automated repair of WAFs
• Automated testing targeting XML and SQL injections in web
applications
• Automated detection of malicious SQL statements
82

Testing Advanced Driving
Assistance Systems
[Ben Abdessalem et al.]
83

Cyber-Physical Systems
• A system of collaborating computational elements controlling
physical entities
84
84

Advanced Driver Assistance
Systems (ADAS)
85
Automated Emergency Braking (AEB)
Pedestrian Protection (PP)
Lane Departure Warning (LDW)
Traffic Sign Recognition (TSR)

Automotive Environment
• Highly varied environments, e.g., road topology, weather, building
and pedestrians …
• Huge number of possible scenarios, e.g., determined by
trajectories of pedestrians and cars
• ADAS play an increasingly critical role
• A challenge for testing
86

Advanced Driver Assistance
Systems (ADAS)
Decisions are made over time based on sensor data
87
Sensors
Controller
Actuators Decision
Sensors
/Camera
Environment
ADAS

A General and Fundamental Shift
• Increasingly so, it is easier to learn behavior from data using
machine learning, rather than specify and code
• Deep learning, reinforcement learning …
• Example: Neural networks (deep learning)
• Millions of weights learned
• No explicit code, no specifications
• Verification, testing?
88

CPS Development Process
89
Functional modeling:
• Controllers
• Plant
• Decision
Continuous and discrete
Simulink models
Model simulation and
testing
Architecture modelling
• Structure
• Behavior
• Traceability
System engineering modeling
(SysML)
Analysis:
• Model execution and
testing
• Model-based testing
• Traceability and
change impact
analysis
• ...
(partial) Code generation
Deployed executables on
target platform
Hardware (Sensors ...)
Analog simulators
Testing (expensive)
Hardware-in-the-Loop
Stage
Software-in-the-Loop
Stage
Model-in-the-Loop Stage
89

Automotive Environment
• Highly varied environments, e.g., road topology, weather, building
and pedestrians …
• Huge number of possible scenarios, e.g., determined by
trajectories of pedestrians and cars
• ADAS play an increasingly critical role
• A challenge for testing
90

Our Goal
• Developing an automated testing technique
for ADAS
91
• To help engineers efficiently and
effectively explore the complex test input
space of ADAS
• To identify critical (failure-revealing) test
scenarios
• Characterization of input conditions that
lead to most critical situations, e.g.,
safety violations

92
Automated Emergency Braking
System (AEB)
92
“Brake-request”
when braking is needed
to avoid collisions
Decision making
Vision
(Camera)
Sensor
Brake
Controller
Objects’
position/speed

Example Critical Situation
• “AEB properly detects a pedestrian in front of the car with a
high degree of certainty and applies braking, but an accident
still happens where the car hits the pedestrian with a
relatively high speed”
93

Testing ADAS
94
A simulator based on
Physical/Mathematical models
On-road testing
Simulation-based (model) testing

Testing via Physics-based
Simulation
95
ADAS
(SUT)
Simulator (Matlab/Simulink)
Model
(Matlab/Simulink)
▪ Physical plant (vehicle / sensors / actuators)
▪ Other cars
▪ Pedestrians
▪ Environment (weather / roads / traﬃc signs)
Test input
Test output
time-stamped output

AEB Domain Model
- visibility:
VisibilityRange
- fog: Boolean
- fogColor:
FogColor
Weather
- frictionCoeff:
Real
Road1
- v0 : Real
Vehicle
- : Real
- : Real
- : Real
- :Real
Pedestrian
- simulationTime:
Real
- timeStep: Real
Test
Scenario
1
1
- ModerateRain
- HeavyRain
- VeryHeavyRain
- ExtremeRain
«enumeration»
RainType- ModerateSnow
- HeavySnow
- VeryHeavySnow
- ExtremeSnow
«enumeration»
SnowType
- DimGray
- Gray
- DarkGray
- Silver
- LightGray
- None
«enumeration»
FogColor
1
WeatherC
{{OCL} self.fog=false
implies self.visibility = “300”
and self.fogColor=None}
Straight
- height:
RampHeight
Ramped
- radius:
CurvedRadius
Curved
- snowType:
SnowType
Snow
- rainType:
RainType
Rain
Normal
- 5 - 10 - 15 - 20
- 25 - 30 - 35 - 40
«enumeration»
CurvedRadius (CR)
- 4 - 6 - 8 - 10 - 12
«enumeration»
RampHeight (RH)
- 10 - 20 - 30 - 40 - 50
- 60 - 70 - 80 - 90 - 100
- 110 - 120 - 130 - 140
- 150 - 160 - 170 - 180
- 190 - 200 - 210 - 220
- 230 - 240 - 250 - 260
- 270 - 280 - 290 - 300
«enumeration»
VisibilityRange
- : TTC: Real
- : certaintyOfDetection:
Real
- : braking: Boolean
AEB Output
- : Real
- : Real
Output functions
Mobile
object
Position
vector
- x: Real
- y: Real
Position
1 11
1
1
Static input
1
Output
1
1
Dynamic input
xp
0
yp
0
vp
0
✓p
0
vc
0
v3
v2
v1
F1
F2

ADAS Testing Challenges
• Test input space is large, complex and multidimensional
• Explaining failures and fault localization are difficult
• Execution of physics-based simulation models is computationally
expensive
97

Our Approach
• We use decision tree classification models
• We use multi-objective search algorithm (NSGAII)
• Objective Functions:
• Each search iteration calls simulation to compute objective functions
• Input values required to perform the simulation:
98
1. Minimum distance between the pedestrian and the field of view
2. The car speed at the time of collision
3. The probability that the object detected is a pedestrian
Precipita-
tion
Fogginess Road
shape
Visibility
range
Car-speed Person-
speed
Person-
position
Person-
orientation

Multiple Objectives: Pareto Front
99
Individual A Pareto
dominates individual B if
A is at least as good as B
in every objective
and better than B in at
least one objective.
Dominated by x
F1
F2
Pareto front
x
• A multi-objective optimization algorithm (e.g., NSGA II) must:
• Guide the search towards the global Pareto-Optimal front.
• Maintain solution diversity in the Pareto-Optimal front.

Search-based Testing Process
100
Test input generation (NSGA II)
Evaluating test inputs
- Select best tests
- Generate new tests
(candidate)
test inputs
- Simulate every (candidate) test
- Compute fitness functions
Fitness
values
Test cases revealing worst case system behaviors
Input data ranges/dependencies + Simulator + Fitness functions
defined based on Oracles

Search: Genetic Evolution
101
Initial input
Fitness
computation
Selection
Breeding

Better Guidance
• Fitness computations rely on simulations and are very
expensive
• Search needs better guidance
102

Decision Trees
103
Partition the input space into homogeneous regions
All points
Count 1200
“non-critical” 79%
“critical” 21%
“critical” 41%
Count 564 Count 636
“critical” 2%
Count 412
“critical” 51%
Count 152
“critical” 16%
Count 230 Count 182
vp
0 >= 7.2km/h vp
0 < 7.2km/h
✓p
0 < 218.6 ✓p
0 >= 218.6
RoadTopology(CR = 5,
Straight, RH = [4 12](m))
RoadTopology
(CR = [10 40](m))
“critical” 69%
“critical” 28%

Genetic Evolution Guided by
Classification
104
Initial input
Fitness
computation
Classification
Selection
Breeding

Search Guided by Classification
105
Test input generation (NSGA II)
Evaluating test inputs
Build a classification tree
Select/generate tests in the fittest regions
Apply genetic operators
Input data ranges/dependencies + Simulator + Fitness functions
defined based on Oracles
(candidate)
test inputs
- Simulate every (candidate) test
- Compute fitness functions
Fitness
values
Test cases revealing worst case system behaviors +
A characterization of critical input regions

NSGAII-DT vs. NSGAII
106
NSGAII-DT outperforms NSGAII
HV
0.0
0.4
0.8
GD
0.05
0.15
0.25
SP
2
0.6
1.0
1.4
6 10 14 18 22 24
Time (h)
NSGAII-DT
NSGAII

Automatic Generation of
System Test Cases
from Requirements
in Natural Language
[Wang et al.]
107

Problem
Automatically verify the compliance of
software systems
with their functional requirements
in a cost-effective way
108

Context
Automotive Embedded Systems
109

Working Assumption
Use Case
Specifications
Domain
Model
110

Use Case
Specifications
(RUCM template)
Concise
Mapping Table
Domain
Model
Automated Generation
Regex Mapping
weight=[d+] Sensor.setWeight
initialized=true System.start
111
Executable Test Cases

Use Case
Specifications
(RUCM template)
Concise
Mapping Table
Domain
Model
Automated Generation
Regex Mapping
weight=[d+] Sensor.setWeight
initialized=true System.start
112
Executable Test Cases
NL sentences to formulae

Use Case Specifications
Example
BodySense: embedded system that determines the occupancy status of
seats in a car
113

Use Case Specifications
Example
Precondition: The system has been initialized
Basic Flow
1. The SeatSensor SENDS the weight TO the system.
2. INCLUDE USE CASE Self Diagnosis.
3. The system VALIDATES THAT no error has been detected.
4. The system VALIDATES THAT the weight is above 20 Kg.
5. The system sets the occupancy status to adult.
6. The system SENDS the occupancy status TO AirbagControlUnit.
--written according to RUCM template--
114

115
Precondition: The system has been initialized
Basic Flow
1. The SeatSensor SENDS the weight TO the system.
2. INCLUDE USE CASE Self Diagnosis.
3. The system VALIDATES THAT no error has been detected.
4. The system VALIDATES THAT the weight is above 20 Kg.
5. The system sets the occupancy status to adult.
6. The system SENDS the occupancy status TO AirbagControlUnit.
Alternative Flow
RFS 4.
1. IF the weight is above 1 Kg THEN
2. The system sets the occupancy status to child.
3. ENDIF.
4. RESUME STEP 6.

UseCaseStart
Input
Condition
Condition
Output
Exit
Condition
Internal
Internal
Include INCLUDE USE CASE Self Diagnosis.
IF the weight is above 1 Kg THEN
The SeatSensor SENDS the weight TO the system.
The system sets the occupancy status to adult.
The system SENDS the occupant class TO AirbagControlUnit.
The system VALIDATES THAT no error has been detected.
The system sets the occupancy status to child.
The system VALIDATES THAT the weight is above 20 Kg.
Precondition: The system has been initialized.
Model-based
Test Case Generation
driven by
coverage criteria
116

Domain Model:
Formalizing Conditions
Manually written OCL constraint:
“The system VALIDATES THAT no error has been detected.”
Error.allInstances()->forAll( i | i.isDetected = false)
117

UseCaseStart
Input
Condition
Condition
Output
Exit
Condition
Internal
Internal
Include INCLUDE USE CASE Self Diagnosis.
IF the weight is above 1 Kg THEN
The SeatSensor SENDS the weight TO the system.
The system sets the occupancy status to adult.
The system SENDS the occupant class TO AirbagControlUnit.
The system VALIDATES THAT no error has been detected.
The system sets the occupancy status to child.
The system VALIDATES THAT the weight is above 20 Kg.
Precondition: The system has been initialized.
OCL
OCL
OCL
OCL
System.allInstances()->forAll( s | s.initialized = true )
AND System.allInstances()->forAll( s | s.initialized = true )
AND Error.allInstances()->forAll( e | e.isDetected = false)
AND System.allInstances()
->forAll( s | s.occupancyStatus = Occupancy::Adult )
Path condition:
Constraint
Solving
Test inputs:
118

Automated Generation of OCL
Expressions
“The system VALIDATES THAT
no error has been detected.”
OCLgen
119

EntityName left-hand side
(variable)
right-hand side
(variable/value)
operator
Pattern
120

OCLgen solution
“The system sets the occupancy status to adult.”
actor affected by the verb final state
1. determine the role of words in a sentence
121

OCLgen solution
2. match words in the sentence with concepts in the domain model
122

OCLgen solution
BodySense.allInstances()
->forAll( i | i.occupancyStatus = Occupancy::Adult)
3. generate the OCL constraint using a verb-specific transformation rule
123

OCLgen solution
BodySense.allInstances()
->forAll( i | i.occupancyStatus = Occupancy::Adult)
Based on Semantic Role Labeling
Lexicons that describe the sets of roles typically
Based on String similarity
124
3. generate the OCL constraint using a verb-specific transformation rule

Constraints Generation Process
Execute SRL
use case sentence
text with SRL labels
Select and apply verb-specific transformation rule
• All rules share a common algorithmic structure
• Rules differ for the SRL roles labels considered
EntityName.allInstances()->forAll( i | i.LHS <Operator> RHS)125

Schedulability Analysis and
Strress Testing
[Di Alesio et al.]
126

Problem and Context
• Schedulability analysis encompasses techniques that try to
predict whether (critical) tasks are schedulable, i.e., meet
their deadlines
• Stress testing runs carefully selected test cases that have
a high probability of leading to deadline misses
• Stress testing is complementary to schedulability analysis
• Testing is typically expensive, e.g., hardware in the loop
• Finding stress test cases is difficult
127

Finding Stress Test Cases is Hard
128
0
1
2
3
4
5
6
7
8
9
j0, j1 , j2 arrive at at0 , at1 , at2 and must
finish before dl0 , dl1 , dl2
J1 can miss its deadline dl1 depending on
when at2 occurs!
0
1
2
3
4
5
6
7
8
9
j0 j1 j2 j0 j1 j2
at0
dl0
dl1
at1 dl2
at2
T
T
at0
dl0 dl1
at1
at2
dl2

Challenges and Solutions
• Ranges for arrival times form a very large input space
• Task interdependencies and properties constrain what
parts of the space are feasible
• Solution: We re-expressed the problem as a constraint
optimization problem and used a combination of constraint
programming (IBM CPLEX) and meta-heuristic search (GA)
129

Constraint Optimization
130
Constraint Optimization Problem
Static Properties of Tasks
(Constants)
Dynamic Properties of Tasks
(Variables)
Performance Requirement
(Objective Function)
OS Scheduler Behaviour
(Constraints)

Combining CP and GA
131
A:12 S. Di Alesio et al.
Fig. 3: Overview of GA+CP: the solutions x , y and z in the initial population of GA evolve into

Case Study
132
Drivers
(Software-Hardware Interface)
Control Modules
Alarm Devices
(Hardware)
Multicore Architecture
Real-Time Operating System
System monitors gas leaks and fire in
oil extraction platforms

Summary
• We provided a solution for generating stress test cases by combining
meta-heuristic search and constraint programming
• Meta-heuristic search (GA) identifies high risk regions in the
input space
• Constraint programming (CP) finds provably worst-case
schedules within these (limited) regions
• Achieve (nearly) GA efficiency and CP effectiveness
• Our approach can be used both for stress testing and
schedulability analysis (assumption free)
133

Other Industrial Projects
• Delphi: Testing and verification of CPS Simulink models
(e.g., controllers) [Matinnejad et al.]
• SES: Hardware-in-the-Loop, acceptance testing of CPS
[Seung et al.]
• IEE: Testing timing properties in embedded systems
[Wang et al.]
• Luxembourg government: Generating representative,
synthetic test data for information systems [Soltana et
al.]
135

Role of AI
• Metaheuristic search:
• Many test automation problems can be re-expressed into
search and optimization problems
• Machine learning:
• Automation can be better guided and effective when
learning from data: test execution results, fault detection …
• Natural Language Processing:
• Natural language is commonly used and is an obstacle to
automated analysis and therefore test automation
136

Search-Based Solutions
• Versatile
• Helps relax assumptions compared to exact approaches
• Helps decrease modeling requirements
• Scalability, e.g., easy to parallelize
• Requires massive empirical studies
• Search is rarely sufficient by itself
137

Multidisciplinary Approach
• Single-technology approaches rarely work in practice
• Combined search with:
• Solvers, e.g., CP, SMT
• Statistical approaches, e.g., sensitivity analysis
• System and environment modeling and simulation
138

The Road Ahead
• We need to develop techniques that strike a balance in terms
of scalability, practicality, applicability, and offering a
maximum level of dependability guarantees
• We need more multi-disciplinary research involving AI
• In most industrial contexts, offering absolute guarantees
(correctness, safety, or security) is illusory
• The best trade-offs between cost and level of guarantees is
necessarily context-dependent
• Research in this field cannot be oblivious to context (domain
…)
139

Some Leading Researchers
• A. Arcuri, Westerdals and U. of Luxembourg, Norway & Luxembourg
• R. Feldt, Chalmers U., Sweden
• G. Fraser, U. of Passau, Germany
• M. Harman, Facebook and UCL, UK
• T. Menzies, North-Carolina State U. , USA
• P. McMinn, U. Sheffield, UK
• A. Panichella, Delft U., NL
• M. Pezze, P. Tonella, U. of Lugano, Switzerland
• A. Zeller, CISPA and Saarland U., Germany
141

Selected SBST References
• McMinn, “Search-Based Software Testing: Past, Present and Future”, ICST 2011
• Harman et al., “Search-based software engineering: Trends, techniques and applications”, ACM
Computing Surveys, 2012
• Fraser, Arcuri, “Whole Test Suite Generation”, IEEE Transactions on Software Engineering, 2013
• A.Panichella et al., “Reformulating Branch Coverage as Many-Objective Optimization Problems”,
ICST 2015
• Ali et al., “Generating Test Data from OCL Constraints with Search Techniques”, IEEE Transactions
on Software Engineering, 2013
• Hemmati et al., “Achieving Scalable Model-based Testing through Test Case Diversity”, ACM
TOSEM, 2013
142

Selected ML-driven Testing
• Noorian et al., “Machine Learning-based Software Testing:
Towards a Classification Framework”, SEKE 2011
• Briand et al., “Using machine learning to refine category-partition
test specifications and test suites”, Information and Software
Technology (Elsevier), 2009
• Appelt et al., “A Machine Learning-Driven Evolutionary Approach
for Testing Web Application Firewalls”, IEEE Transaction on
Reliability, 2018
• Machine learning session at ISSTA 2018!
143

NLP-driven Testing
• Wang et al., “Automatic generation of system test cases from use case
specifications”, ISSTA 2015
• Wang et al., “Automated Generation of Constraints from Use Case
Specifications to Support System Testing”, ICST 2018
• Mai et al., “A Natural Language Programming Approach for
Requirements-based Security Testing”, ISSRE 2018
• Blasi et al., “Translating Code Comments to Procedure Specifications”,
ISSTA 2018
• Arnaoudova et al., “The use of text retrieval and natural language
processing in software engineering”, ICSE 2015
144

Selected Industrial Examples
• Matinnejad et al., “MiL Testing of Highly Configurable Continuous
Controllers: Scalable Search Using Surrogate Models”, ASE 2014
• Di Alesio et al. “Combining genetic algorithms and constraint
programming to support stress testing of task deadlines”, ACM
Transactions on Software Engineering and Methodology, 2015
• Ben Abdessalem et al., "Testing Vision-Based Control Systems Using
Learnable Evolutionary Algorithms”, ICSE 2018
• Soltana et al., “Synthetic Data Generation for Statistical
Testing”, ASE 2017.
• Shin et al., “Test case prioritization for acceptance testing of cyber-
physical systems”, ISSTA 2018
145

Selected Industrial Examples
• Appelt et al., “A Machine Learning-Driven Evolutionary Approach for
Testing Web Application Firewalls”, IEEE Transaction on Reliability, 2018
• Jan et al., “Automatic Generation of Tests to Exploit XML Injection
Vulnerabilities in Web Applications”, IEEE Transactions on Software
Engineering, 2018
• Wang et al., “System Testing of Timing Requirements Based on Use Cases
and Timed Automata”, ICST 2017
• Wang et al., “Automated Generation of Constraints from Use Case
Specifications to Support System Testing”, ICST 2018
146

Artificial Intelligence for Automated Software Testing

More Related Content

What's hot (20)

Similar to Artificial Intelligence for Automated Software Testing (20)

More from Lionel Briand (20)

Recently uploaded (20)

Artificial Intelligence for Automated Software Testing