SEN4013 Software Verification and Validation: A Framework For Test and Analysis
SEN4013 Software Verification and Validation: A Framework For Test and Analysis
Lecture 2
A Framework for Test and Analysis
Learning Objectives
• Introduce dimensions and tradeoff between test
and analysis activities
• Distinguish validation from verification activities
• Understand limitations and possibilities of test and
analysis
2
Software Test
• Purpose of software test and analysis
• Assess software quality
• Improve software by finding defects
• Dependability of software
• Correctness
• Consistency of implementation with specification
• Reliability
• Likelihood of correct functioning
• Robustness
• Acceptable behavior in unusual circumstances
• Safety
• Absence of unacceptable behaviors
3
Software Test
• No perfect test or analysis techniques.
• Not a single “best” technique for all circumstances.
• Testing techniques
• exist in a complex space of trade-offs
• have complementary strengths and weaknesses.
4
First Computer Bug
• In 1947 Harvard University was operating a
room-sized computer called the Mark II.
• Mechanical relays
• Glowing vacuum tubes
• Technicians program the computer by reconfiguring it
• Technicians had to change the occasional vacuum tube.
5
Bugs – Errors – Faults
• Pac-Man (1980)
• Has “Split Screen” at level 256
• Cause: Integer overflow
• 8 bits: maximum representable value
6
Bugs – Errors – Faults
• AT&T (1990)
• One switching system in
New York City
experienced an
intermittent failure that
caused a major service
outage
• The first major network
problem in AT&T’s 114-
year history
• Cause: Wrong BREAK
statement in C Code
• Complete code coverage
could have revealed this
bug during testing
7
Bugs – Errors – Faults
• Ariane 5 flight 501 (1996)
• Destroyed 37 seconds after launch (cost: $370M)
• Cause: Arithmetic overflow
• Data conversion from a 64-bit floating point to 16-bit
signed integer value caused an exception
• The software from Ariane 4 was re-used for Ariane 5
without retesting
8
Bugs – Errors – Faults
• Mars Climate Orbiter (1998)
• Sent to Mars to relay signal from Mars Lander
• Smashed to the planet
• Cause: Failing to convert between different metric
standards
• Software that calculated the total impulse presented
results in pound-seconds
• The system using these results expected its inputs to be
in newton-seconds
9
Bugs – Errors – Faults
• THERAC-25 Radiation Therapy (1985)
• 3 cancer patients received fatal overdoses
• Cause:
• Miss-handling of race condition of the software in the
equipment
10
Software Failure, Fault & Error
• Fault
• Incorrect portions of code.
• may involve missing code as well as incorrect code
• Necessary (not sufficient) condition for the occurrence of a failure.
• Failure
• Observable incorrect behavior of a program.
• Error
• Cause of a fault. something bad a programmer did (conceptual,
typo, etc.).
• Bug
• Informal term for fault/failure.
11
Approaches to Reduce Faults
• Manual code review
• Manually review the code to detect faults
• Limitations:
• Hard to evaluate your progress
• Can miss many faults/bugs
12
Approaches to Reduce Faults
• Manual code review
13
Approaches to Reduce Faults
14
Approaches to Reduce Faults
• Verification:
Does the software system meet the requirements
specifications?
are we building the software right?
18
Validation
• Software meeting the user’s real needs???
• Fulfilling requirements is not the same as conforming to
a requirements specification.
• A specification is a statement about a particular
proposed solution to a problem, and that proposed
solution may or may not achieve its goals.
• Specifications are written by people, and therefore
contain mistakes.
• A system that meets its actual goals is useful, while a
system that is consistent with its specification is
dependable.
19
Verification
• Checking the consistency of an implementation
with a specification.
• Verification is a check of consistency between
implementation and specification, in contrast to
validation which compares a description (whether a
requirements specification, a design, or a running
system) against actual needs.
20
Validation and Verification
SW
Actual Specs
Requirements System
Validation Verification
Includes usability Includes testing,
testing, user feedback inspections, static
analysis
21
Validation and Verification
12345678
• Verification or validation depends on
the specification.
• Example: elevator response
23
V Model
• Verification activities check consistency between
descriptions (design and specifications) at adjacent
levels of detail, and between these descriptions
and code.
• Validation activities attempt to gauge whether the
system actually satisfies its intended purpose.
24
Validation and Verification
25
Dependability
• A system that meets its actual goals is useful, while
a system that is consistent with its specification is
dependable.
• Dependability properties include correctness,
reliability, robustness, and safety.
• Correctness: Absolute consistency with a
specification, always and in all circumstances.
28
Dependability
• Reliability: Statistical approximation to correctness,
expressed as the likelihood of correct behavior in
expected use.
• Robustness: Weighs properties as more and less
critical. Distinguishes which properties should be
maintained even under exceptional circumstances
in which full functionality cannot be maintained.
• Safety: A kind of robustness in which the critical
property to be maintained is avoidance of
hazardous behaviors.
29
Degrees of Freedom
• Given a precise specification and a program, it seems that
one ought to be able to arrive at some logically sound
argument or proof that a program satisfies the specified
properties.
• After all, if a civil engineer can perform mathematical
calculations to show that a bridge will carry a specified
amount of traffic, shouldn’t we be able to similarly apply
mathematical logic to verification of programs?
30
Verification and Undecidability
• Program testing is a verification technique and is as
vulnerable to undecidability as other techniques.
• Exhaustive testing, that is, executing and checking
every possible behavior of a program, would be a
“proof by cases,” which is a perfectly legitimate way to
construct a logical proof.
• BUT How long would this take?
• If we ignore implementation details such as the size of
the memory holding a program and its data, the
answer is “forever.”
• That is, for most programs, exhaustive testing cannot
be completed in any finite amount of time.
31
Exhaustive Testing
• Programs are executed on real machines with finite representations of
memory values.
• Consider the following trivial Java class:
class Trivial{
static int sum(int a, int b)
{ return a + b; }
}
32
Getting What You Need ...
Theorem proving: Perfect verification of
Unbounded effort to
verify general
arbitrary properties by • optimistic inaccuracy: we may
properties.
logical proof or exhaustive
testing (Infinite effort) accept some programs that do not
Model checking:
possess the property (i.e., it may
Decidable but possibly
intractable checking of
not detect all violations).
simple temporal • testing
properties.
Data flow
analysis
• pessimistic inaccuracy: it is not
guaranteed to accept a program
Typical testing even if the program does possess
the property being analyzed
Precise analysis of techniques
simple syntactic
properties.
• automated program analysis
techniques
• simplified properties: reduce the
Simplified Optimistic
degree of freedom for simplifying
properties inaccuracy the property to check
Pessimistic
inaccuracy
34
Don’t know?
• Some analysis techniques may give a third possible
answer, “don’t know.”
• We can consider these techniques to be either
optimistic or pessimistic depending on how we
interpret the “don’t know” result.
• Perfection is unobtainable, but one can choose
techniques that err in only a particular direction.
35
Pessimistic
• A software verification technique that errs only in the
pessimistic direction is called a conservative analysis.
• It might seem that a conservative analysis would always be
preferable to one that could accept a faulty program.
• However, a conservative analysis will often produce a very
large number of spurious error reports, in addition to a few
accurate reports.
• A human may, with some effort, distinguish real faults from
a few spurious reports, but cannot cope effectively with a
long list of purported faults of which most are false alarms.
• Often only a careful choice of complementary optimistic and
pessimistic techniques can help in mutually reducing the
different problems of the techniques and produce
acceptable results.
36
Third Dimension
• Substituting a property that is more easily checked or
constraining the class of programs that can be checked.
• Suppose we want to verify a property S, but we are not
willing to accept the optimistic inaccuracy of testing for
S, and the only messages that they are worthless.
• Suppose we know some property S’ that is a sufficient,
but not necessary, condition for S (i.e., the validity of S’
implies S, but not the contrary).
• Maybe S’ is so much simpler than S that it can be
analyzed with little or no pessimistic inaccuracy.
• If we check S’ rather than S, then we may be able to
provide precise error messages that describe a real
violation of S’ rather than a potential violation of S.
37
Substitution Example
• Each variable should be initialized with a value
before its value is used in an expression.
• In the C language, a compiler cannot provide a
precise static check for this property, because of
the possibility of code like the following:
38
Substitution Example (cont.)
• It is impossible in general to determine whether
each control flow path can be executed, and while
a human will quickly recognize that the variable
sum is initialized on the first iteration of the loop, a
compiler or other static analysis tool will typically
not be able to rule out an execution in which the
initialization is skipped on the first iteration.
• Java neatly solves this problem by making code like
this illegal; that is, the rule is that a variable must
be initialized on all program control paths, whether
those paths can ever be executed.
39
Summary
• Most interesting properties are undecidable, thus
in general we cannot count on tools that work
without human intervention
• Assessing program qualities comprises two
complementary sets of activities: validation (does
the software do what it is supposed to do?) and
verification (does the system behave as specified?)
• There is no single technique for all purposes: test
designers need to select a suitable combination of
techniques
40