0% found this document useful (0 votes)
26 views108 pages

01 Intro

Uploaded by

kevinlin13588
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
26 views108 pages

01 Intro

Uploaded by

kevinlin13588
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 108

SE320

Software Verification &


Validation
Week 1: Overview

Fall 2024
First…
• Lecture is recorded!
• CCI auto-records all in-person class sections, which occasionally surprises
someone in week 6
Why Are We Here?
• (in this class)
• Many/most of you will go on to write software professionally
• You will be expected to produce software that (mostly) works as
expected
• How do you do that?
• What is “expected”?
• What is “works”?
• What is “mostly”?
Introduction
• Software Testing is a critical element of developing quality software
systems
• It is a systematic approach to judging quality and discovering bugs
• This course presents the theory and practice of software testing
• Topics covered include:
• Black-box and white-box testing, and related test data generation techniques
• Tools for software testing
• Performance testing basics
• Security testing basics
Course Objectives (Officially)
• Understand the concepts and theory related to software testing
• Understand the relationship between black-box and white-box
testing, and know how to apply as appropriate
• Understand different testing techniques used for developing test
cases and evaluating test adequacy
• Learn to use automated testing tools in order to measure code
coverage
Course Objectives (Unofficially)
• Give you a toolbox of approaches to thinking about software quality
• Make sure you understand their trade-offs, to pick the right tool for
the job
• Give you a preview of software quality techniques that are seeing
increasing adoption or increasing relevance
• Including just a taste more speculative techniques
Course Trivia
• Instructor: Dr. Colin S. Gordon
• Email: [email protected]
• Office: 1174
• Office Hours:
• TBD
• By appointment (do not hesitate to do this!)
Why Am I Teaching This Class?
• Because I’m obsessed with things not working
Why Am I Teaching This Class?
Over time, I’ve wondered about:
• “Why doesn’t my garbage collector work?”
• “Why doesn’t my OS kernel work?”
• “Why don’t (any) concurrent programs work?”
• “Why doesn’t my compiler work?”
• Now I wonder about lots of things, but mostly about general
approaches to prevent or detect software defects
Teaching Assistant
• Binh Le ([email protected]) &
Raunaq Malhotra ([email protected])
• TA Hours:
• 6-8pm Tuesdays (Raunaq)
• 6-8pm Wednesdays (Binh)
• 4-6pm Thursdays (Binh)
• If you have slides from prior runs, please delete them
• They’re out of date
• They sometimes result in people emailing *last year’s TAs* for help…
??? Textbook
• The originally intended text was Effective
Software Testing (2021) by Mauricio Aniche
• Roughly half the course is well-covered by
this book
• The other half of the course is not well-
covered by any generalist testing book
• Some material will only be in the lecture
notes
• Library link in syllabus; not the official text
due to… software bugs
Logistics
• Submissions either through BBLearn or Gradescope – TBD!
• Please make sure you hand in all files!
• Need help?
• Office hours: TAs or professor
• General questions about course material, assignments: Discord
• Other questions, like trouble specific to your code: don’t share in Discord!
• Email both the TAs and professor! This improves response time.
• Discord is just for our section of SE320
Lecture Logistics
• Questions during lecture?
• Raise hand
• Can just talk! “Excuse me, can you explain…”
• Please try to avoid eating during lecture if possible
• Generally distracting to peers
• Sometimes really is necessary if your schedule is extreme
Intended Audience
• This course is intended for undergraduate students in Software Engineering
and Computer Science
• If you’re from another major (DS, IS, Math, Game Design, etc.), welcome!
• Pre-requisite: CS260 Data Structures, CS181 Software Engineering
• If you’re here without one of these, email me ASAP
• If you need to brush up on Java, do so now
• Your first assignment goes out next week, though only uses basic Java
• Every year someone who needs to do this, and knows it, doesn’t do it and sets
themselves up for a rough term.
• This includes Java generics.
Attendance
• The University requires attendance
• But, DO NOT come to class if you’re sick!
• I realize these are conflicting messages. But really:
• By default come to class
• If you’re feeling under the weather, be responsible and stay home
• Drop me an email and I’ll make sure you can access the lecture recordings
Grading (Tentative!)
• 5 Assignments(80%)
• Assignment 1 (20%): Open/Closed-box Review & Extension
• Assignment 2 (20%): Design by Contract, Property-Based Testing, & Coverage
• Assignment 3 (20%): Object-Oriented Testing with Mocks (Mockito)
• Assignment 4 (10%): Performance Testing
• Assignment 5 (10%): GUI Testing with Selenium
• Final Report (20%)

Term Grades
• Points-to-letter conversion in the syllabus
• I do not intend to curve the course, homework, or final
Grading Rules
• All grades are final
• There will be no extra credit assignments
• All late work will receive a reduced grade
• -10% per week late
• Maximum of 2 weeks late (after that, no credit)
• Last two assignments are close to the end of the term, and therefore have reduced
late periods
• If you hand in before the deadline, you may not hand in again after
• No extensions will be given beyond the end of the term
• No collaboration is permitted during the exam, and assignments are
individual
Extensions
• I generally prefer not to grant extensions
• But:
• If you have a good reason (e.g., presenting at a conference, student research
competition, interview) and give sufficient notice, I’m open to extensions.
• The tentative deadlines (on Thursdays) are on the syllabus right now.
• If you ask for a last-minute extension for something you would have known about for
a while, you’ll not get an extension
• Of course there are always emergencies.
• My view is that there are many good reasons an extension may be needed,
and I don’t want you to be penalized for unexpected life events. If you think
you have a legitimate reason, ask! The worst I’ll say is no.
Extensions
• To give you a little wiggle-room: you have one no-questions-asked 3
day extension on any homework assignment.
• You use it by simply informing me that you’re planning to use it.
• No permission required!
• Applies to any assignment, even at end of term, and the late policy
starts from the end of the 3 days.
Academic Honesty
• The University’s academic honesty policy is in effect in this course.
Please consult the student handbook for details.
• Higher order bit: Do not hand in work that is not your own, or not
solely your own (modulo help from the professor and TA)
• If you’re not sure if something is cheating, ASK FIRST!
• You’re welcome to help each other understand assignments, but you
shouldn’t be working out pseudocode / test inputs together.
• Cheating is easier to catch than you think
• Even first-time TAs catch it

If you cheat in this class, you will fail the class.


Cheating vs. Extensions
I recognize that most cheating is not mere laziness, but a combination of:
• Didn’t realize how much work it was, started too late.
• Close to the deadline, things are a mess, better to cheat and get the grade
• Viewing things as: grades are most important because they unlock future
opportunities, and you can always learn stuff later
I recognize that these pressures exist, which is why there is a late policy, and
a fairly flexible extension policy.
My intent is for you to have enough legitimate flexibility that these shouldn’t
motivate you to cheat; hence the strict penalty.
Malicious Cheating
Of course, some cheating is intentional.

A few previous years’ students had their code public on Github.


• I have downloaded their code.
• I have emailed them to remove it.
• Github stats say some of those repositories had already been cloned!
All student submissions will be run through MOSS, including
comparisons to prior years’ homeworks, to find cheating.
• If you downloaded that code, best to just delete it.
AI Assistants: No
• Use of AI assistants in this class is considered cheating
• Reason: It is not indicative of your personal skills
• But this shouldn’t bother you: they actually suck at testing
• They are very big, complicated, interesting(!), expensive(!!), next-token-
predictors
• They do not “understand” your code or your spec
• All they can do is spit out code that is “similar to” other code near words
”similar to” what you ask it to do
• You’ll see that testing code thoroughly requires careful thought and
consideration about subtle boundary cases, what the code is supposed to do,
and how. These tools can’t do that.
But I used ChatGPT/Tabnine/etc. to test my code
in another class / on co-op and it worked!
• I’m not saying these tools are useless; in fact they can be quite useful
• But you can only spot them going off the rails if you know how to do
these things manually
• Plus, human beings are notoriously bad at code review!
• They can do pretty well for simple examples
• E.g., all the student course projects on Github used as training data
• They start to do poorly when faced with novel domains or properties
• Also: they sometimes memorize and regurgitate code. If you use an
LLM to generate code that happens to match public code on Github,
on my end this is indistinguishable from plagiarism
• This is not hypothetical; I have multiple colleagues who have encountered
this!
So Why Not Teach Us to Use Them
Effectively?
• We will loop back to these later in the term
• This course is unlike many other CS and SE courses
• The hard part is not writing the code
• The hard part is understanding how to think carefully about
specifications
• Jumping straight to AI assistants short-circuits that.
• Here’s a marketing video of someone using one of these tools to
generate a test that doesn’t make sense. They emailed me this today.
• https://www.youtube.com/watch?v=Xn0vPH0S3Tw&t=36s
• https://docs.tabnine.com/main/software-development-with-
tabnine/accelerate-unit-testing
Course Overview
What is This Course About?
Verification
How do we ensure software satisfies its requirements?

Validation
How do we ensure the software requirements satisfy the
system’s intended use?
In other words…
Are we building the product correctly?
Are we building the correct product?
Software
A software product is composed of more than just code:
• Administrator manuals
• End-user guides
• Training materials
• Data (databases, audio, video, localization. . . )
When we talk about validating software, we really mean all of these
things.
Software (cont.)
We’ll focus on just the software component: It’s the most technically
challenging.
In the real world, these other components matter as much or more
than the software, and they are not trivial! Respect the work it takes to
do those well!
• Can a non-technical user distinguish between incorrect
documentation, unusable interfaces, and broken functionality?
Who is “We”?
I’ve been speaking for some time now about things “we” can do. . .
who is this “we”?
• It’s not the royal “we”
• It’s not the academic “we”
• It’s US! As developers and testers. . .
Personnel Roles in Software
• Historically, software
development has been rigidly
structured
• Separate roles for manager,
architect, programmers, testers,

• Increasingly not the case
• *Especially* in smaller teams and
startups!
• So what’s a tester now?
You May Be A Tester If…
• Your job requirements include identifying software defects
• Your job requirements include producing working code
• Your job requirements include producing secure code
• Your job requirements include. . . code.

Today
This is most of a development team
How Important Is Software Testing?

What do you think?


Does testing catch bugs that matter?
Medical Systems
Some of the most serious software failures have occurred in medical
settings:
• The Therac-25 radiotherapy machine malfunctioned, causing massive
overdoses of radiation to patients. (More in a moment)
• Pacemakers and several hundred other medical devices have been
recalled due to faulty firmware/software
• Some contained security flaws that would allow a malicious passer-by to e.g.,
shut off or overload a pacemaker. . .
• Medication records used for distributing medication throughout a
hospital become inaccessible if e.g., the pharmacy database goes
down. . .
Therac-25 Radiation Therapy
• In 1986 and 1987, multiple cancer patients received overdoses up to
>100x what they were supposed to receive
• A number of them died
Therac-25 (cont.)
The cause?
• Earlier hardware versions had a hardware interlock that shut off the
machine if software requested a dangerous dose.
• Software on the earlier version never checked dosage safety;
hardware checks masked the software bug
• Newer hardware removed the check (cost…)
• The software was not properly tested on the new hardware
• Basis: it “worked” on the earlier hardware, which was almost the same
• Other issues contributed as well
• Nancy Leveson wrote the canonical report
Boeing 737 MAX
• Two Boeing 737 MAX planes crashed in 2019
• Many things went wrong, but software was one aspect
• MCAS (Maneuvering Characteristics Augmentation System) responds to a
sensor indicating plane’s angle
• If the plane angles up too far (risking a stall), MCAS automatically tilts the horizontal
tail to push the plane down
• Doing this repeatedly is obviously bad, so there is a limit on how much MCAS can
change this
• BUT: software reset the limit tracking every time a pilot made any response, allowing
MCAS to tilt the plane again
• Combined with faulty sensors....
• (There should have also been 2 sensors for redundancy, not 1)
Mars Climate Orbiter
• In 1999, NASA launched the Mars Climate Orbiter
• It cost $125 million (>222 million in 2022 USD)
• The spacecraft spent 286 days traveling to Mars
• Then it overshot...
• Lockheed Martin used English units
• NASA JPL used metric units
• The spec didn’t specify units, and nobody checked that the teams
agreed.
Safety-Critical Software
• These examples are, or are close to, being safety-critical systems
• Systems where malfunctions can harm people’s safety, directly
• Aerospace and nuclear reactors have clear safety consequences
• Design software for nuclear plants isn’t quite safety-critical, but close
• Phone systems are critical (think: emergency calls)
• Lots of other software still has major bugs
Knight Capital
• On August 1, 2012, Knight Capital deployed untested code to their
production high frequency trading servers.
• Well, 7 out of 8
• The update reused an old setting that previously enabled some code
to simulate market movements in testing
• When the “new” setting was enabled, it made the server with the old
code act as if the markets were highly volatile
• The resulting trades lost the company $440 million immediately
• They barely stayed in business after recruiting new investors
Heartbleed
• Classic buffer overrun found in 2014
• OpenSSL accepted heartbeat requests that asked
for too much data
• Server returned, e.g., private encryption keys
• Affected nearly every version of Linux (including
Android) — most computers on the internet
• Don’t worry, Mac got Shellshock a few months later
• And shortly thereafter, Windows suffered similar bugs
• Now all major bugs come with logos and catchy
names :-)
Ethereum “DAO Heist”
• Heard of cryptocurrency (e.g., Bitcoin?)
• Ethereum includes smart contracts — objects whose state and code is
stored in the blockchain
• Accounts can expend small amounts to interact with smart contracts
• Smart contracts can manage ether (currency)
• Someone built an automated investment contract
• Someone else figured out how to withdraw more than they invested, and
stole ~$150 million
• Cause: Allowing recursive calls to transfer before deducting from available
client balance
fMRI Bugs
• Eklund et al. discovered the statistics software used in most fMRI
studies and diagnoses was never properly tested
• Eklund, Nichols, and Knutsson. Cluster Failure: Why fMRI Inferences for
Spatial Extent have Inflated False-Positive Rates. PNAS July 2016.
• They found that errors in statistics packages (multiple) caused a high
number of false positives.
• This questions 25 years of fMRI research — over 40,000 studies! Not
to mention patient treatments. . .
Time Zones are Hard
Post Office Horizon Scandal
• In the UK, the postal system introduced Fujitsu’s Horizon system for various
bookkeeping tasks
• Among other things it tracked balance sheets
• And it reported a lot of shortfalls in accounts for local post offices!
• People claimed these were incorrect, that the accounts were correct, but were
ignored
• Nearly 1000 postmasters were prosecuted(!) for the shortfalls
• Turns out the UK prosecuted nearly 1000 people because the software was
buggy
• And parts of the post office and Fujitsu knew and covered it up
• Yesterday it became public that the system is still screwing up balances.
Benefits are Hard
• TennCare Connect was a system to make it easier to apply for
Tennessee’s Medicaid
• Except it denied an awful lot of people who were supposed to be
eligible
• This past summer (2024!) a judge ruled that this violated the law
Discussion…
Have you heard of other software bugs?
• In the media?
• From personal experience?
Does this embarrass you as a likely-future-software-engineer?
Defective Software
• We develop software that contains defects.
• It is likely the software we (including you!) will develop in the near
future will not be significantly better.
Back To Our Focus

What are things we — as testers — can do to ensure that the software


we develop will satisfy its requirements, and when the user uses the
software it will meet their actual needs?
Fundamental Factors in Software Quality
• Sound requirements
• Sound design
• Good programming practices
• Static analysis (code inspections, or via tools)
• Unit testing
• Integration testing
• System testing
Direct Impacts
Requirements & 3 major testing forms have direct impact
Sources of Problems
• Requirements Definition: Erroneous, incomplete, inconsistent
requirements
• Design: Fundamental design flaws
• Implementation: Mistakes in programming, or bugs in dependencies
• Support Systems: Poor programming languages, faulty compilers and
debuggers, misleading development tools
• Did you know compilers and operating systems have bugs, too?
• Inadequate Testing of Software: Incomplete testing, poor verification,
mistakes while debugging
• Evolution: Sloppy redevelopment or maintenance, introducing new flaws
while fixing old flaws, incrementally
Requirements
• The quality of the requirements plays a critical role in the final
product’s quality
• Remember verification and validation?
• Important questions to ask:
• What do we know about the requirements’ quality?
• What should we look for to make sure the requirements are good?
• What can we do to improve the quality of the requirements?
• We’ll say little about requirement engineering in this course
Specification
If you can’t say it, you can’t do it
You have to know what your product is before you can say
whether it has a bug

Have you heard…?


It’s a feature, not a bug!
Specification
A specification defines the product being created, and
includes:
• Functional Requirements that describe the features the
product will support.
• e.g., for a word processor, save, print, spell-check, font, etc.
capabilities
• Non-functional Requirements that constrain how the
product behaves
• Security, reliability, usability, platform
Software Bugs Occur When…
. . . at least one of these is true:
• The software does not do something that the specification
says it should
• The software does something the specification says it should
not do
• The software does not do something that the specification
does not mention, but should
• The software is difficult to understand, hard to use, slow, . . .
Many Bugs are Not From Coding Errors!
• Wrong specification?
• No way to write correct code
• Poor design?
• Good luck debugging
• Bad assumptions about your platform (OS), threat
model, network speed. . .
The Requirements Problem:
Standish Report (1995)

Major Source of Failure


Poor requirements engineering: roughly 50% of responses.
The Requirements Problem:
European Survey (1996)
• Coverage: 3800 European organizations, 17 countries
• Main software problems perceived to be in
• Requirements Specification: > 50%
• Requirements Evolution Management: 50%
The Requirements Problem Persists…

J. Maresco, IBM developerWorks, 2007


Relative Cost of Bugs
• Cost to fix a bug is said to increase exponentially over time (10^t )
• i.e., it increases tenfold as time increases
• E.g., a bug found during specification costs $1 to fix
• ... if found in design it costs $10 to fix
• ... if found in coding it costs $100 to fix
• … if found in released software it costs $1000 to fix
These are rule-of-thumb: different studies find different ratios, but the
average costs always increase.
Bug Free Software
Software is in the news for all the wrong reasons
• Security breaches, hackers getting credit card information,
hacked political emails, etc.
Why can’t developers just write software that works?
• As software gets more features and supports more
platforms, it becomes increasingly difficult to make it bug-
free.
Discussion
• Do you think bug free software is unattainable?
• Are there technical barriers that make this impossible?
• Is it just a question of time before we can do this?
• Are we missing technology or processes?
Formal Verification
• Use lots of math to prove properties about programs!
• Lots of math, but aided by computer reasoning
• The good:
• It can in principle eliminate any class of bugs you care to specify
• It works on real systems now (OS, compiler, distributed systems)
• The bad:
• Requires far more time/expertise than most have
• Verification tools are still software You’ll see
• Verified software is only as good as your spec! demos this
• Still not a good financial decision for most software
• Exceptions: safety-critical, reusable infrastructure term.
So, What Are We Doing?
• In general, it’s not yet practical to prove software
correct
• So what do we do instead?
• We collect evidence that software is correct
• Behavior on representative/important inputs (tests)
• Behavior under load (stress/performance testing)
• Stare really hard (code review)
• Run lightweight analysis tools without formal guarantees,
but which are effective at finding issues
Evidence
Why is this okay?

We can do this in a principled way that allows


us to gather strong evidence.
Goals of a Software Tester
• To find bugs
• To find them as early as possible
• To make sure they get fixed

Note
Doesn’t say eliminate all bugs. This would be wildly
unrealistic for the foreseeable future.
Software Engineering Process &
Testing Recap
Discussion
• What is software engineering?
• Where/when does testing occur in the software
development process?
Development Styles
• Code and Fix
• Waterfall
• Spiral
• Agile
• Scrum
• XP
• Test-Driven Development
• Behavior-Driven Development
Waterfall
Spiral
Corporate
Agile
A Grain of Salt
• Between Waterfall, Spiral, Agile, XP, Scrum, TDD, BDD, and dozens of
other approaches:
• Everyone says they do X
• Few do exactly X
• Most borrow main ideas from X and a few others, then adapt as needed to
their team, environment, or other influences
• But knowing the details of X is still important for communication, planning,
and understanding trade-offs
• Key element of success: adaptability
• The approaches that work well tend to assume bugs and requirements
changes will occur, so plan on revisiting old code
The Original Waterfall Picture

Royce, W. Managing the Development of Large Software


Systems. IEEE WESCON, 1970.
Describing the Original Waterfall Diagram
Immediately below that figure, in the original paper, is this:

Key Sentence
I believe in this concept, but the implementation described
above is risky and invites failure.
Waterfall Improved
Agile, TDD, Waterfall, Whatever: Same Goals
• Testing has the same goals in any process:
• Check that software behavior matches the spec
• Software needs to do the right thing at release
• Software needs to do (part of) the right thing for intermediate goals and
client check-ins
• Make sure software is a solid basis for further development
• Could be next test in TDD
• Could be next sprint in general agile approaches
• Could be next release if you do literal waterfall
Testing Vocabulary
An Overview of Testing
• We’ve already mentioned many types of testing in passing
• Unit tests
• Integration tests
• System tests
• Usability tests
• Performance tests
• Functional tests
• Nonfunctional tests
• What do these (and more) all mean?
• How do they fit together?
• To talk about these, we need to set out some terminology
Errors, Defects, and Failures
• Many software engineers use the following language to
distinguish related parts of software issues:
• An error is a mistake made by the developer, leading them to
produce incorrect code
• A defect is the problem in the code.
• This is what we commonly call a bug.
• A failure occurs when a defect/bug leads the software to exhibit
incorrect behavior
• Crashes
• Wrong output
• Leaking private information
Errors, Defects, and Failures (cont.)
• Not every defect leads to a failure!
• Some silently corrupt data for weeks and months, and maybe
eventually cause a failure
• Some teams use a distinct term for when a mistake leads to
incorrect internal behavior, separately from external behavior
• Some failures are not caused by defects!
• If you hold an incandescent light bulb next to a CPU, random bits
start flipping. . . .
Forcing Failures with a Light Bulb

From Govindavajhala & Appel’s


IEEE Security and Privacy 2003
paper, Using Memory Errors to
Attack a Virtual Machine.
Alternative Language
• This error/defect/failure terminology is not universal
• It is common
• What terminology you use isn’t really important, as long as
your team agrees
• The point of this terminology isn’t pedantry
• The point of this terminology is communication, which is more
important than particular terms
• In this course, we’ll stick to error/defect/failure
Open-box and Closed-box Testing
Open Box Testing (aka, Whitebox)
• Testing software with knowledge of its internals
• A developer-centric perspective
• Testing implementation details

Closed Box Testing (aka Blackbox)


• Testing software without knowledge of its internals
• A user-centric (external) perspective
• Testing external interface contract

These are complementary; we’ll discuss more next week.


Classifying Tests
There are two primary “axes” by which tests can be categorized:
• Test Levels describes the “level of detail” for a test: small
implementation units, combining subsystems, complete product
tests, or client-based tests for accepting delivery of software
• Test Types describe the goal for a particular test: to check
functionality, performance, security, etc.
Classifying Tests
Why Classify?
Before we get into the details of that table, why even care?
Having a systematic breakdown of the testing space helps:
• Planning — it provides a list of what needs to happen
• Different types of tests require different infrastructure
• Determines what tests can be run on developer machines, on
every commit, nightly, weekly, etc.
• Division of labor
• Different team members might be better at different types of
testing
Why Classify? (cont.)
• Exposes the option to skip some testing
• Never ideal, but under time constraints it provides a menu of
options
• Checking
• Can’t be sure at the end you’ve done all the testing you wanted, if
you didn’t know the options to start!
Test Levels
Four standard levels of detail:
• Unit
• Integration
• System
• Acceptance

Have you heard of these before?


(SE181 is a pre-req, so, I hope so)
Unit Tests
• Testing smallest “units of functionality”
• Intended to be fast (quick to run a single test)
• Goal is to run all unit tests frequently (e.g., every commit)
• Run by a unit testing framework
• Consistently written by developers, even when dedicated
testers exist
• Typically open box, but not always
• Any unit test of an internal interface is open box
• Testing external APIs can be closed box
Units of Functionality
Unit tests target small “units of functionality.” What’s that?
• Is it a method? A class?
• What if the method/class depends on other
methods/classes?
• Do we “stub them out” (more later)
• Do we just let them run?
• What if a well defined piece of functionality depends on
multiple classes?
There is no single right answer to these questions.
Guidelines for Unit Tests
• Well-defined single piece of functionality
• Functionality independent of environment
• Can be checked independently of other functionality
• i.e., if the test fails, you know precisely which functionality is
broken
Examples of “Definite” Unit Tests
• Insert an element into a data structure, check that it’s
present
• Pass invalid input to a method, check the error code or
exception is appropriate
• Specific way the input is invalid: out of bounds, wrong state, …
• Sort a collection, check that it’s sorted
Gray Areas
Larger pieces of functionality can still be unit tests, but may
be integration. Unfortunately, some “I know it when I see it”
Concrete Unit Test
@Test
public void testMin02() {
int a = 0, b = 2;
int m = min(a,b);
assertSame("min(0,2)␣is␣0", 0, m);
}
Challenges for Unit Tests
• Size and scope (as discussed)
• Speed
• How do you know you have enough?
• More on this with openbox testing / code coverage
• Might need stubs
• Might need to “mock up” expensive resources like disks,
databases, network
• Might need a way to test control logic without physical side effects
• More on this with dependencies / mocking
System Tests
• Testing overall system functionality, for a complete system
• Assumes all components already work well
• Reconciles software against top-level requirements
• Tests stem from concrete use cases in the requirements

But wait — we skipped a level!


Integration Tests
• A test checking that two “components” of a system work together
• Yes, this is vague
• Emphasizes checking that components implement their interfaces
correctly
• Not just Java interfaces, but the expected behavior of the component
• Testing combination of components that are larger than unit test
targets
• Not testing the full system
• Many tests end up as integration tests by process of elimination —
not a unit test, not a system test, and therefore an integration test.
Two Approaches to Integration
Big Bang
• Build everything.
• Test individually.
• Put it all together.
• Do system Tests
Incremental
• Test individual components
• Then pairs
• Then threes…
• Until you finish the system.
Acceptance Tests
• Performed by a customer / client / end user
• Testing to see if the customer believes the software is what
they wanted
• Also tied to requirements and use cases
Test Types
Test “types” classify the purpose of the test, rather than its scope or
mechanism. Let’s talk about:
• Functional testing
• Non-functional testing
• Performance testing
• Security testing
• Regression testing
Note these aren’t all mutually exclusive!
Functional Testing
• The default assumption in testing
• Functional testing verifies the software’s behavior matches
expectations.
• Also includes testing bad inputs, to check implicit
assumptions
• i.e., given nonsense input, the program should do “something
reasonable”
• i.e., the compiler shouldn’t delete your code if you have a type
error
• Cuts across all levels of testing, but heavy on unit testing
Nonfunctional Testing
• Roughly, testing things that are not “functionality”
• Generally, tests quality of the software, also called the “-ilities”
• Usability
• Reliability
• Maintainability
• Security
• Performance
Nonfunctional testing is an umbrella term for many test types.
Functional vs. Nonfunctional
Functional testing concerns what the software does.
Nonfunctional testing concerns how it does it.
Performance Testing (Nonfunctional)
Performance testing is, broadly, how quickly the software works. But it
includes a variety of subtypes:
• Performance Testing without further qualification test how fast the
software performs certain tasks
• Load Testing checks how the system performs with a high number of
users
• Stress Testing checks how the system handles having more
users/requests than it was designed for
• Spike Testing check how the system handles high stress that arrives
suddenly
This is a complex area. We’ll spend a lecture on the absolute basics
later this term, but it’s possible to run a whole course on this.
Security Testing (Nonfunctional)
Security testing ensures the system is secure, which has a more
nuanced meaning than most assume. A common model of “secure” is
the “CIA security triad:”
• Confidentiality
• Data confidentiality (private information stays private)
• Privacy (control over private data)
• Integrity
• Data integrity (reliable data storage)
• System integrity (system cannot be compromised/hacked)
• Availability
• Resources are available to authorized users and no one else
Another topic that could fill a whole course (or several).
Regression Testing
Making sure that code changes haven’t broken existing functionality,
performance, security, etc.

The Need for Regression Testing


It’s common to introduce new bugs while changing existing
code, whether fixing an earlier bug or adding a new feature.
• In practice, this means re-running tests after a code change
• With good test automation and good unit/integration/system/etc.
tests, this is literally running tests again after a change.
• Next week we’ll talk about continuous integration, which directly
addresses this
Testing Costs
We haven’t discussed the cost of tests — only a bit about their logistics
and purposes.
• Ideally we’d write tests for every conceivable thing, and re-run every
test on every change.
• Then we know immediately whether functionality was broken
• But nobody does this — why?
Testing Costs
• In general, there are always more bugs, but we can’t write tests
forever.
• Must prioritize likely scenarios (common use patterns) and high-risk
scenarios (e.g., security)
• Some exceedingly rare cases may not be tested! Maybe in V2...
• For large systems, running all tests takes too long.
• Running all tests for Microsoft Windows, end to end, on one machine, would
take months.
• This is infeasible to do for every change.
• A subset of fast tests (e.g., unit tests) is run on every change.
• Other tests are run nightly or weekly depending on cost.
Next Week
• Some systematic closed-box test data generation approaches
• Details of code coverage and its limits
• First homework goes out
• Project template uses Gradle
• We will be using Gradescope
• The project template has CI setups for Github & Gitlab

You might also like