Software QA and Testing Frequently-Asked-Questions, Part 1: What Is 'Software Quality Assurance'?
Software QA and Testing Frequently-Asked-Questions, Part 1: What Is 'Software Quality Assurance'?
What are some recent major computer system failures caused by software bugs?
• A May 2005 newspaper article reported that a major hybrid car manufacturer had
to install a software fix on 20,000 vehicles due to problems with invalid engine
warning lights and occasional stalling. In the article, an automative software
specialist indicated that the automobile industry spends $2 billion to $3 billion per
year fixing software problems.
• Media reports in January of 2005 detailed severe problems with a $170 million
high-profile U.S. government IT systems project. Software testing was one of the
five major problem areas according to a report of the commission reviewing the
project. In March of 2005 it was decided to scrap the entire project.
• In July 2004 newspapers reported that a new government welfare management
system in Canada costing several hundred million dollars was unable to handle a
simple benefits rate increase after being put into live operation. Reportedly the
original contract allowed for only 6 weeks of acceptance testing and the system
was never tested for its ability to handle a rate increase.
• Millions of bank accounts were impacted by errors due to installation of
inadequately tested software code in the transaction processing system of a major
North American bank, according to mid-2004 news reports. Articles about the
incident stated that it took two weeks to fix all the resulting errors, that additional
problems resulted when the incident drew a large number of e-mail phishing
attacks against the bank's customers, and that the total cost of the incident could
exceed $100 million.
• A bug in site management software utilized by companies with a significant
percentage of worldwide web traffic was reported in May of 2004. The bug
resulted in performance problems for many of the sites simultaneously and
required disabling of the software until the bug was fixed.
• According to news reports in April of 2004, a software bug was determined to be
a major contributor to the 2003 Northeast blackout, the worst power system
failure in North American history. The failure involved loss of electrical power to
50 million customers, forced shutdown of 100 power plants, and economic losses
estimated at $6 billion. The bug was reportedly in one utility company's vendor-
supplied power monitoring and management system, which was unable to
correctly handle and report on an unusual confluence of initially localized events.
The error was found and corrected after examining millions of lines of code.
• In early 2004, news reports revealed the intentional use of a software bug as a
counter-espionage tool. According to the report, in the early 1980's one nation
surreptitiously allowed a hostile nation's espionage service to steal a version of
sophisticated industrial software that had intentionally-added flaws. This
eventually resulted in major industrial disruption in the country that used the
stolen flawed software.
• A major U.S. retailer was reportedly hit with a large government fine in October
of 2003 due to web site errors that enabled customers to view one anothers' online
orders.
• News stories in the fall of 2003 stated that a manufacturing company recalled all
their transportation products in order to fix a software problem causing instability
in certain circumstances. The company found and reported the bug itself and
initiated the recall procedure in which a software upgrade fixed the problems.
• In August of 2003 a U.S. court ruled that a lawsuit against a large online
brokerage company could proceed; the lawsuit reportedly involved claims that the
company was not fixing system problems that sometimes resulted in failed stock
trades, based on the experiences of 4 plaintiffs during an 8-month period. A
previous lower court's ruling that "...six miscues out of more than 400 trades does
not indicate negligence." was invalidated.
• In April of 2003 it was announced that a large student loan company in the U.S.
made a software error in calculating the monthly payments on 800,000 loans.
Although borrowers were to be notified of an increase in their required payments,
the company will still reportedly lose $8 million in interest. The error was
uncovered when borrowers began reporting inconsistencies in their bills.
• News reports in February of 2003 revealed that the U.S. Treasury Department
mailed 50,000 Social Security checks without any beneficiary names. A
spokesperson indicated that the missing names were due to an error in a software
change. Replacement checks were subsequently mailed out with the problem
corrected, and recipients were then able to cash their Social Security checks.
• In March of 2002 it was reported that software bugs in Britain's national tax
system resulted in more than 100,000 erroneous tax overcharges. The problem
was partly attributed to the difficulty of testing the integration of multiple
systems.
• A newspaper columnist reported in July 2001 that a serious flaw was found in off-
the-shelf software that had long been used in systems for tracking certain U.S.
nuclear materials. The same software had been recently donated to another
country to be used in tracking their own nuclear materials, and it was not until
scientists in that country discovered the problem, and shared the information, that
U.S. officials became aware of the problems.
• According to newspaper stories in mid-2001, a major systems development
contractor was fired and sued over problems with a large retirement plan
management system. According to the reports, the client claimed that system
deliveries were late, the software had excessive defects, and it caused other
systems to crash.
• In January of 2001 newspapers reported that a major European railroad was hit by
the aftereffects of the Y2K bug. The company found that many of their newer
trains would not run due to their inability to recognize the date '31/12/2000'; the
trains were started by altering the control system's date settings.
• News reports in September of 2000 told of a software vendor settling a lawsuit
with a large mortgage lender; the vendor had reportedly delivered an online
mortgage processing system that did not meet specifications, was delivered late,
and didn't work.
• In early 2000, major problems were reported with a new computer system in a
large suburban U.S. public school district with 100,000+ students; problems
included 10,000 erroneous report cards and students left stranded by failed class
registration systems; the district's CIO was fired. The school district decided to
reinstate it's original 25-year old system for at least a year until the bugs were
worked out of the new system by the software vendors.
• In October of 1999 the $125 million NASA Mars Climate Orbiter spacecraft was
believed to be lost in space due to a simple data conversion error. It was
determined that spacecraft software used certain data in English units that should
have been in metric units. Among other tasks, the orbiter was to serve as a
communications relay for the Mars Polar Lander mission, which failed for
unknown reasons in December 1999. Several investigating panels were convened
to determine the process failures that allowed the error to go undetected.
• Bugs in software supporting a large commercial high-speed data network affected
70,000 business customers over a period of 8 days in August of 1999. Among
those affected was the electronic trading system of the largest U.S. futures
exchange, which was shut down for most of a week as a result of the outages.
• In April of 1999 a software bug caused the failure of a $1.2 billion U.S. military
satellite launch, the costliest unmanned accident in the history of Cape Canaveral
launches. The failure was the latest in a string of launch failures, triggering a
complete military and industry review of U.S. space launch programs, including
software integration and testing processes. Congressional oversight hearings were
requested.
• A small town in Illinois in the U.S. received an unusually large monthly electric
bill of $7 million in March of 1999. This was about 700 times larger than its
normal bill. It turned out to be due to bugs in new software that had been
purchased by the local power company to deal with Y2K software issues.
• In early 1999 a major computer game company recalled all copies of a popular
new product due to software problems. The company made a public apology for
releasing a product before it was ready.
• The computer system of a major online U.S. stock trading service failed during
trading hours several times over a period of days in February of 1999 according to
nationwide news reports. The problem was reportedly due to bugs in a software
upgrade intended to speed online trade confirmations.
• In April of 1998 a major U.S. data communications network failed for 24 hours,
crippling a large part of some U.S. credit card transaction authorization systems as
well as other large U.S. bank, retail, and government data systems. The cause was
eventually traced to a software bug.
• January 1998 news reports told of software problems at a major U.S.
telecommunications company that resulted in no charges for long distance calls
for a month for 400,000 customers. The problem went undetected until customers
called up with questions about their bills.
• In November of 1997 the stock of a major health industry company dropped 60%
due to reports of failures in computer billing systems, problems with a large
database conversion, and inadequate software testing. It was reported that more
than $100,000,000 in receivables had to be written off and that multi-million
dollar fines were levied on the company by government agencies.
• A retail store chain filed suit in August of 1997 against a transaction processing
system vendor (not a credit card company) due to the software's inability to
handle credit cards with year 2000 expiration dates.
• In August of 1997 one of the leading consumer credit reporting companies
reportedly shut down their new public web site after less than two days of
operation due to software problems. The new site allowed web site visitors instant
access, for a small fee, to their personal credit reports. However, a number of
initial users ended up viewing each others' reports instead of their own, resulting
in irate customers and nationwide publicity. The problem was attributed to
"...unexpectedly high demand from consumers and faulty software that routed the
files to the wrong computers."
• In November of 1996, newspapers reported that software bugs caused the 411
telephone information system of one of the U.S. RBOC's to fail for most of a day.
Most of the 2000 operators had to search through phone books instead of using
their 13,000,000-listing database. The bugs were introduced by new software
modifications and the problem software had been installed on both the production
and backup systems. A spokesman for the software vendor reportedly stated that
'It had nothing to do with the integrity of the software. It was human error.'
• On June 4 1996 the first flight of the European Space Agency's new Ariane 5
rocket failed shortly after launching, resulting in an estimated uninsured loss of a
half billion dollars. It was reportedly due to the lack of exception handling of a
floating-point error in a conversion from a 64-bit integer to a 16-bit signed
integer.
• Software bugs caused the bank accounts of 823 customers of a major U.S. bank to
be credited with $924,844,208.32 each in May of 1996, according to newspaper
reports. The American Bankers Association claimed it was the largest such error
in banking history. A bank spokesman said the programming errors were
corrected and all funds were recovered.
• Software bugs in a Soviet early-warning monitoring system nearly brought on
nuclear war in 1983, according to news reports in early 1999. The software was
supposed to filter out false missile detections caused by Soviet satellites picking
up sunlight reflections off cloud-tops, but failed to do so. Disaster was averted
when a Soviet commander, based on what he said was a '...funny feeling in my
gut', decided the apparent missile attack was a false alarm. The filtering software
code was rewritten.
In some cases an IT organization may be too small or new to have a testing staff even if
the situation calls for it. In these circumstances it may be appropriate to instead use
contractors or outsourcing, or adjust the project management and development approach
(by switching to more senior developers and agile test-first development, for example).
Inexperienced managers sometimes gamble on the success of a project by skipping
thorough testing or having programmers do post-development functional testing of their
own work, a decidedly high risk gamble.
For non-trivial-size projects or projects with non-trivial risks, a testing staff is usually
necessary. As in any business, the use of personnel with specialized skills enhances an
organization's ability to be successful in large, complex, or difficult tasks. It allows for
both a) deeper and stronger skills and b) the contribution of differing perspectives. For
example, programmers typically have the perspective of 'what are the technical issues in
making this functionality work?'. A test engineer typically has the perspective of 'what
might go wrong with this functionality, and how can we ensure it meets expectations?'.
Technical people who can be highly effective in approaching tasks from both of those
perspectives are rare, which is why, sooner or later, organizations bring in test specialists.
• A lot depends on the size of the organization and the risks involved. For large
organizations with high-risk (in terms of lives or property) projects, serious
management buy-in is required and a formalized QA process is necessary.
• Where the risk is lower, management and organizational buy-in and QA
implementation may be a slower, step-at-a-time process. QA processes should be
balanced with productivity so as to keep bureaucracy from getting out of hand.
• For small groups or projects, a more ad-hoc process may be appropriate,
depending on the type of customers and projects. A lot will depend on team leads
or managers, feedback to developers, and ensuring adequate communications
among customers, managers, developers, and testers.
• The most value for effort will often be in (a) requirements management processes,
with a goal of clear, complete, testable requirement specifications embodied in
requirements or design documentation, or in 'agile'-type environments extensive
continuous coordination with end-users, (b) design inspections and code
inspections, and (c) post-mortems/retrospectives.
• Other possibilities include incremental self-managed team approaches such as
'Kaizen' methods of continuous process improvement, the Deming-Shewhart
Plan-Do-Check-Act cycle, and others.
Also see 'How can QA processes be implemented without reducing productivity?' in the
LFAQ section.
(See the Bookstore section's 'Software QA', 'Software Engineering', and 'Project
Management' categories for useful books with more information.)
What is a 'walkthrough'?
A 'walkthrough' is an informal meeting for evaluation or informational purposes. Little or
no preparation is usually required.
What's an 'inspection'?
An inspection is more formalized than a 'walkthrough', typically with 3-8 people
including a moderator, reader, and a recorder to take notes. The subject of the inspection
is typically a document such as a requirements spec or a test plan, and the purpose is to
find problems and see what's missing, not to fix anything. Attendees should prepare for
this type of meeting by reading thru the document; most problems will be found during
this preparation. The result of the inspection meeting should be a written report.
Thorough preparation for inspections is difficult, painstaking work, but is one of the most
cost effective methods of ensuring quality. Employees who are most skilled at
inspections are like the 'eldest brother' in the parable in 'Why is it often hard for
organizations to get serious about quality assurance?'. Their skill may have low visibility
but they are extremely valuable to any software development organization, since bug
prevention is far more cost-effective than bug detection.
• Black box testing - not based on any knowledge of internal design or code. Tests
are based on requirements and functionality.
• White box testing - based on knowledge of the internal logic of an application's
code. Tests are based on coverage of code statements, branches, paths, conditions.
• unit testing - the most 'micro' scale of testing; to test particular functions or code
modules. Typically done by the programmer and not by testers, as it requires
detailed knowledge of the internal program design and code. Not always easily
done unless the application has a well-designed architecture with tight code; may
require developing test driver modules or test harnesses.
• incremental integration testing - continuous testing of an application as new
functionality is added; requires that various aspects of an application's
functionality be independent enough to work separately before all parts of the
program are completed, or that test drivers be developed as needed; done by
programmers or by testers.
• integration testing - testing of combined parts of an application to determine if
they function together correctly. The 'parts' can be code modules, individual
applications, client and server applications on a network, etc. This type of testing
is especially relevant to client/server and distributed systems.
• functional testing - black-box type testing geared to functional requirements of an
application; this type of testing should be done by testers. This doesn't mean that
the programmers shouldn't check that their code works before releasing it (which
of course applies to any stage of testing.)
• system testing - black-box type testing that is based on overall requirements
specifications; covers all combined parts of a system.
• end-to-end testing - similar to system testing; the 'macro' end of the test scale;
involves testing of a complete application environment in a situation that mimics
real-world use, such as interacting with a database, using network
communications, or interacting with other hardware, applications, or systems if
appropriate.
• sanity testing or smoke testing - typically an initial testing effort to determine if a
new software version is performing well enough to accept it for a major testing
effort. For example, if the new software is crashing systems every 5 minutes,
bogging down systems to a crawl, or corrupting databases, the software may not
be in a 'sane' enough condition to warrant further testing in its current state.
• regression testing - re-testing after fixes or modifications of the software or its
environment. It can be difficult to determine how much re-testing is needed,
especially near the end of the development cycle. Automated testing tools can be
especially useful for this type of testing.
• acceptance testing - final testing based on specifications of the end-user or
customer, or based on use by end-users/customers over some limited period of
time.
• load testing - testing an application under heavy loads, such as testing of a web
site under a range of loads to determine at what point the system's response time
degrades or fails.
• stress testing - term often used interchangeably with 'load' and 'performance'
testing. Also used to describe such tests as system functional testing while under
unusually heavy loads, heavy repetition of certain actions or inputs, input of large
numerical values, large complex queries to a database system, etc.
• performance testing - term often used interchangeably with 'stress' and 'load'
testing. Ideally 'performance' testing (and any other 'type' of testing) is defined in
requirements documentation or QA or Test Plans.
• usability testing - testing for 'user-friendliness'. Clearly this is subjective, and will
depend on the targeted end-user or customer. User interviews, surveys, video
recording of user sessions, and other techniques can be used. Programmers and
testers are usually not appropriate as usability testers.
• install/uninstall testing - testing of full, partial, or upgrade install/uninstall
processes.
• recovery testing - testing how well a system recovers from crashes, hardware
failures, or other catastrophic problems.
• failover testing - typically used interchangeably with 'recovery testing'
• security testing - testing how well the system protects against unauthorized
internal or external access, willful damage, etc; may require sophisticated testing
techniques.
• compatability testing - testing how well software performs in a particular
hardware/software/operating system/network/etc. environment.
• exploratory testing - often taken to mean a creative, informal software test that is
not based on formal test plans or test cases; testers may be learning the software
as they test it.
• ad-hoc testing - similar to exploratory testing, but often taken to mean that the
testers have significant understanding of the software before testing it.
• context-driven testing - testing driven by an understanding of the environment,
culture, and intended use of software. For example, the testing approach for life-
critical medical equipment software would be completely different than that for a
low-cost computer game.
• user acceptance testing - determining if software is satisfactory to an end-user or
customer.
• comparison testing - comparing software weaknesses and strengths to competing
products.
• alpha testing - testing of an application when development is nearing completion;
minor design changes may still be made as a result of such testing. Typically done
by end-users or others, not by programmers or testers.
• beta testing - testing when development and testing are essentially completed and
final bugs and problems need to be found before final release. Typically done by
end-users or others, not by programmers or testers.
• mutation testing - a method for determining if a set of test data or test cases is
useful, by deliberately introducing various code changes ('bugs') and retesting
with the original test data/cases to determine if the 'bugs' are detected. Proper
implementation requires large computational resources.
(See the Bookstore section's 'Software Testing' category for useful books on Software
Testing.)
• poor requirements - if requirements are unclear, incomplete, too general, and not
testable, there will be problems.
• unrealistic schedule - if too much work is crammed in too little time, problems are
inevitable.
• inadequate testing - no one will know whether or not the program is any good
until the customer complains or systems crash.
• featuritis - requests to pile on new features after development is underway;
extremely common.
• miscommunication - if developers don't know what's needed or customer's have
erroneous expectations, problems are guaranteed.
(See the Bookstore section's 'Software QA', 'Software Engineering', and 'Project
Management' categories for useful books with more information.)
(See the Bookstore section's 'Software QA', 'Software Engineering', and 'Project
Management' categories for useful books with more information.)
• the program should act in a way that least surprises the user
• it should always be evident to the user what can be done next and how to exit
• the program shouldn't let the users do something stupid without warning them.