Common Soft Errors
Common Soft Errors
Catalogue of
Common Software Errors
Common Software Errors
8.4 ASSUMPTION THAT INTERRUPTS WILL NOT OCCUR DURING A BRIEF INTERVAL ............... 53
8.5 RESOURCE RACES: THE RESOURCE HAS JUST BECOME UNAVAILABLE ................................ 53
8.6 ASSUMPTION THAT A PERSON, DEVICE, OR PROCESS WILL RESPOND QUICKLY ................ 53
8.7 OPTIONS OUT OF SYNCH DURING A DISPLAY CHANGE.............................................................. 54
8.8 TASK STARTS BEFORE ITS PREREQUISITES ARE MET ................................................................. 54
8.9 MESSAGES CROSS OR DO NOT ARRIVE IN THE ORDER SENT ................................................... 54
9. LOAD CONDITIONS .................................................................................................................................... 55
9.1 REQUIRED RESOURCE NOT AVAILABLE ......................................................................................... 55
9.2 DOES NOT RETURN A RESOURCE ..................................................................................................... 55
1) Does not indicate that it is done with a device ........................................................................................ 55
2) Does not erase old files from mass storage ............................................................................................. 55
3) Does not return unused memory ............................................................................................................. 55
4) Wastes computer time ............................................................................................................................. 56
9.3 NO AVAILABLE LARGE MEMORY AREAS ........................................................................................ 56
9.4 INPUT BUFFER OR QUEUE NOT DEEP ENOUGH ............................................................................ 56
9.5 DOES NOT CLEAR ITEMS FROM QUEUE, BUFFER, OR STACK ................................................... 56
9.6 LOST MESSAGES ................................................................................................................................... 56
9.7 PERFORMANCE COSTS ........................................................................................................................ 57
9.8 RACE CONDITION WINDOWS EXPAND............................................................................................ 57
9.9 DOES NOT ABBREVIATE UNDER LOAD ........................................................................................... 57
9.10 DOES NOT RECOGNIZE THAT ANOTHER PROCESS ABBREVIATES OUTPUT UNDER LOAD
......................................................................................................................................................................... 57
9.11 LOW PRIORITY TASKS NOT PUT OFF .............................................................................................. 57
9.12 LOW PRIORITY TASKS NEVER DONE ............................................................................................. 57
10. HARDWARE ................................................................................................................................................ 58
10. 1 WRONG DEVICE ................................................................................................................................. 58
10. 2 WRONG DEVICE ADDRESS .............................................................................................................. 58
10. 3 DEVICE UNAVAILABLE ..................................................................................................................... 58
10. 4 DEVICE RETURNED TO WRONG TYPE OF POOL ........................................................................ 58
10. 5 DEVICE USE FORBIDDEN TO CALLER .......................................................................................... 58
10. 6 SPECIFIES WRONG PRIVILEGE LEVEL FOR A DEVICE .............................................................. 58
10.7 NOISY CHANNEL ................................................................................................................................. 58
10.8 CHANNEL GOES DOWN ..................................................................................................................... 58
10.9 TIME-OUT PROBLEMS........................................................................................................................ 59
10.10 WRONG STORAGE DEVICE ............................................................................................................. 59
10.11 DOES NOT CHECK DIRECTORY OF CURRENT DISK .................................................................. 59
10.12 DOES NOT CLOSE A FILE ................................................................................................................. 59
10.13 UNEXPECTED END OF FILE ............................................................................................................ 59
10.14 DISK SECTOR BUGS AND OTHER LENGTH-DEPENDENT ERRORS ........................................ 59
10.15 WRONG OPERATION OR INSTRUCTION CODES......................................................................... 59
10.16 MISUNDERSTOOD STATUS OR RETURN CODE .......................................................................... 60
10.17 DEVICE PROTOCOL ERROR ............................................................................................................ 60
10.18 UNDERUTILIZES DEVICE INTELLIGENCE................................................................................... 60
10.19 PAGING MECHANISM IGNORED OR MISUNDERSTOOD .......................................................... 60
10.20 IGNORES CHANNEL THROUGHPUT LIMITS ................................................................................ 60
10.21 ASSUMES DEVICE IS OR IS NOT, OR SHOULD BE OR SHOULD NOT BE INITIALIZED ...... 61
10.22 ASSUMES PROGRAMMABLE FUNCTION KEYS ARE PROGRAMMED CORRECTLY ........... 61
11. SOURCE, VERSION AND ID CONTROL.................................................................................................. 62
11.1 OLD BUGS MYSTERIOUSLY REAPPEAR......................................................................................... 62
11.2 FAILURE TO UPDATE MULTIPLE COPIES OF DATA OR PROGRAM FILES ............................... 62
11.3 No TITLE ................................................................................................................................................ 62
11.4 NO VERSION ID .................................................................................................................................... 62
11.5 WRONG VERSION NUMBER ON THE TITLE SCREEN .................................................................. 62
Common Software Errors
1) Excessive functionality
This is an error. It is the hardest one to convince people not to make. Systems that try to do too much are hard
to learn and easy to forget how to use. They lack conceptual unity. They require too much documentation, too
many help screens, and too much information per topic. Performance is poor. User errors are likely but the
error messages are too general. Here is our rule of thumb: A system’s level of functionality is out of control if
the presence of rarely used features significantly complicates the use of basic features.
4) Missing function
A function was not implemented even though it was in the external specification or is “obviously” desirable.
5) Wrong function
A function that should do one thing (perhaps defined in a specification) does something else.
1) Missing information
Anything you must know should be available onscreen. Onscreen access to any other information that the
average user would find useful is also desirable.
2) No onscreen instructions
How do you find out the name of the program, how to exit it, and what key(s) to press for Help? If it uses a
command language, how do you find the list of commands? The program might display this information only
when it starts. However it does it, you should not have to look in a manual to find the answers to questions like
these.
4) Undocumented features
If most features or commands are documented onscreen, all should be, skipping only a few causes much
confusion. Similarly, if the program describes “special case” behavior for many commands, it should document
them all.
6) No cursor
People rely on the cursor. It points to the place on the screen where they should focus attention. It can also
show that the computer is still active and “listening.” Every interactive program should show the cursor and
display a salient message when turning the cursor off.
10) Failure to check for the same document being opened more than once
Program that allows user to open multiple documents must check for the same document being opened more
than once. Otherwise, the user will not be able to keep track of the changes made to the documents since they
all have the same name. For example, the file My_Doc is open, if the user attempts to open My_Doc again,
there must be the way for users to identify the first My_Doc versus the second one. A typical method for
keeping track is to append a number after the file name such as My_Doc:1 and My_Doc:2 for the first and
second file respectively. An alternative method is not to allow the same file to be opened twice.
23) Verbosity
Messages must be short and simple. Harried readers are infuriated by chatty technobabble. When some users
need much more information than others, it is common to give access to further information by a menu. Let
people choose where and how much further they want to investigate.
1) Inconsistencies
Increasing the number of always-true rules shortens learning time and documentation and makes the program
more professional-looking. Inconsistencies are so common because it takes planning and agony to choose a rule
of operation that can always be followed. It is so tempting to do things differently now and again. Each minor
inconsistency seems insignificant, but together they quickly make an otherwise well conceived product hard to
use. It is good testing practice to flag all inconsistencies, no matter how minor.
Common Software Errors (Part 1):
Interface Errors
2) Optimizations
Programmers deliberately introduce inconsistencies to optimize a program. Optimizations are tempting since
they tailor the program to your most likely present need. But each new inconsistency brings complexity with it.
Make the programmer aware of the tradeoff in each case. Is saving a keystroke or two worth the increase in
learning time or the decrease in trust? Usually not.
3) Inconsistent syntax
Syntactic details should be easily learned. You should be able to stop thinking about them. Syntax of all
commands should be consistent throughout the program. Syntax includes such things as:
The order in which you specify source and destination locations (copy from source to destination or
copy to destination from source).
The type of separators used (spaces, commas, semicolons, slashes, etc.).
The location of operators (infix (A+B), prefix (+AB), postfix (AB+)).
5) Inconsistent abbreviations
Without clear-cut abbreviation rules, abbreviations cannot be easily remembered. Abbreviating delete to
del but list to ls makes no sense. Each choice is fine individually, but the collection is an ill-conceived
mess of special cases.
9) Inconsistent capitalization
If command entry is case sensitive, first letters of all commands should all be capitalized or none should be.
First letters of embedded words in commands should always or never be capitalized.
data and <F2> deletes, other times <F1> deletes and <F2> saves) are unacceptable.
15) Time-wasters
Programs that seem designed to waste your time only and doing nothing but flashy animation, etc.
20) Menus
Menus should be simple, but they become complex when there are poor icons or command names and when
choices hide under non-obvious topic headings. The more commands a menu covers, the more complex it will
be no matter how well planned it is. But without planning, complex menus can become disasters.
1) State transitions
Most programs move from state to state. The program is in one state before you choose a menu item or issue a
command. It moves into another state in response to your choice. Programmers usually test their code well
Common Software Errors (Part 1):
Interface Errors
enough to confirm that you can reach any state that you should be able to reach. They do not always let you
change your mind, once you have chosen a state.
5) Cannot pause
Some programs limit the time you have to enter data. When the time is up, the program changes state. It might
display help text or accept a displayed “default” value, or it may log you off. Although time limits can be useful,
people do get interrupted. You should be able to tell it that you are taking a break, and when you get back you
will want it in the same state it is in now.
6) Disaster prevention
System failures and user errors happen. Programs should minimize the consequences of them.
7) No backup facility
It should be easy to make an extra copy of a file. If you are changing a file, the computer should keep a copy of
the original (or make it easy for you to tell it to keep it) so you have a known good version to return to if your
changes go away.
8) No undo
Undo lets you retract a command, typically any command, or a group of them. Undelete is a restricted case of
undo that lets you recover data deleted in error. Undo is desirable. Undelete is essential.
1) User tailorability
You should be able to change minor and arbitrary aspects of the program’s user interface with a minimum of
fuss and bother.
Common Software Errors (Part 1):
Interface Errors
1) Slow program
Many design and code errors can slow a program. The program might do unnecessary work, such as initializing
an area of memory that will be overwritten before being read. It might repeat work unnecessarily, such as doing
something inside a loop that could be done outside of it. Design decisions also slow the program, often more
than the obvious errors.
Whatever the reason for the program being slow, if it is, it is a problem. Delays as short as a quarter of a
second can break your concentration, and substantially increase your time to finish a task.
2) Slow echoing
The program should display inputs immediately. If you notice a lag between the time you type a letter and the
time you see it, the program is too slow. You will be much more likely to make mistakes. Fast feedback is
essential for any input event, including moving mice, trackballs, and light pens.
significance.
4) Poor responsiveness
A responsive program does not force you to wait before issuing your next command. It constantly scans for
keyboard (or other) input, acknowledges commands quickly, and assigns them high priority. For example, type
a few lines of text while your word processor is reformatting the screen. It should stop formatting, echo the
input, format the display of these lines as you enter them, and execute your editing commands. It should keep
the area of the screen near the cursor up to date. The rest is lower priority since you are not working with it at
this instant. The program can update the rest of the display when you stop typing. Responsive programs feel
faster.
5) No type-ahead
A program that allows type-ahead lets you keep typing while it goes about other business. It remembers what
you typed and displays and executes it later. You should not have to wait to enter the next command.
7) No progress reports
For long tasks or delays, it is very desirable to indicate how much has been done and how much longer the
machine will be tied up.
2. ERROR HANDLING
Errors in dealing with errors are among the most common bugs. Error handling errors include failure to
anticipate the possibility of errors and protect against them, failure to notice error conditions, and failure to deal
with detected errors in a reasonable way. Note that error messages were discussed above.
1) Ignores overflow
An overflow condition occurs when the result of a numerical calculation is too big for the program to handle.
Overflows arise from adding and multiplying large numbers and from dividing by zero or by tiny fractions.
Overflows are easy to detect, but the program does have to check for them, and some do not.
6) Data comparisons
When you try to balance your checkbook, you have the number you think is your balance and the number the
bank tells you is your balance. If they do not agree after you allow for service charges, recent checks, and so
forth, there is an error in your records, the bank’s, or both. Similar opportunities frequently arise to check two
sets of data or two sets of calculations against each other. The program should take advantage of them.
5) Aborting errors
You stop the program or it stops itself when it detects an error. Does it close any open output files? Does it log
the cause of the exit on its way down? In the most general terms, does it tidy up before dying or does it just die
and maybe leave a big mess?
3. BOUNDARY-RELATED ERRORS
A boundary describes a change-point for a program. The program is supposed to work one way for anything on
one side of the boundary. It does something different for anything on the other side.
The classic “things” on opposite sides of boundaries are data values. There are three standard boundary bugs:
Mishandling of the boundary case: If a program adds any two numbers that are less than 100, and
rejects any greater than 100, what does it do when you enter exactly 100? What is it supposed to do?
Wrong boundary: The specification says the program should add any two numbers less than 100 but
it rejects anything greater than 95.
Mishandling of cases outside the boundary: Values on one side of the boundary are impossible,
unlikely, unacceptable, unwanted. No code was written for them. Does the program successfully
reject values greater than 100 or does it crash when it gets one?
We treat the concept of boundaries more broadly. Boundaries describe a way of thinking about a program
and its behavior around its limits. There are many types of limits: largest, oldest, latest, longest, most recent,
first time, etc. The same types of bugs can happen with any of them so why not think of them in the same
terms?
The program keeps printing and adding 1 to COUNT_VARIABLE until the counter finally reaches 45. Then the
program quits. 45 bounds the loop. Loops can have lower as well as upper bounds (IF COUNT_VARIABLE is
less than 45 and greater than 10).
4. CALCULATION ERRORS
The program calculates a number and gets the wrong result. This can happen for one of three types of reasons:
Bad logic: There can be a typing error, like A-B instead of A+B. Or the programmer might break a
complex expression into a set of simpler ones, but get the simplification wrong. Or he might use an
incorrect formula, or one inapplicable to the data at hand. This third case is a design error. The code
does what the programmer intended it is his conception of what the code should do that is wrong.
Bad arithmetic: There might be an error in the coding of a basic function, such as addition,
multiplication, or exponentiation. The error might show up whenever the function is used (2 + 2 = -
5) or it might be restricted to rare special cases. In either case, any program that uses the function can
fail.
Imprecise calculation: If the program uses floating point arithmetic, it loses precision as it calculates,
because of round off and truncation errors. After many intermediate errors, it may claim that 2 + 2
works out to -5 even though none of the steps in the program contains a logical error.
For example, suppose the program stores all numbers in fixed point format, with one byte per number. It works
with numbers from 0 to 255. It cannot add 255 + 255 because the result is too large to fit in one byte.
Overflows also occur in floating point arithmetic, when the exponent is too large.
Underflows occur only in floating point calculations. In floating point, a number is represented by a pair of
values, one for the exponent the other for a fraction. For example, 255 is 0.255 times 103. 255,000 is 0.255
times 106. The exponent changes, but the fractional part (0.255) is the same in both cases. Now, suppose the
program allocates a byte for the exponent, and stores values of 0 to 255. What happens if the exponent is (-1)?
E.g., 0.255 * 10-1 is 0.0255? This is too small to be stored (because the smallest exponent we can store in
this scheme is 0), so we have an underflow. Underflows are usually converted to 0 (0.255 * 10-1 becomes 0),
without an error message. This is usually appropriate, but it can lead to computational errors:
Is 100 * 0.255 * 10-1 zero or 2.55?
are also traditional. They will keep appearing in textbooks and programs for years. As a common example, if
you are graphing a set of data and want to fit a curve to them that has the form Y = axb, it is traditional to take
logarithms of everything in sight, estimating a and b by fitting a line to the new function,
log Y = log a + b log x.
This is easy to program, quick to run, and inaccurate. When you return from logarithms and plot axb, the curve
fits data on the left of the figure (small values of x) better than data toward the right.
Many programs use bad approximation methods and other incorrect mathematical procedures. They might
print impressive output, but it is wrong output. You cannot test for these types of problems unless you
understand a fair bit of the mathematics yourself. If you are testing a statistical package or other mathematical
package, it is essential that you or another tester have a detailed understanding of the functions being
programmed.
Common Software Errors (Part 5):
Initial and Later States
1) GOTO somewhere
GOTO transfers control to another part of the program. The program jumps to the specified routine, but this is
obviously the wrong place. The program may lock, the screen display may be inappropriate, etc.
The GOTO command is unfashionable. The structured programming movement is centered on a belief that
GOTO encourages sloppy thinking and coding.
Errors involving GOTO are especially likely when:
The program branches backward, going somewhere it has been before. For example, the GOTO may
jump to a point just past validity checking or initialization of data or devices.
The GOTO is indirect, going to an address stored in a variable. When the variable’s value changes, the
GOTO takes the program somewhere else. It is hard to tell, when reading the code, whether the variable
has the right value at the right time.
4) Executing data
You cannot tell from a byte’s contents whether it holds a character, part of a number, part of a memory location,
or a program instruction. The program keeps these different types of information in different places in memory
to keep straight which byte holds what type of data. If the program interprets data as instructions, it will try to
execute them and will probably lock. It may print odd things on the screen first. Some computers detect
execution of “impossible” commands and stop the program with an error message (usually a hexadecimal
message flagging program termination or reference to an illegal machine code.)
The program will treat data as if they were instructions under two conditions:
(a) Data are copied into a memory area reserved for code. The code is overwritten. Examples of how to
do this:
Pointers are variables which store memory addresses. A pointer might hold the starting address
of an array; the programmer could put a value in the fourth element of the array by saying store
it in the fourth location after the address stored in this pointer. If the address in the pointer is
wrong, the data go to the wrong place. If the address is in the code space, the new data overwrite
the program.
Some languages do not check array limits. Suppose you have an array MYARRAY, with three
elements, MYARRAY[1], MYARRAY[2], and MYARRAY[3]. What happens if the program
tries to store a value in MYARRAY[2044]? If the language does not catch this error, the data
will be stored in the spot that would have been MYARRAY[2044] if that MYARRAY element
existed. This memory location is a few thousand bytes past the end address of MYARRAY. It
might be reserved for code, data, or hardware I/O, but not for MYARRAY.
(b) The program jumps to an area of memory that is reserved for data, and treats it like an area
containing code.
A bad table entry in a table-driven program can lead the program to jump into a data area.
Some computers divide memory into segments. The computer interprets anything in a code
segment as instructions, and anything in a data segment as numbers or characters. If the program
misstates a segment’s starting address, what the computer interprets as a code segment will
probably be a combination of code and data.
6) Re-entrance
A re-entrant program can be used concurrently by two or more processes. A re-entrant subroutine can call itself
or be called by any other routine while it is executing. Some languages do not support re-entrant subroutine
calls: if a routine tries to call itself, the program crashes. Even if the language allows re-entrance, a given
program or routine might not be. If a routine is serving two processes, how does it keep its data separate, so that
what it does for one process does not corrupt what it does for the other?
Common Software Errors (Part 6):
Control Flow Errors
stored on the stack, which takes it back to subroutine 1. This is rarely intentional.
To avoid this error, subroutine 2 might POP (remove) its return address from the stack when it does its GOTO
back to routine 1. Used incorrectly, this can cause stack underflows, returns to the wrong calling routine, and
attempts to return to data values stored on the stack with the return addresses.
14) Interrupts
An interrupt is a special signal that causes the computer to stop the program in progress and branch to an
interrupt handling routine. Later, the program restarts from where it was interrupted. Input/output events,
including signals from the clock that a specified interval of time has passed, are typical causes of interrupts.
1) Dead crash
In a dead crash, the computer stops responding to keyboard input, stops printing, and leaves lights on or off
(but does not change them). It usually locks without issuing any warnings that it is about to crash. The only
way to regain control is to turn off the machine or press the reset key.
Common Software Errors (Part 6):
Control Flow Errors
Dead crashes are usually due to infinite loops. One common loop keeps looking for acknowledgment or data
from another device (printer, another computer, disk, etc.). If the program missed the acknowledgment, or
never gets one, it may stay in this wait loop forever.
6.3 LOOPS
There are many ways to code a loop, but they all have some things in common. Here is one example:
1 SET LOOP_CONTROL = 1
2 REPEAT
3 SET VAR = 5
4 PRINT VAR * LOOP_CONTROL
5 SET LOOP_CONTROL = LOOP_CONTROL + 1
6 UNTIL LOOP_CONTROL > 5
7 PRINT VAR
The program sets LOOP_CONTROL to 1, sets VAR to 5, prints the product of VAR and LOOP_CONTROL,
increments LOOP_CONTROL then checks whether LOOP_CONTROL is greater than 5. Since LOOP_CONTROL
is only 2, it repeats the code inside the loop (lines 3, 4, and 5). The loop keeps repeating until LOOP_CONTROL
reaches 6. Then the program executes the next command after the loop, printing the value of VAR.
LOOP_CONTROL is called the loop control variable. Its value determines how many times the loop is
executed. If the expression written after the UNTIL is complex, involving many different variables, it is a loop
Common Software Errors (Part 6):
Control Flow Errors
control expression, rather than a loop control variable. The same types of errors arise in both cases.
1) Infinite loop
If the condition that terminates the loop is never met, the program will loop forever. Modify the example so that
it loops until LOOP_CONTROL was less than 0 (never happens). It will loop forever.
For example:
IF VAR > 5
THEN SET VAR_2 = 20
ELSE SET VAR_2 = 10
The THEN clause (SET VAR_2 = 20) is only executed if the condition (VAR > 5) is met. If the condition is not
met, the ELSE clause (SET VAR_2 = 10) is executed. Some IF statements only specify what to do if the
condition is met. They do not include an ELSE clause. If the condition is not met (VAR =< 5) the program
skips the THEN clause and moves on to the next line of code.
usually does not matter. If he includes it only inside one clause, it will be missed whenever the other clause
(ELSE or THEN) is executed.
1) Missing default
A programmer who thinks VAR can only take on the values listed may not write a default case. Because of a
bug or later modifications to the code, VAR can take on other values. A default case could catch these, and print
any unexpected value of VAR.
2) Wrong default
Suppose the programmer expects VAR to have only four possible values. He explicitly deals with the first three
possibilities, and buries the other one as the “default.” Will this default be correct for VAR’s unanticipated fifth
and sixth values?
3) Missing cases
VAR can take on five possible values but the programmer forgot to write a CASE statement covering the fifth
case.
5) Overlapping cases
The CASE statements are equivalent to this:
IF VAR > 5 then do TASK 1
IF VAR > 7 then do TASK_2
etc.
Common Software Errors (Part 6):
Control Flow Errors
The first and second cases overlap. If VAR is 9, it fits in both cases. Which should be executed? The first task
is the usual choice. Sometimes both are. Sometimes the second one is the correct choice.
The program calls a subroutine and passes it data, perhaps like so:
DO SUB (VAR_1, VAR_2, VAR_3)
The three variables, VAR_1, VAR_2, and VAR_3 are passed from the program to the subroutine. They are
called the subroutine’s parameters. The subroutine itself might refer to these variables by different names. The
statement at the start of the subroutine definition might look like this:
SUB(INPUT_1, INPUT_2, INPUT_3)
The subroutine receives the first variable in the list passed by the program (VAR_1) and calls it INPUT_1. It
calls the second variable in the list (VAR_2) INPUT_2. INPUT_3 is its name for the last variable (VAR_3).
The program’s and subroutine’s definitions of these variables must match. If VAR_1 is an integer, INPUT_1
should be as well. If VAR_2 is a floating point value that is what the subroutine had better expect to find in
INPUT_2.
The program may use the wrong starting or ending address of a set of data.
A buffer is an area of memory used for temporary storage. Messages between processes often include buffers:
the message includes a pointer to the start of the buffer and, in effect, says “for more details, read this.” When
the process receiving the message is done with it, it “releases” the buffer which becomes “free memory” again,
ready for other uses as needed by the operating system.
The receiving routine might get the address or the length of the message buffer wrong. It could start reading
memory locations that precede the start of the buffer or it could keep reading data out of locations past the
buffer’s end.
programmer might also store data on the stack. A stack that holds data only and no-return addresses is called a
value stack.
Suppose a stack can hold 256 bytes and the programmer tries to store 300 bytes on it. The stack overflows:
the last 256 bytes stored are usually kept, and the first 44 values lost, overwritten by the others. When the
program tries to retrieve these data from the stack, it can only get the last 256. When it tries to pop the 257th
value off the stack (the 44th pushed onto the stack) there is an underflow condition-the program is trying to
retrieve a value from a stack that is now empty.
The safest way for two processes to communicate is via messages. If they pass data through shared memory
areas instead, a bug in one process can trash data used by both, no matter how defensively the other process
was written. The most prevalent problems arising out of messaging architectures are race conditions, which are
discussed in the next section. There are also errors in sending and receiving the data in a message.
The data are stored on disk, tape, punch cards, whatever. The process corrupts stored data by putting bad values
into these files.
1) Overwritten changes
Imagine two processes working with the same data. Both read the data from disk at about the same time. One
saves some changes. The second does not know anything about changes made by the first. When it saves its
changes, it overwrites the data saved by the first process. Some programs use field, record, or file locking to
prevent processes from changing fields, records, or files that another process is changing. These locks are not
always present, and they do not always work.
8. RACE CONDITIONS
In the classic race, there are two possible events, call them EVENT_A and EVENT_B. Both events will happen.
The issue is which comes first. EVENT_A almost always precedes EVENT_B. There are logical grounds for
expecting EVENT_A to precede EVENT_B. However, under rare and restricted conditions, EVENT_B can “win
the race,” and occur just before EVENT_A. We have a race condition whenever EVENT_B precedes EVENT_A.
We have a race condition bug if the program fails when this happens. Usually the program fails because the
programmer did not anticipate the possibility of EVENT_B preceding EVENT_A, so he did not write any code
to deal with it.
Few testers look for race conditions. If they find an “irreproducible” bug, few think about timing issues
(races) when trying to reproduce it. Many people find timing issues hard to conceptualize or hard to understand.
We provide more than our usual amount of detail in the examples below, hoping that this will make the overall
concept easier to understand.
8.2 ASSUMPTION THAT ONE EVENT OR TASK HAS FINISHED BEFORE ANOTHER
BEGINS
The previous and the next sections provide examples of this type of problem.
8.3 ASSUMPTION THAT INPUT WILL NOT OCCUR DURING A BRIEF PROCESSING
INTERVAL
You type a character. The editing program You are testing receives it, moves other displayed characters around
on the screen so it can display this one at the cursor location, echoes the received character, then looks for your
next input. Naturally, since the computer is faster than the finger, the program should get everything done and
be ready for the next input long before you are ready to type it. Accordingly, the program does not allow for the
possibility that other characters will arrive before it is done with this one. However, a fast typist might enter
two, three, or more characters before the editor is ready for them. The editor catches the last one typed and
misses the others, which were typed while it was in the middle of dealing with the first one.
Common Software Errors (Part 8):
Race Conditions
The programmer realizes that these operations take very little time. Since it is so unlikely for an
interrupt-triggering event to happen in this brief interval, why take the time to block interrupts during it?
Usually all goes well, but every now and again the program will be interrupted.
Failure to block interrupts was raised earlier (“Program runs amok: Interrupts”). There the focus was on the
problems of interrupts. Here the point is one of timing. Even if part of a program is brief, if it lasts long enough
that an interrupt-triggering event can happen during this interval, then some day an interrupt-triggering event
will happen during the interval.
does not receive a response within the specified period, it will never receive a response. What happens if the
response arrives milliseconds after the time-out interval has ended? The program might interpret this as a
response to some other message, or it might just crash. This is a classic race because it is unlikely, but not
impossible, for the response to occur after the time-out period is over.
9. LOAD CONDITIONS
Programs misbehave when overloaded. A program may fail when working under high volume (lots of work
over a long period) or under stress (maximum amount of work all at once). It may fail when it runs out of
memory, printers or other “resources.” It may fail because it is required to do too much in too little time. All
programs have limits. The issues are whether a program can meet its stated limits and how horribly it fails
when those limits are exceeded.
Also, some programs create their own load problems, or, in multi-processing situations, make problems for
others. They hog computer time or resources or create unnecessary extra work to such an extent that other
processes (or themselves later) cannot do their tasks.
The program does not erase outdated backups and internal-use temporary files. There are limits on how much
erasure should be done automatically, but the process does not get rid of files that obviously should go.
10. HARDWARE
Programs send bad data to devices, ignore error codes coming back, try to use devices that are not there, and so
on. Even if the problem is truly due to a hardware failure, there is also a software error if the software does not
recognize that the hardware is no longer working correctly.
11.3 No TITLE
The program should identify itself when it starts. You should know right away that you are now running Joe
Blow’s Super Spreadsheet, not Jane Doe’s Deluxe Database.
11.4 NO VERSION ID
The program should display its version identification when it starts or when you give it a display version
command. Customers should be able to find this ID easily, so they can tell it to you when they call to complain
about the program. You should be able to find the ID easily so that you can tell it to the programmer when you
find bugs.
If the program is made of many independently developed pieces, it pays to be able to identify the version of
each piece. These IDs may not display automatically-you may have to use a debugger or a special editor to find
them. They are useful if they exist and if they are kept up to date by the programmers. However, unless you
have firm management backing, do not insist that programmers compile separate version IDs for each module.
11.7 ARCHIVED SOURCE DOES NOT COMPILE INTO A MATCH FOR SHIPPING
CODE
Before releasing a product to any customer, archive the source code. If the customer finds a bug, your company
must be able to recompile this code and regenerate the product that the customer has. Without this starting point,
you will have major problems addressing that customer’s difficulties.
This should be obvious, but it seems not to be. Many companies cannot recreate products they sell. They
may have archival copies of source code, but the code in their vaults is a bit different from the code in the
product they shipped. This is begging for a disaster.
If you report that archives are not up to date with code that is about to be shipped, and are rebuffed, take it to
a higher level. The president and the company lawyer might be much more concerned by this problem than
mid-level engineering or marketing managers.
These are not acceptable reasons. If you are not sure whether something is a problem, say so in the report.
Appeal to higher management to relieve criticism for reporting minor or politically inconvenient bugs. It is
your responsibility to report every problem you find. Deliberate suppression of bug reports leads to confusion,
poorer tester morale, and a poorer product. It can also bring you into the middle of nasty office politics,
possibly as a scapegoat.
out, or running some tests only on every second or third cycle of testing.
You have combined too much into one test. If one test is buried inside another, or depends on another,
then if that other test fails, this test probably will not be executed. Overly complex combinations of
test cases can lead to missed tests because they confuse you.
1) Illegible reports
If the programmer finds it hard to read a report, he will ignore it for as long as possible. Many reports are hard
to read because you pack too much information into them. Put separable problems on separate report forms. If a
single problem requires a long description, type it on a separate page and attach it to the Problem Report.
7) Concentration on trivia
Do not make big issues over small problems. Do not get too far drawn into long arguments over wording, or
style of presentation. Do not exaggerate the severity of bugs. Be wary of getting a reputation as a nitpicker.
8) Abusive language
If you refer to work as “unprofessional,” “sloppy,” or “incompetent,” expect the programmer who did it to get
angry. Do not bet that he will fix the bug, even if it is serious. It can be useful to shock a programmer
occasionally, but be conscious of what you are doing. Do it rarely (once a year!).
Common Software Errors (Part 12):
Testing Errors
It is not enough to just report a bug. You have got to make sure that it is noticed and not forgotten. Otherwise,
bugs will “slip through the cracks” and make it into the shipping product.