Therac Software
Therac Software
The manufacturer said that the hardware and software were "tested and exercised
separately or together over many years." In his deposition for one of the lawsuits, the
quality assurance manager explained that testing was done in two parts. A "small
amount" of software testing was done on a simulator, but most testing was done as a
system. It appears that unit and software testing was minimal, with most effort directed at
the integrated system test. At a Therac-25 user group meeting, the same quality assurance
manager said that the Therac-25 software was tested for 2,700 hours. Under questioning
by the users, he clarified this as meaning "2,700 hours of use."
The programmer left CMC in 1986. In a lawsuit connected with one of the accidents, the
lawyers were unable to obtain information about the programmer from CMC. In the
depositions connected with that case, none of the CMC employees questioned could
provide any information about his educational background or experience. Although an
attempt was made to obtain a deposition from the programmer, the lawsuit was settled
before this was accomplished. We have been unable to learn anything about his
background.
CMC claims proprietary rights to its software design. However, from voluminous
documentation regarding the accidents, the repairs, and the eventual design changes, we
can build a rough picture of it.
The software is responsible for monitoring the machine status, accepting input about the
treatment desired, and setting the machine up for this treatment. It turns the beam on in
response to an operator command (assuming that certain operational checks on the status
of the physical machine are satisfied) and also turns the beam off when treatment is
completed, when an operator commands it, or when a malfunction is detected. The
operator can print out hard-copy versions of the CRT display or machine setup
parameters.
The treatment unit has an interlock system designed to remove power to the unit when
there is a hardware malfunction. The computer monitors this interlock system and
provides diagnostic messages. Depending on the fault, the computer either prevents a
treatment from being started or, if the treatment is in progress, creates a pause or a
suspension of the treatment.
The software, written in PDP 11 assembly language, has four major components; stored
data, a scheduler, a set of critical and noncritical tasks, and interrupt services. The stored
data includes calibration parameters for the accelerator setup as well as patient-treatment
data. The interrupt routines include:
The scheduler controls the sequences of all noninterrupt events and coordinates all
concurrent processes. Tasks are initiated every 0.1 second, with the critical tasks
executed first and the noncritical tasks executed in any remaining cycle time. Critical
tasks include the following:
• The treatment monitor (Treat) directs and monitors patient setup and treatment via
eight operating phases. These are called subroutines, depending on the value of
the Tphase control variable. Following the execution of a particular subroutine,
Treat reschedules itself. Treat interacts with the keyboard processing task, which
handles operator console communication. The prescription data is cross-checked
and verified by other tasks (for example, the keyboard processor and the
parameter setup sensor) that inform the treatment task of the verification status via
shared variables.
• The servo task controls gun emission, dose rate (pulse-repetition frequency),
symmetry (beam steering), and machine motions. The servo task also sets up the
machine parameters and monitors the beam-tilt-error and the flatness-error
interlocks.
• The housekeeper task takes care of system-status interlocks and limit checks, and
puts appropriate messages on the CRT display. It decodes some information and
checks the setup verification.
•
It is clear from the CMC documentation on the modifications that the software allows
concurrent access to shared memory, that there is no real synchronization aside from data
stored in shared variables, and that the "test" and "set" for such variables are not
indivisible operations. Race conditions resulting from this implementation of
multitasking played an important part in the accidents.