Chapter 5
Chapter 5
Program Security
Many different types of programs may need to be secure programs Some common types are:
Application programs used as viewers of remote data. Programs used as viewers (such as
word processors or file format viewers) are often asked to view data sent remotely by an
untrusted user (this request may be automatically invoked by a web browser). Clearly, the
untrusted user’s input should not be allowed to cause the application to run arbitrary
programs. It’s usually unwise to support initialization macros (run when the data is displayed);
if you must, then you must create a secure sandbox (a complex and error-prone task that
almost never succeeds, which is why you shouldn’t support macros in the first place). Be
careful of issues such as buffer overflow which might allow an untrusted user to force the
viewer to run an arbitrary program.
Application programs used by the administrator (root). Such programs shouldn’t trust
information that can be controlled by non-administrators.
Local servers (also called daemons).
Network-accessible servers (sometimes called network daemons).
Web-based applications (including CGI scripts). These are a special case of network-accessible
servers, but they’re so common they deserve their own category. Such programs are invoked
indirectly via a web server, which filters out some attacks but nevertheless leaves many
attacks that must be withstood.
Applets (i.e., programs downloaded to the client for automatic execution). This is something
Java is especially famous for, though other languages (such as Python) support mobile code
as well. There are several security viewpoints here; the implementer of the applet
infrastructure on the client side has to make sure that the only operations allowed are “safe”
ones, and the writer of an applet has to deal with the problem of hostile hosts (in other words,
you can’t normally trust the clientsetuid/setgid programs. These programs are invoked by a
local user and, when executed, are immediately granted the privileges of the program’s
owner and/or owner’s group. In many ways these are the hardest programs to secure,
because so many of their inputs are under the control of the untrusted user and some of
those inputs are not obvious.
Buffer Overflows
A buffer overflow is the computing equivalent of trying to pour two liters of water into a one-
liter pitcher: Some water is going to spill out and make a mess. And in computing, what a mess
these errors have made.
A buffer (or array or string) is a space in which data can be held. A buffer resides in memory.
Because memory is finite, a buffer's capacity is finite. For this reason, in many programming
languages the programmer must declare the buffer's maximum size so that the compiler can
set aside that amount of space.
Let us look at an example to see how buffer overflows can happen. Suppose a C language
program contains the declaration:
char sample[10];
The compiler sets aside 10 bytes to store this buffer, one byte for each of the ten elements of
the array, sample[0] through sample[9]. Now we execute the statement:
sample[10] = 'A';
The subscript is out of bounds (that is, it does not fall between 0 and 9), so we have a problem.
The nicest outcome (from a security perspective) is for the compiler to detect the problem
and mark the error during compilation. However, if the statement were
sample[i] = 'A';
we could not identify the problem until i was set during execution to a too-big subscript. It
would be useful if, during execution, the system produced an error message warning of a
subscript out of bounds. Unfortunately, in some languages, buffer sizes do not have to be
predefined, so there is no way to detect an out-of-bounds error. More importantly, the code
needed to check each subscript against its potential maximum value takes time and space
during execution, and the resources are applied to catch a problem that occurs relatively
infrequently. Even if the compiler were careful in analyzing the buffer declaration and use,
this same problem can be caused with pointers, for which there is no reasonable way to define
a proper limit. Thus, some compilers do not generate the code to check for exceeding bounds.
If the extra character overflows into the user's data space, it simply overwrites an existing
variable value (or it may be written into an as-yet unused location), perhaps affecting the
program's result, but affecting no other program or data.
In the second case, the 'B' goes into the user's program area. If it overlays an already executed
instruction (which will not be executed again), the user should perceive no effect. If it overlays
an instruction that is not yet executed, the machine will try to execute an instruction with
operation code 0x42, the internal code for the character 'B'. If there is no instruction with
operation code 0x42, the system will halt on an illegal instruction exception. Otherwise, the
machine will use subsequent bytes as if they were the rest of the instruction, with success or
failure depending on the meaning of the contents. Again, only the user is likely to experience
an effect.
Incomplete Mediation
http://www.somesite.com/subpage/userinput&parm1=(808)555-1212&parm2=2004Jan01
The two parameters look like a telephone number and a date. Probably the client's (user's)
web browser enters those two values in their specified format for easy processing on the
server's side. What would happen if parm2 were submitted as 1800Jan01? Or 1800Feb30? Or
2048Min32? Or 1Aardvark2Many?
Something would likely fail. As with buffer overflows, one possibility is that the system would
fail catastrophically, with a routine's failing on a data type error as it tried to handle a month
named "Min" or even a year (like 1800) which was out of range. Another possibility is that the
receiving program would continue to execute but would generate a very wrong result. (For
example, imagine the amount of interest due today on a billing error with a start date of 1 Jan
1800.) Then again, the processing server might have a default condition, deciding to treat
1Aardvark2Many as 3 July 1947. The possibilities are endless.
One way to address the potential problems is to try to anticipate them. For instance, the
programmer in the examples above may have written code to check for correctness on
the client's side (that is, the user's browser). The client program can search for and screen out
errors. Or, to prevent the use of nonsense data, the program can restrict choices only to valid
ones. For example, the program supplying the parameters might have solicited them by using
a drop-down box or choice list from which only the twelve conventional months would have
been possible choices. Similarly, the year could have been tested to ensure that the value was
between 1995 and 2005, and date numbers would have to have been appropriate for the
months in which they occur (no 30th of February, for example). Using these verification
techniques, the programmer may have felt well insulated from the possible problems a
careless or malicious user could cause.
However, the program is still vulnerable. By packing the result into the return URL, the
programmer left these data fields in a place accessible to (and changeable by) the user. In
particular, the user could edit the URL line, change any parameter values, and resend the line.
On the server side, there is no way for the server to tell if the response line came from the
Access control is a fundamental part of computer security; we want to make sure that only
those who should access an object are allowed that access. (We explore the access control
mechanisms in operating systems in greater detail in Chapter 4.) Every requested access must
be governed by an access policy stating who is allowed access to what; then the request must
be mediated by an access policy enforcement agent. But an incomplete mediation problem
occurs when access is not checked universally. The time-of-check to time-of-use (TOCTTOU)
flaw concerns mediation that is performed with a "bait and switch" in the middle. It is also
known as a serialization or synchronization flaw.
To understand the nature of this flaw, consider a person's buying a sculpture that costs $100.
The buyer removes five $20 bills from a wallet, carefully counts them in front of the seller,
and lays them on the table. Then the seller turns around to write a receipt. While the seller's
back is turned, the buyer takes back one $20 bill. When the seller turns around, the buyer
hands over the stack of bills, takes the receipt, and leaves with the sculpture. Between the
time when the security was checked (counting the bills) and the access (exchanging the
sculpture for the bills), a condition changed: what was checked is no longer valid when the
object (that is, the sculpture) is accessed.
A similar situation can occur with computing systems. Suppose a request to access a file were
presented as a data structure, with the name of the file and the mode of access presented in
the structure. An example of such a structure is shown in Figure
The data structure is essentially a "work ticket," requiring a stamp of authorization; once
authorized, it will be put on a queue of things to be done. Normally the access control
To carry out this authorization sequence, the access control mediator would have to look up
the file name (and the user identity and any other relevant parameters) in tables. The
mediator could compare the names in the table to the file name in the data structure to
determine whether access is appropriate. More likely, the mediator would copy the file name
into its own local storage area and compare from there. Comparing from the copy leaves the
data structure in the user's area, under the user's control.
It is at this point that the incomplete mediation flaw can be exploited. While the mediator is
checking access rights for the file my_file, the user could change the file name descriptor to
your_file, the value shown in figure.
Having read the work ticket once, the mediator would not be expected to reread the ticket
before approving it; the mediator would approve the access and send the now-modified
descriptor to the file handler.
The problem is called a time-of-check to time-of-use flaw because it exploits the delay
between the two times. That is, between the time the access was checked and the time the
result of the check was used, a change occurred, invalidating the result of the check.
A virus can be either transient or resident. A transient virus has a life that depends on the life
of its host; the virus runs when its attached program executes and terminates when its
attached program ends. (During its execution, the transient virus may have spread its
infection to other programs.) A resident virus locates itself in memory; then it can remain
active or be activated as a stand-alone program, even after its attached program ends.
A Trojan horse is malicious code that, in addition to its primary effect, has a second,
nonobvious malicious effect.1 As an example of a computer Trojan horse,
A logic bomb is a class of malicious code that "detonates" or goes off when a specified
condition occurs. A time bomb is a logic bomb whose trigger is a time or date.
A trapdoor or backdoor is a feature in a program by which someone can access the program
other than by the obvious, direct call, perhaps with special privileges. For instance, an
automated bank teller program might allow anyone entering the number 990099 on the
keypad to process the log of everyone's transactions at that machine. In this example, the
trapdoor could be intentional, for maintenance purposes, or it could be an illicit way for the
implementer to wipe out any record of a crime.
A worm is a program that spreads copies of itself through a network. The primary difference
between a worm and a virus is that a worm operates through networks, and a virus can spread
through any medium (but usually uses copied program or data files). Additionally, the worm
spreads copies of itself as a stand-alone program, whereas the virus spreads copies of itself
as a program that attaches to or embeds in other programs.
For example, recall the SETUP program that you initiate on your computer. It may call dozens or hundreds
of other programs, some on the distribution medium, some already residing on the computer, some in
memory. If any one of these programs contains a virus, the virus code could be activated. Let us see how.
Suppose the virus code were in a program on the distribution medium, such as a CD; when executed, the
virus could install itself on a permanent storage medium (typically, a hard disk), and also in any and all
executing programs in memory. Human intervention is necessary to start the process; a human being puts
the virus on the distribution medium, and perhaps another initiates the execution of the program to which the
virus is attached. (It is possible for execution to occur without human intervention, though, such as when
execution is triggered by a date or the passage of a certain amount of time.) After that, no human intervention
is needed; the virus can spread by itself.
A more common means of virus activation is as an attachment to an e-mail message. In this attack, the virus
writer tries to convince the victim (the recipient of an e-mail message) to open the attachment. Once the viral
attachment is opened, the activated virus can do its work. Some modern e-mail handlers, in a drive to "help"
the receiver (victim), will automatically open attachments as soon as the receiver opens the body of the e-
mail message. The virus can be executable code embedded in an executable attachment, but other types of
Appended Viruses
A program virus attaches itself to a program; then, whenever the program is run, the virus is activated. This
kind of attachment is usually easy to program.
In the simplest case, a virus inserts a copy of itself into the executable program file before the first
executable instruction. Then, all the virus instructions execute first; after the last virus instruction, control
flows naturally to what used to be the first program instruction. Such a situation is shown in Figure.
This kind of attachment is simple and usually effective. The virus writer does not need to know anything about
the program to which the virus will attach, and often the attached program simply serves as a carrier for the
virus. The virus performs its task and then transfers to the original program. Typically, the user is unaware of
the effect of the virus if the original program still does all that it used to. Most viruses attach in this manner.
Finally, the virus can replace the entire target, either mimicking the effect of the target or
ignoring the expected effect of the target and performing only the virus effect. In this case,
the user is most likely to perceive the loss of the original program.
Currently, the most popular virus type is what we call the document virus, which is
implemented within a formatted document, such as a written document, a database, a slide
presentation, or a spreadsheet. These documents are highly structured files that contain both
data (words or numbers) and commands (such as formulas, formatting controls, links). The
commands are part of a rich programming language, including macros, variables and
procedures, file accesses, and even system calls. The writer of a document virus uses any of
the features of the programming language to perform malicious actions.
The ordinary user usually sees only the content of the document (its text or data), so the virus
writer simply includes the virus in the commands part of the document, as in the integrated
program virus.
Memory-Resident Viruses
Some parts of the operating system and most user programs execute, terminate, and
disappear, with their space in memory being available for anything executed later. For very
frequently used parts of the operating system and for a few specialized user programs, it
would take too long to reload the program each time it was needed. Such code remains in
memory and is called "resident" code. Examples of resident code are the routine that
interprets keys pressed on the keyboard, the code that handles error conditions that arise
during a program's execution, or a program that acts like an alarm clock, sounding a signal at
a time the user determines. Resident routines are sometimes called TSRs or "terminate and
stay resident" routines.
Virus writers also like to attach viruses to resident code because the resident code is activated
many times while the machine is running. Each time the resident code runs, the virus does
too. Once activated, the virus can look for and infect uninfected carriers. For example, after
activation, a boot sector virus might attach itself to a piece of resident code. Then, each time
the virus was activated it might check whether any removable disk in a disk drive was infected
and, if not, infect it. In this way the virus could spread its infection to all removable disks used
during the computing session.