Issues in Kernel Design
Issues in Kernel Design
by
Abstract
Considerable activity recently has been devoted to the design and development of
operating system kernels, as part of efforts to provide much more reliably secure
systems than heretofore available. The resulting kernel architectures differ sub-
stantially from more traditional systems of similar function and, in particular, ap-
pear superior with respect to reliability, simplicity, and security.
!. Introduction
As operating systems became larger and more complex, interest increased in seg-
menting that software in a rational fashion. With respect to operating systems, a
number of efforts were made to develop a low level base that would provide basic sys-
tem functions in a highly reliable way. On top of this skeletal software base, ex-
tensive operating system functions and user supports would be built. Early efforts
in this direction, although significantly different in scale and goals, include IBM's
virtual machine system CP-67 [IBM 72], Brinch-Hansen's RC-4000 nucleus [Bri 73], and
CAL-TSS [Lam 76].
However, the greatest impetus to this activity grew recently out of the desire
for secure operating systems -- those that could assure that the access to data
stored within was controlled and protected in an uncircumventable way. Efforts have
been made by a number of groups to design and develop operating system bases that, in
addition to providing necessary primitive functions, were wholly responsible for the
security of the operating system, including whatever was built on top of that nu-
cleus. Part of the goal of these efforts typically was to minimize the size and com-
plexity of the resulting nucleus, in the well founded belief that to do so would
greatly increase the likelihood that the resulting software would be correctly imple-
*This research was supported by the Advanced Research Projects Agency of the Depart-
ment of Defense under Contract DAHC-73-C-0368.
211
It should be noted that these efforts have taken place in an environment where
reliable security has become of increasing concern. No general purpose operating sys-
tem of a more traditional design has successfully provided a satisfactory degree of
assurance for those who are seriously concerned about control over their data. Known
flaws are so numerous that it is generally recognized a highly systematic approach is
required. Since it appears that kernel based architectures will make considerable
strides in improving that situation, an understanding of their characteristics is
useful.
There are several considerations that make a kernel based operating system ar-
chitecture significantly different from a more traditional architecture developed
without security as a major design criterion. Reliable security demands that the
software upon which security depends be as small and simple as practical. As a
result, functions included in operating system bases to enhance performance or in-
crease the convenience of writing software above that base are not needed for secu-
rity purposes and consequently not included in a kernel. Kernel designs would typi-
cally exclude that software despite potentially increased performance overhead or
greater non-kernel software complexity. Naturally, the architecture is structured
to avoid these costs as much as possible, and it appears that such penalties can gen-
erally be minimized.
Conversely however, the kernel must contain all security related software if
correct enforcement is not to depend on non-kernel software. Therefore, as a
countering influence, functions are forced into the kernel that may have been omitted
from an operating system base and implementable at an outer level. The body of this
paper characterizes the architectural principles that lead to these changes. Examples
of the relocation of system functions in an architecture are also given.
Kernel architectures have been directed primarily at the goal of reliable "data
security": assuring that it is possible for users only to access data to which they
are specifically entitled. Issues such as confinement or denial of service have gen-
erally not been directly addressed, although care usually has been taken to minimize
such problems. Clearly, the more sophisticated the security policy that it is
desired to enforce, the more kernel mechanism will likely be needed.
Next, what is the ob~ ~ra~__: the size of the objects protected. Are
processes the active objects, or can a procedure call within a process coincide with
a domain change, as in Hydra. [Wul 75] Perhaps, in a system that supports a family
tree of processes for a given user, the entire family is the active object, with no
distinction made among processes. That is, all processes in the tree might be re-
quired to have the same access rights. This design was used in the TENEX operating
system.
What are the rules governing the alteration of system data that records the
security policy? That is, what is the olpol__~grain. Is control over objects and
groups of objects hierarchically distributed as in Multics[Sal 74], or is it strictly
centralized, as in the IBM operating systems. What is the precision with which one
can specify access control decisions? Can a file be marked with the exact list of
users to be given access, or can one only set the access mode as either public or
private, to illustrate two extremes.
Apart from the question of the security policy supported by a kernel, a serious
213
impact on the actual kernel design and implementation results from the functions and
services to be supported by that kernel. Is it possible, for instance, to create and
destroy objects of all types supported by the kernel, or do certain cases, such as
mini-disks, have their allocation fixed at the time the kernel is assembled. To what
degree is the number of objects of each type supported and protected by the kernel
fully variable. How rich are the set of operators provided to access the kernel pro-
tected objects. Are sophisticated message synchronization primitives built, or is a
simpler, less efficient facility provided, out of which user software must simulate
the desired features.
The hardware base on which a kernel is built can significantly impact the
resulting kernel, including its size, complexity, overall architecture, and details
214
of implementation. The richness of the hardware base that must be supported will have
considerable effect, A multiple processor system, in which more than one cpu exe-
cutes kernel functions, may require support for processor coordination within the
kernel that single processor systems do not. Certain types of I/O channels may need
detailed cpu software support.
I/O is clearly one area where hardware characteristics considerably affect the
security kernel, but it is not the only one. A simple example, the time of day clock
on many machines is located in a way so that only privileged software can read it,
despite the fact that the time of day is not relevant to most definitions of data
security. On the PDP-11, both the word size and cpu arithmtic are organized on a 16
bit basis. Absolute addresses are larger however. I/O device registers (each 16
bits) have the additional bits located in additional registers, with little commonal-
ity among devices wheDe those bits are to be found. Sometimes carry from the 16-th
to 17-th bit is implemented, sometimes not. As a result, special software for each
device is typically needed to check address bounds. As a result, additional kernel
215
code is required that could have been eliminated with judicious hardware planning.
There are often a number of such hardware characteristics that considerably com-
plicate the task of building a secure kernel, and that would not require significant
changes to greatly diminish those difficulties. The previous examples were largely
of this class. More dramatic changes to the hardware base could of course greatly
contribute to reliable system security. Capability based architectures have consid-
erable promise in that respect. Kernel designs such as those being pursued at UCLA
particularly lend themselves to implementation in firmware. The attractive feature
about many of these points is that their adoption should not necessarily imply signi-
ficantly increased fabrication costs, and both the simplicity and reliability of
software would be enhanced.
2.4 Performance
The last significant issues which affect the size and complexity of the kernel
are performance constraints. Stringent performance requirements as expected make it
difficult to develop a straightforward design and implementation. For example, while
it may be possible for scheduling decisions to be made through a separate user pro-
cess which is only interrogated by the kernel, the additional process switch overhead
to invoke the scheduler for each expected user process switch may impose unacceptable
delays. Expensive domain changing usually exerts pressures that lead to larger, more
complex kernels, since one is tempted to integrate such functions as scheduling into
kernel code.
The costly effects of this phenomenon are well illustrated by the Multics
experience.[Sal 76] On its original GE 645 hardware base, ring crossings (domain
changes) were quite expensive, since the existence of rings were simulated in
software by a complete switch of segment tables and status at each ring crossing. As
a result, performance pressures led the highly privileged ring zero to be composed
of approximately 80,000 lines of PL/I code, and a number of security errors. Subse-
quent research has shown that a majority of that code can be removed.[Sch 75]
These issues of security policy, system function, hardware impact, and perfor-
mance requirements all affect the size and complexity of a security kernel which can
be built. Within any given set of such constraints, however, certain designs are far
216
superior to others~ The next section of this paper discusses kernel design princi-
ples that generally seem to apply to most of the likely sets of constraints.
Here we first make several general observations regarding kernel designs and
illustrate specific tradeoffs with examples from existing systems, Most of these ob-
servations and illustrations are concerned with the tasks of developing as small and
simple a kernel as is practical, under the design constraints discussed earlier~
One of the first questions concerns the relationship of the kernel to the Pest
of the operating system. There ape a number of possibilities. In both the Multics
system and a Unix prototype under development at the Mitre Corporation, the kernel is
essentially part of the user process, even sharing the process address space in the
Multics case. Kernel code and data is protected from user process software by a
variety of mechanisms. Kernel tables are shared among all processes. Interrupts may
occur at nearly any time, even while running kernel code. Typically, at that point
the current process can be suspended and a process switch performed. Therefore ker-
nel data and tables must be designed for parallel access and modification.
One of the most effective ways to reduce the size and complexity of an operating
217
system kernel is to remove, as much as possible, all of the resource management func-
tions for a particular type of resource and relegate those functions to untrusted
code. The degree to which this goal can be attained, of course, will vary from sys-
tem to system, and from one object type to another within a given system.
The most extreme action possible is to remove the entire resource as well as all
of its supporting software from the base level of the system. As a result, The ker-
nel does not provide protection of these resources as such. Any structure which is
needed to insure the type integrity of the resource is the responsibility of the user
software. The only protection enforced by the kernel is that provided for the kernel
recognized object in which the user implemented resources are placed: segments,
pages, disk blocks and the like.
What this strat@gy generally means is that a single common pool of resources,
usually maintained for all processes by the operating system, has now been broken up
into a number of smaller subpools, each managed independently. This approach is il-
lustrated by the I/O buffer management in 0S/360. Buffer integrity is not assured,
since whatever pointer the user supplies in certain SVCs is used by the operating
system so long as the indicated transfer involves that user's core.
There are many resources that an operating system typically manages that are not
so obvious as I/O buffers. Name spaces are a good example. There are a number of
cases where unique names are needed for convenience in communication, like socket
numbers in the ARPA net, which are usually assigned and reclaimed by the operating
218
system. However, they could be preallocated to various domains, with each domain ex-
pected to do its own management. A kernel, using lower level names (see 3.2.3), can
assure that messages are properly delivered to the appropriate domains in the ARPA
net example, but need not be concerned with management of sockets [Gai 76].
A good example of the application of this approach occurs in the UCLA kernel ar-
chitecture. [Pop 74a] There, process scheduling as well as the management of main
memory is the sole responsibility of one particular user level process, containing
untrusted code. It issues kernel calls to switch processes, initiate page swaps, and
so forth. While this scheduling process therefore can move around the objects that
it has the responsibility to manage, it cannot examine any of their contents. This
method supports usual functionality requirements and permits sophisticated schedul-
ing, while it considerably simplifies the kernel. However, certain confinement
problems are not solved, as discussed later.
~.2.~ Naming
The more of this name management that can be removed from the kernel, the
simpler the resulting kernel software. There are several ways to do so. One can
consider supporting in the kernel only the lowest level naming. The rest, including
mapping software, could be built as part of user domains. This approach generally
amounts to partitioning the higher level name spaces, with separate partitions
managed by separate processes. However, such a design requires care if controlled
sharing is supported by the architecture. User software needs to know the names
maintained by other user software in order to coordinate references to shared ob-
jects. The coordination can be accomplished if each name maintenance package fol-
lows the same conventions, and messages are used to exchange relevant information.
One should note that it is possible to move only a portion of the name support
out of the kernel, too. One might partition the name space, allocating each parti-
tion to a different user process, and then have the kernel merely enforce that parti-
tion. Each user would do his own name management within a given partition. This
view is similar to that of type integrity, discussed earlier, with the resource being
the names, and the operations implemented in the kernel which operate on names being
very limited.
Up until this point we have maintained the fiction that the appropriate system
architecture for highly reliable security placed all security relevant code in the
core of the operating system, the kernel, running on the bare hardware. The rest of
the system software is encapsulated within user processes, running under the control
of the kernel, and making calls to it.
These proceses architecturally are the same as user processes, except that the infor-
mation they pass to the kernel is used by the kernel to take security relevant ac-
tions.
A good example of such a trusted process appears both in the Multics system at
M.I.T. as well as in the UCLA Secure Unix development. One process has responsibili-
ty for handling free terminals, logging and authenticating users, and starting up
the appropriate software for them. The Multics logger and the UCLA initiator both
run as standard processes, except that the Multics ring zero software and the UCLA
kernel both accept the authentication decision made by that process, and make securi-
ty relevant decisions using that information.
One sometimes finds that the software composing these trusted processes can be
segregated into security relevant and non-security relevant parts. That is, the
trusted processes can often themselves be structured so as to be composed of a kernel
and other, untrusted software.
A more powerful example of the utility of a second level kernel concerns name
mapping and file systems. There, conditions occur under which the user needs to
refer to system supported objects by names that are from a different name space than
that supported by the kernel. For example, as already mentioned, the kernel may
maintain segment numbers, while the user will invariably wish to refer to those seg-
221
ments by character string names, especially when setting access control information,
but perhaps also when stating which segment to read or modify. Because of these
usage patterns, mapping between name spaces may be highly security relevant. File
systems provide an important case of this problem. In the UCLA Unix system, file
management is done by a user level process. [Kam 76] The kernel of that file manager
is responsible for elementary directory management, and other untrusted software in
the process can take care of positioning pointers within files, scheduling I/O opera-
tions and maintaining resource allocation limits. The directory management consists
largely of mapping string names to disk block names supported by the kernel, handling
allocation matters, and checking and maintaining protection data.
This multiple level kernel architecture can yield significant advantages in the
design and implementation of the base level kernel. If applied in conjunction with
other design principles discussed above, the base level kernel can become little more
than an implementor of abstract types. It makes segments, processes, messages, and
so forth out of hardware memory, status registers, and the like. For some systems,
this simplification of the base level kernel may permit further improvement. It may
be possible to implement each kernel call (including the interrupt handler, which is
just a hardware forced call) as uninterruptible code, without losing any relevant
performance characteristics. This design virtually eliminates any concern for paral-
lelism in the security relevant code, and therefore enhances its reliability. If the
kernel is simple enough, it is a candidate for implementation in microcode. A packet
switch communications processor that hosts no user programming is an example of an
application that might only require a kernel simple enough for such an implementa-
tion.
A final note on levels of kernels concerns the updating of the protection data
used by trusted software to make access control decisions. If a user at a terminal
is to have confidence in the security enforced by a system, he must be able to view
and change the system's protection data in a highly reliable way. He needs a secure
channel between his terminal and kernel code. One easy method to implement that
channel is to provide a simple facility by which the user can switch his terminal to
a trusted process and back. In this way he need not be concerned with any software
running in his own process(es). While this solution, unless extended, suffers from
the inability to permit program generated changes in protection data, such actions do
not seem to occur often in practice anyway, except as a way of compensating for some
other lack in functionality or flexibility of the protection controls.
Much of the discussion up to this point has concerned the architectural place of
222
a kernel in the larger system architecture, and the effect of kernel goals on that
structure. Naturally, the internal structure of a kernel is also important. In gen-
eral, the task of constructing a reliable kernel is little different from that of
developing other highly reliable software. However, there are considerations pecu-
liar to this application which deserve mention, including hardware selection, paral-
lelism, and the effect of finite resource limits on attempts to use abstract type
methods for organizing kernel code.
~.2 Parallelism
While there may be some argument over whether algorithms that are expressed as
parallel computations or strictly sequential code are inherently easier to understand
(to some degree it certainly depends on the algorithm), it seems rather clear that
certification or verification of sequential code is presently less difficult. That
is the reason for specifying physical or logical uninterruptibility in the software
architecture. This goal is, of course, one that has been followed at higher levels
in systems since the early days of multiprogramming.
to run this virtual memory software as a process like other processes. Further, the
process management software can benefit from being written with the assumption that a
large, virtual address space is available. With the obvious circular dependency in
mind, it is not clear whether process types or virtual memory types should be the
lower level, with the other built on top. Reed [Ree 76] outlines a method of defin-
ing multiple levels of types to break the cycle. Lower levels implement a small
fixed number or amount of the resource, using small amounts of resources in so doing.
Higher levels reimplement the resource in large numbers or amounts, by multiplexing
the lower level ones. In medium sized systems such as UCLA Unix, such a multiple
tiered structure is not necessary.
6. Confinement
~.! !moorbance
Some have argued instead that confinement is not a serious issue in practical
applications. While this viewpoint is clearly reasonable when taken in the context
of the current absense of reliable data security, several experiments with the Mul-
tics system have illustrated how real the problem may be. It was found possible in
principle to drive a user terminal through inter-process communication performed by
modulation of paging behavior ~in one process and the sensing of that modulation by
the other. In a separate experiment, a similar bandwidth was discovered through use
of the directory system. One process repeatedly made and deleted entries in a direc-
tory. This action changed the size of that directory, a value recorded in its
parent directory. A second process read the size of that parent directory. In this
particular test, the first process was not supposed to be able to communicate to the
second.[Sal 76]
These two examples are instructive, for they illustrate a timing dependent and
timing independent channel, respectively, both involving resource management. The
bandwidth is enough to be of concern both to the military (one job might be a "tro-
Jan horse" running with top secret classification, and the other running unclassi-
fied) as well as to industry (consider disgruntled programmers).
From a practical point of view, virtually all the channels mentioned by Lampson
[Lam 75] or illustrated above are related to the allocation or management of
224
resources, and the changes in system data that occurs as a result of resource alloca-
tion decisions°
6 . 2 S t o r a g e g_n~dTimingChannels
Millen e t a l . [Mil 75] partition these channels into storage channels and tim-
ing channels. Storage channels are those which permit the passing of information
through direct modification and reading of cells in the computer memory. Timing chan-
nels are those which permit one user to vary the real time rate at which another re-
ceives service, to a degree that the second user can sense. As the examples earlier
show, the bandwidth of both types of channels can be significant.
It appears, however, that all resource storage channels can be changed into tim-
ing channels, and, on a large system, the bandwidth of the timing channels can be
severely limited. Further, it appears that these actions can be taken without losing
the values of multiprogramming, which is the primary cause of the problem. Resource
storage channels can in general be changed to timing dependent channels in the fol-
lowing way. First, it appears that all direct reads and writes of system storage,
such as that illustrated by the Multics file system problem, can be blocked by proper
definitions of domains and the association of status data in the appropriate domains.
What remains are resource allocation requests and status notifications. The use of
resource availability as a channel can then be prevented merely by blocking the
process requesting the resource until it is available. To be practical, this re-
placement of resource status replies by delays should be accompanied by the use of
some form of a deadlock detection or prevention algorithm to insure that the system
operates efficiently. That code need not necessarily be implemented in a security
relevant way, as shown by the discussion of scheduling below.
Some believe that the bandwidth of timing channels can be limited by the
scheduling characteristics that can be introduced in a large system. However, the
existence of a timing channel is in principle more difficult to identify than a
storage channel, since in the latter case there are explicit cells which are being
accessed, even if interpretively through system code. While one might argue that if
a process has only virtual time available to it, the timing channels are blocked,
this viewpoint is probably not practical, given how many real time devices (there-
fore, clocks) actually exist on real systems. Nevertheless, once the bandwidth is
limited to an amount comparable to that expected external to the computer system, the
problem has in effect been solved.
225
However, such restrictions do not prevent collusion among user processes from
providing timing independent channels. That is, a group of cooperating, legally com-
municating processes may be able to communicate with another such group, with which
they are not permitted, even in the face of a scheduler designed to be an information
sink. Whether or not this is possible depends on scheduler characteristics. Whenev-
er it is possible for one process, by its own resource utilization actions, to affect
the relative Ord¢~ in which members of other groups of processes are served, communi-
cation is possible.
That is, let processes A and B be one group, X and Y the other. Inter-group
communication is not to be permitted. (They may be Ford and G.M. programs). Suppose
that A (or B) increases its cpu to I/O ratio so much that the system's adaptive
scheduler stops favoring cpu bound processes and begins favoring I/O bound jobs. X
is cpu bound, Y is I/O bound. The order in which X and Y are run was changed by A or
B. X and Y can easily determine the relative order in which they are run since they
can legally communicate. Thus the A,B group has sent one bit to the X,Y group.
Clearly this mechanism serves as a basis for extended bidirectional communication.
Many of the interesting scheduling algorithms that are used in practice have the
characteristic outlined above, and in fact those that don't seem to ignore precisely
that kind of information concerning user behavior upon which meaningful optimization
is based. However, not all useful algorithms permit collusion. Round robin does
not. The simple, limited algorithm implemented in the new Mitre Unix kernel, in
226
which each process can only set its own priority, and the kernel merely runs that
user process with the highest priority, also blocks collusion.
Z. Conclusion
We have attempted to distill the knowledge gained from various security kernel
research efforts into a form that will be useful in guiding others who wish to
develop highly secure, reliable system bases. It is also hoped that these perspec-
tives are applicable to other environments than operating systems, especially such
higher level software projects as message systems and data management, where privacy
considerations are especially sensitive.
Acknowledgements
This paper would not have been written had it not been for Clark Weissman, who
often asked one of the authors how one designed a kernel. We also wish to thank
Evelyn Walton, who largely built the UCLA kernel, and all the members of the UCLA
Security Group who participated in discussions that helped form and refine these
ideas.
Bibl io~raohl
Belady, L. and C. Weissman "Experiments with Secure Resource Sharing for Virtual
Machines", Proceedings of IRIA International Workshop on Protection in Operating
Systems, Rocquencourt, France, August 13-14,7974, pp 27-34.
Brinch Hansen, P. Operating System Principles, Prentice Hall 7973, 366 pp.
Gaines, R. S. and C. Sunshine, "A Secure NCP for Kernel Based Systems", RAND Inter-
nal memo, ~976.
Janson~ P. A.,"Removing the Dynamic Linker from the Security Kernel of a Computing
Utility ~', MIT, Masters Thesis, June 4974, MAC TR-732, 728 pp.
Kampe, M., C. Kline, G. Popek, E. Walton, "The UCLA Data Secure Unix Operating Sys-
tem", UCLA Technical Report, 9/76.
Lampson, B., "A Note on the Confinement Problem", Communications of the ACM, Vol. 16,
No. 70, October 1973, PP 613-615.
227
Popek, G. and C. Kline "A Verifiable Protection System", Proceedings of the Interna-
tional Conference on Reliable Software, May 1975, Los Angeles, California.
Popek G., and C. Kline, "The UCLA Secure Unix Design" , Internal memo, unpublished.
Ritchie, D. and K. Thompson, "The Unix Timesharing System" Communications of the ACM,
Vol. 17, No. 7, July 1974, pp 365-375.
Robinson, et.al., "On Attaining Reliable Software for a Secure Operating System",
1975 International Conference on Reliable Software, April 21-23, 1975, Los
Angeles, California.
Wulf, W.~ et.al., "HYDRA: The Kernel of a Multiprocessor Operating System", Communi-
cations of the ACM, Vol. 17, No. 6,June 1974, pp 337-345.