Unix DB Notes
Unix DB Notes
Memory Protection:
In order to secure the virtual address translation mechanism, it is important to ensure that
processes cannot tamper with the address translation mechanisms. To ensure this, processors
have to provide some protection primitives. Typically, this is done using the notion of privileged
execution modes.
Specifically, 2 modes of CPU execution are introduced: privileged and unprivileged. (Processors
may support multiple levels of privileges, but today's OSes use only two levels.) Certain
instructions, such as those relating to I/O, DMA, interrupt processing, and page table translation
are permitted only in the privileged mode.
OSes rely on the protection mechanism provided by the processor as follows. All user processes
(including root-processes) execute in unprivileged mode, while the OS kernel executes in
privileged mode. Obviously, user level processes need to access OS kernel functionality from
time time to time. Typically, this is done using system calls that represent a call from
unprivileged code to privileged code. Uncontrolled calls across the privilege boundary can defeat
security mechanism, e.g., it should not be possible for arbitrary user code to call a kernel
function that changes the page tables. For this reason, privilege transitions need to be carefully
controlled. Usually, “software trap” instructions are used to effect transition from low to high
privilege mode. (Naturally, no protection is needed for transitioning from privileged to
unprivileged mode.) On Linux, software interrupt 0x80 is used for this purpose. When this
instruction is invoked, the processor starts executing the interrupt handler code for this interrupt
in the privileged mode. (Note that the changes to interrupt handler should itself be permitted only
in the privileged mode, or else this mechanism could be subverted.) This code should perform
appropriate checks to ensure that the call is legitimate, and then carry it out. This basically means
that the parameters to system calls have to be thoroughly checked.
2. Group ID:
Group identifier (GID) is used to represent a specific group. As single user can belongs to
multiple groups, a single process can have multiple group ids. These are organized as a
“primary” group id, and a list of supplementary gids. The primary gid has 3 flavors (real,
effective and save), analogous to uids. All objects created by a process will have the
effective gid of the process. Supplementary gids are used only for permission checking.
3. Group Passwords :
If a user is not listed as belonging to a group G, and there is a password for G, this user
can change her group by providing this group password.
Inter-processes communication:
A process can influence the behavior of another process by communicating with it. From a
security point of view, this is not an issue if the two processes belong to the same user. (Any
damage that can be effected by the second process can be effected by the first process as well, so
there is no incentive for the first process to attack the second --- this is true on standard UNIX
systems, where application-specific access control policies (say, DTE) aren't used.) If not, we
need to be careful. We need to pay particular attention to situations where an unprivileged
process communicates with a privileged process in ways that the privileged process did not
expect.
2. Signals are a mechanism for the OS to notify user-level processes about exceptions, e.g.,
invalid memory access. Their semantics is similar to that of interrupts ---- processes typically
install a “signal handler,” which can be different for different signals. (UNIX defines about 30
such signals.) When a signal occurs, process execution in interrupted, and control transferred to
the handler for that signal. Once the handler finishes execution, the execution of application code
resumes at the point where it was interrupted.
Signals can also be used for communication: one process can send a signal to another process
using the “kill” system call. Due to security considerations, this is permitted only when the
userid of the process sending the signal is zero, or equals that of the receiving process.
3. Debugging and Tracing
OSes need to provide some mechanisms for debugging. On Linux, this takes the form of
the ptrace mechanism. It allows a debugger to read or write arbitrary locations in the
memory of a debugged process. It can also read or set the values of registers used by the
debugged process. The interface allows code regions to be written as well --- typically,
code regions are protected and hence the debugged process won't be able to overwrite
code without first using a system call to modify the permissions on the memory page(s)
that contains the code. But the debugger is able to change code as well.
Obviously, the debugging interface enables a debugging process to exert a great deal of
control over the debugged process. As such, UNIX allows debugging only if the
debugger and the debugged processes are both run by the same user.
(On Linux, ptrace can also be used for system call interception. In this case, every time
the debugged process makes a system call, the call is suspended inside the kernel, and the
information delivered to the debugger. At this point, the debugger can change this system
call or its parameters, and then allow the system call to continue. When the system call is
completed, the debugger is notified again, and it can change the return results or modify
the
debugged process memory. It can then let the system call return to the debugged process,
which then resumes execution. )
4. Network Connection
a) Binding: Programs use the socket abstraction for network communication. In order
for a socket, which represents a communication endpoint, to become visible from
outside, it needs to be associated with a port number. (This holds for TCP and UDP,
the two main protocols used for communication on the Internet.) Historically, ports
below 1024 were considered “privileged” ports --- binding to them required root
privileges. The justification is in the context of large, multi-user systems where a
number of user applications are running on the same system as a bunch of services.
The assumption was that user processes are not trusted, and could try to masquerade
as a server. (For instance, a user process masquerading as a telnet server could
capture passwords of other users and forward them to the attacker.) To prevent this
possibility, trusted servers would use only ports below 1024. Since such ports cannot
be bound to normal user processes, this masquerading wont be possible.
b) Connect:
A client initiates a connection. There are no access controls associated with the
connect operation on most contemporary OSes.
c) Accept :
Accept is used by a server to accept an incoming connection (i.e., in response to a
connect operation invoked by a client). No permission checks are associated with this
operation on most contemporary OSes.
Boot Security:
A number of security-critical services get started up at boot time. It is necessary to understand
this sequence in order to identify the relevant security issues.
1) Loader loads the Kernel
Loader loads the kernel and init process starts. The PID of init process is 0.
3) Search Path
A search path is a sequence of directories that a system uses to locate an object
(program, library, or file). Because programs rely on search paths, users must take care to
set them appropriately.
Some systems have many types of search paths. In addition to searching for executables,
a common search path contains directories that are used to search for libraries when the
system supports dynamic loading. If an attacker is able to influence this search path, then
he induce other users (including root) to execute code of his choice. For instance, suppose
that an attacker A can modify root's path to include /home/A at its beginning. Then, when
root types the command ls, the file /home/A/ls may get executed with root privileges.
Since the attacker created this file, it gives the attacker the ability to run arbitrary code
with root privileges.
4) Capabilities. Modern UNIX systems have introduced some flexibility in places were
policies were hard-coded previously. For instance, the ability to change file ownerships is
now treated as a capability within Linux. (These are not fully transferable, in the sense of
classical capabilities, but they can inherited across a fork.) A number of similar
capabilities have been defined. (On Linux, try “man capabilities.”)
5) Network Access
Linux systems provide a built-in firewalling capabilities. This is administered using the
iptables program. When this service is enabled, iptables related scripts are run at boot-
time. You can figure out how to configure this by looking at the relevant scrips and the
documentation on iptables configuration.
In addition to iptables, additional mechanisms are available for controlled network
access. The most important ones among these are the hosts.allow and hosts.deny files that
specify which hosts are allowed to connect to the local system.
Database security
Main issue in database security is fine granularity – it is not enough to permit or deny access to
an entire database. SQL supports an access control mechanism that can be used to limit access
to tables in much the same way that access can be limited to specific files using a conventional
access control specification, e.g., a user may be permitted to read a table, another may be
permitted to update it, and so on.
Sometimes, we want to have finer granularity of protection, e.g., suppressing certain columns
and/or rows. This can be achieved using database views. Views are a mechanism in databases to
provide a customized view of a database to particular users. Typically, a view can be defined as
the result of a database query. As a result, rows can be omitted, or columns can be projected out
using a query. Thus, by combining views with SQL access control primitives, we can realize
fairly sophisticated access control objectives.
When dealing with sensitive data, it is often necessary permit access to aggregated even when
access to individual data items may be too sensitive to reveal. For instance, the census bureau
collects a lot of sensitive information about individuals. In general, no one should be able to
access detailed individual records. However, if we dont permit aggregate queries, e.g., the
number of people in a state that are african americans, then the whole purpose of conducting
census would be lost.
The catch is that it may be possible to identify sensitive information from the results of one of
more aggregate queries. This is called the inference problem. As an example, consider a database
that contains grade information for this course. We may permit aggregate queries, e.g., average
score on the final exam. But if this average is computed over a small set, then it can reveal
sensitive information. To illustrate this, consider a class that has only a single woman. By
making a query that selects the students whose gender is female, and asking for the average of
these students, one can determine the grade of a single individual in the class.
One can attempt to solve this problem by prescribing a minimum size on the sets on which
aggregates are computed. But an attacker can circumvent this by computing aggregates on the
complement of a set, e.g., by comparing the average of the whole class with the average for male
students, an attacker can compute the female student's grade. Another possibility is to insert
random errors in outputs. For instance, in the above calculation, a small error in the average
grades can greatly increase the error in the inferred grade of the female student. (In particular, a
1% error in the average can translate to a 100% error if the class had 100 students.) Yet another
possibility is to limit the number of related queries a single user can launch. The most powerful
technique is one that computes all possible inferences that can be made from the queries
launched so far, and refuse to accept queries that allow a user to infer sensitive data. The
problem with all these techniques is that they limit usability, and may be difficult to enforce. As
you get closer to near-perfect solutions in terms of protecting sensitive information, they become
far too expensive (or difficult) to use.