Chapter 02
Chapter 02
Chapter 02
Concepts Covered
Man pages and Texinfo pages Detecting and reporting errors in system calls
The UNIX le I/O API Memory-mapped I/O,
Reading, creating, and writing les Feature test macros
File descriptors open, creat, close, read, write, lseek, perror,
Kernel buering ctime, localtime, utmpname, getutent, setutent,
Kernel versus user mode and the cost of system endutent, malloc, calloc, mmap, munmap, mem-
calls cpy
Timing programs Filters and regular expressions
Time representation in UNIX
The utmp le
2.1 Introduction
This chapter introduces the two primary methods of I/O possible in a UNIX: buered and unbuered.
By trying to write the who and cp commands, we will learn explore how to create, open, read, write,
and close arbitrary les. "Arbitrary" in this context means that they are not necessarily text les.
We will write several dierent versions of the who command, simply to illustrate dierent approaches
to the problem of reading from a le. They will dier in their performance characteristics and their
portability. The chapter uses this exercise to introduce the UNIX concept of time, and the rst
of several important databases provided by the kernel, as well as the kernel's interface to those
databases. We also write two dierent versions of a simplied cp command, one using read() and
write(), and the other using memory-mapped I/O.
1
UNIX Lecture Notes Stewart Weiss
Chapter 2 Login Records, File I/O, and Performance
Command programs are located in one of several directories, the most common being /bin, /usr/bin,
and /usr/local/bin. The /usr/local/bin directory is traditionally used as a repository for com-
mands that do not come with the UNIX distribution and have been added as local extras. Many
packages that are installed after the operating system installation are placed in subdirectories of
/usr/local. Administrative commands, such as those for creating and modifying user accounts,
are found in /usr/sbin. /usr/ucb directory. (The "ucb"
Many UNIX systems still retain the old
in /usr/ucb stands for the University of California at Berkeley. The /usr/ucb directory, if it exists,
contains commands that are part of the BSD distributions. Some of the commands in /usr/ucb
are also in /usr/bin and have dierent semantics. If the same command exists in both /usr/bin
and in some other directory such as /usr/ucb, the PATH environment variable just like the one used
in Windows and DOS, determines which command will be run. The PATH variable contains a list of
the directories to search when the command is typed without a leading path. Whichever directory
is earliest in the list is the one whose version of the command is used. Thus, if more exists in both
/usr/ucb and /usr/bin, as well as in your working directory, and /usr/bin precedes /usr/ucb
which precedes . in your PATH variable, and if you type
$ more myfile
$ ./more myfile
then your PATH is not searched and your private more program will run. If you type
$ /usr/ucb/more myfile
The who command displays information about who is currently using the system. Running who
without command-line options produces a listing such as
2
UNIX Lecture Notes Stewart Weiss
Chapter 2 Login Records, File I/O, and Performance
Each line represents a single login session. The -H option will print column headings, in case the
data is not obvious. The rst column is the username, the second is the terminal line on which the
user is logged in, the third is the time of the login on that terminal, and the last is the source of
the login, either the host name or an X display. For example, sweiss was logged in on terminal
line pts/6, the session started at 13:08 on July 26th of the then current year, and the login was
initiated from a computer identied as 70.ny325.east.verizon.net. Notice that there may be
multiple logins with the same username.
The output of who may vary from one system to another. Some of the reasons have to do with
how systems treat users who have multiple terminal windows open in a single login or are running
terminal multiplexers such as Gnu's screen program. The w command, by the way, is approximately
equivalent to the command sequence uptime; who; it shows more information than who does.
In this case, one should use the info command instead. The info command brings up the Texinfo
pages. The Texinfo system is an alternative system for providing on-line documentation. To learn
how to use the Texinfo viewer, type
info info
which will bring up a tutorial on using the Texinfo documentation system. The general idea is that
the information is stored in a tree-like structure, in which an internal node represents a topic area,
and its child nodes are specic to that topic. The space bar will advance within the entire tree using
breadth-rst search. To descend into a node's children, d (for down) works. To go back up, u (for
up) works. To traverse the siblings from left to right, n (for next) does the trick, and to go back, p
(for previous) works. Just picture the tree.
Note. On some systems, when you type "info coreutils who" , you will see the page for the
whoami command. who. On other systems
If you move ahead a few pages, you will nd the page for
you may have to type info who or "info coreutils 'who invocation'" to bring up the proper
pages.
The man page for who tells us that the command may be called with zero or more of the command-
line options abdHlmpqrstTu. It can also be called as follows:
3
UNIX Lecture Notes Stewart Weiss
Chapter 2 Login Records, File I/O, and Performance
$ who am i
sweiss pts/6 Jul 26 13:08 (70.ny325.east.verizon.net)
and, in Linux, if you supply any two words after who , it behaves the same way:
In general, the way to research a UNIX command is to use a combination of these methods:
5. Find and read the header (.h) les relevant to the command.
The DESCRIPTION section gives the details of how the command is used. For example, reading
about who in the man page reveals that who has an optional le name argument, and that if it is not
supplied, who reads the le /var/run/utmp to get the information about current logins. The optional
argument can be /var/log/wtmp. We can infer that the le /var/run/utmp contains information
about who is currently logged in. What about /var/log/wtmp? If you were to try typing
$ man wtmp
you would be pleasantly surprised to discover that, although wtmp is not a command, there is a
man page that describes it. This is because there is a section of the man pages strictly devoted to
the description of system le formats. /var/log/wtmp is a system le, as is /var/run/utmp, and
they are both described on the same man page in section 5 of the manual. There we can learn that
/var/log/wtmp contains information about who has logged in previously .
4
Before we dig deeper into the man page for the utmp and wtmp les, you should also know that it
is required of all POSIX-compliant UNIX systems that they also contain man pages for all of the
header les that might be included by a function in the kernel's API. To put it more precisely, each
function in the System Interfaces volume of POSIX.1-2008 species the headers that an application
must include to use that function, and a POSIX-compliant system must have a man page for each
of those headers. They may not be installed on the system you are using, but they are available.
They will only be installed if the system administrator installed the application development les.
The man pages for the header les have a xed format. From the POSIX.1-2008 standard:
4
If we consult the who Texinfo page, we could learn that as well.
4
UNIX Lecture Notes Stewart Weiss
Chapter 2 Login Records, File I/O, and Performance
NAME
This section gives the name or names of the entry and briey states its purpose.
SYNOPSIS
This section summarizes the use of the entry being described.
DESCRIPTION
This section describes the functionality of the header.
APPLICATION USAGE
This section is informative. This section gives warnings and advice to application de-
velopers about the entry. In the event of conict between warnings and advice and a
normative part of this volume of POSIX.1-2008, the normative material is to be taken
as correct.
RATIONALE
This section is informative. This section contains historical information concerning the
contents of this volume of POSIX.1-2008 and why features were included or discarded
by the standard developers.
FUTURE DIRECTIONS
This section is informative. This section provides comments which should be used as a
guide to current thinking; there is not necessarily a commitment to adopt these future
directions.
SEE ALSO
This section is informative. This section gives references to related information.
The important sections are NAME, SYNOPSIS, DESCRIPTION, and SEE ALSO.
For example
$ man stdlib.h
will display the man page for the header le <stdlib.h>. This is a useful feature. But if you do not
know the name of the command that you need, nor the names of any les that might be useful or
relevant, then you do not know which man page to read. UNIX systems provide various methods
of overcoming this problem.
$ man k keyword
This will only work if the whatis database has been built when the man pages were installed
5
however, so you are at the mercy of the system administrator . For example, typing
5
If you are the administrator, issue the command /usr/sbin/makewhatis to build the database.
5
UNIX Lecture Notes Stewart Weiss
Chapter 2 Login Records, File I/O, and Performance
$ man k utmp
will list all man pages that contain the string utmp in their summaries. The command
$ apropos utmp
has the exact same meaning: apropos is equivalent to "man k". Unfortunately, the implemen-
tation of apropos varies from system to system. On some systems, such as Fedora 15, the most
current stable version, apropos has features that allow multiple keyword searches as well as regular
expression searches. To search for man pages whose page names and/or NAME sections contain all
keywords provided, one can use the -a option, as in
The number in parentheses is the section number. Section 3 contains man pages for library functions.
Notice that we have output in which the string case is a substring of other words. If we wanted
to limit it to those descriptions in which case is a word on its own, we could use the regular
expression matching feature of apropos:
Unfortunately, this powerful apropos is not available on all systems. In particular, it is absent on
the RHEL 6 system installed on our server. This version has no options, so one cannot do such
searches. In this case, to get the same eect, one can use a simple search and pipe the output
through a grep lter. If you are not familiar with grep or regular expressions, see the Appendix.
The equivalent command would be
If the output list is still too long to be useful, you can lter it further with another instance of grep:
6
UNIX Lecture Notes Stewart Weiss
Chapter 2 Login Records, File I/O, and Performance
The rst word is the topic of the man page, the next, the man page title, the third is the section
number of the manual, and the last is a brief description of the topic.
Every UNIX system has a manual volume that deals with the les used by the commands. The
number may vary. From the above output, it appears that the utmp le is described in Section 5 of
the man pages:
shows that the man page describing the wtmp le is the same page as the one describing utmp.
Obviously, there is a man page for utmp in Section 5 of the manual. To specify the specic section
to display, you need to specify it as an option. The syntax varies; in RedHat Linux either of these
will work:
$ man 5 utmp
$ man S5 utmp
7
UNIX Lecture Notes Stewart Weiss
Chapter 2 Login Records, File I/O, and Performance
The <utmpx.h> header le describes a POSIX-compliant interface to the utmp le. This interface
is dierent from that of the <utmp.h> le. We will use the (outdated) <utmp.h> interface for our
initial attempts, exploring the utmp le in greater depth, starting with the man page that our
system delivers when we type either of the above man commands. After that we will consider using
two other interfaces, the POSIX utmpx interface and a GNU extension, the thread-safe functions
getutent_r() and its cousins.
The beginning of the man page for utmp from RedHat Enterprise Linux Release 4 is displayed below.
NAME
utmp, wtmp - login records
SYNOPSIS
#include <utmp.h>
DESCRIPTION
The utmp file allows one to discover information about who is currently
using the system. There may be more users currently using the system,
because not all programs use utmp logging.
First note that it tells us which header le is relevant: <utmp.h> This is the header le that the
compiler will use when the include directive #include <utmp.h> is in your program6 . Next, it issues
a warning to system administrators not to leave this le writable by anyone other than its owner,
the superuser. Then it warns the rest of us, before showing us the contents of the include le, that
the contents may dier from one installation to another.
Since UNIX is a free, community supported operating system, it has been evolving over time. You
may nd that what is described in a book, or in these notes, is dierent from what you observe
on your system. It is not that anything is correct or incorrect, but that UNIX is a moving target,
and that systems can dier in minor ways. For example, the man page for utmp in an older version
of Linux will be very dierent from the one shown here. Even the location of the utmp le itself
is dierent. Later versions of UNIX added system functions to provide a data abstraction layer so
that the programmer would not need to know the actual structure of the le. The problem was
that dierent versions of UNIX had dierent denitions of the utmp structure, and programs that
accessed the structure directly were failing on dierent systems.
6
There may be many les named utmp.h in the le system. Each compiler will have its own method of deciding
which one to use. The GNU compiler collection (gcc) installs its own header les in specic places, and it uses these
by default. The default search path used by gcc is typically
/usr/local/include
target-installdir /include
/usr/include
where target-installdir is the directory in which gcc was installed on the machine. This is explained in more detailed
shortly.
8
UNIX Lecture Notes Stewart Weiss
Chapter 2 Login Records, File I/O, and Performance
The structures displayed in the man page may not be the same as those found on our machine. If
you write code that depends critically on the structure denition, it may work on one machine but
not another. In spite of this, it is valuable to study these structures. Afterward we will write more
portable code. The key to that is to use preprocessor directives to conditionally compile the code
based on the values of macros. The man page continues:
#define UT_UNKNOWN 0
#define RUN_LVL 1
#define BOOT_TIME 2
#define NEW_TIME 3
#define OLD_TIME 4
#define INIT_PROCESS 5
#define LOGIN_PROCESS 6
#define USER_PROCESS 7
#define DEAD_PROCESS 8
#define ACCOUNTING 9
#define UT_LINESIZE 12
#define UT_NAMESIZE 32
#define UT_HOSTSIZE 256
struct exit_status {
short int e_termination; /* process termination status. */
short int e_exit; /* process exit status. */
};
struct utmp {
short ut_type; /* type of login */
pid_t ut_pid; /* pid of login process */
char ut_line[UT_LINESIZE]; /* device name of tty - "/dev/" */
char ut_id[4]; /* init id or abbrev. ttyname */
char ut_user[UT_NAMESIZE]; /* user name */
char ut_host[UT_HOSTSIZE]; /* hostname for remote login */
struct exit_status ut_exit; /* The exit status of a process
#if __WORDSIZE == 64 && defined __WORDSIZE_COMPAT32
int32_t ut_session; /* Session ID (getsid(2)),
used for windowing */
struct {
int32_t tv_sec; /* Seconds */
int32_t tv_usec; /* Microseconds */
} ut_tv; /* Time entry was made */
#else
long ut_session; /* Session ID */
struct timeval ut_tv; /* Time entry was made */
#endif
int32_t ut_addr_v6[4]; /* IP address of remote host. */
char __unused[20]; /* Reserved for future use. */
};
The page then contains a brief description of the purpose of the structure:
9
UNIX Lecture Notes Stewart Weiss
Chapter 2 Login Records, File I/O, and Performance
This structure gives the name of the special file associated with the user's
terminal, the user's login name, and the time of login in the form of time(2).
String fields are terminated by '\0' if they are shorter than the size of the
field.
More information about the specic members of the structure is contained in the comments in the
struct denition. The man page does not describe the members in detail beyond that. The rest
of the man page, which is not included here, goes on to describe how the various entries in the
utmp le are created and modied by the dierent processes involved in logging in and out. We will
return to that topic shortly. It reiterates the warning:
You should have noticed the following line in the man page:
This causes conditional compilation of the code. It means, if the machine's word size is 64 bits and
ut_session and ut_tv members,
it is in 32-bit compatibility mode, then use one denition of the
otherwise use a dierent one. The macros __WORDSIZE and __WORDSIZE_COMPAT32 are dened in
7
the header le /usr/include/bit/wordsize.h . We will ignore this subtlety for now, and rather
than relying on the man page, we will examine the <utmp.h> header le itself.
10
UNIX Lecture Notes Stewart Weiss
Chapter 2 Login Records, File I/O, and Performance
$ gcc -v empty.c
These lines will show you which directories and in which order gcc searches for included header
les. The above output shows that gcc will search rst in /usr/include/local, then in the install
directory, and then in /usr/include. Since there is no <utmp.h> le in the rst two directories, it
will use /usr/include/utmp.h.
Returning to the task at hand, if you look at either of the <utmp.h> les mentioned above, you will
see that they are mostly wrappers for a le which is in the corresponding bits subdirectory:
/usr/include/bits/utmp.h,
or
/usr/lib/i386-redhat-linux3E/include/bits/utmp.h.
Taking the liberty of eliminating the 64-bit conditional macros, and the macro names, the important
elements of the header le are as follows:
11
UNIX Lecture Notes Stewart Weiss
Chapter 2 Login Records, File I/O, and Performance
{
short int ut_type ; // Type of login .
pid_t ut_pid ; // Process ID of login process .
char ut_line [ UT_LINESIZE ]; // Devicename .
char ut_id [4]; // Inittab ID .
char ut_user [ UT_NAMESIZE ]; // Username .
char ut_host [ UT_HOSTSIZE ]; // Hostname for remote login .
struct exit_status ut_exit ; /* Exit status of a process
marked as DEAD_PROCESS . */
long int ut_session ; // Session ID , used for windowing .
struct timeval ut_tv ; // Time entry was made .
int32_t ut_addr_v6 [4]; // Internet address of remote host .
char __unused [20]; // Reserved for future use .
};
The point is that login records have ten signicant members, and we can write code to extract
their data in order to mimic the who command. In particular, the ut_user char array stores the
username, the ut_line char array stores the name of the terminal device of the login, ut_time
stores the login time, and ut_host stores the name of the remote host from which the connection
was made. Unfortunately, we will not be able to ignore indenitely the way that time is dened on
dierent architectures, but for the moment, we will continue to ignore it.
• to display the information from a single utmp structure on the display device in a user-friendly
format.
12
UNIX Lecture Notes Stewart Weiss
Chapter 2 Login Records, File I/O, and Performance
textual input. They are specically designed for that purpose. Although you could read structures
by reading one char at a time and then reconstructing the structure from the sequence of chars with
a lot of type casts, that would be grossly inecient and error-prone. Clearly there must be a better
way.
Let us suppose that you do not know the methods of reading from a binary le. You could use a
man page search such as
Remember though that when you use multiple words with the -k option, they are OR-ed together,
so the output includes lines with either word (or both). If you do this search, you will see a list
of perhaps several dozen man pages. If you get a long list you can lter it further by limiting the
output to only sections 2 or 3 of the man pages with a third stage in the pipeline:
In this list will be the page for two prospective functions to use:
The rst, fread(), in Section 3, is part of the C Standard I/O Library; it is C's function for reading
binary les. The second, read(), in Section 2, is the prototype of a system call. As we are primarily
interested in what Unix in particular has to oer us, we will look at the system call. In Chapters 5
and 7, we will revisit the C Standard I/O Library.
We want to see what the man page for read() has to say. If you do not specify the section number
when you type man read , you will get the man page from the rst section, and you will discover
that there is also a UNIX command, /usr/bin/read:
$ man read
which will output the man page for the read command in Section 1. You must type
$ man 2 read
to get the man page for the read() system call. I have included the important parts of the man
page below.
NAME
read - read from a file descriptor
SYNOPSIS
#include <unistd.h>
ssize_t read(int fildes, void *buf, size_t nbyte);
DESCRIPTION
13
UNIX Lecture Notes Stewart Weiss
Chapter 2 Login Records, File I/O, and Performance
To use the read() function, the program must include the header le <unistd.h>. This header le
serves various purposes, the most relevant for our purposes being that it contains the prototypes of
the (POSIX compliant) system calls.
The functions open(), read(), write(), and close() are UNIX system calls and their
prototypes are dened in <unistd.h>, which is a POSIX header le. The <unistd.h>
header denes miscellaneous symbolic constants and types, and declares miscellaneous
functions, among which are these calls. These functions exist only in UNIX systems
and they exist no matter what language you use, as long as the system you are using
is POSIX-compliant. POSIX does not specify whether they should be system calls or
library functions, but only that they exist as one or the other. These system calls
operate on le descriptors, not le streams. The UNIX system calls operate on the
kernel directly; the ANSI Standard C I/O Library calls are at a higher level.
The read() function has three arguments. The man page says that the read() function reads from
a le associated with a le descriptor. A le descriptor is a small, non-negative integer. We will
study le descriptors in greater detail in a later chapter. The second parameter is a pointer to a
place in memory into which the bytes that are read are to be stored. The third parameter is the
number of bytes to read. The return value is the number of bytes actually read, which can never
be larger, but might be smaller, or is 1, if something went wrong.
To illustrate, suppose that filedesc is a valid le descriptor that we can use for reading, buffer is
a char array of size 100, and num_bytes_read is an integer variable. The following code fragment
shows how to read 100 bytes of data at a time from this le stream until the end of data is found
14
UNIX Lecture Notes Stewart Weiss
Chapter 2 Login Records, File I/O, and Performance
while ( !done ) {
num_bytes_read = read(filedesc, buffer, 100);
if ( 0 > num_bytes_read )
// an error code was returned during reading - bail out
if ( 0 == num_bytes_read )
// the end of file was reached - stop reading
done = 1;
else
// do whatever has to be done to the data
}
This is a typical read-loop structure. The read() call does not fail when there is no data; it just
returns 0. This is how to detect the end of the input data.
How can a program associate a le descriptor with a le? Look in the SEE ALSO section of the man
8
page and you will nd references to fnctl(), creat(), open() and many other system calls. Most
of these work with le descriptors. The open() system call is the one we need now, because the
open() call opens a le and assigns a le descriptor to it.
The open() system call creates a connection between the process and the le. Think of a connection
as an object that manages the I/O operations on the le from the process. This object contains
things such as the oset in the le for the next operation, various status ags, and pointers to
kernel functions that the process can invoke. It is represented by a le descriptor. A process can
open several les and each will have its own le descriptor. In fact, it can open the same le twice
9
and each connection will have a dierent le descriptor . UNIX does not prevent you or anyone
else from opening the same le many times. It is up to the users and their programs to coordinate
accesses to les.
If you look at the man page you will see the following synopsis of the open() call.
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
int open(const char *path, int oflag, /* mode_t mode */...);
8
All of these are in Section 2 of the man pages.
9
You might have guessed. The le descriptor is the index into an array of structs. Each of these structs contains,
among other things, a pointer to the next character in the le to be read. A process can read from two dierent parts
of the same le at the same time in this way.
15
UNIX Lecture Notes Stewart Weiss
Chapter 2 Login Records, File I/O, and Performance
The rst argument is a character string containing the path to the le to be opened. The second
argument is an integer specifying how the le is to be opened: for reading, for writing, for reading
and writing, for appending, and so on. If the call is successful, it returns a le descriptor. More
accurately, it returns the lowest numbered le descriptor not already in use by the process. If the
call is not successful, it returns 1. There are methods of detecting the type of error; these will be
examined later.
It is more complex than this, but this is enough for now. Other values can be bit-wise-OR-ed to
these values.
int fd;
if (fd = open("/var/adm/messages.0", O_RDONLY) < 0 )
exit (-1);
This attempts to open the le /var/adm/messages.0 for reading. If it fails, it exits. If it is successful,
the le is ready for reading. The le descriptor stored in fd is the one the program must use in the
read() call. Notice that the call is made within a conditional expression and that the return value
of the call is compared to 0 in that condition. This is a common method of error handling in C
programs.
Unlike other operating systems, UNIX does not prevent a le that is already open by one process
from being opened by another. This is a very important feature to remember about UNIX. It is
why it is possible for multiple users to run the same command or change their passwords at the
same time
10 .
After your process is nished reading a le, it should close the connection to the le. The close()
system call
has a single argument which is the le descriptor of the connection to be closed. If a le has been
opened by a process via multiple calls to open(), then the other connections will remain open and
only the one corresponding to filedes will be closed. If the kernel cannot close the connection, it
will return 1.
Now you might wonder what could possibly go wrong when closing a le, especially when it has
been opened for reading. Well, rst of all, it is possible you passed it a bad le descriptor when
10
Of course UNIX does provide the means for a process to open a le and lock it so that no other process can read
or write it while it is in use, but this requires actions on the part of the process to make it happen. UNIX does not
do this automatically.
16
UNIX Lecture Notes Stewart Weiss
Chapter 2 Login Records, File I/O, and Performance
you closed it. Secondly, the kernel, in the middle of the system call, may be given an urgent task to
complete, so urgent that it has to drop the close() call in the middle to deal with it. In this case it
will also return a 1. Also, the le may not have been on the local machine or the local drive, and
a network connection might have gone down, in which case the le cannot be closed. Furthermore,
if this le had been opened for writing, there are more reasons why close() might fail, the most
important of which is that it is only when close() is called that the actual write takes place and
at which point the kernel will discover it cannot complete the write for any number of reasons.
1 Listing 1. who1 . c
2 #include < s t d i o . h>
3 #include < s t d l i b . h>
4 #include < f c n t l . h>
5 #include <utmp . h>
6
7 int main ( )
8 {
9 int fd ;
10 struct utmp current_record ;
11 int reclen = sizeof ( struct utmp ) ;
12
13 f d = o p e n (UTMP_FILE, O_RDONLY) ;
14 if ( f d == −1 ) {
15 p e r r o r ( UTMP_FILE );
16 exit (1);
17 }
18
19 while ( read ( fd , &c u r r e n t _ r e c o r d , r e c l e n ) == r e c l e n )
20 s h o w _ i n f o ( &c u r r e n t _ r e c o r d );
21
22 close ( fd ) ;
23 return 0;
24 }
First observe that the rst argument to the open() call is UTMP_FILE. This is a macro whose
denition is included in the <utmp.h> header le. Its value is system-dependent; it is the path to
the actual utmp le. It is usually "/var/run/utmp". We would not know about it if we did not read
the header le.
Notice which header les are included, notice that reclen contains the number of bytes in a utmp
struct. The sizeof() function returns the number of bytes in its argument type. reclen will be
used in the read() call to read exactly one utmp structure at a time. The call to read() is given the
17
UNIX Lecture Notes Stewart Weiss
Chapter 2 Login Records, File I/O, and Performance
le descriptor returned by open(), a pointer to a memory location large enough to hold one utmp
record, and reclen, the number of bytes to be read. If the return value equals reclen then a full
record was read. If it does not, then an incomplete record was read or the end-of-le was reached.
In either case we stop reading. The show_info() function remains to be written. It should display
the contents of the current record. The perror() function is described below.
The <errno.h> le denes a number of mnemonic constants for error values, such as
Your program can use these symbols directly with code such as
if ( fd = open("myfile", O_RDONLY) == -1 ) {
printf(Cannot open file: ");
if ( errno == ENOENT )
printf("No such file or directory\n");
else if
...
}
This would be very tedious, since every program you write would have long switch statements or
cascading if-statements. It is much easier to use the UNIX library function perror() to do this for
you. The perror() function, which conforms to POSIX-1.2001, has a single string as a parameter,
and looks up the value of errno and displays the string followed by an appropriate message based
on the value of errno. It is declared in <stdio.h>, so you do not need to include <errno.h> if you
use it. The code snippet above is simplied by using perror():
if ( fd = open("myfile", O_RDONLY) == -1 ) {
perror("Cannot open file: ");
return;
}
18
UNIX Lecture Notes Stewart Weiss
Chapter 2 Login Records, File I/O, and Performance
In short, the perror() function prints the string you pass it followed by the message from the
<errno.h> le. It is a good idea to create a function to handle errors, so that you do not have
to type these lines all of the time. Very often, the error is a fatal one, meaning that the program
cannot proceed if the error occurred. In this case, you would want to exit the program, calling
exit() to do so, as in
if ( fd = open("myfile", O_RDONLY) == -1 ) {
perror("Cannot open file: ");
exit(1);
}
The exit() function is declared in <stdlib.h>; its man page is in Section 3. A simple function for
handling fatal errors would be
#include <stdio.h>
#include <stdlib.h>
You might also benet from writing a second function to call when you do not want to terminate
the program, or you could combine the two into a single, general-purpose function that does either,
by passing a parameter to indicate the error's severity.
If this were compiled and run on a system that supported this API, the output would look something
like
19
UNIX Lecture Notes Stewart Weiss
Chapter 2 Login Records, File I/O, and Performance
$ who1
system b 952601411 ()
952601423 ()
LOGIN console 952601566 ()
acotton ttyp3 964319088 (math-guest04.williams.edu)
ttypc 964319645 ()
This output diers from the output of who in two signicant ways. First, there are records in the
output of who1 that do not correspond to user logins, and second, the login times are in some
strange format. Both of these problems are easily xed.
#define EMPTY 0
#define RUN_LVL 1
#define BOOT_TIME 2
#define OLD_TIME 3
#define NEW_TIME 4
#define INIT_PROCESS 5 /* Process spawned by "init" */
#define LOGIN_PROCESS 6 /* A "getty" process waiting for login */
#define USER_PROCESS 7 /* A user process */
#define DEAD_PROCESS 8
utmp le are created by the init process and are initialized with a ut_type of
New entries in the
INIT_PROCESS. Recall from Chapter 1 that what happens when a user logs in depends upon whether
it is a console login, a login on an xterm window, or a login over a network using a protocol such
as SSH. In all cases, the ut_type of the entry is changed from INIT_PROCESS to LOGIN_PROCESS,
either by a getty process or a similar process, depending on the source of the login. The getty
(or similar) process prints the login prompt, collects the user's input to the prompt (which should
be a username) and creates a login process, handing the user's username to the login process. The
login process prompts for the password and authenticates it. If it is valid, it changes the ut_type
to USER_PROCESS. When a user logs out, the ut_type is changed to DEAD_PROCESS.
This implies that the ut_type member of a currently logged-in user record will have the value
USER_PROCESS. No other utmp record will be of type USER_PROCESS and so all we need to do to
suppress non-user records is to print only those records whose ut_type member is USER_PROCESS.
The show_info() function will be modied by the inclusion of this check:
20
UNIX Lecture Notes Stewart Weiss
Chapter 2 Login Records, File I/O, and Performance
UNIX represents time as the number of seconds elapsed since 12:00 A.M., January 1, 1970, Coordi-
nated Universal Time (UTC )11 , known as the Epoch. UTC is essentially like Greenwich Meridian
Time except that it includes occasional leap seconds to synchronize with the earth's rotation
12 .
UNIX stores time in objects of type time_t, the implementation of which is not standardized. On
many systems time_t is a typedef for a 32-bit integer. Such implementations will fail in the year
2038, when it overows. Representing time as an integer number of seconds since the Epoch makes
it easy for the kernel to update times, but not very easy for a human to determine the time.
How can we learn more about UNIX time and the various parts of the API related to it? The
answer again is to do a man page search. If you search on the keyword "time" you will nd too
many man pages that refer to time. A second keyword will be needed to rene the search. Perhaps
convert or transform or something similar, to capture functions that transform time from one
form to another. Trying
we will see several functions related to time, including ctime() and localtime(). The man page
will also include reference to the header le, <time.h>, which must be included for most of these
functions. These functions share a single man page. Reading this page reveals that ctime() converts
a time_t time into a human readable string of the form
Observe that the argument is the address of a time_t value, not the value itself. The return value
is a pointer to a string consisting of a 3-letter day abbreviation, a 3-letter month abbreviation, the
day of the month, the 24-hour time in hours, minutes, and seconds, and the 4-digit year. The string
is allocated statically by ctime(), so it might be overwritten by other calls, so it is best to copy it
into a local variable if it needs to be available at a later time.
Note 1. ctime() is one of many functions that return a pointer to a string that is allocated statically.
Make sure that you understand what this means. The string itself is allocated by ctime() and a
pointer to that memory is returned to the caller. Subsequent calls to ctime() will overwrite the
previously allocated memory. The caller will be unable to retrieve the old value unless it was copied
11
The abbreviation UTC is a compromise between the English and French abbreviations. In English, it would be
CUT and in French, TUC.
12
The earth's rotation can vary due to astronomical conditions. UNIX systems are not required by POSIX to
represent exact UTC; they are allowed to ignore the leap seconds.
21
UNIX Lecture Notes Stewart Weiss
Chapter 2 Login Records, File I/O, and Performance
to a local. Also, the caller is not responsible for freeing the memory allocated to the string; that
is handled by the library. This is just one of many functions that are not thread-safe, a topic we
discuss below.
The localtime() function takes a time_t argument but returns a pointer to a struct tm, which
is a structure whose members are the various components of time, such as the day-of-week, the
month, day, and year, and so on.
If you read through the man page carefully, which you should, you will nd near the end the
conformance section. It states:
CONFORMING TO
POSIX.1-2001. C89 and C99 specify asctime(), ctime(), gmtime(), localtime(),
and mktime(). POSIX.1-2008 marks asctime(), asctime_r(), ctime(), and ctime_r()
as obsolete, recommending the use of strftime(3) instead.
The ctime() function is disparaged at this point. One should instead use strftime(), whose
prototype is
#include <time.h>
size_t strftime(char *s, size_t max, const char *format, const struct tm *tm);
This function, unlike ctime(), allows the calling program to specify the format of the character
string to be created. It is also safer to use in that the string is passed as an argument to the function,
allocated by the caller, instead of allocated statically and returned as the function value. The rst
argument is a pointer to the string to be lled, the second, the size of the array of chars to ll, the
third is a format for the string, and the last is the tm structure containing the broken down time
representation.
The format specication is described in great detail in the man page for the function. It is similar
to the format for the printf() function in that it is a string literal enclosed in double-quotes,
with conversion specications of the form %x , where x is a character to be replaced. For example,
%M represents minutes as a decimal number in the range 00 to 59. and %b is the abbreviation of
the month name in the current locale. This phrase, in the current locale means that the locale
settings of the user are used in deciding the exact string that %b will produce. Every user has a
locale in UNIX. The topic of locales will be covered in a later section. The important point now
is that strftime(), unlike ctime(), can use locale information in determining the format of the
output string. In chapter 3 we will use this function to display time with more control. For our
implementation of the who command, we will use ctime().
The who program only displays the date, hours and minutes. For the above example, it would
display only "Aug 11 23:12". Our implementation of who must extract this substring from the
larger string. In other words, given
it needs to print
"Aug 11 23:12"
22
UNIX Lecture Notes Stewart Weiss
Chapter 2 Login Records, File I/O, and Performance
A simple way to achieve this, perhaps not obvious, is to use pointer arithmetic to print only those
characters of the source string in which we are interested. The rst character is 4 characters after
the start of the string, and the length of the string is exactly 12 characters. Assuming that t is a
time_t variable containing the required time to be printed, the following printf()
13 call will do
the trick:
printf("%12.12s", ctime(&t) + 4 );
which prints the 12 chars starting at position 4 in the full string. The format %12.12s forces the
string to use 12 characters on the output. The complete program is shown below. You should study
it carefully.
1 Listing who2 . c
2 // This s o l v e s t h e time d i s p l a y problem and i t f i l t e r s r e c o r d s
3
4 #include < s t d i o . h>
5 #include < s t d l i b . h>
6 #include <u n i s t d . h>
7 #include <utmp . h>
8 #include < f c n t l . h>
9 #include <t i m e . h>
10
11 void show_time ( long ) ;
12 void show_info ( struct utmp ∗);
13
14 int main ( int argc , char ∗ argv [ ] )
15 {
16 struct utmp utbuf ; // read i n f o i n t o here
17 int utmpfd ; // read from t h i s d e s c r i p t o r
18 int reclen = sizeof ( u t b u f ) ;
19
20 if ( ( utmpfd = o p e n (UTMP_FILE, O_RDONLY) ) == −1 ){
21 p e r r o r (UTMP_FILE ) ;
22 exit (1);
23 }
24
25 while ( r e a d ( utmpfd , &u t b u f , r e c l e n ) == r e c l e n )
26 s h o w _ i n f o ( &u t b u f );
27 c l o s e ( utmpfd ) ;
28 return 0;
29 }
30
13
If you are not familiar with the following C functions, you should take the time to familiarize yourself with them:
printf, fprintf, sprintf, scanf, fscanf, and sscanf. These are all part of C and hence C++ and any C or C++
book should contain adequate descriptions of them. You can also look at the manpages for them. Once you know
printf and scanf, the others are trivial to understand. The best way to learn them is to write a few very simple
programs of course.
23
UNIX Lecture Notes Stewart Weiss
Chapter 2 Login Records, File I/O, and Performance
24
UNIX Lecture Notes Stewart Weiss
Chapter 2 Login Records, File I/O, and Performance
We now take a look at what that page has to oer. The beginning of the page contains the following
(depending on what system you have):
SYNOPSIS
#include <utmp.h>
struct utmp *getutent(void);
struct utmp *getutid(struct utmp *ut);
struct utmp *getutline(struct utmp *ut);
struct utmp *pututline(struct utmp *ut);
void setutent(void);
void endutent(void);
int utmpname(const char *file);
DESCRIPTION
New applications should use the POSIX.1-specified "utmpx"
versions of these functions; see CONFORMING TO.
The very rst sentence in this man page tells us that these functions are not POSIX.1-compliant,
and that there are utmpx versions of these functions. We will ignore this warning for the moment
and see how to use these non-POSIX functions, simply because there is something that needs to be
explained about the POSIX.-1-compliant interface, to which we will return afterward.
The man page basically tells us that there is a simple way of reading the records in a utmp le,
requiring just four steps:
1. Use utmpname() to select the le that should be accessed by the other functions.
2. Call setutent() to rewind the le pointer to the beginning of the le.
3. Repeatedly call getutent() to get the next utmp record from the le; getutent() will return
a NULL pointer after it has read the last record from the le.
In other words, this interface provides a hidden iterator to the utmp le: setutent() initializes it,
getutent() advances it successively, and endutent() sends a signal that it is no longer needed. In
addition, the utmpname() function simply needs to be told the pathname to the le, and it will take
care of opening it.
The man page also mentions that _PATH_UTMP is a macro whose value is the path to the utmp le.
We already knew that UTMP_FILE contained that path, but if we dig a little deeper by actually
reading the header les, we will discover that the <paths.h> header le denes _PATH_UTMP and
_PATH_WTMP and that <utmp.h> denes UTMP_FILE as another name for _PATH_UTMP.
We can put all of this together to create a simpler version of who, named who3. In this version we
add the extra feature that the user can optionally supply the word wtmp on the command line if
she wants to see records in the wtmp le instead. The show_info() and show_time() functions are
the same, so we just display the main program in the listing.
25
UNIX Lecture Notes Stewart Weiss
Chapter 2 Login Records, File I/O, and Performance
1 Listing who3 . c
2 #include < s t d i o . h>
3 #include < s t d l i b . h>
4 #include <u n i s t d . h>
5 #include <utmp . h>
6 #include < f c n t l . h>
7 #include <t i m e . h>
8
9 int main ( int argc , char ∗ argv [ ] )
10 {
11 struct utmp ∗ utbufp ;
12
13 if ( ( a r g c > 1 ) && ( s t r c m p ( a r g v [ 1 ] , "wtmp" ) == 0 ) )
14 utmpname (_PATH_WTMP) ;
15 else
16 utmpname (_PATH_UTMP) ;
17
18 setutent ( ) ;
19 while ( ( utbufp = getutent ( ) ) != NULL )
20 show_info ( utbufp );
21 endutent ( ) ;
22 return 0;
23 }
This program is not thread-safe. Many functions in the various UNIX libraries use static variables
to store their results. These variables act like global variables within the programs that call these
functions. If a program is multi-threaded, these threads can corrupt each others data if they use
the unsafe function calls in an overlapping way. Thread-safe functions do not have this problem. A
thread-safe version of the who3 program can use getutent_r(), which is a GNU thread-safe version
of getutent().
The man page tells us that to use the getutent_r() function, we have to set a macro, the
_GNU_SOURCE macro, before including the header le <utmp.h>. That is the purpose of the fol-
lowing lines from that man page:
The above functions are not thread-safe. Glibc adds reentrant versions
#define _GNU_SOURCE /* or _SVID_SOURCE or _BSD_SOURCE */
#include <utmp.h>
int getutent_r(struct utmp *ubuf, struct utmp **ubufp);
The macro denition of _GNU_SOURCE is required because the <utmp.h> header le contains feature
test macros. Feature test macros can be used to control which denitions are exposed in the system
header les when a program is compiled. This is important for creating portable applications,
because it prevents nonstandard denitions from being exposed in the program. If you remove the
denition of _GNU_SOURCE from your program and try to use getutent_r() you will get a compile
time error because the declaration of this function in the header le is guarded by a conditional
preprocessor directive that is true only if _GNU_SOURCE is dened. It is essentially of the form
26
UNIX Lecture Notes Stewart Weiss
Chapter 2 Login Records, File I/O, and Performance
#ifdef _GNU_SOURCE
extern int getutent_r (struct utmp *__buffer, struct utmp **__result) __THROW;
/* more stuff here
#endif
If you put the denition of _GNU_SOURCE after the include directive, it will be useless because it will
not be dened when the header le is preprocessed by gcc, and so in this case too you will get an
error message.
The feature_test_macros man page describes everything you need to know to use these macros.
The main program of this thread-safe who, which we call who4.c, is almost the same as that of
who3.c:
1 Listing who4 . c
2 #include < s t d i o . h>
3 #include < s t d l i b . h>
4 #include <u n i s t d . h>
5
6 #define _GNU_SOURCE
7 #include <utmp . h>
8 #include < f c n t l . h>
9 #include <t i m e . h>
10
11 int main ( int argc , char ∗ argv [ ] )
12 {
13 struct utmp utbuf , ∗ utbufp ;
14 int utmpfd ;
15
16 if ( ( a r g c > 1 ) && ( s t r c m p ( a r g v [ 1 ] , "wtmp" ) == 0 ) )
17 utmpname (_PATH_WTMP) ;
18 else
19 utmpname (_PATH_UTMP) ;
20
21 setutent ( ) ;
22 while ( g e t u t e n t _ r (& u t b u f , &u t b u f p ) == 0 )
23 s h o w _ i n f o ( &u t b u f );
24 endutent ( ) ;
25 return 0;
26 }
27
UNIX Lecture Notes Stewart Weiss
Chapter 2 Login Records, File I/O, and Performance
version includes members that may not be present on other systems. In an eort to standardize the
utmp interface, the POSIX standards since 2001 have replaced the denition of the utmp structure
with a utmpx structure. This structure is only guaranteed to have the following members:
In addition, the functions setutent(), getutent(), and endutent() are replaced by the corre-
sponding functions setutxent(), getutxent(), and endutxent(). In general, the utmpx structure
may dene a dierent set of members than those found in a utmp structure. Linux systems actually
dene the utmpx structure to be the same as the utmp structure, unless the _GNU_SOURCE macro is
dened. In addition, Linux systems dene a larger set of allowed values of the ut_type member than
does POSIX. Programs that are meant to be portable can use conditional compilation with feature
test macros to detect which structure is actually on the system at compile time. The who_p.c
program demonstrates how this is done, but is not included in these notes.
2.6.9 Summary
The preceding set of implementations of the who command demonstrates that the man pages and
header les can be used to learn enough about a command to implement it. The utmp interface
may not be the same on every UNIX system, and as a result there are several dierent methods of
approaching the problem. One can use the GNU, non-POSIX, thread-safe version of the interface,
for example, or the POSIX-compliant utmpx interface. One can also use the lower-level system calls,
e.g. read(), to access either the utmpx or the utmp structure directly. A truly portable solution
would use feature test macros to conditionally compile the code depending on what system it is to
be run on. The exercise introduced various concepts along the way, but we are still not nished
with it. Later we will return to the problem with a more ecient solution.
28
UNIX Lecture Notes Stewart Weiss
Chapter 2 Login Records, File I/O, and Performance
the O_CREAT ag. Instead it is opened with the current position pointer pointing to the start of the
le. The current position pointer is a member of the open le structure, the data structure that is
created by the kernel when a le is opened. It points to the position of the next byte to read or
write in the le.
For example, to open the le whose path is stored in the C-string file_to_open, one could write
if ( ( fd = open(file_to_open, O_RDWR)) == -1 ) {
perror(file_to_open);
// handle error here
}
2. Read the utmp le until it nds the record for the terminal from which the logout took place.
3. Modify a copy of the utmp record in the process's memory, and replace the utmp record in the
le with the modied one, i.e., modify the utmp le.
4. Close the utmp le.
The rst and last steps need no discussion. The second step requires being able to identify which
utmp record in the le corresponds to the one logout is trying to modify. It cannot use the ut_user
member because a single user might have several lines open at a time. The piece of information that
is unique is stored in the ut_line. The ut_line member stores the name of the pseudo-terminal
as a string such as "pts/4". Only one person can be using a given terminal at the same time, so it
is sucient to match the line.
The more interesting part of this task is how to replace the utmp record in the le. The record
may be in the middle of the le, so this operation involves replacing a xed-size sequence of bytes
starting at some specic position in a le with a sequence of the exact same size.
29
UNIX Lecture Notes Stewart Weiss
Chapter 2 Login Records, File I/O, and Performance
position and then advances it N bytes. Usually when a le is open for writing the current position
pointer is at the end of the le.
The lseek() system call changes the current position pointer in an open le.
#include <sys/types.h>
#include <unistd.h>
off_t lseek( int fd, off_t dist, int base)
lseek() is given a le descriptor, fd, a distance in bytes, dist, and an integer ag, base. base
can be one of three values. The distance, dist, is used by lseek() to move the current position
pointer. If dist is positive, it moves forward; if it is negative, it moves backwards. The value of
base determines the starting position of the current position pointer from which it is to be moved.
The three values are
SEEK_SET the distance dist is forwards relative to the start of the le,
SEEK_CUR the distance, dist, is relative to the current position pointer and may be positive or
negative
SEEK_END the distance, dist, is relative to the end of the le and may be positive or negative.
If lseek() is successful, its return value is the resulting oset location as measured in bytes from
the beginning of the le, otherwise it returns 1.
When the value of the oset is positive and the base is SEEK_END, the le pointer is moved beyond
the end of the le. Data can be written to this position, and this in eect creates a hole in the le.
For example, if a le is currently open and has the contents 123456789, and a seek is performed
that moves the le pointer 5000 bytes past the end, after which the characters abcde are written
to the le, then the le size will be 5014 bytes, even though there is a hole of 5000 bytes within it.
More will be said about this in Chapter 3.
The lseek() call can be used to code the third step of the logout procedure.
In the gure, the matching record is numbered k. After it is found, the pointer has been advanced
to the start of record k+1. In order to write the modied record where the original was, we need
to move the current position pointer back with lseek(). The following program demonstrates the
key ideas.
30
UNIX Lecture Notes Stewart Weiss
Chapter 2 Login Records, File I/O, and Performance
... utmp record k−1 utmp record k utmp record k+1 utmp record k+2 ...
... utmp record k−1 utmp record k utmp record k+1 utmp record k+2 ...
Listing who5 . c
#i n c l u d e ....
31
UNIX Lecture Notes Stewart Weiss
Chapter 2 Login Records, File I/O, and Performance
Notice that every system call is tested for failure before its result is used (except for the call to
write()). Here, the calls are embedded within the conditional expressions of the if and while
statements above. The rst if checks whether the record read in the while condition has the same
terminal line as the one we are looking for (stored in the variable line) and the user member is
not null. If this is successful, the type member ut_type of the record is set to the DEAD_PROCESS
type, the user and host members are set to null strings, and the time member, ut_tv, is updated to
the current time. If this is successful, the lseek() call moves the current pointer back to the start
of the last matched record, so that the write operation that follows will replace the old record. If
the write operation is reached and executes without error (determined by checking that the number
of bytes written is equal to the number requested to be written), then the program returns 0 for
success.
stores the size of the le into the variable filesize. We will make use of this soon.
32
UNIX Lecture Notes Stewart Weiss
Chapter 2 Login Records, File I/O, and Performance
concern with most software. To demonstrate the problem a bit more clearly, we will implement a
dierent command, one whose eciency or lack thereof will be much more obvious. Then we will
take what we learned from that exercise and apply it to the who program in our nal version. The
command of interest is the cp command, which copies one or more les or directories.
$ cp source_file target_file
Whether or not target_file already exists, cp makes a copy of source_file named target_file.
If it does exist, it will be overwritten, an act known as clobbering. This is dangerous, as you cannot
recover the le once you have clobbered it. To prevent accidental overwrites, the interactive option
-i should always be used, as in
$ cp -i source_file target_file
cp: overwrite `target_file'? n
If a new le is created, it will have the permissions and ownership of the source le. If an existing le
is overwritten, it retains the permissions and ownership it had before the copy. No other attributes
are preserved in a copy. To preserve the time-stamps and other attributes, you must use the -p (p
for preserve ) option.
Another form of the cp command is
in which the very last word on the command line, target_dir, is a directory and all preceding
words are non-directory les. In this case, if the directory does not exist, it is an error. Otherwise
all of the source les are copied into the directory with their existing permissions and names. If any
names already exist in the target directory, the rules described above apply.
the sources can include directory names. All of the les and directories specied on the command-
line, up to but not including target_dir, are copied into target_dir, which must already exist.
The r or R option must be specied otherwise it is a syntax error. The r species that the
directories will be copied recursively. The R is essentially the same; the dierence has to do with
how they handle pipes, which is unimportant now.
For the remainder of the chapter, we try to understand the implementation of the simple form of
the command, without any options.
33
UNIX Lecture Notes Stewart Weiss
Chapter 2 Login Records, File I/O, and Performance
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
int open(const char *pathname, int flags);
int open(const char *pathname, int flags, mode_t mode);
int creat(const char *path, mode_t mode);
The creat() system call has two arguments, a C string and a mode_t. The string should contain
the path name of the le to be created and the mode_t species the le's mode, i.e., its permission
string, as an octal number. For example,
fd = creat("prototype", 0751)
creates a le named prototype in the current working directory, if it does not exist, with permission
0751 (owner can read, write, and execute, group can read and execute, others can execute only)
provided that the process's umask does not modify the permission. Umasks are covered in the next
chapter. If the le exists, the mode argument is ignored and the le is truncated
14 . In either case,
upon termination of the call, fd is a le descriptor associated with the write-only connection to the
le.
#include <unistd.h>
ssize_t write(int fildes, const void *buf, size_t nbyte);
14
It is possible to prevent the le from being overwritten in case it exists, but not if you use the creat() call to try
to create it. Instead the open() call must be used. Chapter 4 covers the various methods of opening a le for writing.
34
UNIX Lecture Notes Stewart Weiss
Chapter 2 Login Records, File I/O, and Performance
The size_t type stores the sizes of things in bytes. It is usually a typedef of an unsigned long
integer, which may be 32 or 64 bits. The ssize_t type is almost the same as the size_t type.
It diers only in that it is signed and that it can also store a 1. If successful, the write() call
transfers nbyte bytes from the memory location pointed to by buf in the process's address space
to the position of the le-pointer in the le associated with fd, and returns the number of bytes
transferred. If the kernel cannot copy any of the data, write() returns 1.
The word "buer" is used to describe the second parameter in the read() and write() system
calls. It is declared as a void pointer. It is called a buer because it is a storage location in the
memory space of the calling process that is used to hold the data to be transferred to or from the
le.
attempts to transfer num_bytes bytes from the memory location pointed to by buffer to the position
fd. (By default, the le pointer
of the le pointer in the le opened for writing via the le descriptor
is at the end of the le, unless it has been moved elsewhere.) The reason for the condition
if (write(fd,buffer,num_bytes) != num_bytes)
is that the return value of write() is the number of bytes actually written and it may not be equal
to the number of bytes that were supposed to be written. The number of bytes successfully written
may be less than num_bytes for any number of reasons. The le might have reached a predened
maximum size, the disk might be full, or the user's disk quota might be reached. This is why it is
necessary to compare the return value of the write() call with the value of its third parameter.
We know how to open and close les and we know how to read and write them, so this is a relatively
easy program for us at this point. The only points that need explanation are how we create and use
buers. For example, how big should the buer be? How do we declare it and pass it to the calls?
35
UNIX Lecture Notes Stewart Weiss
Chapter 2 Login Records, File I/O, and Performance
1 Listing cp1 . c
2 // F i r s t attempt at cp command , based on a program
3 // by Bruce Molay in Understanding Uunix/ Linux Programming , p .53
4
5 #include < s t d i o . h>
6 #include <u n i s t d . h>
7 #include < f c n t l . h>
8
9 #define BUFFERSIZE 4096
10 #define COPYMODE 0644
11
12 void die ( char ∗ string1 , char ∗ string2 ); // p r i n t e r r o r and q u i t
13
14 int main ( int argc , char ∗ argv [])
15 {
16 int source_fd , target_fd , n_chars ;
17 char b u f [ BUFFERSIZE ] ;
18
19 if ( argc != 3 ){
20 fprintf ( stderr , " u s a g e : %s s o u r c e d e s t i n a t i o n \n" ,
21 ∗ argv ) ;
22 exit (1);
23 }
24
25 // t r y to open f i l e s
26 if ( ( s o u r c e _ f d = o p e n ( a r g v [ 1 ] , O_RDONLY) ) == −1 )
27 d i e ( " C a n no t open " , argv [ 1 ] ) ;
28 if ( ( target_fd = creat ( a r g v [ 2 ] , COPYMODE) ) == −1 )
29 die ( " C a n no t c r e a t " , argv [ 2 ] ) ;
30
31 // copy from source to t a r g e t
32 while ( ( n_chars = r e a d ( source_fd , buf , BUFFERSIZE ) )
33 > 0 ) {
34 if ( n_chars != write ( target_fd , buf , n_chars ) )
35 die ( " Write e r r o r to " , argv [ 2 ] ) ;
36 }
37 if ( −1 == n _ c h a r s )
38 d i e ( " Read e r r o r from " , argv [ 1 ] ) ;
39
40 // c l o s e both f i l e s
41 if ( c l o s e ( s o u r c e _ f d ) == −1 || c l o s e ( t a r g e t _ f d ) == −1 )
42 die ( " Error closing files" , "" );
43
44 return 0;
45 }
46
47 void die ( char ∗ string1 , char ∗ string2 )
48 {
36
UNIX Lecture Notes Stewart Weiss
Chapter 2 Login Records, File I/O, and Performance
Comments
• The buer is declared as an array of BUFFERSIZE chars, which is equal to the maximum
number read in a read() call.
• The die() function encapsulates the error handling logic and calls the perror() function.
• The main work is in the while loop (lines 32-36). The entry condition is that the read() call
transferred one or more bytes. The body is the call to write the bytes just read to the output
le. The return value of write() is checked to see if the number of bytes transferred equals
the number requested by the call.
If you compile and run this program you will see that it works correctly. But does it run fast? How
long will it take to copy a very large le? How does one time programs in UNIX?
$ time -p command
where command is the command that you wish to know about. The '-p' option tells time to display
the traditional POSIX output, which consists of three values, each measured in seconds to two
decimal places:
Elapsed time is the number of seconds from when the command was invoked until it completed.
User time is the total amount of time that the process, and any children executing on its behalf,
spent running in user mode. System time is the total amount of time spent on the process's behalf
running within the kernel, i.e., in privileged mode, including such time spent by its children as well.
Non-POSIX output may be more voluminous; you can read the man page for further details. Also,
shells such as bash typically dene their own version of the time command, so it is best to type the
full path name when using it, if you want the non-bash version.
37
UNIX Lecture Notes Stewart Weiss
Chapter 2 Login Records, File I/O, and Performance
I got the following output on one of the UNIX systems at Hunter College:
real 4.05
user 0.01
sys 0.02
What accounts for the dierence between the sum of user and system times and the elapsed time?
It is the time that the process spent waiting for I/O to complete. When a process issues a request
for I/O, it is blocked until the I/O is complete. The time that it spends in this blocked, or waiting,
state is part of the elapsed (real) time. cp1 spent about 4 seconds waiting for I/O. Although the
amount of time that a process spends waiting for I/O depends heavily on what else the system
is doing, the more calls it makes, the longer it will take, on average. The reason for this will be
explained below.
As we use cp1 on larger and larger les, we will see worse performance. To create a spreadsheet
with the results of the time command I used a dierent option to it:
/usr/bin/time -f "\t%e\t%U\t%S"
The -f option expects a format string, which I supplied as a tab-separated string of real-time, user-
time, and system-time format symbols. This allowed me to open the output with a spreadsheet
program for analysis:
Notice that the real and system times increase roughly in proportion to the size of the le over this
small sample.
To answer this question, we will rst perform a little experiment. We will revise the cp program
so that the buer size is an input parameter, and run the program on a very large input le with
successively larger buer sizes, recording the three components of running time reported by the
time command for each run, and tabulating results. The revised program, called cp2.c, is in the
listing below.
38
UNIX Lecture Notes Stewart Weiss
Chapter 2 Login Records, File I/O, and Performance
For those who have not seen it before, calloc() (in line 30) and its companion, malloc() are
dynamic memory allocation functions in C. The prototype for calloc() is
Unlike malloc(), calloc() takes two arguments: the number of elements, and the size in bytes of
each element, and it attempts to allocate space for an array of nelem elements, each of size elsize.
39
UNIX Lecture Notes Stewart Weiss
Chapter 2 Login Records, File I/O, and Performance
If it is successful, it returns a void* pointer to the start of the array and lls the allocated memory
with zeros. This pointer should be cast to the appropriate type before using it.
The table below shows the eect of buer size on the elapsed, user, and system times when copying
a le of size 19MB on a particular host in the Computer Science Department network at Hunter
College running RHEL 4. As you can see, the user and system times roughly decrease in inverse
proportion to the buer size for most of the sampled range of values. The user time decreases
because the process spends less time in its own code, since there are fewer iterations of the loop and
hence fewer instructions to execute. The system time decreases for the same reason the read()
and write() system calls are executed fewer times and therefore less time is spent in the kernel.
The elapsed time tends to reach a steady value after the buer size reaches 16. Since the total of
the user and system time continues to decrease for buer sizes greater than 16, this suggests that
the limiting factor is the time that the process spends waiting for the I/O operations to complete.
As the buer gets larger, the kernel is called fewer times to transfer the data: as we stated above, if
N is le size and B is buer size, the number of calls is c = dN/Be. Another way to say this is that
cB is constant. The table shows that, if s is total system time, sB is also approximately constant,
except for B > 256. In other words, the total system time is roughly proportional to the number of
calls made for small values of B . For larger values of B , the total system time is not in proportion
to the number of calls, but is larger than it. Why is this?
There are two components to the running time of an I/O operation: the transfer time and the
overhead. The overhead is largely independent of the number of bytes to be read or written; each
read or write request to the disk has overhead that does not depend much on how much data is
to be transferred. This includes various components of time required by the device to set up and
initiate the transfer. It also includes the cost of the system call itself, which is not always negligible.
The transfer time is the time that it actually takes to copy data between the device and memory and
is a function of the amount of data. The kernel's involvement in this transfer in modern machines
with DMA is minor; it mostly just starts it and does more work when it is nished. Nonetheless,
the kernel's involvement is a function of the amount of data to be transferred. Therefore, if B is
buer size, O is the overhead of a I/O operation, and t tB is the amount of
is a constant such that
time the kernel spends in a single transfer operation, a single read() or write() system call uses
O + tB time units, and the program takes ( N ON O
B ) · (O + tB) = B + tN = N · ( B + t) time. Since
N is the size of our data and does not change, you can see that the system time is proportional to
O
(B + t). This explains why the system time does not keep diminishing by half. Eventually the t
40
UNIX Lecture Notes Stewart Weiss
Chapter 2 Login Records, File I/O, and Performance
O
term is large in proportion to the
B term. When O is very large in comparison to t, doubling B
halves the expression, but otherwise it does not.
As we shall see shortly, in UNIX in particular, the design of a buering system within the I/O
system makes the transfer time on average even smaller.
This is a form of context-switch. A context-switch occurs when the kernel changes the currently
executed memory image (the context). This can happen because a new process is run or because the
kernel runs on behalf of a process, requiring that the memory image be switched. In some versions
of UNIX such as Linux 2.6, a full context switch is not performed when a process changes from user
phase to kernel phase or vice-versa.
The kernel needs to execute in kernel mode because it has to have access to all hardware instructions.
In contrast, user processes must be prevented from executing special instructions. Therefore, when
the system call is made, the machine must change mode twice, at the start and at the end of the
call. It must also change the CPU state, because when the kernel runs, it has a dierent address
space, dierent sets of resources, and so on. All of this changing means that a system call adds
overhead to the running time of the program.
UNIX uses this buering scheme only for certain types of input and output
17 , particularly for read
15
On some UNIX systems, such as Linux 2.6, the user phase and kernel phase are called user mode and kernel
mode respectively.
16
There is a way to avoid this copying of data back and forth. Memory mapping is a method of I/O in which disk
les are mapped directly into user memory. This topic will be discussed in a later chapter. If you are curious, read
about the mmap() and munmap() system calls.
17
There are two types of I/O in UNIX: block I/O and character I/O. The block I/O system in UNIX is used for
block devices such as magnetic and optical disks and tapes. Character I/O is used for devices that are inherently
one-character-at-a-time devices, such as the keyboard and terminals in general. Character I/O does not use kernel
buers for I/O. All block I/O uses the kernel's buering system.
41
UNIX Lecture Notes Stewart Weiss
Chapter 2 Login Records, File I/O, and Performance
and write operations to and from disks. While it may seem at rst that it just adds overhead, in
fact it is a powerful and ecient method of reducing overall time spent performing I/O.
The buering scheme for both reading and writing makes it seem as if read operations read directly
from the device and write operations take place immediately. In fact, the kernel hides from the user
an important layer of complexity. To understand this complexity, one needs to know a bit about
how the disk is organized.
The disk is organized as a collection of xed-size disk blocks. Disk blocks are numbered so that
they can be identied. Each logical disk or disk partition has a unique name in UNIX, such as sd0a
or rsd2b.
The kernel maintains a pool of buers in kernel memory that can be assigned to each device. Each
buer is given a name, corresponding to the device to which it is assigned and the particular block
whose contents it holds. For example, a buer might be assigned block 511 from disk rsd2b.
On a read request by a process, the buer pool is searched for a buer whose name matches the
block being sought on the disk. If a buer is found, the data is read directly from memory without
any physical I/O. If the buer is not found, the data must be read from disk. A buer will most
likely have to be reused for this data. A least recently used (LRU) algorithm is used to decide which
buer to replace. After the buer is selected, if it is "dirty" its contents are written to disk. Buers
are dirty if they were modied since the last time they were written to disk. The buer is renamed
to match the block being read and the read is performed.
Write requests are handled similarly. When a process requests a write to a specic block on a disk,
the buer pool is searched and if a buer is not found whose name matches the disk address to be
written, a new buer is allocated for this write operation. If no buer is available, a block is chosen
using the LRU algorithm and relabeled. The data is stored in the buer without any physical I/O
(i.e, disk accesses) and the buer is marked dirty. The write will be performed only when the block
is renamed.
Note that this scheme can greatly reduce the need to perform disk I/O, because reads and writes
can take place in memory, which is much faster, and it is completely transparent to the user. But
what happens if the system suddenly comes to an unexpected halt? Unless the system has time to
"ush" its buers, the updates are lost. This is why one should never halt a system in the wrong
way.
The advantages of buering are a reduction in physical I/O and therefore a decrease in the overall
eective disk access time. The disadvantages include that
• I/O error reporting can lag behind the logical I/O and therefore can become meaningless,
• delayed disk writes can cause loss of data and le system inconsistencies in the event of
unexpected system halts, and
• the order in which buers are written to the external device may not be the same as the
order in which the logical I/O occurs, and unless programs are designed with this in mind,
disk-based data structures can become inconsistent.
Writes to sequential devices such as tape drives generally do not exhibit this problem because the
drivers are only allowed one outstanding write request per drive. In other words, if a logical write
operation is requested for a particular drive, but there is a request that has not yet been satised
42
UNIX Lecture Notes Stewart Weiss
Chapter 2 Login Records, File I/O, and Performance
by a physical write, the second request cannot be satised until the rst physical write takes place.
A device like a tape drive will reject requests for service until it nishes what it is doing. It is a
one-job-at-a-time device.
In Linux 2.6 and later, the kernel oers a service named direct I/O for processes that wish to bypass
the kernel buering system for block I/O. Certain types of programs such as database servers need
to implement their own caching schemes for eciency. Forcing them to also use the kernel buering
system would slow them down signicantly and make the system inecient, because then there
would be duplicate copies of blocks: those in the database server's cache and those in the kernel's
cache. With direct I/O transfers, the kernel transfers data directly between the disk and user space.
Unfortunately, there are many problems associated with direct I/O, which you can read about in
the man page for the open() system call. An apt conclusion is reached at the bottom of that page,
with a quote from Linus Torvalds:
In summary, O_DIRECT is a potentially powerful tool that should be used with caution. It
is recommended that applications treat use of O_DIRECT as a performance option which
is disabled by default.
"The thing that has always disturbed me about O_DIRECT is that the whole interface is
just stupid, and was probably designed by a deranged monkey on some serious mind-
controlling substances."
Linus
The actual use of the memory mapping system calls, mmap() and munmap(), is a bit more complex
than this. The purpose of munmap(), as its name suggests, is to undo a mapping. The mmap() call
has several parameters. We introduce memory mapping by writing the cp program a third way,
using memory mapped I/O instead of reading and writing.
1. Map the entire input le to a region of memory. Assume it starts at address source_addr.
2. Determine the size of the input le in bytes. Call it filesize.
3. Create an output le with the given name and make it the same size as the input le.
4. Map the output le to a region of memory the exact same size as the le. Assume it starts at
address dest_addr.
5. Do a single memory-to-memory copy of filesize bytes from source_addr to dest_addr
using memcpy().
6. Undo the mappings and close the les.
43
UNIX Lecture Notes Stewart Weiss
Chapter 2 Login Records, File I/O, and Performance
This causes the input to be copied to the output without any reads or writes. In order to implement
these steps we need to know the prototypes of the mapping functions and memcpy(). The prototypes
are
#include <sys/mman.h>
void *mmap(void *addr, size_t length, int prot, int flags, int fd, off_t offset);
The mmap() call creates a new mapping in the virtual address space of the calling process. The
starting address for the new mapping is specied in the rst argument, addr. The second argument,
length, species the length in bytes of the mapping.
If addr is NULL, then the kernel chooses the address at which to create the mapping; this is the
most portable method of creating a new mapping. If addr is not NULL, then the kernel takes it as
a hint about where to place the mapping; on Linux, the mapping will be created at a nearby page
boundary. The address of the new mapping is returned as the result of the call. It is best to always
use NULL as the rst argument.
The third argument describes the memory protection of the mapping; it must not conict with the
open mode of the le. The possible values are
They can be or-ed together. In other words, if the le was opened read-only (O_RDONLY), then the
value should be PROT_READ. If it was opened read-write, then it should be set to PROT_READ|PROT_WRITE.
A warning about this follows below.
The fourth argument determines whether updates to the mapping are visible to other processes
mapping the same region, and whether updates are carried through to the underlying le. This
behavior is determined by including exactly one of the following values in ags:
MAP_SHARED Share this mapping. Updates to the mapping are visible to other processes that map
this le, and are carried through to the underlying le. The le may not actually be
updated until msync() or munmap() is called.
MAP_PRIVATE Create a private copy-on-write mapping. Updates to the mapping are not visible to
other processes mapping the same le, and are not carried through to the underlying
le. It is unspecied whether changes made to the le after the mmap() call are visible
in the mapped region.
Because we want to do I/O we need to set the ag to MAP_SHARED, otherwise no changes will appear
in the output le. There are other values that can be or-ed to this ag, but we will not discuss them
at this point.
44
UNIX Lecture Notes Stewart Weiss
Chapter 2 Login Records, File I/O, and Performance
The next two arguments are the le descriptor of the le to be mapped and the oset in bytes
relative to the start of the le at which to map the le. In other words, if you want to map only
the portion of the le after the rst N bytes, you would pass N as the last argument.
What you need to know is that the memory region is always a multiple of the page size of the
machine and must be allocated as such. If the length is not a multiple of page size, the last page
will be partly lled. The starting address must always be a multiple of page size. For now this is
not our concern. After we learn how to get the page size of the machine, we will return to this issue.
A caveat the documentation on my Linux system states that mmap() has been deprecated in
mmap2(), but mmap2() does not exist on it. In fact, glibc (GNU's C Standard Library)
favor of
implements mmap() as a wrapper for the kernel's mmap2() call, so mmap() is actually mmap2().
Our third copy program is in the listing below. It does not include all of the error-checking and
handling that it should, but most is included. It makes use of memcpy() to do the actual transfer of
bytes from the source to the destination, but it does so within memory. The prototype for memcpy()
is
#include <string.h>
where src is a pointer to the start of the memory to be copied, dest is the starting address where
the bytes should be written, and n is the number of bytes to copy. The memory areas cannot
overlap. In other words the absolute value of (dest - src) must be greater than n.
Listing cp3 . c −− a copy p rog ram using memory−mapped I /O
#d e f i n e COPYMODE 0666
/∗ check args ∗/
if ( argc != 3 ){
fprintf ( stderr , " u s a g e : %s source d e s t i n a t i o n \n " , ∗ argv ) ;
45
UNIX Lecture Notes Stewart Weiss
Chapter 2 Login Records, File I/O, and Performance
exit (1);
}
/∗ open files ∗/
if ( ( i n _ f d = o p e n ( a r g v [ 1 ] , O_RDONLY) ) == −1 )
d i e ( " Cannot open ", argv [ 1 ] ) ;
46
UNIX Lecture Notes Stewart Weiss
Chapter 2 Login Records, File I/O, and Performance
if ( ( d e s t _ a d d r = mmap(NULL, f i l e s i z e , PROT_WRITE,
MAP_SHARED, out_fd , 0) ) == ( v o i d ∗ ) −1 )
die ( " Error mapping file ", argv [ 2 ] ) ;
return 0;
In order for who to perform input buering, it needs a place to store the extra records until it is
ready to use them. The logical place is in an array of records. If it reads 20 records at a time, for
example, then these 20 records will be placed into its internal array. It can maintain a pointer to a
current record. Each time it needs to examine a new record, it checks whether the current record
pointer has exceeded the array bounds. If it has, it attempts to fetch the next 20 records from the
utmp le and ll the array with them. If no records are left in the le, it cannot obtain a new record,
and it is nished. Otherwise, it fetches as many as it can, up to 20, and then gets the current record
from the array and advances the current record pointer.
18
This idea is borrowed from Bruce Molay, Understanding Unix/Linux Programming, Prentice Hall.2003.
47
UNIX Lecture Notes Stewart Weiss
Chapter 2 Login Records, File I/O, and Performance
The logic for input buering is encapsulated into a separate library of routines for interacting with
the utmp records, called utmp_utils.c. The interface to this library consists of three functions:
open_utmp(), next_utmp(), and close_utmp(). The open_utmp() function opens the given utmp
le, the next_utmp() function delivers the next record, reading a new chunk from the le if the
buer is empty, and the close_utmp() closes the le. The interface follows.
Listing utmp_utils . h
typedef struct utmp utmp_record ;
utmp_record ∗ next_utmp ( ) ;
// returns : a pointer to the next utmp record from the
// opened file and advances to the next record
// NULL if no more records are in the file
void close_utmp ( ) ;
// closes the utmp file and frees the file descriptor
The implementation of the library is next. It uses global variables (static variables) so that the
functions can communicate. We do not want to pass these as parameters, because then client code
would have to do that as well, breaking the abstraction. If this were written in C++, this library
would be a class instead, and the globals would be member variables.
1 Listing utmp_utils . c
2 #include < s t d i o . h>
3 #include < f c n t l . h>
4 #include <s y s / t y p e s . h>
5 #include <utmp . h>
6
7 #define NUM_RECORDS 20
8 #define NULL_UTMP_RECORD_PTR ( ( utmp_record ∗) NULL)
9 #define SIZE_OF_UTMP_RECORD ( sizeof ( utmp_record ) )
10 #define BUFSIZE (NUM_RECORDS ∗ SIZE_OF_UTMP_RECORD)
11
12 static char utmpbuf [ BUFSIZE ] ; // b u f f e r o f r e c o r d s
13 static int number_of_recs_in_buffer ; // num r e c o r d s in b u f f e r
14 static int current_record ; // next rec to read
15 static int fd_utmp = − 1; // f i l e d e s c r i p t o r f o r utmp f i l e
16
17 int open_utmp ( char ∗ utmp_file )
18 {
19 fd_utmp = o p e n ( u t m p _ f i l e , O_RDONLY ) ;
48
UNIX Lecture Notes Stewart Weiss
Chapter 2 Login Records, File I/O, and Performance
20 current_record = 0;
21 number_of_recs_in_buffer = 0 ;
22 return fd_utmp ; // e i t h e r a v a l i d f i l e d e s c r i p t o r or −1
23 }
24
25 int fill_utmp ()
26 {
27 int bytes_read ;
28
29 // read NUM_RECORDS r e c o r d s from t h e utmp f i l e i n t o b u f f e r
30 // bytes_read i s t h e a c t u a l number o f b y t e s read
31 bytes_read = read ( fd_utmp , utmpbuf , BUFSIZE );
32 if ( bytes_read < 0 ) {
33 die ( " F a i l e d t o r e a d f r o m utmp f i l e " , " " ) ;
34 }
35
36 // I f we reach here , t h e read was s u c c e s s f u l
37 // Convert t h e b y t e c o u n t i n t o a number o f r e c o r d s
38 n u m b e r _ o f _ r e c s _ i n _ b u f f e r = b y t e s _ r e a d /SIZE_OF_UTMP_RECORD;
39
40 // r e s e t current_record to s t a r t at t h e b u f f e r s t a r t
41 current_record = 0;
42 return number_of_recs_in_buffer ;
43 }
44
45 utmp_record ∗ next_utmp ( )
46 {
47 utmp_record ∗ recordptr ;
48 int byte_position ;
49
50 if ( fd_utmp == −1 )
51 // f i l e was not opened c o r r e c t l y
52 return NULL_UTMP_RECORD_PTR;
53
54 if ( c u r r e n t _ r e c o r d == n u m b e r _ o f _ r e c s _ i n _ b u f f e r )
55 // t h e r e are no unread r e c o r d s in t h e b u f f e r
56 // need to r e f i l l t h e b u f f e r
57 if ( u t m p _ f i l l ( ) == 0 )
58 // no utmp r e c o r d s l e f t in t h e f i l e
59 return NULL_UTMP_RECORD_PTR;
60
61 // There i s at l e a s t one record in t h e b u f f e r ,
62 // so we can read i t
63 byte_position = current_record ∗ SIZE_OF_UTMP_RECORD;
64 recordptr = ( utmp_record ∗) &utmpbuf [ b y t e _ p o s i t i o n ] ;
65
66 // advance current_record p o i n t e r and r e t u r n record p o i n t e r
67 c u r r e n t _ r e c o r d ++;
49
UNIX Lecture Notes Stewart Weiss
Chapter 2 Login Records, File I/O, and Performance
68 return recordptr ;
69 }
70
71 void close_utmp ( )
72 {
73 // i f f i l e d e s c r i p t o r i s a v a l i d one , c l o s e t h e connection
74 if ( fd_utmp != −1 )
75 close ( fd_utmp );
76 }
Comments
1. In next_utmp(), if
( current_record == number_of_recs_in_buffer )
is true, it means that the number of records read so far is equal to the number of records in
the buer, which implies that it is time to read from the le again.
sets recordptr to point to the address of the array entry at the given byte position. We have
to cast the address of the linear array of bytes to a utmp_record pointer type.
Listing who4 . c
#i n c l u d e " utmp_utils . h"
if ( open_utmp ( UTMP_FILE ) == −1 ){
p e r r o r (UTMP_FILE ) ;
exit (1);
}
while ( ( u t b u f p = next_utmp ( ) ) != NULL_UTMP_RECORD_PTR )
show_info ( utbufp );
close_utmp ( );
return 0;
}
50
UNIX Lecture Notes Stewart Weiss
Chapter 2 Login Records, File I/O, and Performance
51
UNIX Lecture Notes Stewart Weiss
Chapter 2 Login Records, File I/O, and Performance
Appendix A
cat reads les first, second, and third in that order and concatenates their contents, sending
them to the standard output, which has been redirected to a le named combinedfile.
The most useful lters are
If your time is limited and you could learn but one of these, the most important would be grep
the return on your investment will be greatest. Coming in second would be sed, and then awk. The
remaining lters are easy to learn and use and are described briey rst.
A.1.1 sort
$ sort file
52
UNIX Lecture Notes Stewart Weiss
Chapter 2 Login Records, File I/O, and Performance
will sort the text le named file and print it on standard output. By default is uses collating order,
the order of the characters in the character code of the terminal, which is usually ASCII or UTF-8.
In this case uppercase letters precede lowercase letters. There are versions of sort that ignore case
by default, but if your does not, you can turn o case-sensitivity with the -i option.
$sort -n numeric_data
which will sort numbers correctly. Without the -n, 9 will precede 10 because 1 precedes 9 in the
collating sequence. Read the man page for details.
$ head -N
or
$ tail -N
respectively.
A.1.3 cut
cut is a lesser lter. You will rarely use it. It does simple tasks well. It cuts out selected pieces of
lines of the input.
$ cut c1-10
copies the rst 10 characters from every line, removing the rest.
$ cut f2,4
copies only elds 2 and 4 of every line to the output stream. Fields are delimited by the TAB
character unless the delimiter character is changed using the d option. Fields are 1-based, so the
rst eld is eld 1. The delimiter must be a single character:
will display elds 1 and 5 of the /etc/passwd le, which are the username and gcos elds.
53
UNIX Lecture Notes Stewart Weiss
Chapter 2 Login Records, File I/O, and Performance
where <regular expression> is an expression that represents a set of zero or more strings to be
matched. The syntax and interpretation of regular expressions is found in the regex man page in
Volume 7, as well as the man page for grep, so typing
$ man 7 regex
or
$ man grep
will give you everything you need to know on how to use them. The simplest patterns are strings
that do not contain regular expression operators of any kind; those match themselves. For example,
prints each line in les file1, file2, and file3 that contains the word "print". It will print these
in the order in which the les are listed, rst lines in file1, then file2, then file3. If you want
just a count of those lines, use the -c option; if you want the non-matching lines, use the -v option.
If you want the line numbers, use the -n option. There are many more useful options described in
its man page.
If you want to match a string that contains characters that have special meaning to the shell, such
as white-space, asterisks, slashes, dollar-signs, and so on, it should be enclosed in single-quotes:
will match all lines in the given les that have the exact string 'atomic energy' somewhere in the
line. Note that the lines merely have to contain the string as a substring; they do not have to match
the the string exactly. If you want the pattern to match an entire line, you have to bracket it with
operators called anchors. The start of line anchor is the caret ^ and the end of line anchor is the
dollar sign $:
matches lines in the given les that are exactly the string atomic energy.
Regular expressions can be formed with various operators such as the asterisk *, which multiplies
the expression to its left 0 or more times, as in
54
UNIX Lecture Notes Stewart Weiss
Chapter 2 Login Records, File I/O, and Performance
a*
which matches strings with zero or more a's: a, aa, aaa, and the null string. To match a string
like ababab, you have to enclose it in \(...\), as in
\(ab\)*
(ab)*
will match strings like (ab)(ab)(ab), not ababab because in regular expressions, the parentheses
by themselves are not special characters.
The period matches any character. There are character classes, which are formed by enclosing a list
(or a range) in square brackets []. A character class represents a single character from that class.
Because the special characters in regular expressions typically have special meaning in the shell as
well, it is a good idea to always enclose the pattern in single quotes. In particular, if you give it a
regular expression using an asterisk you must enclose the string in quotes .
1
A.1.4.1 Examples
In the following examples, the le argument is omitted for simplicity. In this case grep would apply
the pattern against standard input, which means if you actually type this, it will wait for you to
enter text followed by an end-of-le signal, Cntrl-D.
matches lines containing the word 'while' followed by zero or more space characters, followed by a
parenthesized expression.
$ grep '^[a-zA-Z][a-zA-Z0-9_]*'
matches lines that begin with a word that starts with a letter, upper or lowercase, following by zero
or more letters or digits or underscores.
$ grep '[0-9][0-9]*\.[0-9][0-9]\>'
The pattern selects strings that have 1 or more digits followed by a single period, followed by exactly
two digits. The period must be preceded by a backslash so that grep does not treat the period as
the special character meaning "match any character". The "\>" tells grep to anchor the pattern to
the end of the word. A word is a sequence of letters and/or digits. This forces grep to select only
those words that end in two digits. If I omitted the "\>" grep would have matched strings such as
1.234 or 1.23ab. There is a matching operator, \<, that anchors to the beginning of the word.
55
UNIX Lecture Notes Stewart Weiss
Chapter 2 Login Records, File I/O, and Performance
$ grep '\/\*.*\*\/'
Since / is a special character, if I want to match it I have to escape it with a \ like this: \/.
Similarly, since * is a special character in regular expressions, \* is how you have to match a single
asterisk *. So to match the two-character sequence /* I have to write \/\* and to match /* followed
by any number of characters and then followed by */, I have to write
\/\*.*\*\/
in which .* matches zero or more characters of any kind (including the period itself ). This nds
lines with C-style comments in them.
Regular expressions also provide a means of remembering matched expressions, for re-use in the
expression. This is very handy in vi and sed, which have substitution operators. The same operator
used for grouping is also used for remembering matching strings. The remembered string is then
referenced using the back-reference \1 (or \2, \3... if there are multiple strings remembered):
$ grep '\([a-z]\)\1\1\1\1'
matches any line that contains a sequence of 5 copies of a letter, such as xxxxx or bbbbb.
$ grep '\([1-9][0-9]\).*\1'
matches any line that has a two digit number that is repeated later in the line. The command
$ grep '\([a-z]\)\([a-z]\)\([a-z]\)\3\2\1'
has three remembered matches in the back-references \1, \2, and \3, but in reverse order. Each
will have a copy of the single lower-case letter that it matched, so this pattern matches palindromes
of length 6 such as xyzzyx.
You are encouraged to read the man page for grep. There is a lot more to regular expressions than
is covered here. The best way to learn them is to experiment. You can open a terminal window
and type grep followed by a pattern. It will then wait for you to type lines on the keyboard. Lines
that match will be repeated. Lines that don't will not. Try it.
egrep (extended grep or expression grep ) has a larger set of regular expressions meta-symbols than
grep, including '|', '?', '+', and parentheses. It is not a strict superset of grep because it does not
allow \( \), \{ \}, \< \>. These are equivalent to (), {}, and <>, in egrep.
$ egrep 'March|April|May'
56
UNIX Lecture Notes Stewart Weiss
Chapter 2 Login Records, File I/O, and Performance
and
$ egrep 'M(iss)+ippi'
$ egrep '[a-z]+'
A.1.5.2 fgrep
The fgrep variant of grep does not support regular expressions but does support multiple strings.
It is used to search quickly for many dierent xed strings. For example, you can put a list of
frequently misspelled words into a le and then call fgrep to search for them:
will print all lines in document that contain one of the strings in the le named errors.
$ ls *.c
is a command to list all les in the current working directory that have zero or more characters
followed by a .c.
The regular expressions that the shell uses for le-globbing have a dierent syntax from those used
by vi, grep, and the other lters and commands. They are not really regular expressions. File-globs
are more limited, and the asterisk * does not multiply the character that precedes it. It, by itself,
represents zero or more characters of any kind. Thus,
$ rm *.o
57
UNIX Lecture Notes Stewart Weiss
Chapter 2 Login Records, File I/O, and Performance
will run unzip on every le in the current working directory whose name starts with hwk2_ and
ends in .gz (in bash and sh and other Bourne-shell-like shells). You must be very careful when
using le globs, especially with dangerous commands such as rm that are not reversible, because
they may represent les that you did not think they did. One disastrous example would be
$ rm -r .*
which a naive user might think removes the hidden les in the given directory and their descen-
dants. But the pattern .* matches .., which implies that the command will recursively remove
everything in .., the parent directory. There are many other things to know about le globs; the
complete description can be found in the man page in Volume 7:
$ man 7 glob
58