Chapter 02

Download as pdf or txt
Download as pdf or txt
You are on page 1of 58

UNIX Lecture Notes Stewart Weiss

Chapter 2 Login Records, File I/O, and Performance

Chapter 2 Login Records, File I/O, and Performance

Concepts Covered
Man pages and Texinfo pages Detecting and reporting errors in system calls
The UNIX le I/O API Memory-mapped I/O,
Reading, creating, and writing les Feature test macros
File descriptors open, creat, close, read, write, lseek, perror,
Kernel buering ctime, localtime, utmpname, getutent, setutent,
Kernel versus user mode and the cost of system endutent, malloc, calloc, mmap, munmap, mem-
calls cpy
Timing programs Filters and regular expressions
Time representation in UNIX
The utmp le

2.1 Introduction
This chapter introduces the two primary methods of I/O possible in a UNIX: buered and unbuered.
By trying to write the who and cp commands, we will learn explore how to create, open, read, write,
and close arbitrary les. "Arbitrary" in this context means that they are not necessarily text les.
We will write several dierent versions of the who command, simply to illustrate dierent approaches
to the problem of reading from a le. They will dier in their performance characteristics and their
portability. The chapter uses this exercise to introduce the UNIX concept of time, and the rst
of several important databases provided by the kernel, as well as the kernel's interface to those
databases. We also write two dierent versions of a simplied cp command, one using read() and
write(), and the other using memory-mapped I/O.

2.2 Commands Are (Usually) Programs


In UNIX, most commands are programs, almost always written in C. Some commands are not pro-
grams; they are built into the shell and therefore are called shell builtins. Exactly which commands
1
are builtins varies from one shell to another , but there are some that are common to almost all
shells, such as cd and exit. When you type cd, for example, the shell does not run the cd program;
it jumps to the internal code that implements the cd command itself. You can think of the shell
as containing a C switch statement inside a loop. When it sees that the command is a built in, it
jumps to the code to execute it. Some commands, such as pwd, are both shell builtins and programs.
By default the shell built in will be executed if the user types pwd; to get the program version, one
can either precede the command with a backslash "\", as in \pwd, or type the full path name,
/bin/pwd.
1
The list of built-in commands is usually provided in the shell's man page. For example, the command man
builtins will display the bash_builtins man page, and at the very top of that page is the complete list of bash
builtins.

1
UNIX Lecture Notes Stewart Weiss
Chapter 2 Login Records, File I/O, and Performance

Command programs are located in one of several directories, the most common being /bin, /usr/bin,
and /usr/local/bin. The /usr/local/bin directory is traditionally used as a repository for com-
mands that do not come with the UNIX distribution and have been added as local extras. Many
packages that are installed after the operating system installation are placed in subdirectories of
/usr/local. Administrative commands, such as those for creating and modifying user accounts,
are found in /usr/sbin. /usr/ucb directory. (The "ucb"
Many UNIX systems still retain the old
in /usr/ucb stands for the University of California at Berkeley. The /usr/ucb directory, if it exists,
contains commands that are part of the BSD distributions. Some of the commands in /usr/ucb
are also in /usr/bin and have dierent semantics. If the same command exists in both /usr/bin
and in some other directory such as /usr/ucb, the PATH environment variable just like the one used
in Windows and DOS, determines which command will be run. The PATH variable contains a list of
the directories to search when the command is typed without a leading path. Whichever directory
is earliest in the list is the one whose version of the command is used. Thus, if more exists in both
/usr/ucb and /usr/bin, as well as in your working directory, and /usr/bin precedes /usr/ucb
which precedes  . in your PATH variable, and if you type

$ more myfile

then /usr/bin/more will run. If instead you type

$ ./more myfile

then your PATH is not searched and your private more program will run. If you type

$ /usr/ucb/more myfile

then your PATH is not searched and /usr/ucb/more will run.

2.3 The who Command


There are a few dierent commands for checking which users are currently using the system. The
simplest of these is conveniently named who2 . Other commands that perform similar tasks are w,
users, and whodo3 . The who and w commands are required by the POSIX standard, so they are
more likely to be on a UNIX installation.

The who command displays information about who is currently using the system. Running who
without command-line options produces a listing such as

dsutton pts/1 Jul 23 20:22 (66-108-62-189.nyc.rr.com)


ioannis pts/2 Jul 24 16:53 (freshwin.geo.hunter.cuny.edu)
dplumer pts/3 Jul 26 11:34 (66-65-53-41.nyc.rr.com)
rnoorzad pts/4 Jul 23 09:25 (death-valley.geo.hunter.cuny.edu)
rnoorzad pts/5 Jul 23 09:25 (death-valley.geo.hunter.cuny.edu)
sweiss pts/6 Jul 26 13:08 (70.ny325.east.verizon.net)
2
This is unusual. Most UNIX commands have names that are so cryptic that you have to be a wizard to guess
their names. Would you have guessed, for example, that to view the contents of a directory, you have to type "ls"
or that to view the contents of a le you can type "cat"?
3
whodo is not available in Linux. It is found in Solaris, AIX, and other UNIX variants.

2
UNIX Lecture Notes Stewart Weiss
Chapter 2 Login Records, File I/O, and Performance

Each line represents a single login session. The -H option will print column headings, in case the
data is not obvious. The rst column is the username, the second is the terminal line on which the
user is logged in, the third is the time of the login on that terminal, and the last is the source of
the login, either the host name or an X display. For example, sweiss was logged in on terminal
line pts/6, the session started at 13:08 on July 26th of the then current year, and the login was
initiated from a computer identied as 70.ny325.east.verizon.net. Notice that there may be
multiple logins with the same username.

The output of who may vary from one system to another. Some of the reasons have to do with
how systems treat users who have multiple terminal windows open in a single login or are running
terminal multiplexers such as Gnu's screen program. The w command, by the way, is approximately
equivalent to the command sequence  uptime; who; it shows more information than who does.

2.4 Researching Commands In UNIX


UNIX is a self-documented operating system. You can use UNIX itself to learn how it works if
you do a thorough exploration of the online documentation. In particular, the man pages can be
a source of information about how a command might be implemented. This information is not
explicit, but can be obtained by using clues within the page. The man page for a command may
not have enough content, and will instead have a message such as the following in the SEE ALSO
section at the bottom:

The full documentation for who is maintained as a Texinfo manual.


If the info and who programs are properly installed at your site,
the command
info coreutils 'who invocation'
should give you access to the complete manual.

In this case, one should use the info command instead. The info command brings up the Texinfo
pages. The Texinfo system is an alternative system for providing on-line documentation. To learn
how to use the Texinfo viewer, type

info info

which will bring up a tutorial on using the Texinfo documentation system. The general idea is that
the information is stored in a tree-like structure, in which an internal node represents a topic area,
and its child nodes are specic to that topic. The space bar will advance within the entire tree using
breadth-rst search. To descend into a node's children, d (for down) works. To go back up, u (for
up) works. To traverse the siblings from left to right, n (for next) does the trick, and to go back, p
(for previous) works. Just picture the tree.

Note. On some systems, when you type "info coreutils who" , you will see the page for the
whoami command. who. On other systems
If you move ahead a few pages, you will nd the page for
you may have to type  info who or "info coreutils 'who invocation'" to bring up the proper
pages.

The man page for who tells us that the command may be called with zero or more of the command-
line options abdHlmpqrstTu. It can also be called as follows:

3
UNIX Lecture Notes Stewart Weiss
Chapter 2 Login Records, File I/O, and Performance

$ who am i
sweiss pts/6 Jul 26 13:08 (70.ny325.east.verizon.net)

and, in Linux, if you supply any two words after  who , it behaves the same way:

$ who you think


sweiss pts/6 Jul 26 13:08 (70.ny325.east.verizon.net)

In general, the way to research a UNIX command is to use a combination of these methods:

1. Read the relevant man page.

2. Follow the SEE ALSO links on the page.

3. Read the Texinfo page if the man page refers to it.

4. Search the manual.

5. Find and read the header (.h) les relevant to the command.

2.4.1 Reading Man Pages


There is no standard that denes what must be contained in most man pages; it is implementation-
dependent. However, most systems follow a time-honored convention for man pages in general,
which is what we describe in these notes. For the purpose of understanding how a command works,
the relevant sections of the man page for that command are the DESCRIPTION, SEE ALSO, and FILES
sections.

The DESCRIPTION section gives the details of how the command is used. For example, reading
about who in the man page reveals that who has an optional le name argument, and that if it is not
supplied, who reads the le /var/run/utmp to get the information about current logins. The optional
argument can be /var/log/wtmp. We can infer that the le /var/run/utmp contains information
about who is currently logged in. What about /var/log/wtmp? If you were to try typing

$ man wtmp

you would be pleasantly surprised to discover that, although wtmp is not a command, there is a
man page that describes it. This is because there is a section of the man pages strictly devoted to
the description of system le formats. /var/log/wtmp is a system le, as is /var/run/utmp, and
they are both described on the same man page in section 5 of the manual. There we can learn that
/var/log/wtmp contains information about who has logged in previously .
4

Before we dig deeper into the man page for the utmp and wtmp les, you should also know that it
is required of all POSIX-compliant UNIX systems that they also contain man pages for all of the
header les that might be included by a function in the kernel's API. To put it more precisely, each
function in the System Interfaces volume of POSIX.1-2008 species the headers that an application
must include to use that function, and a POSIX-compliant system must have a man page for each
of those headers. They may not be installed on the system you are using, but they are available.
They will only be installed if the system administrator installed the application development les.

The man pages for the header les have a xed format. From the POSIX.1-2008 standard:

4
If we consult the who Texinfo page, we could learn that as well.

4
UNIX Lecture Notes Stewart Weiss
Chapter 2 Login Records, File I/O, and Performance

NAME
This section gives the name or names of the entry and briey states its purpose.

SYNOPSIS
This section summarizes the use of the entry being described.

DESCRIPTION
This section describes the functionality of the header.

APPLICATION USAGE
This section is informative. This section gives warnings and advice to application de-
velopers about the entry. In the event of conict between warnings and advice and a
normative part of this volume of POSIX.1-2008, the normative material is to be taken
as correct.

RATIONALE
This section is informative. This section contains historical information concerning the
contents of this volume of POSIX.1-2008 and why features were included or discarded
by the standard developers.

FUTURE DIRECTIONS
This section is informative. This section provides comments which should be used as a
guide to current thinking; there is not necessarily a commitment to adopt these future
directions.

SEE ALSO
This section is informative. This section gives references to related information.

The important sections are NAME, SYNOPSIS, DESCRIPTION, and SEE ALSO.
For example

$ man stdlib.h

will display the man page for the header le <stdlib.h>. This is a useful feature. But if you do not
know the name of the command that you need, nor the names of any les that might be useful or
relevant, then you do not know which man page to read. UNIX systems provide various methods
of overcoming this problem.

2.4.2 Man Page Searching


The most basic solution, guaranteed to work on all systems, is to use the search feature of the man
command. To search for all man pages that contain a particular keyword in their one-line summaries
in the NAME Section, you can type

$ man k keyword

This will only work if the whatis database has been built when the man pages were installed
5
however, so you are at the mercy of the system administrator . For example, typing

5
If you are the administrator, issue the command /usr/sbin/makewhatis to build the database.

5
UNIX Lecture Notes Stewart Weiss
Chapter 2 Login Records, File I/O, and Performance

$ man k utmp

will list all man pages that contain the string utmp in their summaries. The command

$ apropos utmp

has the exact same meaning: apropos is equivalent to "man k". Unfortunately, the implemen-
tation of apropos varies from system to system. On some systems, such as Fedora 15, the most
current stable version, apropos has features that allow multiple keyword searches as well as regular
expression searches. To search for man pages whose page names and/or NAME sections contain all
keywords provided, one can use the -a option, as in

$ apropos -a convert case


toupper (3) - convert letter to upper or lower case
FcToLower (3) - convert upper case ASCII to lower case
tolower (3) - convert letter to upper or lower case
towlower (3) - convert a wide character to lowercase
towupper (3) - convert a wide character to uppercase
XConvertCase (3) - convert keysyms

The number in parentheses is the section number. Section 3 contains man pages for library functions.
Notice that we have output in which the string case is a substring of other words. If we wanted
to limit it to those descriptions in which case is a word on its own, we could use the regular
expression matching feature of apropos:

$ apropos -ar convert '\<case\>'


toupper (3) - convert letter to upper or lower case
FcToLower (3) - convert upper case ASCII to lower case
tolower (3) - convert letter to upper or lower case

Unfortunately, this powerful apropos is not available on all systems. In particular, it is absent on
the RHEL 6 system installed on our server. This version has no options, so one cannot do such
searches. In this case, to get the same eect, one can use a simple search and pipe the output
through a grep lter. If you are not familiar with grep or regular expressions, see the Appendix.
The equivalent command would be

$ apropos convert | grep '\<case\>'


FcToLower (3) - convert upper case ASCII to lower case
tolower (3) - convert letter to upper or lower case
toupper (3) - convert letter to upper or lower case

If the output list is still too long to be useful, you can lter it further with another instance of grep:

$ apropos convert | grep '\<case\>' | grep '\<ASCII\>'


FcToLower (3) - convert upper case ASCII to lower case

6
UNIX Lecture Notes Stewart Weiss
Chapter 2 Login Records, File I/O, and Performance

2.5 Digging Deeper into the who Command


The output of the manual search on the utmp le will look something like:

endutent [getutent] (3) - access utmp file entries


getutent (3) - access utmp file entries
getutid [getutent] (3) - access utmp file entries
getutline [getutent] (3) - access utmp file entries
login (3) - write utmp and wtmp entries
logout [login] (3) - write utmp and wtmp entries
pututline [getutent] (3) - access utmp file entries
sessreg (1x) - manage utmp/wtmp entries for non-init clients
setutent [getutent] (3) - access utmp file entries
utmp (5) - login records
utmpname [getutent] (3) - access utmp file entries
utmpx.h [utmpx] (0p) - user accounting database definitions
wtmp [utmp] (5) - login records

The rst word is the topic of the man page, the next, the man page title, the third is the section
number of the manual, and the last is a brief description of the topic.

Every UNIX system has a manual volume that deals with the les used by the commands. The
number may vary. From the above output, it appears that the utmp le is described in Section 5 of
the man pages:

utmp [utmp] (5) - login records

Also, the line

wtmp [utmp] (5) - login records

shows that the man page describing the wtmp le is the same page as the one describing utmp.
Obviously, there is a man page for utmp in Section 5 of the manual. To specify the specic section
to display, you need to specify it as an option. The syntax varies; in RedHat Linux either of these
will work:

$ man 5 utmp
$ man S5 utmp

There was also a line of output

utmpx.h [utmpx] (0p) - user accounting database definitions

7
UNIX Lecture Notes Stewart Weiss
Chapter 2 Login Records, File I/O, and Performance

The <utmpx.h> header le describes a POSIX-compliant interface to the utmp le. This interface
is dierent from that of the <utmp.h> le. We will use the (outdated) <utmp.h> interface for our
initial attempts, exploring the utmp le in greater depth, starting with the man page that our
system delivers when we type either of the above man commands. After that we will consider using
two other interfaces, the POSIX utmpx interface and a GNU extension, the thread-safe functions
getutent_r() and its cousins.

The beginning of the man page for utmp from RedHat Enterprise Linux Release 4 is displayed below.

NAME
utmp, wtmp - login records
SYNOPSIS
#include <utmp.h>
DESCRIPTION
The utmp file allows one to discover information about who is currently
using the system. There may be more users currently using the system,
because not all programs use utmp logging.

Warning: utmp must not be writable, because many system programs


(foolishly) depend on its integrity. You risk faked system logfiles and
modifications of system files if you leave utmp writable to any user.
The file is a sequence of entries with the following structure declared
in the include file (note that this is only one of several definitions
around; details depend on the version of libc):

( lines omitted here )

First note that it tells us which header le is relevant: <utmp.h> This is the header le that the
compiler will use when the include directive #include <utmp.h> is in your program6 . Next, it issues
a warning to system administrators not to leave this le writable by anyone other than its owner,
the superuser. Then it warns the rest of us, before showing us the contents of the include le, that
the contents may dier from one installation to another.

Since UNIX is a free, community supported operating system, it has been evolving over time. You
may nd that what is described in a book, or in these notes, is dierent from what you observe
on your system. It is not that anything is correct or incorrect, but that UNIX is a moving target,
and that systems can dier in minor ways. For example, the man page for utmp in an older version
of Linux will be very dierent from the one shown here. Even the location of the utmp le itself
is dierent. Later versions of UNIX added system functions to provide a data abstraction layer so
that the programmer would not need to know the actual structure of the le. The problem was
that dierent versions of UNIX had dierent denitions of the utmp structure, and programs that
accessed the structure directly were failing on dierent systems.

6
There may be many les named utmp.h in the le system. Each compiler will have its own method of deciding
which one to use. The GNU compiler collection (gcc) installs its own header les in specic places, and it uses these
by default. The default search path used by gcc is typically
/usr/local/include
target-installdir /include
/usr/include
where target-installdir is the directory in which gcc was installed on the machine. This is explained in more detailed
shortly.

8
UNIX Lecture Notes Stewart Weiss
Chapter 2 Login Records, File I/O, and Performance

The structures displayed in the man page may not be the same as those found on our machine. If
you write code that depends critically on the structure denition, it may work on one machine but
not another. In spite of this, it is valuable to study these structures. Afterward we will write more
portable code. The key to that is to use preprocessor directives to conditionally compile the code
based on the values of macros. The man page continues:

#define UT_UNKNOWN 0
#define RUN_LVL 1
#define BOOT_TIME 2
#define NEW_TIME 3
#define OLD_TIME 4
#define INIT_PROCESS 5
#define LOGIN_PROCESS 6
#define USER_PROCESS 7
#define DEAD_PROCESS 8
#define ACCOUNTING 9
#define UT_LINESIZE 12
#define UT_NAMESIZE 32
#define UT_HOSTSIZE 256
struct exit_status {
short int e_termination; /* process termination status. */
short int e_exit; /* process exit status. */
};
struct utmp {
short ut_type; /* type of login */
pid_t ut_pid; /* pid of login process */
char ut_line[UT_LINESIZE]; /* device name of tty - "/dev/" */
char ut_id[4]; /* init id or abbrev. ttyname */
char ut_user[UT_NAMESIZE]; /* user name */
char ut_host[UT_HOSTSIZE]; /* hostname for remote login */
struct exit_status ut_exit; /* The exit status of a process
#if __WORDSIZE == 64 && defined __WORDSIZE_COMPAT32
int32_t ut_session; /* Session ID (getsid(2)),
used for windowing */
struct {
int32_t tv_sec; /* Seconds */
int32_t tv_usec; /* Microseconds */
} ut_tv; /* Time entry was made */
#else
long ut_session; /* Session ID */
struct timeval ut_tv; /* Time entry was made */
#endif
int32_t ut_addr_v6[4]; /* IP address of remote host. */
char __unused[20]; /* Reserved for future use. */
};

The page then contains a brief description of the purpose of the structure:

9
UNIX Lecture Notes Stewart Weiss
Chapter 2 Login Records, File I/O, and Performance

This structure gives the name of the special file associated with the user's
terminal, the user's login name, and the time of login in the form of time(2).
String fields are terminated by '\0' if they are shorter than the size of the
field.

More information about the specic members of the structure is contained in the comments in the
struct denition. The man page does not describe the members in detail beyond that. The rest
of the man page, which is not included here, goes on to describe how the various entries in the
utmp le are created and modied by the dierent processes involved in logging in and out. We will
return to that topic shortly. It reiterates the warning:

The file format is machine dependent, so it is recommended that it


be processed only on the machine architecture where it was created.

You should have noticed the following line in the man page:

#if __WORDSIZE == 64 && defined __WORDSIZE_COMPAT32

This causes conditional compilation of the code. It means, if the machine's word size is 64 bits and
ut_session and ut_tv members,
it is in 32-bit compatibility mode, then use one denition of the
otherwise use a dierent one. The macros __WORDSIZE and __WORDSIZE_COMPAT32 are dened in
7
the header le /usr/include/bit/wordsize.h . We will ignore this subtlety for now, and rather
than relying on the man page, we will examine the <utmp.h> header le itself.

2.5.1 Reading the Correct Header Files


Which header le to read depends upon the particular installation. For example, on my home
oce workstation, which is running Fedora 14, gcc will use /usr/include/utmp.h, whereas on the
cs82010 server in the Graduate Center, which is running RedHat Enterprise Linux Release 6, gcc
will rst look for /usr/lib/gcc/x86_64-redhat-linux/4.4.5/include/utmp.h. One method of
determining which le gcc will actually use in a particular installation is the following:

1. Create a trivial C program such as

int main() { return 0; }

and suppose it is named empty.c.

echo int main() {return 0;} > empty.c

is an easy way to do this.

2. Run the command


7
The macro __WORDSIZE_COMPAT32 is only dened on 64 bit machines. One can discover this le by doing a
recursive grep on the /usr/include directory hierarchy of the form  grep -R WORDSIZE /usr/include/* | grep
define, which will list the les in which these macros are dened.

10
UNIX Lecture Notes Stewart Weiss
Chapter 2 Login Records, File I/O, and Performance

$ gcc -v empty.c

3. In the output produced by gcc, look for lines of the form

#include "..." search starts here:


#include <...> search starts here:
your_current_working_dir/include
/usr/local/include
/usr/lib/gcc/x86_64-redhat-linux/4.4.5/include
/usr/include
End of search list.

These lines will show you which directories and in which order gcc searches for included header
les. The above output shows that gcc will search rst in /usr/include/local, then in the install
directory, and then in /usr/include. Since there is no <utmp.h> le in the rst two directories, it
will use /usr/include/utmp.h.

Returning to the task at hand, if you look at either of the <utmp.h> les mentioned above, you will
see that they are mostly wrappers for a le which is in the corresponding bits subdirectory:

/usr/include/bits/utmp.h,

or

/usr/lib/i386-redhat-linux3E/include/bits/utmp.h.

Taking the liberty of eliminating the 64-bit conditional macros, and the macro names, the important
elements of the header le are as follows:

/* The structure describing an entry in the database of


previous logins . */
struct lastlog
{
__time_t ll_time ;
char ll_line [ UT_LINESIZE ];
char ll_host [ UT_HOSTSIZE ];
};
/* The structure describing the status of a terminated
process . This type is used in ` struct utmp ' below . */
struct exit_status
{
short int e_termination ; /* Process termination status . */
short int e_exit ; /* Process exit status . */
};
/* The structure describing an entry in the user accounting
database . */
struct utmp

11
UNIX Lecture Notes Stewart Weiss
Chapter 2 Login Records, File I/O, and Performance

{
short int ut_type ; // Type of login .
pid_t ut_pid ; // Process ID of login process .
char ut_line [ UT_LINESIZE ]; // Devicename .
char ut_id [4]; // Inittab ID .
char ut_user [ UT_NAMESIZE ]; // Username .
char ut_host [ UT_HOSTSIZE ]; // Hostname for remote login .
struct exit_status ut_exit ; /* Exit status of a process
marked as DEAD_PROCESS . */
long int ut_session ; // Session ID , used for windowing .
struct timeval ut_tv ; // Time entry was made .
int32_t ut_addr_v6 [4]; // Internet address of remote host .
char __unused [20]; // Reserved for future use .
};

The point is that login records have ten signicant members, and we can write code to extract
their data in order to mimic the who command. In particular, the ut_user char array stores the
username, the ut_line char array stores the name of the terminal device of the login, ut_time
stores the login time, and ut_host stores the name of the remote host from which the connection
was made. Unfortunately, we will not be able to ignore indenitely the way that time is dened on
dierent architectures, but for the moment, we will continue to ignore it.

2.5.2 What Next?


It seems likely that who opens the utmp le and reads the utmp structures from that le in sequence,
displaying the appropriate data for each login. We will write use this as the basis for our own
implementation of the command.

2.6 Writing who


The program that implements the who command has two key tasks:

• to read the utmp structures from a le, and

• to display the information from a single utmp structure on the display device in a user-friendly
format.

We begin by discussing solutions to the rst task.

2.6.1 Reading Structures From a File


A binary le consists of a sequence of bytes, not to be interpreted as characters. It is the most
general form of a le. A le consisting of a sequence of structures, such as the utmp le, is a binary
le and cannot be read using the C I/O functions with which most programmers are familiar, such as
get(), getc(), fgets(), and scanf(), nor the istream methods in C++, because all of these read

12
UNIX Lecture Notes Stewart Weiss
Chapter 2 Login Records, File I/O, and Performance

textual input. They are specically designed for that purpose. Although you could read structures
by reading one char at a time and then reconstructing the structure from the sequence of chars with
a lot of type casts, that would be grossly inecient and error-prone. Clearly there must be a better
way.

Let us suppose that you do not know the methods of reading from a binary le. You could use a
man page search such as

$ man k binary file | grep read

Remember though that when you use multiple words with the -k option, they are OR-ed together,
so the output includes lines with either word (or both). If you do this search, you will see a list
of perhaps several dozen man pages. If you get a long list you can lter it further by limiting the
output to only sections 2 or 3 of the man pages with a third stage in the pipeline:

$ man -k binary file | grep read | grep '([23])'

In this list will be the page for two prospective functions to use:

fread (3) - binary stream input/output


read (2) - read from file descriptor

The rst, fread(), in Section 3, is part of the C Standard I/O Library; it is C's function for reading
binary les. The second, read(), in Section 2, is the prototype of a system call. As we are primarily
interested in what Unix in particular has to oer us, we will look at the system call. In Chapters 5
and 7, we will revisit the C Standard I/O Library.

We want to see what the man page for read() has to say. If you do not specify the section number
when you type  man read , you will get the man page from the rst section, and you will discover
that there is also a UNIX command, /usr/bin/read:

$ man read

which will output the man page for the read command in Section 1. You must type

$ man 2 read

to get the man page for the read() system call. I have included the important parts of the man
page below.

NAME
read - read from a file descriptor
SYNOPSIS
#include <unistd.h>
ssize_t read(int fildes, void *buf, size_t nbyte);
DESCRIPTION

13
UNIX Lecture Notes Stewart Weiss
Chapter 2 Login Records, File I/O, and Performance

read() attempts to read up to count bytes from file descriptor fd into


the buffer starting at buf.
If count is zero, read() returns zero and has no other results.
If count is greater than SSIZE_MAX, the result is unspecified.
RETURN VALUE
On success, the number of bytes read is returned (zero indicates end
of file), and the file position is advanced by this number. It is not
an error if this number is smaller than the number of bytes requested;
this may happen for example because fewer bytes are actually available
right now (maybe because we were close to end-of-file, or because we
are reading from a pipe, or from a terminal), or because read() was
interrupted by a signal. On error, -1 is returned, and errno is set
appropriately. In this case it is left unspecified whether the
file position (if any) changes.

To use the read() function, the program must include the header le <unistd.h>. This header le
serves various purposes, the most relevant for our purposes being that it contains the prototypes of
the (POSIX compliant) system calls.

The dierence between <stdio.h> and <unistd.h>.


The functions that begin with "f": fopen(), fread(), fwrite(), fclose(), and so on,
which operate on le stream pointers (FILE pointers) are all part of the ANSI Standard
C I/O Library, whose header le is <stdio.h>. They are C functions that you can use
on any operating system. We used fopen() and fclose() in Chapter 1 to implement
our version of the more command.

The functions open(), read(), write(), and close() are UNIX system calls and their
prototypes are dened in <unistd.h>, which is a POSIX header le. The <unistd.h>
header denes miscellaneous symbolic constants and types, and declares miscellaneous
functions, among which are these calls. These functions exist only in UNIX systems
and they exist no matter what language you use, as long as the system you are using
is POSIX-compliant. POSIX does not specify whether they should be system calls or
library functions, but only that they exist as one or the other. These system calls
operate on le descriptors, not le streams. The UNIX system calls operate on the
kernel directly; the ANSI Standard C I/O Library calls are at a higher level.

The read() function has three arguments. The man page says that the read() function reads from
a le associated with a le descriptor. A le descriptor is a small, non-negative integer. We will
study le descriptors in greater detail in a later chapter. The second parameter is a pointer to a
place in memory into which the bytes that are read are to be stored. The third parameter is the
number of bytes to read. The return value is the number of bytes actually read, which can never
be larger, but might be smaller, or is 1, if something went wrong.

To illustrate, suppose that filedesc is a valid le descriptor that we can use for reading, buffer is
a char array of size 100, and num_bytes_read is an integer variable. The following code fragment
shows how to read 100 bytes of data at a time from this le stream until the end of data is found

14
UNIX Lecture Notes Stewart Weiss
Chapter 2 Login Records, File I/O, and Performance

while ( !done ) {
num_bytes_read = read(filedesc, buffer, 100);
if ( 0 > num_bytes_read )
// an error code was returned during reading - bail out
if ( 0 == num_bytes_read )
// the end of file was reached - stop reading
done = 1;
else
// do whatever has to be done to the data
}

This is a typical read-loop structure. The read() call does not fail when there is no data; it just
returns 0. This is how to detect the end of the input data.

How can a program associate a le descriptor with a le? Look in the SEE ALSO section of the man
8
page and you will nd references to fnctl(), creat(), open() and many other system calls. Most
of these work with le descriptors. The open() system call is the one we need now, because the
open() call opens a le and assigns a le descriptor to it.

2.6.2 The open() and close() System Calls


To read from a binary le, a process must

• open the le for reading,

• read the bytes, and

• close the le.

The open() system call creates a connection between the process and the le. Think of a connection
as an object that manages the I/O operations on the le from the process. This object contains
things such as the oset in the le for the next operation, various status ags, and pointers to
kernel functions that the process can invoke. It is represented by a le descriptor. A process can
open several les and each will have its own le descriptor. In fact, it can open the same le twice
9
and each connection will have a dierent le descriptor . UNIX does not prevent you or anyone
else from opening the same le many times. It is up to the users and their programs to coordinate
accesses to les.

If you look at the man page you will see the following synopsis of the open() call.

#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
int open(const char *path, int oflag, /* mode_t mode */...);
8
All of these are in Section 2 of the man pages.
9
You might have guessed. The le descriptor is the index into an array of structs. Each of these structs contains,
among other things, a pointer to the next character in the le to be read. A process can read from two dierent parts
of the same le at the same time in this way.

15
UNIX Lecture Notes Stewart Weiss
Chapter 2 Login Records, File I/O, and Performance

The rst argument is a character string containing the path to the le to be opened. The second
argument is an integer specifying how the le is to be opened: for reading, for writing, for reading
and writing, for appending, and so on. If the call is successful, it returns a le descriptor. More
accurately, it returns the lowest numbered le descriptor not already in use by the process. If the
call is not successful, it returns 1. There are methods of detecting the type of error; these will be
examined later.

The value of oflag is one of the following constants dened in <fcntl.h>:

O_RDONLY Open for reading only.

O_WRONLY Open for writing only.

O_RDWR Open for reading and writing.

It is more complex than this, but this is enough for now. Other values can be bit-wise-OR-ed to
these values.

Example. Consider the following code:

int fd;
if (fd = open("/var/adm/messages.0", O_RDONLY) < 0 )
exit (-1);

This attempts to open the le /var/adm/messages.0 for reading. If it fails, it exits. If it is successful,
the le is ready for reading. The le descriptor stored in fd is the one the program must use in the
read() call. Notice that the call is made within a conditional expression and that the return value
of the call is compared to 0 in that condition. This is a common method of error handling in C
programs.

Unlike other operating systems, UNIX does not prevent a le that is already open by one process
from being opened by another. This is a very important feature to remember about UNIX. It is
why it is possible for multiple users to run the same command or change their passwords at the
same time
10 .

After your process is nished reading a le, it should close the connection to the le. The close()
system call

int close( int filedes)

has a single argument which is the le descriptor of the connection to be closed. If a le has been
opened by a process via multiple calls to open(), then the other connections will remain open and
only the one corresponding to filedes will be closed. If the kernel cannot close the connection, it
will return 1.

Now you might wonder what could possibly go wrong when closing a le, especially when it has
been opened for reading. Well, rst of all, it is possible you passed it a bad le descriptor when

10
Of course UNIX does provide the means for a process to open a le and lock it so that no other process can read
or write it while it is in use, but this requires actions on the part of the process to make it happen. UNIX does not
do this automatically.

16
UNIX Lecture Notes Stewart Weiss
Chapter 2 Login Records, File I/O, and Performance

you closed it. Secondly, the kernel, in the middle of the system call, may be given an urgent task to
complete, so urgent that it has to drop the close() call in the middle to deal with it. In this case it
will also return a 1. Also, the le may not have been on the local machine or the local drive, and
a network connection might have gone down, in which case the le cannot be closed. Furthermore,
if this le had been opened for writing, there are more reasons why close() might fail, the most
important of which is that it is only when close() is called that the actual write takes place and
at which point the kernel will discover it cannot complete the write for any number of reasons.

2.6.3 A First Attempt at Writing who


The main program must open the le and then enter a loop in which it repeatedly reads a single
utmp record and displays it on the screen, until all records have been read. A rough sketch of this
is in the listing below, which we call who1.c.

1 Listing 1. who1 . c
2 #include < s t d i o . h>
3 #include < s t d l i b . h>
4 #include < f c n t l . h>
5 #include <utmp . h>
6
7 int main ( )
8 {
9 int fd ;
10 struct utmp current_record ;
11 int reclen = sizeof ( struct utmp ) ;
12
13 f d = o p e n (UTMP_FILE, O_RDONLY) ;
14 if ( f d == −1 ) {
15 p e r r o r ( UTMP_FILE );
16 exit (1);
17 }
18
19 while ( read ( fd , &c u r r e n t _ r e c o r d , r e c l e n ) == r e c l e n )
20 s h o w _ i n f o ( &c u r r e n t _ r e c o r d );
21
22 close ( fd ) ;
23 return 0;
24 }

First observe that the rst argument to the open() call is UTMP_FILE. This is a macro whose
denition is included in the <utmp.h> header le. Its value is system-dependent; it is the path to
the actual utmp le. It is usually "/var/run/utmp". We would not know about it if we did not read
the header le.

Notice which header les are included, notice that reclen contains the number of bytes in a utmp
struct. The sizeof() function returns the number of bytes in its argument type. reclen will be
used in the read() call to read exactly one utmp structure at a time. The call to read() is given the

17
UNIX Lecture Notes Stewart Weiss
Chapter 2 Login Records, File I/O, and Performance

le descriptor returned by open(), a pointer to a memory location large enough to hold one utmp
record, and reclen, the number of bytes to be read. If the return value equals reclen then a full
record was read. If it does not, then an incomplete record was read or the end-of-le was reached.
In either case we stop reading. The show_info() function remains to be written. It should display
the contents of the current record. The perror() function is described below.

2.6.4 What to Do with System Call Errors


In UNIX, most system calls simply return the value 1 when something goes wrong. This would be
rather useless if that is all it did because the calling program would not know what actually went
wrong. In addition to returning a 1, the kernel stores an error code in the global variable errno
that all processes can access if they include <errno.h>. When you build a program in UNIX, the
variable errno is in the namespace of the program if the header le is included.

The <errno.h> le denes a number of mnemonic constants for error values, such as

#define EPERM 1 /* Operation not permitted */


#define ENOENT 2 /* No such file or directory */

Your program can use these symbols directly with code such as

if ( fd = open("myfile", O_RDONLY) == -1 ) {
printf(Cannot open file: ");
if ( errno == ENOENT )
printf("No such file or directory\n");
else if
...
}

This would be very tedious, since every program you write would have long switch statements or
cascading if-statements. It is much easier to use the UNIX library function perror() to do this for
you. The perror() function, which conforms to POSIX-1.2001, has a single string as a parameter,
and looks up the value of errno and displays the string followed by an appropriate message based
on the value of errno. It is declared in <stdio.h>, so you do not need to include <errno.h> if you
use it. The code snippet above is simplied by using perror():

if ( fd = open("myfile", O_RDONLY) == -1 ) {
perror("Cannot open file: ");
return;
}

and it would print

Cannot open file: No such file or directory

18
UNIX Lecture Notes Stewart Weiss
Chapter 2 Login Records, File I/O, and Performance

In short, the perror() function prints the string you pass it followed by the message from the
<errno.h> le. It is a good idea to create a function to handle errors, so that you do not have
to type these lines all of the time. Very often, the error is a fatal one, meaning that the program
cannot proceed if the error occurred. In this case, you would want to exit the program, calling
exit() to do so, as in

if ( fd = open("myfile", O_RDONLY) == -1 ) {
perror("Cannot open file: ");
exit(1);
}

The exit() function is declared in <stdlib.h>; its man page is in Section 3. A simple function for
handling fatal errors would be

#include <stdio.h>
#include <stdlib.h>

void fatal_error(char *string1, char *string2)


{
fprintf(stderr,"Error: %s ", string1);
perror(string2);
exit(1);
}

You might also benet from writing a second function to call when you do not want to terminate
the program, or you could combine the two into a single, general-purpose function that does either,
by passing a parameter to indicate the error's severity.

2.6.5 Displaying login Records


This is the rst attempt at show_info():

1 void show_info ( struct utmp ∗ utbufp )


2 {
3 p r i n t f ("% − 8.8 s " , u t b u f p −>ut_name ) ; /∗ the logname ∗/
4 p r i n t f (" ");
5 p r i n t f ("% − 8.8 s " , u t b u f p −>u t _ l i n e ) ; /∗ the tty ∗/
6 p r i n t f (" ");
7 p r i n t f ("%10 l d " , u t b u f p −>ut _tim e ) ; /∗ login time ∗/
8 p r i n t f (" ");
9 p r i n t f ("(% s ) " , u t b u f p −>u t _ h o s t ) ; /∗ the host ∗/
10 p r i n t f ("\n " ) ; /∗ newline ∗/
11 }

If this were compiled and run on a system that supported this API, the output would look something
like

19
UNIX Lecture Notes Stewart Weiss
Chapter 2 Login Records, File I/O, and Performance

$ who1
system b 952601411 ()
952601423 ()
LOGIN console 952601566 ()
acotton ttyp3 964319088 (math-guest04.williams.edu)
ttypc 964319645 ()

This output diers from the output of who in two signicant ways. First, there are records in the
output of who1 that do not correspond to user logins, and second, the login times are in some
strange format. Both of these problems are easily xed.

2.6.6 A Second Attempt at Writing who


2.6.6.1 Suppressing Records That Are Not Active Logins
The le /usr/include/utmp.h contains denitions of integer constants used for the ut_type mem-
ber. They are

#define EMPTY 0
#define RUN_LVL 1
#define BOOT_TIME 2
#define OLD_TIME 3
#define NEW_TIME 4
#define INIT_PROCESS 5 /* Process spawned by "init" */
#define LOGIN_PROCESS 6 /* A "getty" process waiting for login */
#define USER_PROCESS 7 /* A user process */
#define DEAD_PROCESS 8

utmp le are created by the init process and are initialized with a ut_type of
New entries in the
INIT_PROCESS. Recall from Chapter 1 that what happens when a user logs in depends upon whether
it is a console login, a login on an xterm window, or a login over a network using a protocol such
as SSH. In all cases, the ut_type of the entry is changed from INIT_PROCESS to LOGIN_PROCESS,
either by a getty process or a similar process, depending on the source of the login. The getty
(or similar) process prints the login prompt, collects the user's input to the prompt (which should
be a username) and creates a login process, handing the user's username to the login process. The
login process prompts for the password and authenticates it. If it is valid, it changes the ut_type
to USER_PROCESS. When a user logs out, the ut_type is changed to DEAD_PROCESS.
This implies that the ut_type member of a currently logged-in user record will have the value
USER_PROCESS. No other utmp record will be of type USER_PROCESS and so all we need to do to
suppress non-user records is to print only those records whose ut_type member is USER_PROCESS.
The show_info() function will be modied by the inclusion of this check:

show_info( struct utmp *utbufp)


{
if ( utbufp->ut_type != USER_PROCESS )
return;
...
}

20
UNIX Lecture Notes Stewart Weiss
Chapter 2 Login Records, File I/O, and Performance

This solves the rst problem.

2.6.6.2 Displaying Login Time in Human-Readable Form


Solving the second problem requires an understanding of how calendar, or universal, time is repre-
sented in UNIX systems and what functions are provided in the API for manipulating time values.

UNIX represents time as the number of seconds elapsed since 12:00 A.M., January 1, 1970, Coordi-
nated Universal Time (UTC )11 , known as the Epoch. UTC is essentially like Greenwich Meridian
Time except that it includes occasional leap seconds to synchronize with the earth's rotation
12 .

UNIX stores time in objects of type time_t, the implementation of which is not standardized. On
many systems time_t is a typedef for a 32-bit integer. Such implementations will fail in the year
2038, when it overows. Representing time as an integer number of seconds since the Epoch makes
it easy for the kernel to update times, but not very easy for a human to determine the time.

How can we learn more about UNIX time and the various parts of the API related to it? The
answer again is to do a man page search. If you search on the keyword "time" you will nd too
many man pages that refer to time. A second keyword will be needed to rene the search. Perhaps
convert or transform or something similar, to capture functions that transform time from one
form to another. Trying

$ man k time | grep transform

we will see several functions related to time, including ctime() and localtime(). The man page
will also include reference to the header le, <time.h>, which must be included for most of these
functions. These functions share a single man page. Reading this page reveals that ctime() converts
a time_t time into a human readable string of the form

"Mon Aug 11 23:12:06 2003\n"

To be precise, the ctime() function is declared as

char *ctime(const time_t *clock);

Observe that the argument is the address of a time_t value, not the value itself. The return value
is a pointer to a string consisting of a 3-letter day abbreviation, a 3-letter month abbreviation, the
day of the month, the 24-hour time in hours, minutes, and seconds, and the 4-digit year. The string
is allocated statically by ctime(), so it might be overwritten by other calls, so it is best to copy it
into a local variable if it needs to be available at a later time.

Note 1. ctime() is one of many functions that return a pointer to a string that is allocated statically.
Make sure that you understand what this means. The string itself is allocated by ctime() and a
pointer to that memory is returned to the caller. Subsequent calls to ctime() will overwrite the
previously allocated memory. The caller will be unable to retrieve the old value unless it was copied

11
The abbreviation UTC is a compromise between the English and French abbreviations. In English, it would be
CUT and in French, TUC.
12
The earth's rotation can vary due to astronomical conditions. UNIX systems are not required by POSIX to
represent exact UTC; they are allowed to ignore the leap seconds.

21
UNIX Lecture Notes Stewart Weiss
Chapter 2 Login Records, File I/O, and Performance

to a local. Also, the caller is not responsible for freeing the memory allocated to the string; that
is handled by the library. This is just one of many functions that are not thread-safe, a topic we
discuss below.

The localtime() function takes a time_t argument but returns a pointer to a struct tm, which
is a structure whose members are the various components of time, such as the day-of-week, the
month, day, and year, and so on.

If you read through the man page carefully, which you should, you will nd near the end the
conformance section. It states:

CONFORMING TO
POSIX.1-2001. C89 and C99 specify asctime(), ctime(), gmtime(), localtime(),
and mktime(). POSIX.1-2008 marks asctime(), asctime_r(), ctime(), and ctime_r()
as obsolete, recommending the use of strftime(3) instead.

The ctime() function is disparaged at this point. One should instead use strftime(), whose
prototype is

#include <time.h>
size_t strftime(char *s, size_t max, const char *format, const struct tm *tm);

This function, unlike ctime(), allows the calling program to specify the format of the character
string to be created. It is also safer to use in that the string is passed as an argument to the function,
allocated by the caller, instead of allocated statically and returned as the function value. The rst
argument is a pointer to the string to be lled, the second, the size of the array of chars to ll, the
third is a format for the string, and the last is the tm structure containing the broken down time
representation.

The format specication is described in great detail in the man page for the function. It is similar
to the format for the printf() function in that it is a string literal enclosed in double-quotes,
with conversion specications of the form %x , where x is a character to be replaced. For example,
%M represents minutes as a decimal number in the range 00 to 59. and %b is the abbreviation of
the month name in the current locale. This phrase, in the current locale means that the locale
settings of the user are used in deciding the exact string that %b will produce. Every user has a
locale in UNIX. The topic of locales will be covered in a later section. The important point now
is that strftime(), unlike ctime(), can use locale information in determining the format of the
output string. In chapter 3 we will use this function to display time with more control. For our
implementation of the who command, we will use ctime().
The who program only displays the date, hours and minutes. For the above example, it would
display only "Aug 11 23:12". Our implementation of who must extract this substring from the
larger string. In other words, given

"Mon Aug 11 23:12:06 2003\n"

it needs to print

"Aug 11 23:12"

22
UNIX Lecture Notes Stewart Weiss
Chapter 2 Login Records, File I/O, and Performance

A simple way to achieve this, perhaps not obvious, is to use pointer arithmetic to print only those
characters of the source string in which we are interested. The rst character is 4 characters after
the start of the string, and the length of the string is exactly 12 characters. Assuming that t is a
time_t variable containing the required time to be printed, the following printf()
13 call will do

the trick:

printf("%12.12s", ctime(&t) + 4 );

which prints the 12 chars starting at position 4 in the full string. The format %12.12s forces the
string to use 12 characters on the output. The complete program is shown below. You should study
it carefully.

1 Listing who2 . c
2 // This s o l v e s t h e time d i s p l a y problem and i t f i l t e r s r e c o r d s
3
4 #include < s t d i o . h>
5 #include < s t d l i b . h>
6 #include <u n i s t d . h>
7 #include <utmp . h>
8 #include < f c n t l . h>
9 #include <t i m e . h>
10
11 void show_time ( long ) ;
12 void show_info ( struct utmp ∗);
13
14 int main ( int argc , char ∗ argv [ ] )
15 {
16 struct utmp utbuf ; // read i n f o i n t o here
17 int utmpfd ; // read from t h i s d e s c r i p t o r
18 int reclen = sizeof ( u t b u f ) ;
19
20 if ( ( utmpfd = o p e n (UTMP_FILE, O_RDONLY) ) == −1 ){
21 p e r r o r (UTMP_FILE ) ;
22 exit (1);
23 }
24
25 while ( r e a d ( utmpfd , &u t b u f , r e c l e n ) == r e c l e n )
26 s h o w _ i n f o ( &u t b u f );
27 c l o s e ( utmpfd ) ;
28 return 0;
29 }
30

13
If you are not familiar with the following C functions, you should take the time to familiarize yourself with them:
printf, fprintf, sprintf, scanf, fscanf, and sscanf. These are all part of C and hence C++ and any C or C++
book should contain adequate descriptions of them. You can also look at the manpages for them. Once you know
printf and scanf, the others are trivial to understand. The best way to learn them is to write a few very simple
programs of course.

23
UNIX Lecture Notes Stewart Weiss
Chapter 2 Login Records, File I/O, and Performance

31 void show_info ( struct utmp ∗ utbufp )


32 // d i s p l a y s t h e c o n t e n t s o f t h e utmp s t r u c t only i f a user
33 // l o g i n , with time in human r e a d a b l e form , and h o s t i f
34 // not n u l l
35 {
36 if ( u t b u f p −>u t _ t y p e != USER_PROCESS )
37 return ;
38
39 p r i n t f ( " % − 8.8 s " , u t b u f p −>ut_name ) ; / ∗ t h e logname ∗ /
40 printf (" " );
41 p r i n t f ( " % − 8.8 s " , u t b u f p −>u t _ l i n e ) ; /∗ t h e t t y ∗/
42 printf (" " );
43 show_time ( u t b u f p −>ut_ tim e ); / ∗ l o g i n time ∗/
44 printf (" " );
45 if ( u t b u f p −>u t _ h o s t [ 0 ] != ' \0 ' ) /∗ t h e h o s t ∗/
46 printf (" (% s ) " , u t b u f p −>u t _ h o s t ) ;
47 p r i n t f ( "\n" ) ;
48
49 }
50
51 void show_time ( long timeval )
52 // d i s p l a y s time in a format f i t f o r human consumption
53 // uses ctime to b u i l d a s t r i n g then p i c k s p a r t s out o f i t
54 // Note : %12.12 s p r i n t s a s t r i n g 12 chars wide and LIMITS
55 // i t to 12 chars .
56 {
57 char ∗ t i m e s t r = c t i m e (& t i m e v a l ) ;
58 // s t r i n g l o o k s l i k e " Sat Sep 3 1 6 : 4 3 : 2 9 EDT 2011"
59
60 // p r i n t 12 chars s t a r t i n g at char 4
61 p r i n t f ( " %12.12 s" , timestr + 4 );
62 }

2.6.7 A Third Version of who


The preceding versions of who read the data from the utmp le using the read() system call, reading
one utmp struct at a time. An alternative method of accessing the data in the le is to take advantage
of a data abstraction layer that the API makes available. When we did the man page search for
man pages related to the utmp le, we ignored the functions found on the page named getutent:

endutent [getutent] (3) - access utmp file entries


getutent (3) - access utmp file entries
getutid [getutent] (3) - access utmp file entries
getutline [getutent] (3) - access utmp file entries
pututline [getutent] (3) - access utmp file entries
setutent [getutent] (3) - access utmp file entries
utmpname [getutent] (3) - access utmp file entries

24
UNIX Lecture Notes Stewart Weiss
Chapter 2 Login Records, File I/O, and Performance

We now take a look at what that page has to oer. The beginning of the page contains the following
(depending on what system you have):

SYNOPSIS
#include <utmp.h>
struct utmp *getutent(void);
struct utmp *getutid(struct utmp *ut);
struct utmp *getutline(struct utmp *ut);
struct utmp *pututline(struct utmp *ut);
void setutent(void);
void endutent(void);
int utmpname(const char *file);
DESCRIPTION
New applications should use the POSIX.1-specified "utmpx"
versions of these functions; see CONFORMING TO.

The very rst sentence in this man page tells us that these functions are not POSIX.1-compliant,
and that there are utmpx versions of these functions. We will ignore this warning for the moment
and see how to use these non-POSIX functions, simply because there is something that needs to be
explained about the POSIX.-1-compliant interface, to which we will return afterward.

The man page basically tells us that there is a simple way of reading the records in a utmp le,
requiring just four steps:

1. Use utmpname() to select the le that should be accessed by the other functions.

2. Call setutent() to rewind the le pointer to the beginning of the le.

3. Repeatedly call getutent() to get the next utmp record from the le; getutent() will return
a NULL pointer after it has read the last record from the le.

4. Call endutent() when we have read all of the records.

In other words, this interface provides a hidden iterator to the utmp le: setutent() initializes it,
getutent() advances it successively, and endutent() sends a signal that it is no longer needed. In
addition, the utmpname() function simply needs to be told the pathname to the le, and it will take
care of opening it.

The man page also mentions that _PATH_UTMP is a macro whose value is the path to the utmp le.
We already knew that UTMP_FILE contained that path, but if we dig a little deeper by actually
reading the header les, we will discover that the <paths.h> header le denes _PATH_UTMP and
_PATH_WTMP and that <utmp.h> denes UTMP_FILE as another name for _PATH_UTMP.
We can put all of this together to create a simpler version of who, named who3. In this version we
add the extra feature that the user can optionally supply the word  wtmp on the command line if
she wants to see records in the wtmp le instead. The show_info() and show_time() functions are
the same, so we just display the main program in the listing.

25
UNIX Lecture Notes Stewart Weiss
Chapter 2 Login Records, File I/O, and Performance

1 Listing who3 . c
2 #include < s t d i o . h>
3 #include < s t d l i b . h>
4 #include <u n i s t d . h>
5 #include <utmp . h>
6 #include < f c n t l . h>
7 #include <t i m e . h>
8
9 int main ( int argc , char ∗ argv [ ] )
10 {
11 struct utmp ∗ utbufp ;
12
13 if ( ( a r g c > 1 ) && ( s t r c m p ( a r g v [ 1 ] , "wtmp" ) == 0 ) )
14 utmpname (_PATH_WTMP) ;
15 else
16 utmpname (_PATH_UTMP) ;
17
18 setutent ( ) ;
19 while ( ( utbufp = getutent ( ) ) != NULL )
20 show_info ( utbufp );
21 endutent ( ) ;
22 return 0;
23 }

This program is not thread-safe. Many functions in the various UNIX libraries use static variables
to store their results. These variables act like global variables within the programs that call these
functions. If a program is multi-threaded, these threads can corrupt each others data if they use
the unsafe function calls in an overlapping way. Thread-safe functions do not have this problem. A
thread-safe version of the who3 program can use getutent_r(), which is a GNU thread-safe version
of getutent().
The man page tells us that to use the getutent_r() function, we have to set a macro, the
_GNU_SOURCE macro, before including the header le <utmp.h>. That is the purpose of the fol-
lowing lines from that man page:

The above functions are not thread-safe. Glibc adds reentrant versions
#define _GNU_SOURCE /* or _SVID_SOURCE or _BSD_SOURCE */
#include <utmp.h>
int getutent_r(struct utmp *ubuf, struct utmp **ubufp);

The macro denition of _GNU_SOURCE is required because the <utmp.h> header le contains feature
test macros. Feature test macros can be used to control which denitions are exposed in the system
header les when a program is compiled. This is important for creating portable applications,
because it prevents nonstandard denitions from being exposed in the program. If you remove the
denition of _GNU_SOURCE from your program and try to use getutent_r() you will get a compile
time error because the declaration of this function in the header le is guarded by a conditional
preprocessor directive that is true only if _GNU_SOURCE is dened. It is essentially of the form

26
UNIX Lecture Notes Stewart Weiss
Chapter 2 Login Records, File I/O, and Performance

#ifdef _GNU_SOURCE
extern int getutent_r (struct utmp *__buffer, struct utmp **__result) __THROW;
/* more stuff here
#endif

If you put the denition of _GNU_SOURCE after the include directive, it will be useless because it will
not be dened when the header le is preprocessed by gcc, and so in this case too you will get an
error message.

The feature_test_macros man page describes everything you need to know to use these macros.

The main program of this thread-safe who, which we call who4.c, is almost the same as that of
who3.c:

1 Listing who4 . c
2 #include < s t d i o . h>
3 #include < s t d l i b . h>
4 #include <u n i s t d . h>
5
6 #define _GNU_SOURCE
7 #include <utmp . h>
8 #include < f c n t l . h>
9 #include <t i m e . h>
10
11 int main ( int argc , char ∗ argv [ ] )
12 {
13 struct utmp utbuf , ∗ utbufp ;
14 int utmpfd ;
15
16 if ( ( a r g c > 1 ) && ( s t r c m p ( a r g v [ 1 ] , "wtmp" ) == 0 ) )
17 utmpname (_PATH_WTMP) ;
18 else
19 utmpname (_PATH_UTMP) ;
20
21 setutent ( ) ;
22 while ( g e t u t e n t _ r (& u t b u f , &u t b u f p ) == 0 )
23 s h o w _ i n f o ( &u t b u f );
24 endutent ( ) ;
25 return 0;
26 }

2.6.8 A POSIX-compliant Version


There is yet another version of the who program, named who_p.c, in the demos directory for this
chapter on the server. This version is distinguished by the fact that it uses the POSIX-compliant
utmpx interface. The utmp structure is not standard across all versions of UNIX. The one we
described above is the GNU implementation, which is what is found on Linux systems. This GNU

27
UNIX Lecture Notes Stewart Weiss
Chapter 2 Login Records, File I/O, and Performance

version includes members that may not be present on other systems. In an eort to standardize the
utmp interface, the POSIX standards since 2001 have replaced the denition of the utmp structure
with a utmpx structure. This structure is only guaranteed to have the following members:

char ut_user[] User login name.


char ut_id[] Unspecified initialization process identifier.
char ut_line[] Device name.
pid_t ut_pid Process ID.
short ut_type Type of entry.
struct timeval ut_tv Time entry was made.

In addition, the functions setutent(), getutent(), and endutent() are replaced by the corre-
sponding functions setutxent(), getutxent(), and endutxent(). In general, the utmpx structure
may dene a dierent set of members than those found in a utmp structure. Linux systems actually
dene the utmpx structure to be the same as the utmp structure, unless the _GNU_SOURCE macro is
dened. In addition, Linux systems dene a larger set of allowed values of the ut_type member than
does POSIX. Programs that are meant to be portable can use conditional compilation with feature
test macros to detect which structure is actually on the system at compile time. The who_p.c
program demonstrates how this is done, but is not included in these notes.

2.6.9 Summary
The preceding set of implementations of the who command demonstrates that the man pages and
header les can be used to learn enough about a command to implement it. The utmp interface
may not be the same on every UNIX system, and as a result there are several dierent methods of
approaching the problem. One can use the GNU, non-POSIX, thread-safe version of the interface,
for example, or the POSIX-compliant utmpx interface. One can also use the lower-level system calls,
e.g. read(), to access either the utmpx or the utmp structure directly. A truly portable solution
would use feature test macros to conditionally compile the code depending on what system it is to
be run on. The exercise introduced various concepts along the way, but we are still not nished
with it. Later we will return to the problem with a more ecient solution.

2.7 Using a File in Read/Write Mode


Many applications need to have a le open for both reading and writing. A good example of this is
the logout command. The logout command has to update the utmp le, nding within it the record
to be updated (i.e., reading it) and then modifying that record (writing it). Most I/O libraries allow
a le to be opened for both reading and writing.

2.7.1 Opening a File in Read/Write Mode


Recall that the open() system call's second parameter is a set of ags stored in an integer, and
that the ags must include one of the access mode ags: O_RDONLY, O_WRONLY, and O_RDWR. If the
access mode is set to O_RDWR, then the le is opened in read/write mode. In read/write mode, the
process can read from and write to the le. The le is not truncated as it would be if opened with

28
UNIX Lecture Notes Stewart Weiss
Chapter 2 Login Records, File I/O, and Performance

the O_CREAT ag. Instead it is opened with the current position pointer pointing to the start of the
le. The current position pointer is a member of the open le structure, the data structure that is
created by the kernel when a le is opened. It points to the position of the next byte to read or
write in the le.

For example, to open the le whose path is stored in the C-string file_to_open, one could write

if ( ( fd = open(file_to_open, O_RDWR)) == -1 ) {
perror(file_to_open);
// handle error here
}

2.7.2 Logout Records


When a user logs out of a UNIX system, the kernel does some bookkeeping tasks. One of the tasks is
to update the utmp le to indicate that the user logged out. In particular, it has to change the utmp
record for the login session by changing the ut_type member from USER_PROCESS to DEAD_PROCESS.
It also has to change the ut_time member to the current time and zero out the ut_user and ut_host
members.

In short, the logout process has to do the following:

1. Open the utmp le for reading and writing

2. Read the utmp le until it nds the record for the terminal from which the logout took place.

3. Modify a copy of the utmp record in the process's memory, and replace the utmp record in the
le with the modied one, i.e., modify the utmp le.
4. Close the utmp le.

The rst and last steps need no discussion. The second step requires being able to identify which
utmp record in the le corresponds to the one logout is trying to modify. It cannot use the ut_user
member because a single user might have several lines open at a time. The piece of information that
is unique is stored in the ut_line. The ut_line member stores the name of the pseudo-terminal
as a string such as "pts/4". Only one person can be using a given terminal at the same time, so it
is sucient to match the line.

The more interesting part of this task is how to replace the utmp record in the le. The record
may be in the middle of the le, so this operation involves replacing a xed-size sequence of bytes
starting at some specic position in a le with a sequence of the exact same size.

2.7.3 Using lseek to Move the File Pointer


As noted above, when a le is opened and a le descriptor is returned for it, a data structure is
created by the kernel. This data structure represents the connection to the le. The current position
pointer of the data structure is the position of the next byte to read or write in the le. If the le
is open for reading, a read of N bytes starts at this position, and then the current position pointer
is advanced N bytes. If it is open for writing and writes N bytes, it writes starting at the current

29
UNIX Lecture Notes Stewart Weiss
Chapter 2 Login Records, File I/O, and Performance

position and then advances it N bytes. Usually when a le is open for writing the current position
pointer is at the end of the le.

The lseek() system call changes the current position pointer in an open le.

#include <sys/types.h>
#include <unistd.h>
off_t lseek( int fd, off_t dist, int base)

lseek() is given a le descriptor, fd, a distance in bytes, dist, and an integer ag, base. base
can be one of three values. The distance, dist, is used by lseek() to move the current position
pointer. If dist is positive, it moves forward; if it is negative, it moves backwards. The value of
base determines the starting position of the current position pointer from which it is to be moved.
The three values are

SEEK_SET the distance dist is forwards relative to the start of the le,

SEEK_CUR the distance, dist, is relative to the current position pointer and may be positive or
negative

SEEK_END the distance, dist, is relative to the end of the le and may be positive or negative.

If lseek() is successful, its return value is the resulting oset location as measured in bytes from
the beginning of the le, otherwise it returns 1.

When the value of the oset is positive and the base is SEEK_END, the le pointer is moved beyond
the end of the le. Data can be written to this position, and this in eect creates a hole in the le.
For example, if a le is currently open and has the contents  123456789, and a seek is performed
that moves the le pointer 5000 bytes past the end, after which the characters  abcde are written
to the le, then the le size will be 5014 bytes, even though there is a hole of 5000 bytes within it.
More will be said about this in Chapter 3.

The lseek() call can be used to code the third step of the logout procedure.

2.7.4 Updating the utmp File on Logout


The problem with updating the utmp le is the following. We have to nd the record that corresponds
to the login record on the line on which the logout occurred. Therefore we need to repeatedly read
a utmp record and check whether the ut_line member matches the line. When we nd the record,
which has been read into a local variable in our function, we modify it and then have to write it
back. But at this point, the current position pointer has already been advanced to point to the
record immediately following the one we just read. Figure 2.1 illustrates this.

In the gure, the matching record is numbered k. After it is found, the pointer has been advanced
to the start of record k+1. In order to write the modied record where the original was, we need
to move the current position pointer back with lseek(). The following program demonstrates the
key ideas.

30
UNIX Lecture Notes Stewart Weiss
Chapter 2 Login Records, File I/O, and Performance

... utmp record k−1 utmp record k utmp record k+1 utmp record k+2 ...

... utmp record k−1 utmp record k utmp record k+1 utmp record k+2 ...

Figure 2.1: Updating a utmp record in read/write mode

Listing who5 . c
#i n c l u d e ....

int main ( i n t argc , char ∗ argv [ ] )


{
struct utmp utbuf ; // stores a single utmp record
int fd ; // file descriptor for utmp file
int utsize = s i z e o f ( utbuf ) ;
int utlinesize = s i z e o f ( utbuf . ut_line ) ;

if ( argc < 3 ){ // check usage


f p r i n t f ( stderr ,
" u s a g e : %s <utmp− f i l e > < l i n e >\n " , argv [ 0 ] ) ;
exit (1);
}

// try to open utmp file


if ( ( f d = o p e n ( a r g v [ 1 ] , O_RDWR) ) == −1 ) {
f p r i n t f ( stderr , " Cannot o p e n %s \ n " , argv [ 1 ] ) ;
exit (1);
}

// If the line is longer than a ut_line permits do not


// continue
if ( s t r l e n ( a r g v [ 2 ] ) >= UT_LINESIZE ) {
f p r i n t f ( stderr , " Improper a r g u m e n t :% s \ n " , argv [ 1 ] ) ;
exit (1);
}

while ( read ( fd , &u t b u f , utsize ) == u t s i z e )


if ( ( strncmp ( u t b u f . ut_line , a r g v [ 2 ] , u t l i n e s i z e ) == 0 )
&& ( utbuf . ut_user [ 0 ] == '\0 ' ) ) {
u t b u f . ut_type = DEAD_PROCESS;

31
UNIX Lecture Notes Stewart Weiss
Chapter 2 Login Records, File I/O, and Performance

utbuf . ut_user [ 0 ] = '\0 ';


u t b u f . ut_host [ 0 ] = '\0 ';

if ( g e t t i m e o f d a y (& u t b u f . ut_tv , NULL) == 0 ) {


if ( l s e e k ( fd , −u t s i z e , SEEK_CUR) != −1 )
if ( w r i t e ( fd , &u t b u f , utsize ) != utsize )
exit (1);
}
else {
f p r i n t f ( stderr , " Error getting time of day \ n " ) ;
exit (1);
}
break ;
}
c l o s e ( fd ) ;
return 0;
}

Notice that every system call is tested for failure before its result is used (except for the call to
write()). Here, the calls are embedded within the conditional expressions of the if and while
statements above. The rst if checks whether the record read in the while condition has the same
terminal line as the one we are looking for (stored in the variable line) and the user member is
not null. If this is successful, the type member ut_type of the record is set to the DEAD_PROCESS
type, the user and host members are set to null strings, and the time member, ut_tv, is updated to
the current time. If this is successful, the lseek() call moves the current pointer back to the start
of the last matched record, so that the write operation that follows will replace the old record. If
the write operation is reached and executes without error (determined by checking that the number
of bytes written is equal to the number requested to be written), then the program returns 0 for
success.

2.7.5 Another Use of lseek


One other use of lseek() is determining an open le's size without having to look at its properties.
Recall that the return value of lseek() is the location of the le pointer after it has been moved,
relative to the beginning of the le, and expressed in bytes. If we move the le pointer to the end
of the le using lseek(), then we get its size as the return value. If fd is a le descriptor for the
given le, then

size_t filesize = lseek(fd, 0, SEEK_END);

stores the size of the le into the variable filesize. We will make use of this soon.

2.8 Performance and Eciency : Writing the cp Command


The who program was an exercise in reading a system data le and extracting information from
it. It was a naive start, in that we did not pay much attention to its eciency, which is of utmost

32
UNIX Lecture Notes Stewart Weiss
Chapter 2 Login Records, File I/O, and Performance

concern with most software. To demonstrate the problem a bit more clearly, we will implement a
dierent command, one whose eciency or lack thereof will be much more obvious. Then we will
take what we learned from that exercise and apply it to the who program in our nal version. The
command of interest is the cp command, which copies one or more les or directories.

2.8.1 What cp Does


If you are familiar with the cp command you can skip this section. There are several dierent ways
in which the cp command can be used. The simplest is to make a copy of a single le:

$ cp source_file target_file

Whether or not target_file already exists, cp makes a copy of source_file named target_file.
If it does exist, it will be overwritten, an act known as clobbering. This is dangerous, as you cannot
recover the le once you have clobbered it. To prevent accidental overwrites, the interactive option
-i should always be used, as in

$ cp -i source_file target_file
cp: overwrite `target_file'? n

It is a good idea to dene an alias in the .bashrc le,

alias cp='cp -i'

so that you never forget to use the interactive mode.

If a new le is created, it will have the permissions and ownership of the source le. If an existing le
is overwritten, it retains the permissions and ownership it had before the copy. No other attributes
are preserved in a copy. To preserve the time-stamps and other attributes, you must use the -p (p
for preserve ) option.
Another form of the cp command is

$ cp source_file ... target_dir

in which the very last word on the command line, target_dir, is a directory and all preceding
words are non-directory les. In this case, if the directory does not exist, it is an error. Otherwise
all of the source les are copied into the directory with their existing permissions and names. If any
names already exist in the target directory, the rules described above apply.

In the last form,

$ cp -r |-R source source ... target_dir

the sources can include directory names. All of the les and directories specied on the command-
line, up to but not including target_dir, are copied into target_dir, which must already exist.
The r or R option must be specied otherwise it is a syntax error. The r species that the
directories will be copied recursively. The R is essentially the same; the dierence has to do with
how they handle pipes, which is unimportant now.

For the remainder of the chapter, we try to understand the implementation of the simple form of
the command, without any options.

33
UNIX Lecture Notes Stewart Weiss
Chapter 2 Login Records, File I/O, and Performance

2.8.2 Opening/Creating Files For Writing


The cp command has to create a le if it does not exist and open it for writing, or overwrite it if it
does exist. To overwrite a le, it is rst truncated, i.e., its length is set to 0, and then the new data
is written to the empty le.

2.8.2.1 Creating/Truncating Files


The rst task is to learn how to create les and truncate them. In fact, one call accomplishes both.
The creat() system call is used to open a le for writing, if it exists, setting its length to 0 rst, or
if it does not exist, to create it. Notice that there is no "e" at the end of creat. If you type  man
creat you will get the man page for the open() system call:

#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
int open(const char *pathname, int flags);
int open(const char *pathname, int flags, mode_t mode);
int creat(const char *path, mode_t mode);

The creat() system call has two arguments, a C string and a mode_t. The string should contain
the path name of the le to be created and the mode_t species the le's mode, i.e., its permission
string, as an octal number. For example,

fd = creat("prototype", 0751)

creates a le named prototype in the current working directory, if it does not exist, with permission
0751 (owner can read, write, and execute, group can read and execute, others can execute only)
provided that the process's umask does not modify the permission. Umasks are covered in the next
chapter. If the le exists, the mode argument is ignored and the le is truncated
14 . In either case,

upon termination of the call, fd is a le descriptor associated with the write-only connection to the
le.

2.8.2.2 Writing to Files


Having opened a le for writing, the next step is to write data into it. The write() system call
is a symmetric counterpart to the read() call. It is used for writing sequences of bytes to the le
specied by a given le descriptor:

#include <unistd.h>
ssize_t write(int fildes, const void *buf, size_t nbyte);
14
It is possible to prevent the le from being overwritten in case it exists, but not if you use the creat() call to try
to create it. Instead the open() call must be used. Chapter 4 covers the various methods of opening a le for writing.

34
UNIX Lecture Notes Stewart Weiss
Chapter 2 Login Records, File I/O, and Performance

The size_t type stores the sizes of things in bytes. It is usually a typedef of an unsigned long
integer, which may be 32 or 64 bits. The ssize_t type is almost the same as the size_t type.
It diers only in that it is signed and that it can also store a 1. If successful, the write() call
transfers nbyte bytes from the memory location pointed to by buf in the process's address space
to the position of the le-pointer in the le associated with fd, and returns the number of bytes
transferred. If the kernel cannot copy any of the data, write() returns 1.
The word "buer" is used to describe the second parameter in the read() and write() system
calls. It is declared as a void pointer. It is called a buer because it is a storage location in the
memory space of the calling process that is used to hold the data to be transferred to or from the
le.

The code fragment

if (write(fd, buffer, num_bytes) != num_bytes ) {


fprintf(stderr, "Problem writing to file.\n");
}

attempts to transfer num_bytes bytes from the memory location pointed to by buffer to the position
fd. (By default, the le pointer
of the le pointer in the le opened for writing via the le descriptor
is at the end of the le, unless it has been moved elsewhere.) The reason for the condition

if (write(fd,buffer,num_bytes) != num_bytes)

is that the return value of write() is the number of bytes actually written and it may not be equal
to the number of bytes that were supposed to be written. The number of bytes successfully written
may be less than num_bytes for any number of reasons. The le might have reached a predened
maximum size, the disk might be full, or the user's disk quota might be reached. This is why it is
necessary to compare the return value of the write() call with the value of its third parameter.

2.8.3 A First Attempt at cp


The structure of the cp command is

open the sourcele for reading


open the copyle for writing
while a read of data from the sourcele to a buer is not an empty read
write the data from the buer to the copyle
close the sourcele
close the copyle

We know how to open and close les and we know how to read and write them, so this is a relatively
easy program for us at this point. The only points that need explanation are how we create and use
buers. For example, how big should the buer be? How do we declare it and pass it to the calls?

35
UNIX Lecture Notes Stewart Weiss
Chapter 2 Login Records, File I/O, and Performance

1 Listing cp1 . c
2 // F i r s t attempt at cp command , based on a program
3 // by Bruce Molay in Understanding Uunix/ Linux Programming , p .53
4
5 #include < s t d i o . h>
6 #include <u n i s t d . h>
7 #include < f c n t l . h>
8
9 #define BUFFERSIZE 4096
10 #define COPYMODE 0644
11
12 void die ( char ∗ string1 , char ∗ string2 ); // p r i n t e r r o r and q u i t
13
14 int main ( int argc , char ∗ argv [])
15 {
16 int source_fd , target_fd , n_chars ;
17 char b u f [ BUFFERSIZE ] ;
18
19 if ( argc != 3 ){
20 fprintf ( stderr , " u s a g e : %s s o u r c e d e s t i n a t i o n \n" ,
21 ∗ argv ) ;
22 exit (1);
23 }
24
25 // t r y to open f i l e s
26 if ( ( s o u r c e _ f d = o p e n ( a r g v [ 1 ] , O_RDONLY) ) == −1 )
27 d i e ( " C a n no t open " , argv [ 1 ] ) ;
28 if ( ( target_fd = creat ( a r g v [ 2 ] , COPYMODE) ) == −1 )
29 die ( " C a n no t c r e a t " , argv [ 2 ] ) ;
30
31 // copy from source to t a r g e t
32 while ( ( n_chars = r e a d ( source_fd , buf , BUFFERSIZE ) )
33 > 0 ) {
34 if ( n_chars != write ( target_fd , buf , n_chars ) )
35 die ( " Write e r r o r to " , argv [ 2 ] ) ;
36 }
37 if ( −1 == n _ c h a r s )
38 d i e ( " Read e r r o r from " , argv [ 1 ] ) ;
39
40 // c l o s e both f i l e s
41 if ( c l o s e ( s o u r c e _ f d ) == −1 || c l o s e ( t a r g e t _ f d ) == −1 )
42 die ( " Error closing files" , "" );
43
44 return 0;
45 }
46
47 void die ( char ∗ string1 , char ∗ string2 )
48 {

36
UNIX Lecture Notes Stewart Weiss
Chapter 2 Login Records, File I/O, and Performance

49 f p r i n t f ( stderr , " E r r o r : %s " , string1 );


50 perror ( string2 ) ;
51 exit (1);
52 }

Comments
• The buer is declared as an array of BUFFERSIZE chars, which is equal to the maximum
number read in a read() call.

• The die() function encapsulates the error handling logic and calls the perror() function.

• Every call to a function in the API is checked for a possible error.

• The main work is in the while loop (lines 32-36). The entry condition is that the read() call
transferred one or more bytes. The body is the call to write the bytes just read to the output
le. The return value of write() is checked to see if the number of bytes transferred equals
the number requested by the call.

If you compile and run this program you will see that it works correctly. But does it run fast? How
long will it take to copy a very large le? How does one time programs in UNIX?

2.8.4 Timing Programs


The time command is a means of measuring the amount of time (and other resources) that a
command uses. The time command has many options, but its simplest form is

$ time -p command

where command is the command that you wish to know about. The '-p' option tells time to display
the traditional POSIX output, which consists of three values, each measured in seconds to two
decimal places:

• Elapsed clock time, denoted  real

• User time, denoted  user

• System time, denoted  sys

Elapsed time is the number of seconds from when the command was invoked until it completed.
User time is the total amount of time that the process, and any children executing on its behalf,
spent running in user mode. System time is the total amount of time spent on the process's behalf
running within the kernel, i.e., in privileged mode, including such time spent by its children as well.
Non-POSIX output may be more voluminous; you can read the man page for further details. Also,
shells such as bash typically dene their own version of the time command, so it is best to type the
full path name when using it, if you want the non-bash version.

I created a le named bigfile containing about 30 MB of data. When I ran

37
UNIX Lecture Notes Stewart Weiss
Chapter 2 Login Records, File I/O, and Performance

$ time -p cp1 bigfile copy_of_bigfile

I got the following output on one of the UNIX systems at Hunter College:

real 4.05
user 0.01
sys 0.02

What accounts for the dierence between the sum of user and system times and the elapsed time?
It is the time that the process spent waiting for I/O to complete. When a process issues a request
for I/O, it is blocked until the I/O is complete. The time that it spends in this blocked, or waiting,
state is part of the elapsed (real) time. cp1 spent about 4 seconds waiting for I/O. Although the
amount of time that a process spends waiting for I/O depends heavily on what else the system
is doing, the more calls it makes, the longer it will take, on average. The reason for this will be
explained below.

As we use cp1 on larger and larger les, we will see worse performance. To create a spreadsheet
with the results of the time command I used a dierent option to it:

/usr/bin/time -f "\t%e\t%U\t%S"

The -f option expects a format string, which I supplied as a tab-separated string of real-time, user-
time, and system-time format symbols. This allowed me to open the output with a spreadsheet
program for analysis:

File Size Real User Sys


(bytes)
19,004,256 17.28 0.00 0.05
38,008,512 39.17 0.01 0.11
76,017,024 73.69 0.00 0.21

Notice that the real and system times increase roughly in proportion to the size of the le over this
small sample.

2.8.5 Buering and its Impact on Performance


Consider the cp1 program above. Suppose that N is the size of the le to be copied, measured
in bytes. Then the while loop in lines 32 through 36 iterates dN/BU F F ERSIZEe times, since
each iteration copies BUFFERSIZE bytes. It follows that as BUFFERSIZE is increased the number of
iterations decreases inversely, i.e., if we double the buer size, we halve the number of calls to both
read() and write(). The question is, how is the total running time aected as the buer size
increases, in general? Is the amount of time to make a single call to read() proportional to the
number of bytes to be read, or are their other components of its running time that are not related
to the size of the read?

To answer this question, we will rst perform a little experiment. We will revise the cp program
so that the buer size is an input parameter, and run the program on a very large input le with
successively larger buer sizes, recording the three components of running time reported by the
time command for each run, and tabulating results. The revised program, called cp2.c, is in the
listing below.

38
UNIX Lecture Notes Stewart Weiss
Chapter 2 Login Records, File I/O, and Performance

1 Listing cp2 . c : a version of cp with buffer size given on the


2 command line
3 // i n c l u d e s and d e f i n e s omitted here
4
5 int main ( int argc , char ∗ argv [])
6 {
7 int BUFFERSIZE ;
8 char endptr [ 2 5 5 ] ;
9 int source_fd , target_fd , n_chars ;
10 char ∗ buf ;
11
12 // need to check f o r 3 arguments i n s t e a d o f 2
13 if ( argc != 4 ){
14 f p r i n t f ( stderr ,
15 " u s a g e : %s buffersize source d e s t i n a t i o n \n" ,
16 argv [ 0 ] ) ;
17 exit (1);
18 }
19 // e x t r a c t number from s t r i n g
20 BUFFERSIZE = s t r t o l ( argv [ 1 ] , ( char ∗ ∗ ) &e n d p t r , 0);
21 if ( BUFFERSIZE == 0 ) {
22 f p r i n t f ( stderr ,
23 " usage : b u f f e r s i z e must b e a number \ n " ) ;
24 exit (1);
25 }
26
27 // SNIP : code cut out here , i n c l u d i n g e r r o r h a n d l i n g
28
29 / ∗ a l l o c a t e b u f f e r o f s i z e BUFFERSIZE ∗ /
30 buf = ( char ∗ ) c a l l o c ( BUFFERSIZE , sizeof ( char ) ) ;
31 if ( NULL == b u f ) {
32 f p r i n t f ( stderr ,
33 " C o u l d n o t a l l o c a t e memory f o r b u f f e r . \ n" ) ;
34 exit (1);
35 }
36
37 // Everything e l s e i s t h e same from t h i s p o i n t forward ,
38 // and omitted from t h e l i s t i n g

For those who have not seen it before, calloc() (in line 30) and its companion, malloc() are
dynamic memory allocation functions in C. The prototype for calloc() is

void *calloc(size_t nelem, size_t elsize);

Unlike malloc(), calloc() takes two arguments: the number of elements, and the size in bytes of
each element, and it attempts to allocate space for an array of nelem elements, each of size elsize.

39
UNIX Lecture Notes Stewart Weiss
Chapter 2 Login Records, File I/O, and Performance

If it is successful, it returns a void* pointer to the start of the array and lls the allocated memory
with zeros. This pointer should be cast to the appropriate type before using it.

The table below shows the eect of buer size on the elapsed, user, and system times when copying
a le of size 19MB on a particular host in the Computer Science Department network at Hunter
College running RHEL 4. As you can see, the user and system times roughly decrease in inverse
proportion to the buer size for most of the sampled range of values. The user time decreases
because the process spends less time in its own code, since there are fewer iterations of the loop and
hence fewer instructions to execute. The system time decreases for the same reason  the read()
and write() system calls are executed fewer times and therefore less time is spent in the kernel.
The elapsed time tends to reach a steady value after the buer size reaches 16. Since the total of
the user and system time continues to decrease for buer sizes greater than 16, this suggests that
the limiting factor is the time that the process spends waiting for the I/O operations to complete.

Buer Real User Sys


Size(bytes)
2 50.19 3.11 28.27
4 33.27 1.59 13.09
8 24.28 0.76 6.08
16 22.56 0.39 3.08
32 20.53 0.21 1.57
64 21.66 0.10 0.78
128 20.12 0.04 0.43
256 18.27 0.02 0.24
512 19.70 0.00 0.15
1024 18.86 0.00 0.09

As the buer gets larger, the kernel is called fewer times to transfer the data: as we stated above, if
N is le size and B is buer size, the number of calls is c = dN/Be. Another way to say this is that
cB is constant. The table shows that, if s is total system time, sB is also approximately constant,
except for B > 256. In other words, the total system time is roughly proportional to the number of
calls made for small values of B . For larger values of B , the total system time is not in proportion
to the number of calls, but is larger than it. Why is this?

There are two components to the running time of an I/O operation: the transfer time and the
overhead. The overhead is largely independent of the number of bytes to be read or written; each
read or write request to the disk has overhead that does not depend much on how much data is
to be transferred. This includes various components of time required by the device to set up and
initiate the transfer. It also includes the cost of the system call itself, which is not always negligible.

The transfer time is the time that it actually takes to copy data between the device and memory and
is a function of the amount of data. The kernel's involvement in this transfer in modern machines
with DMA is minor; it mostly just starts it and does more work when it is nished. Nonetheless,
the kernel's involvement is a function of the amount of data to be transferred. Therefore, if B is
buer size, O is the overhead of a I/O operation, and t tB is the amount of
is a constant such that
time the kernel spends in a single transfer operation, a single read() or write() system call uses
O + tB time units, and the program takes ( N ON O
B ) · (O + tB) = B + tN = N · ( B + t) time. Since
N is the size of our data and does not change, you can see that the system time is proportional to
O
(B + t). This explains why the system time does not keep diminishing by half. Eventually the t

40
UNIX Lecture Notes Stewart Weiss
Chapter 2 Login Records, File I/O, and Performance
O
term is large in proportion to the
B term. When O is very large in comparison to t, doubling B
halves the expression, but otherwise it does not.

As we shall see shortly, in UNIX in particular, the design of a buering system within the I/O
system makes the transfer time on average even smaller.

2.8.6 System Call Overhead


System calls have overhead. When a user process makes a call to the kernel for some kind of service,
the user process stops executing instructions in its own user space and starts executing instructions
that are physically located in kernel space. Prior to making the call, the process executes the
user program in a non-privileged mode, also known as user mode, and this phase of the process is
called the user phase. During the system call, the process executes system code with the privileges
accorded the kernel, and is said to be in supervisor or kernel mode ; this is called the kernel phase of
the process
15 . When the call terminates, this kernel phase terminates and the user phase resumes.

This is a form of context-switch. A context-switch occurs when the kernel changes the currently
executed memory image (the context). This can happen because a new process is run or because the
kernel runs on behalf of a process, requiring that the memory image be switched. In some versions
of UNIX such as Linux 2.6, a full context switch is not performed when a process changes from user
phase to kernel phase or vice-versa.

The kernel needs to execute in kernel mode because it has to have access to all hardware instructions.
In contrast, user processes must be prevented from executing special instructions. Therefore, when
the system call is made, the machine must change mode twice, at the start and at the end of the
call. It must also change the CPU state, because when the kernel runs, it has a dierent address
space, dierent sets of resources, and so on. All of this changing means that a system call adds
overhead to the running time of the program.

2.8.7 System Buering


In addition to the overhead of the system call itself, there is overhead involved with read() and
write() system calls. When a user process issues a read request from a disk, for example, the kernel
does not transfer the data directly from the disk to the address space of the user process. Instead,
it transfers the data from the disk to a buer in kernel memory, and when all of the data has been
transferred, it copies it into the user process's address space. This copying of data from kernel
memory to user memory takes additional time. The symmetric situation occurs on writes: the
kernel copies the data from the user address space into kernel memory, and from there it transfers
it to disk
16 .

UNIX uses this buering scheme only for certain types of input and output
17 , particularly for read

15
On some UNIX systems, such as Linux 2.6, the user phase and kernel phase are called user mode and kernel
mode respectively.
16
There is a way to avoid this copying of data back and forth. Memory mapping is a method of I/O in which disk
les are mapped directly into user memory. This topic will be discussed in a later chapter. If you are curious, read
about the mmap() and munmap() system calls.
17
There are two types of I/O in UNIX: block I/O and character I/O. The block I/O system in UNIX is used for
block devices such as magnetic and optical disks and tapes. Character I/O is used for devices that are inherently
one-character-at-a-time devices, such as the keyboard and terminals in general. Character I/O does not use kernel
buers for I/O. All block I/O uses the kernel's buering system.

41
UNIX Lecture Notes Stewart Weiss
Chapter 2 Login Records, File I/O, and Performance

and write operations to and from disks. While it may seem at rst that it just adds overhead, in
fact it is a powerful and ecient method of reducing overall time spent performing I/O.

The buering scheme for both reading and writing makes it seem as if read operations read directly
from the device and write operations take place immediately. In fact, the kernel hides from the user
an important layer of complexity. To understand this complexity, one needs to know a bit about
how the disk is organized.

The disk is organized as a collection of xed-size disk blocks. Disk blocks are numbered so that
they can be identied. Each logical disk or disk partition has a unique name in UNIX, such as sd0a
or rsd2b.
The kernel maintains a pool of buers in kernel memory that can be assigned to each device. Each
buer is given a name, corresponding to the device to which it is assigned and the particular block
whose contents it holds. For example, a buer might be assigned block 511 from disk rsd2b.

On a read request by a process, the buer pool is searched for a buer whose name matches the
block being sought on the disk. If a buer is found, the data is read directly from memory without
any physical I/O. If the buer is not found, the data must be read from disk. A buer will most
likely have to be reused for this data. A least recently used (LRU) algorithm is used to decide which
buer to replace. After the buer is selected, if it is "dirty" its contents are written to disk. Buers
are dirty if they were modied since the last time they were written to disk. The buer is renamed
to match the block being read and the read is performed.

Write requests are handled similarly. When a process requests a write to a specic block on a disk,
the buer pool is searched and if a buer is not found whose name matches the disk address to be
written, a new buer is allocated for this write operation. If no buer is available, a block is chosen
using the LRU algorithm and relabeled. The data is stored in the buer without any physical I/O
(i.e, disk accesses) and the buer is marked dirty. The write will be performed only when the block
is renamed.

Note that this scheme can greatly reduce the need to perform disk I/O, because reads and writes
can take place in memory, which is much faster, and it is completely transparent to the user. But
what happens if the system suddenly comes to an unexpected halt? Unless the system has time to
"ush" its buers, the updates are lost. This is why one should never halt a system in the wrong
way.

The advantages of buering are a reduction in physical I/O and therefore a decrease in the overall
eective disk access time. The disadvantages include that

• I/O error reporting can lag behind the logical I/O and therefore can become meaningless,

• delayed disk writes can cause loss of data and le system inconsistencies in the event of
unexpected system halts, and

• the order in which buers are written to the external device may not be the same as the
order in which the logical I/O occurs, and unless programs are designed with this in mind,
disk-based data structures can become inconsistent.

Writes to sequential devices such as tape drives generally do not exhibit this problem because the
drivers are only allowed one outstanding write request per drive. In other words, if a logical write
operation is requested for a particular drive, but there is a request that has not yet been satised

42
UNIX Lecture Notes Stewart Weiss
Chapter 2 Login Records, File I/O, and Performance

by a physical write, the second request cannot be satised until the rst physical write takes place.
A device like a tape drive will reject requests for service until it nishes what it is doing. It is a
one-job-at-a-time device.

In Linux 2.6 and later, the kernel oers a service named direct I/O for processes that wish to bypass
the kernel buering system for block I/O. Certain types of programs such as database servers need
to implement their own caching schemes for eciency. Forcing them to also use the kernel buering
system would slow them down signicantly and make the system inecient, because then there
would be duplicate copies of blocks: those in the database server's cache and those in the kernel's
cache. With direct I/O transfers, the kernel transfers data directly between the disk and user space.
Unfortunately, there are many problems associated with direct I/O, which you can read about in
the man page for the open() system call. An apt conclusion is reached at the bottom of that page,
with a quote from Linus Torvalds:

In summary, O_DIRECT is a potentially powerful tool that should be used with caution. It
is recommended that applications treat use of O_DIRECT as a performance option which
is disabled by default.

"The thing that has always disturbed me about O_DIRECT is that the whole interface is
just stupid, and was probably designed by a deranged monkey on some serious mind-
controlling substances."

 Linus

2.8.8 Memory Mapped I/O


Memory mapping is a way to perform I/O without kernel buering, and it is fully supported on
almost all systems. The concept has been around for a long time. The idea in its simplest form
is easy to understand: a process can request that a le be mapped to a set of virtual memory
addresses. Changes to those addresses are, in eect, writes to the le. Reads of those addresses are
reads of the le.

The actual use of the memory mapping system calls, mmap() and munmap(), is a bit more complex
than this. The purpose of munmap(), as its name suggests, is to undo a mapping. The mmap() call
has several parameters. We introduce memory mapping by writing the cp program a third way,
using memory mapped I/O instead of reading and writing.

The basic idea is to follow the sequence of steps outlined below:

1. Map the entire input le to a region of memory. Assume it starts at address source_addr.
2. Determine the size of the input le in bytes. Call it filesize.
3. Create an output le with the given name and make it the same size as the input le.

4. Map the output le to a region of memory the exact same size as the le. Assume it starts at
address dest_addr.
5. Do a single memory-to-memory copy of filesize bytes from source_addr to dest_addr
using memcpy().
6. Undo the mappings and close the les.

43
UNIX Lecture Notes Stewart Weiss
Chapter 2 Login Records, File I/O, and Performance

This causes the input to be copied to the output without any reads or writes. In order to implement
these steps we need to know the prototypes of the mapping functions and memcpy(). The prototypes
are

#include <sys/mman.h>

void *mmap(void *addr, size_t length, int prot, int flags, int fd, off_t offset);

int munmap(void *addr, size_t length);

The mmap() call creates a new mapping in the virtual address space of the calling process. The
starting address for the new mapping is specied in the rst argument, addr. The second argument,
length, species the length in bytes of the mapping.

If addr is NULL, then the kernel chooses the address at which to create the mapping; this is the
most portable method of creating a new mapping. If addr is not NULL, then the kernel takes it as
a hint about where to place the mapping; on Linux, the mapping will be created at a nearby page
boundary. The address of the new mapping is returned as the result of the call. It is best to always
use NULL as the rst argument.

The third argument describes the memory protection of the mapping; it must not conict with the
open mode of the le. The possible values are

PROT_EXEC Pages may be executed.

PROT_READ Pages may be read.

PROT_WRITE Pages may be written.

PROT_NONE Pages may not be accessed.

They can be or-ed together. In other words, if the le was opened read-only (O_RDONLY), then the
value should be PROT_READ. If it was opened read-write, then it should be set to PROT_READ|PROT_WRITE.
A warning about this follows below.

The fourth argument determines whether updates to the mapping are visible to other processes
mapping the same region, and whether updates are carried through to the underlying le. This
behavior is determined by including exactly one of the following values in ags:

MAP_SHARED Share this mapping. Updates to the mapping are visible to other processes that map
this le, and are carried through to the underlying le. The le may not actually be
updated until msync() or munmap() is called.

MAP_PRIVATE Create a private copy-on-write mapping. Updates to the mapping are not visible to
other processes mapping the same le, and are not carried through to the underlying
le. It is unspecied whether changes made to the le after the mmap() call are visible
in the mapped region.

Because we want to do I/O we need to set the ag to MAP_SHARED, otherwise no changes will appear
in the output le. There are other values that can be or-ed to this ag, but we will not discuss them
at this point.

44
UNIX Lecture Notes Stewart Weiss
Chapter 2 Login Records, File I/O, and Performance

The next two arguments are the le descriptor of the le to be mapped and the oset in bytes
relative to the start of the le at which to map the le. In other words, if you want to map only
the portion of the le after the rst N bytes, you would pass N as the last argument.

What you need to know is that the memory region is always a multiple of the page size of the
machine and must be allocated as such. If the length is not a multiple of page size, the last page
will be partly lled. The starting address must always be a multiple of page size. For now this is
not our concern. After we learn how to get the page size of the machine, we will return to this issue.

A caveat  the documentation on my Linux system states that mmap() has been deprecated in
mmap2(), but mmap2() does not exist on it. In fact, glibc (GNU's C Standard Library)
favor of
implements mmap() as a wrapper for the kernel's mmap2() call, so mmap() is actually mmap2().

Our third copy program is in the listing below. It does not include all of the error-checking and
handling that it should, but most is included. It makes use of memcpy() to do the actual transfer of
bytes from the source to the destination, but it does so within memory. The prototype for memcpy()
is

#include <string.h>

void *memcpy(void *dest, const void *src, size_t n);

where src is a pointer to the start of the memory to be copied, dest is the starting address where
the bytes should be written, and n is the number of bytes to copy. The memory areas cannot
overlap. In other words the absolute value of (dest - src) must be greater than n.
Listing cp3 . c −− a copy p rog ram using memory−mapped I /O

#i n c l u d e <s y s /mman . h>


#i n c l u d e <s y s / s t a t . h>
#i n c l u d e < s t r i n g . h>
#i n c l u d e < s t d l i b . h>
#i n c l u d e <u n i s t d . h>
#i n c l u d e < s t d i o . h>
#i n c l u d e < f c n t l . h>
#i n c l u d e " . . / u t i l i t i e s / d i e . h"

#d e f i n e COPYMODE 0666

int main ( i n t argc , char ∗ argv [])


{
int in_fd , out_fd ;
size_t filesize ;
char nullbyte ;
void ∗ source_addr ;
void ∗ dest_addr ;

/∗ check args ∗/
if ( argc != 3 ){
fprintf ( stderr , " u s a g e : %s source d e s t i n a t i o n \n " , ∗ argv ) ;

45
UNIX Lecture Notes Stewart Weiss
Chapter 2 Login Records, File I/O, and Performance

exit (1);
}

/∗ open files ∗/
if ( ( i n _ f d = o p e n ( a r g v [ 1 ] , O_RDONLY) ) == −1 )
d i e ( " Cannot open ", argv [ 1 ] ) ;

/∗ The file to be created must be opened in read / write mode


because of how mmap ( ) ' s PROT_WRITE w o r k s on i386 architectures .
According to the man pag e , on some hardware architectures (e .g. ,
i 3 8 6 ) , PROT_WRITE i m p l i e s PROT_READ. Therefore , setting the
protection flag t o PROT_WRITE i s equivalent to setting it to
PROT_WRITE|PROT_READ if the machine architecture is i386 or the
like . Since this flag has to match the flags by which the mapped
file was opened , I set the opening flags differently for the
i386 architecture than for others .
∗/
#i f defined ( i386 ) || defined ( __x86_64 ) || defined ( __x86_64__ ) \
|| defined ( i686 )
if ( ( out_fd = open ( a r g v [ 2 ] , O_RDWR | O_CREAT |
O_TRUNC, COPYMODE ) ) == −1 )
die ( " Cannot create ", argv [ 2 ] ) ;
#e l s e
if ( ( out_fd = open ( a r g v [ 2 ] , O_WRONLY | O_CREAT |
O_TRUNC, COPYMODE ) ) == −1 )
die ( " Cannot create ", argv [ 2 ] ) ;
#e n d i f
/∗ get the size of the source file by seeking to the end of it :
lseek () returns the offset location of the file pointer after
the seek relative to the beginning of the file , so this is a
good way to get an opened file ' s size .
∗/
if ( ( filesize = l s e e k ( in_fd , 0 , SEEK_END) ) == −1 )
die ( " Could not seek to end of file ", argv [ 1 ] ) ;

/ ∗ By seeking to filesize in the new file , the file can be grown


to that size . Its size does not change until a write occurs
there .
∗/
l s e e k ( out_fd , filesize −1 , SEEK_SET ) ;

/∗ So we write t h e NULL b y t e and file size is now set to filesize .


∗/
w r i t e ( out_fd , &n u l l b y t e , 1);

/∗ Time to set up the memory maps ∗/


if ( ( s o u r c e _ a d d r = mmap(NULL, f i l e s i z e , PROT_READ,
MAP_SHARED, in_fd , 0) ) == ( v o i d ∗ ) −1 )

46
UNIX Lecture Notes Stewart Weiss
Chapter 2 Login Records, File I/O, and Performance

die ( " Error mapping file ", argv [ 1 ] ) ;

if ( ( d e s t _ a d d r = mmap(NULL, f i l e s i z e , PROT_WRITE,
MAP_SHARED, out_fd , 0) ) == ( v o i d ∗ ) −1 )
die ( " Error mapping file ", argv [ 2 ] ) ;

/∗ copy the input to output by doing a memcpy ∗/


memcpy ( dest_addr , source_addr , filesize );

/ ∗ unmap the files ∗/


munmap ( s o u r c e _ a d d r , filesize );
munmap ( d e s t _ a d d r , filesize );

/∗ close the files ∗/


c l o s e ( in_fd ) ;
c l o s e ( out_fd ) ;

return 0;

2.9 Returning to who


Our previous implementations of who read one utmp record at a time. Each read requires a system
call, even though a single utmp record is quite small and there are many of them. We now know
that this is inecient. Just as the cp command can benet by increasing buer size, so too can
who. We will modify it so that it reads several utmp records at a time and stores them in an internal
array. We are now up to version 5, and this
18
version will be called who5.c .

2.9.1 User-Dened Buering


A process is said to perform input buering when it requests more data than it can process in an
input operation and stores the extra data in its own memory space until it is ready to use it. Input
buering is a way to reduce the cost of input operations because it decreases the amount of time
that the process spends in system calls.

In order for who to perform input buering, it needs a place to store the extra records until it is
ready to use them. The logical place is in an array of records. If it reads 20 records at a time, for
example, then these 20 records will be placed into its internal array. It can maintain a pointer to a
current record. Each time it needs to examine a new record, it checks whether the current record
pointer has exceeded the array bounds. If it has, it attempts to fetch the next 20 records from the
utmp le and ll the array with them. If no records are left in the le, it cannot obtain a new record,
and it is nished. Otherwise, it fetches as many as it can, up to 20, and then gets the current record
from the array and advances the current record pointer.

18
This idea is borrowed from Bruce Molay, Understanding Unix/Linux Programming, Prentice Hall.2003.

47
UNIX Lecture Notes Stewart Weiss
Chapter 2 Login Records, File I/O, and Performance

The logic for input buering is encapsulated into a separate library of routines for interacting with
the utmp records, called utmp_utils.c. The interface to this library consists of three functions:
open_utmp(), next_utmp(), and close_utmp(). The open_utmp() function opens the given utmp
le, the next_utmp() function delivers the next record, reading a new chunk from the le if the
buer is empty, and the close_utmp() closes the le. The interface follows.

Listing utmp_utils . h
typedef struct utmp utmp_record ;

int open_utmp ( char ∗ utmp_file );


// opens the given utmp_file for buffered reading
// returns : a valid file descriptor on success
// −1 on error

utmp_record ∗ next_utmp ( ) ;
// returns : a pointer to the next utmp record from the
// opened file and advances to the next record
// NULL if no more records are in the file

void close_utmp ( ) ;
// closes the utmp file and frees the file descriptor

The implementation of the library is next. It uses global variables (static variables) so that the
functions can communicate. We do not want to pass these as parameters, because then client code
would have to do that as well, breaking the abstraction. If this were written in C++, this library
would be a class instead, and the globals would be member variables.

1 Listing utmp_utils . c
2 #include < s t d i o . h>
3 #include < f c n t l . h>
4 #include <s y s / t y p e s . h>
5 #include <utmp . h>
6
7 #define NUM_RECORDS 20
8 #define NULL_UTMP_RECORD_PTR ( ( utmp_record ∗) NULL)
9 #define SIZE_OF_UTMP_RECORD ( sizeof ( utmp_record ) )
10 #define BUFSIZE (NUM_RECORDS ∗ SIZE_OF_UTMP_RECORD)
11
12 static char utmpbuf [ BUFSIZE ] ; // b u f f e r o f r e c o r d s
13 static int number_of_recs_in_buffer ; // num r e c o r d s in b u f f e r
14 static int current_record ; // next rec to read
15 static int fd_utmp = − 1; // f i l e d e s c r i p t o r f o r utmp f i l e
16
17 int open_utmp ( char ∗ utmp_file )
18 {
19 fd_utmp = o p e n ( u t m p _ f i l e , O_RDONLY ) ;

48
UNIX Lecture Notes Stewart Weiss
Chapter 2 Login Records, File I/O, and Performance

20 current_record = 0;
21 number_of_recs_in_buffer = 0 ;
22 return fd_utmp ; // e i t h e r a v a l i d f i l e d e s c r i p t o r or −1
23 }
24
25 int fill_utmp ()
26 {
27 int bytes_read ;
28
29 // read NUM_RECORDS r e c o r d s from t h e utmp f i l e i n t o b u f f e r
30 // bytes_read i s t h e a c t u a l number o f b y t e s read
31 bytes_read = read ( fd_utmp , utmpbuf , BUFSIZE );
32 if ( bytes_read < 0 ) {
33 die ( " F a i l e d t o r e a d f r o m utmp f i l e " , " " ) ;
34 }
35
36 // I f we reach here , t h e read was s u c c e s s f u l
37 // Convert t h e b y t e c o u n t i n t o a number o f r e c o r d s
38 n u m b e r _ o f _ r e c s _ i n _ b u f f e r = b y t e s _ r e a d /SIZE_OF_UTMP_RECORD;
39
40 // r e s e t current_record to s t a r t at t h e b u f f e r s t a r t
41 current_record = 0;
42 return number_of_recs_in_buffer ;
43 }
44
45 utmp_record ∗ next_utmp ( )
46 {
47 utmp_record ∗ recordptr ;
48 int byte_position ;
49
50 if ( fd_utmp == −1 )
51 // f i l e was not opened c o r r e c t l y
52 return NULL_UTMP_RECORD_PTR;
53
54 if ( c u r r e n t _ r e c o r d == n u m b e r _ o f _ r e c s _ i n _ b u f f e r )
55 // t h e r e are no unread r e c o r d s in t h e b u f f e r
56 // need to r e f i l l t h e b u f f e r
57 if ( u t m p _ f i l l ( ) == 0 )
58 // no utmp r e c o r d s l e f t in t h e f i l e
59 return NULL_UTMP_RECORD_PTR;
60
61 // There i s at l e a s t one record in t h e b u f f e r ,
62 // so we can read i t
63 byte_position = current_record ∗ SIZE_OF_UTMP_RECORD;
64 recordptr = ( utmp_record ∗) &utmpbuf [ b y t e _ p o s i t i o n ] ;
65
66 // advance current_record p o i n t e r and r e t u r n record p o i n t e r
67 c u r r e n t _ r e c o r d ++;

49
UNIX Lecture Notes Stewart Weiss
Chapter 2 Login Records, File I/O, and Performance

68 return recordptr ;
69 }
70
71 void close_utmp ( )
72 {
73 // i f f i l e d e s c r i p t o r i s a v a l i d one , c l o s e t h e connection
74 if ( fd_utmp != −1 )
75 close ( fd_utmp );
76 }

Comments
1. In next_utmp(), if

( current_record == number_of_recs_in_buffer )
is true, it means that the number of records read so far is equal to the number of records in
the buer, which implies that it is time to read from the le again.

2. In next_utmp(), the line

recordptr = ( utmp_record *) &utmpbuf[byte_position];

sets recordptr to point to the address of the array entry at the given byte position. We have
to cast the address of the linear array of bytes to a utmp_record pointer type.

The main program must be revised to use these functions, as follows.

Listing who4 . c
#i n c l u d e " utmp_utils . h"

int main ( i n t argc , char ∗ argv [ ] )


{
utmp_record ∗ utbufp ; // pointer to a utmp record

if ( open_utmp ( UTMP_FILE ) == −1 ){
p e r r o r (UTMP_FILE ) ;
exit (1);
}
while ( ( u t b u f p = next_utmp ( ) ) != NULL_UTMP_RECORD_PTR )
show_info ( utbufp );

close_utmp ( );
return 0;
}

50
UNIX Lecture Notes Stewart Weiss
Chapter 2 Login Records, File I/O, and Performance

2.9.2 Final Comments


This last version of the who command improved performance by reading larger amounts of the le
at a time, thereby reducing the overhead of disk reads. It follows that if we could read the entire
le all at once with a single read() call, then we would reduce the amount of overhead to the least
it could be. In fact, some versions of the who command do precisely this. At this point we cannot
write this implementation because it depends upon our knowing how to use the stat() system call
and some knowledge of the structure of the le system, which will come later. However, this method
has a pitfall: the le may be larger than the available memory for the process. In this case, the
program must be able to identify this and adjust how it reads the le. The GNU implementation
of who does exactly this.

51
UNIX Lecture Notes Stewart Weiss
Chapter 2 Login Records, File I/O, and Performance

Appendix A

A.1 Filters: An Introduction


A lter is a program that gets its input from the standard input (stdin), transforms it, and sends
the transformed input to the standard output (stdout). The data passes  through the lter, which
typically has command-line options that control its behavior. A lter may also perform a  null
transformation, making no change at all to its input (which is what cat does.) Filters process text
only, either from input les or from the output end of another Unix command (i.e., through a pipe.)
All lters can be given optional lename arguments, in which case they take their input from the
named les rather than from standard input. For example, in the command

$ cat first second third > combinedfile

cat reads les first, second, and third in that order and concatenates their contents, sending
them to the standard output, which has been redirected to a le named combinedfile.
The most useful lters are

cut (usually System V only)1 simple text cutting


grep simple regular expressions as ltering pattern
egrep extended (more powerful) regular expressions as ltering patterns
fgrep fast, string matching expressions with alternation as patterns
sed line-oriented text editing lter
awk pattern-matching, eld-oriented lter and full-edged Turing
computable programming language
cat primitive lter with little transformation
sort very general sorting lter
head,tail lets only the top or bottom of a stream pass through
fold wraps each input line to t in a specied width

If your time is limited and you could learn but one of these, the most important would be grep 
the return on your investment will be greatest. Coming in second would be sed, and then awk. The
remaining lters are easy to learn and use and are described briey rst.

A.1.1 sort

sort is easy to use:

$ sort file

52
UNIX Lecture Notes Stewart Weiss
Chapter 2 Login Records, File I/O, and Performance

will sort the text le named file and print it on standard output. By default is uses collating order,
the order of the characters in the character code of the terminal, which is usually ASCII or UTF-8.
In this case uppercase letters precede lowercase letters. There are versions of sort that ignore case
by default, but if your does not, you can turn o case-sensitivity with the -i option.

If you want to sort numerically, use the -n option, as in

$sort -n numeric_data

which will sort numbers correctly. Without the -n, 9 will precede 10 because 1 precedes 9 in the
collating sequence. Read the man page for details.

A.1.2 head and tail


Simply put, head displays the rst N lines of its input and tail, the last N lines. By default
N = 10. To print a dierent number of lines, use

$ head -N

or

$ tail -N

respectively.

A.1.3 cut

cut is a lesser lter. You will rarely use it. It does simple tasks well. It cuts out selected pieces of
lines of the input.

$ cut c1-10

copies the rst 10 characters from every line, removing the rest.

$ cut f2,4

copies only elds 2 and 4 of every line to the output stream. Fields are delimited by the TAB
character unless the delimiter character is changed using the d option. Fields are 1-based, so the
rst eld is eld 1. The delimiter must be a single character:

$ cut -f1,5 -d: /etc/passwd

will display elds 1 and 5 of the /etc/passwd le, which are the username and gcos elds.

53
UNIX Lecture Notes Stewart Weiss
Chapter 2 Login Records, File I/O, and Performance

A.1.4 Regular Expressions and grep


We focus on grep and regular expressions. The regular expressions used by grep are the same as
those used by sed and the visual text editor, vi. The simplest form of the grep command is

$ grep <regularexpression> files

where <regular expression> is an expression that represents a set of zero or more strings to be
matched. The syntax and interpretation of regular expressions is found in the regex man page in
Volume 7, as well as the man page for grep, so typing

$ man 7 regex

or

$ man grep

will give you everything you need to know on how to use them. The simplest patterns are strings
that do not contain regular expression operators of any kind; those match themselves. For example,

$ grep print file1 file2 file3

prints each line in les file1, file2, and file3 that contains the word "print". It will print these
in the order in which the les are listed, rst lines in file1, then file2, then file3. If you want
just a count of those lines, use the -c option; if you want the non-matching lines, use the -v option.
If you want the line numbers, use the -n option. There are many more useful options described in
its man page.

If you want to match a string that contains characters that have special meaning to the shell, such
as white-space, asterisks, slashes, dollar-signs, and so on, it should be enclosed in single-quotes:

$ grep 'atomic energy' file1 file2 file3

will match all lines in the given les that have the exact string 'atomic energy' somewhere in the
line. Note that the lines merely have to contain the string as a substring; they do not have to match
the the string exactly. If you want the pattern to match an entire line, you have to bracket it with
operators called anchors. The start of line anchor is the caret ^ and the end of line anchor is the
dollar sign $:

$ grep '^atomic energy$' file1 file2 file3

matches lines in the given les that are exactly the string atomic energy.

Regular expressions can be formed with various operators such as the asterisk *, which multiplies
the expression to its left 0 or more times, as in

54
UNIX Lecture Notes Stewart Weiss
Chapter 2 Login Records, File I/O, and Performance

a*

which matches strings with zero or more a's: a, aa, aaa, and the null string. To match a string
like ababab, you have to enclose it in \(...\), as in

\(ab\)*

which matches 0 or more sequences of ab. Note that

(ab)*

will match strings like (ab)(ab)(ab), not ababab because in regular expressions, the parentheses
by themselves are not special characters.

The period matches any character. There are character classes, which are formed by enclosing a list
(or a range) in square brackets []. A character class represents a single character from that class.
Because the special characters in regular expressions typically have special meaning in the shell as
well, it is a good idea to always enclose the pattern in single quotes. In particular, if you give it a
regular expression using an asterisk you must enclose the string in quotes .
1

A.1.4.1 Examples
In the following examples, the le argument is omitted for simplicity. In this case grep would apply
the pattern against standard input, which means if you actually type this, it will wait for you to
enter text followed by an end-of-le signal, Cntrl-D.

$ grep 'while *(.*)'

matches lines containing the word 'while' followed by zero or more space characters, followed by a
parenthesized expression.

$ grep '^[a-zA-Z][a-zA-Z0-9_]*'

matches lines that begin with a word that starts with a letter, upper or lowercase, following by zero
or more letters or digits or underscores.

$ grep '[0-9][0-9]*\.[0-9][0-9]\>'

The pattern selects strings that have 1 or more digits followed by a single period, followed by exactly
two digits. The period must be preceded by a backslash so that grep does not treat the period as
the special character meaning "match any character". The "\>" tells grep to anchor the pattern to
the end of the word. A word is a sequence of letters and/or digits. This forces grep to select only
those words that end in two digits. If I omitted the "\>" grep would have matched strings such as
1.234 or 1.23ab. There is a matching operator, \<, that anchors to the beginning of the word.

Now take a look at this one.


1
Single quotes are better than double quotes. Single quotes prevent the shell from doing any interpretation of the
enclosed characters, whereas when the shell sees a double-quoted string, it does a certain amount of interpretation.
Until you understand what the shell will attempt to interpret inside double-quoted strings, use single quotes for
enclosing grep patterns.

55
UNIX Lecture Notes Stewart Weiss
Chapter 2 Login Records, File I/O, and Performance

$ grep '\/\*.*\*\/'

Since / is a special character, if I want to match it I have to escape it with a \ like this: \/.
Similarly, since * is a special character in regular expressions, \* is how you have to match a single
asterisk *. So to match the two-character sequence /* I have to write \/\* and to match /* followed
by any number of characters and then followed by */, I have to write

\/\*.*\*\/

in which .* matches zero or more characters of any kind (including the period itself ). This nds
lines with C-style comments in them.

Regular expressions also provide a means of remembering matched expressions, for re-use in the
expression. This is very handy in vi and sed, which have substitution operators. The same operator
used for grouping is also used for remembering matching strings. The remembered string is then
referenced using the back-reference \1 (or \2, \3... if there are multiple strings remembered):

$ grep '\([a-z]\)\1\1\1\1'

matches any line that contains a sequence of 5 copies of a letter, such as xxxxx or bbbbb.

$ grep '\([1-9][0-9]\).*\1'

matches any line that has a two digit number that is repeated later in the line. The command

$ grep '\([a-z]\)\([a-z]\)\([a-z]\)\3\2\1'

has three remembered matches in the back-references \1, \2, and \3, but in reverse order. Each
will have a copy of the single lower-case letter that it matched, so this pattern matches palindromes
of length 6 such as xyzzyx.
You are encouraged to read the man page for grep. There is a lot more to regular expressions than
is covered here. The best way to learn them is to experiment. You can open a terminal window
and type grep followed by a pattern. It will then wait for you to type lines on the keyboard. Lines
that match will be repeated. Lines that don't will not. Try it.

A.1.5 The Rest of the grep Family


A.1.5.1 egrep

egrep (extended grep or expression grep ) has a larger set of regular expressions meta-symbols than
grep, including '|', '?', '+', and parentheses. It is not a strict superset of grep because it does not
allow \( \), \{ \}, \< \>. These are equivalent to (), {}, and <>, in egrep.

For example, you can write

$ egrep 'March|April|May'

56
UNIX Lecture Notes Stewart Weiss
Chapter 2 Login Records, File I/O, and Performance

and

$ egrep 'M(iss)+ippi'

which matches Mississippi as well as Mississississippi. Another extension in egrep is the +


operator. A  + after a regular expression indicates to search for one or more occurrences of the
regular expression, as in

$ egrep '[a-z]+'

which matches 1 or more letters.

A.1.5.2 fgrep

The fgrep variant of grep does not support regular expressions but does support multiple strings.
It is used to search quickly for many dierent xed strings. For example, you can put a list of
frequently misspelled words into a le and then call fgrep to search for them:

$ fgrep -f errors document

will print all lines in document that contain one of the strings in the le named errors.

A.2 File Globs


All UNIX shells have the ability to parse patterns that represent sets of les. These patterns are
called le globs, or simply globs, or wildcard expressions. In essence, the shell will replace a le-glob
by the list of les that it represents. For example,

$ ls *.c

is a command to list all les in the current working directory that have zero or more characters
followed by a  .c.

The regular expressions that the shell uses for le-globbing have a dierent syntax from those used
by vi, grep, and the other lters and commands. They are not really regular expressions. File-globs
are more limited, and the asterisk * does not multiply the character that precedes it. It, by itself,
represents zero or more characters of any kind. Thus,

$ rm *.o

removes all les ending in  .o and

$ for i in hwk2_*.gz ; do unzip $i ; done

57
UNIX Lecture Notes Stewart Weiss
Chapter 2 Login Records, File I/O, and Performance

will run unzip on every le in the current working directory whose name starts with hwk2_ and
ends in .gz (in bash and sh and other Bourne-shell-like shells). You must be very careful when
using le globs, especially with dangerous commands such as rm that are not reversible, because
they may represent les that you did not think they did. One disastrous example would be

$ rm -r .*

which a naive user might think removes the hidden les in the given directory and their descen-
dants. But the pattern .* matches .., which implies that the command will recursively remove
everything in .., the parent directory. There are many other things to know about le globs; the
complete description can be found in the man page in Volume 7:

$ man 7 glob

will display it.

58

You might also like