BSCS cs-511: Function Name Purpose and Operation
BSCS cs-511: Function Name Purpose and Operation
Record
A record is any contiguous sequence of bytes in a file. The UNIX operating system does not impose any record
structure on files. The boundaries of records are defined by the programs that use the files. Within a single file, a
record as defined by one process can overlap partially or completely on a record as defined by some other process.
A read lock keeps a record from changing while one or more processes read the data. If a process holds a read lock,
it may assume that no other process can alter that record at the same time. A read lock is also a shared lock because
more than one process can place a read lock on the same record or on a record that overlaps a read-locked record.
No process, however, can have a write lock that overlaps a read lock.
A write lock is used to gain complete control over a record. A write lock is an exclusive lock because, when a write
lock is in place on a record, no other process may read- or write-lock that record or any data that overlaps it. If a
process holds a write lock it can assume that no other process will read or write that record at the same time.
Advisory Locking
An advisory lock is visible only when a program explicitly tries to place a conflicting lock. An advisory lock is not
visible to the file I/O system functions such as read() and write(). A process that does not test for an advisory
lock can violate the terms of the lock, for example, by writing into a locked record.
Advisory locks are useful when all processes make an appropriate record lock request before performing any I/O
operation. When all processes use advisory locking, access to the locked data is controlled by the advisory lock
requests. The success of advisory locking depends on the cooperation of all processes in enforcing the locking
protocol; it is not enforced by the file I/O subsystem.
Mandatory Locking
Mandatory record locking is enforced by the file I/O system functions, and so is effective on unrelated processes that
are not part of a cooperating group. Respect for locked records is enforced by the creat(),open(), read(),
and write() system calls. When a record is locked, access to that record by any other process is restricted
according to the type of lock on the record. Cooperating processes should still request an appropriate record lock
before an I/O operation, but an additional check is made by IRIX before each I/O operation to ensure the record
locking protocol is being honored. Mandatory locking offers security against unplanned file use by unrelated
programs, but it imposes additional system overhead on access to the controlled files.
A read lock can be promoted to write-lock status if no other process is holding a read lock in the same record. If
processes with pending write locks are waiting for the same record, the lock promotion succeeds and the other
(sleeping) processes wait. Demoting a write lock to a read lock can be done at any time.
Because the lockf() function does not support read locks, lock promotion is not applicable to locks set with that
call. >
#include <stdio.h>
#include <errno.h>
#include <fcntl.h>
int fd; /* file descriptor */
char *filename;
main(argc, argv)
int argc;
char *argv[];
{
extern void exit(), perror();
/* get database file name from command line and open the
* file for read and write access.
*/
if (argc < 2) {
(void) fprintf(stderr, "usage: %s filename\n", argv[0]);
exit(2);
}
filename = argv[1];
fd = open(filename, O_RDWR);
if (fd < 0) {
perror(filename);
exit(2);
}
}
The file is now open to perform both locking and I/O functions. The next step is to set a lock.
#include <fcntl.h>
#include <errno.h>
#define MAX_TRY 10
int
lockWholeFile(int fd, int tries)
{
int limit = (tries)?tries:MAX_TRY;
int try;
struct flock lck;
lck.l_type = F_WRLCK; /* write (exclusive) lock */
lck.l_whence = 0; /* 0 offset for l_start */
lck.l_start = 0L; /* lock starts at BOF */
lck.l_len = 0L; /* extent is entire file */
for (try = 0; try < limit; ++try)
{
if ( 0 == fcntl(fd, F_SETLK, &lck) )
break; /* mission accomplished */
if ((errno != EAGAIN) && (errno != EACCES))
break; /* mission impossible */
sginap(1); /* let lock holder run */
}
return errno;
}
#include <unistd.h> /* for F_TLOCK */
#include <fcntl.h> /* for O_RDWR */
#include <errno.h> /* for EAGAIN */
#define MAX_TRY 10
int
lockWholeFile(int fd, int tries)
{
int limit = (tries)?tries:MAX_TRY;
int try;
lseek(fd,0L,SEEK_SET); /* set start of lock range */
for (try = 0; try < limit; ++try)
{
if (0 == lockf(fd, F_TLOCK, 0L) )
break; /* mission accomplished */
if (errno != EAGAIN)
break; /* mission impossible */
sginap(1); /* let lock holder run */
}
return errno;
}
#define _BSD_COMPAT
#include <sys/file.h> /* includes fcntl.h */
#include <errno.h> /* for EAGAIN */
#define MAX_TRY 10
int
lockWholeFile(int fd, int tries)
{
int limit = (tries)?tries:MAX_TRY;
int try;
for (try = 0; try < limit; ++try)
{
if ( 0 == flock(fd, LOCK_EX+LOCK_NB) )
break; /* mission accomplished */
if (errno != EWOULDBLOCK)
break; /* mission impossible */
sginap(1); /* let lock holder run */
}
return errno;
}
struct record {
.../* data portion of record */...
long prev; /* index to previous record in the list */
long next; /* index to next record in the list */
};
For the example, assume that the record after which the new record is to be inserted has a read lock on it already.
The lock on this record must be promoted to a write lock so that the record may be edited.Example 7-5 shows a
function that can be used for this.
Example 7-5. Record Locking With Promotion Using fcntl()
/*
|| This function is called with a file descriptor and the
|| offsets to three records in it: this, here, and next.
|| The caller is assumed to hold read locks on both here and next.
|| This function promotes these locks to write locks.
|| If write locks on "here" and "next" are obtained
|| Set a write lock on "this".
|| Return index to "this" record.
|| If any write lock is not obtained:
|| Restore read locks on "here" and "next".
|| Remove all other locks.
|| Return -1.
*/
long set3Locks(int fd, long this, long here, long next)
{
struct flock lck;
lck.l_type = F_WRLCK; /* setting a write lock */
lck.l_whence = 0; /* offsets are absolute */
lck.l_len = sizeof(struct record);
/* Promote the lock on "here" to write lock */
lck.l_start = here;
if (fcntl(fd, F_SETLKW, &lck) < 0) {
return (-1);
}
/* Lock "this" with write lock */
lck.l_start = this;
if (fcntl(fd, F_SETLKW, &lck) < 0) {
/* Failed to lock "this"; return "here" to read lock. */
lck.l_type = F_RDLCK;
lck.l_start = here;
(void) fcntl(fd, F_SETLKW, &lck);
return (-1);
}
/* Promote lock on "next" to write lock */
lck.l_start = next;
if (fcntl(fd, F_SETLKW, &lck) < 0) {
/* Failed to promote "next"; return "here" to read lock... */
lck.l_type = F_RDLCK;
lck.l_start = here;
(void) fcntl(fd, F_SETLK, &lck);
/* ...and remove lock on "this". */
lck.l_type = F_UNLCK;
lck.l_start = this;
(void) fcntl(fd, F_SETLK, &lck);
return (-1)
}
return (this);
}
Example 7-5 uses the F_SETLKW command to fcntl(), with the result that the calling process will sleep if there
are conflicting locks at any of the three points. If the F_SETLK command was used instead, the fcntl() system
calls would fail if blocked. The program would then have to be changed to handle the blocked condition in each of the
error return sections (as in Example 7-2).
It is possible to unlock or change the type of lock on a subsection of a previously set lock; this may cause an
additional lock (two locks for one system call) to be used by the operating system. This occurs if the subsection is
from the middle of the previously set lock.
Example 7-6 shows a similar example using the lockf() function. Since it does not support read locks, all (write)
locks are referenced generically as locks.
Example 7-6. Record Locking Using lockf()
/*
|| This function is called with a file descriptor and the
|| offsets to three records in it: this, here, and next.
|| The caller is assumed to hold no locks on any of the records.
|| This function tries to lock "here" and "next" using lockf().
|| If locks on "here" and "next" are obtained
|| Set a lock on "this".
|| Return index to "this" record.
|| If any lock is not obtained:
|| Remove all other locks.
|| Return -1.
*/
long set3Locks(int fd, long this, long here, long next)
{
/* Set a lock on "here" */
(void) lseek(fd, here, 0);
if (lockf(fd, F_LOCK, sizeof(struct record)) < 0) {
return (-1);
}
/* Lock "this" */
(void) lseek(fd, this, 0);
if (lockf(fd, F_LOCK, sizeof(struct record)) < 0) {
/* Failed to lock "this"; clear "here" lock. */
(void) lseek(fd, here, 0);
(void) lockf(fd, F_ULOCK, sizeof(struct record));
return (-1);
}
/* Lock "next" */
(void) lseek(fd, next, 0);
if (lockf(fd, F_LOCK, sizeof(struct record)) < 0) {
/* Failed to lock "next"; release "here"... */
(void) lseek(fd, here, 0);
(void) lockf(fd, F_ULOCK, sizeof(struct record));
/* ...and remove lock on "this". */
(void) lseek(fd, this, 0);
(void) lockf(fd, F_ULOCK, sizeof(struct record));
return (-1)
}
return (this);
}
Locks are removed in the same manner as they are set; only the lock type is different (F_UNLCK or F_ULOCK). An
unlock cannot be blocked by another process. An unlock can affect only locks that were placed by the unlocking
process.
/*
|| This function takes a file descriptor and prints a report showing
|| all locks currently set on that file. The loop variable is the
|| l_start field of the flock structure. The function asks fcntl()
|| for the first lock that would block a lock from l_start to the end
|| of the file (l_len==0). When no lock would block such a lock,
|| the returned l_type contains F_UNLCK and the loop ends.
|| Otherwise the contending lock is displayed, l_start is set to
|| the end-point of that lock, and the loop repeats.
*/
void printAllLocksOn(int fd)
{
struct flock lck;
/* Find and print "write lock" blocked segments of file. */
(void) printf("sysid pid type start length\n");
lck.l_whence = 0;
lck.l_start = 0L;
lck.l_len = 0L;
for( lck.l_type = 0; lck.l_type != F_UNLCK; )
{
lck.l_type = F_WRLCK;
(void) fcntl(fd, F_GETLK, &lck);
if (lck.l_type != F_UNLCK)
{
(void) printf("%5d %5d %c %8d %8d\n",
lck.l_sysid,
lck.l_pid,
(lck.l_type == F_WRLCK) ? 'W' : 'R',
lck.l_start,
lck.l_len);
if (lck.l_len == 0)
break; /* this lock goes to end of file, stop */
lck.l_start += lck.l_len;
}
}
}
fcntl() with the F_GETLK command always returns correctly (that is, it will not sleep or fail) if the values passed
to it as arguments are valid.
The lockf() function with the F_TEST command can also be used to test if there is a process blocking a lock. This
function does not, however, return the information about where the lock actually is and which process owns the
lock. Example 7-8 shows a code fragment that uses lockf() to test for a lock on a file.
Example 7-8. Testing for Contending Lock Using lockf()
/* find a blocked record. */
/* seek to beginning of file */
(void) lseek(fd, 0, 0L);
/* set the size of the test region to zero
* to test until the end of the file address space.
*/
if (lockf(fd, F_TEST, 0L) < 0) {
switch (errno) {
case EACCES:
case EAGAIN:
(void) printf("file is locked by another process\n");
break;
case EBADF:
/* bad argument passed to lockf */
perror("lockf");
break;
default:
(void) printf("lockf: unknown error <%d>\n", errno);
break;
}
}
When a process forks, the child receives a copy of the file descriptors that the parent has opened. The parent and
child also share a common file pointer for each file. If the parent seeks to a point in the file, the child's file pointer is
also set to that location. Similarly, when a share group of processes is created using sproc(), and
the sproc() flag PR_SFDS is used to keep the open-file table synchronized for all processes (see
the sproc(2) reference page), then there is a single file pointer for each file and it is shared by every process in
the share group.
This feature has important implications when using record locking. The current value of the file pointer is used as the
reference for the offset of the beginning of the lock, in lockf() at all times and in fcntl()when using
an l_whence value of 1. Since there is no way to perform the sequence lseek(); fcntl(); as an atomic operation, there
is an obvious potential for race conditions—a lock might be set using a file pointer that was just changed by another
process.
The solution is to have the child process close and reopen the file. This creates a distinct file descriptor for the use of
that process. Another solution is to always use the fcntl() function for locking with anl_whence value of 0 or 2.
This makes the locking function independent of the file pointer (processes might still contend for the use of the file
pointer for other purposes such as direct-access input).
Deadlock Handling
A certain level of deadlock detection and avoidance is built into the record locking facility. This deadlock handling
provides the same level of protection granted by the /usr/group standard lockf() call. This deadlock detection is
valid only for processes that are locking files or records on a single system.
Deadlocks can potentially occur only when the system is about to put a record locking system call to sleep. A search
is made for constraint loops of processes that would cause the system call to sleep indefinitely. If such a situation is
found, the locking system call fails and sets errno to the deadlock error number.
If a process wishes to avoid using the system's deadlock detection, it should set its locks using F_GETLK instead of
F_GETLKW.
The bit must be set before the file is opened; a change has no effect on a file that is already open.
Example 7-9 shows a fragment of code that sets mandatory lock mode on a given filename.
Example 7-9. Setting Mandatory Locking Permission Bits
#include <sys/types.h>
#include <sys/stat.h>
int setMandatoryLocking(char *filename)
{
int mode;
struct stat buf;
if (stat(filename, &buf) < 0)
{
perror("stat(2)");
return error;
}
mode = buf.st_mode;
/* ensure group execute permission 0010 bit is off */
mode &= ~(S_IEXEC>>3);
/* turn on 'set group id bit' in mode */
mode |= S_ISGID;
if (chmod(filename, mode) < 0)
{
perror("chmod(2)");
return error;
}
return 0;
}
When IRIX opens a file, it checks to see whether both of two conditions are true:
Set-group-ID bit is 1.
Group execute permission is 0.
When both are true, the file is marked for mandatory locking, and each use of creat(), open(), read(),
and write() tests for contending locks.
Some points to remember about mandatory locking:
Mandatory locking does not protect against file truncation with the truncate() function (see
the truncate(2) reference page), which does not look for locks on the truncated portion of the file.
Mandatory locking protects only those portions of a file that are locked. Other portions of the file that are not
locked may be accessed according to normal UNIX system file permissions.
Advisory locking is more efficient because a record lock check does not have to be performed for every I/O
request.
If the returned value is off, rpc.lockd is not running and locks have local scope only.
To use rpc.lockd, the administrator must configure it on as follows:
% /etc/chkconfig lockd on
Then the system must be rebooted. This must be done on both the NFS file server and on all NFS clients where locks
are requested.
Performance Impact
Normally, the NFS software uses a data cache to speed access to files. Data read or written to NFS mounted files is
held in a memory cache for some time, and access requests to cached data is satisfied from memory instead of being
read from the server. Data caching has a major effect on the speed of NFS file access.
As soon as any process places a file or record lock on an NFS mounted file, the file is marked as uncachable. All I/O
requests for that file bypass the local memory cache and are sent to the NFS server. This ensures consistent results
and data integrity. However, it means that every read or write to the file, at any offset, and from any process, incurs a
network delay.
The file remains uncachable even when the lock is released. The file cannot use the cache again until it has been
closed by all processes that have it open.