The Linux Device File-System: Richard Gooch EMC Corporation Rgooch@atnf - Csiro.au
The Linux Device File-System: Richard Gooch EMC Corporation Rgooch@atnf - Csiro.au
Richard Gooch
EMC Corporation
[email protected]
Abstract 1 Introduction
Existing major and minor numbers are limited to 8 Since device nodes are stored on a disc media, these
bits each. This is now a limiting factor for some must be created by the system administrator. For
drivers, particularly the SCSI disc driver, which standard devices one can usually find a MAKEDEV
originally consumed a single major number. Since 4 programme which creates the thousands of device
bits were assigned to the partition index (support- nodes in common use. Thus, for a change in any
ing 15 partitions per disc), this left 4 bits for the one of the hundreds of device drivers which re-
disc index. Thus, only 16 discs were supported. quires a device name or number change, a corre-
sponding change is required in the MAKEDEV pro-
A subsequent change reserved another 7 major num- gramme, or else the system administrator creates
bers for SCSI discs, which has increased the num- device nodes by hand.
The fundamental problem is that there are multiple, This would require 8 Mega (1024*1024) inodes if all
separate databases of major and minor numbers and possible device nodes were stored. This would result
device names. Device numbers are stored in the in an impractically large /dev directory.
following databases:
• in the MAKEDEV programme If the major table is converted to a list, this would
require a list traversal for each device open. This
• in the /dev directory on many millions of com- is undesirable, as it would slow down device open
puters operations. The effect could be reduced by using a
• in the source code of thousands of applications hash function, but not eliminated.
which access device nodes.
The most common need for changing permissions By providing a mechanism for device drivers to reg-
of device nodes is for terminal (tty) devices. Thus, ister device nodes, it is possible to send notifica-
it is impractical to mount a CD-ROM as the root tions to user-space when these registrations and un-
file-system for a production system, since tty per- registrations occur. This allows more sophisticated
missions will not be changeable. Similarly, the root device management schemes and policies to be im-
file-system cannot reside on a ROM-FS (often used plemented.
on embedded systems to save space).
Furthermore, a virtual file-system mounted onto
A similar problem exists for systems where the root /dev opens the possibility of capturing file-system
file-system is mounted from an NFS server. Multi- events and notifying user-space. For example, open-
ple systems cannot mount the same NFS root file- ing a device node, or attempting to access a non-
system because there would be a conflict between existent device node, can be used to trigger a specific
the machines as device node permissions need to be action in user-space. This further enhances the level
changed. of sophistication possible in device management.
These problems can be worked around by creating In section 3, the Linux devfs daemon is presented,
a RAMDISC at boot time, making an ext2 file- which supports advanced device management.
system in it, mounting it somewhere and copying
the contents of /dev into it, then un-mounting it
and mounting it over /dev. 2.3.4 Speculative Device Scanning
• device number size (8 bits each for major and In larger systems, however, discs are often moved
minor) is a real limitation, and must be fixed between different controllers (the interface between
somehow. Systems with large numbers of SCSI the computer and groups of discs). This is often
devices, for example, will continue to con- done when a system is being reconfigured for the ad-
sume the remaining unallocated major num- dition of more storage capacity. If discs are mounted
bers. Hot-plug busses such as USB will also using their locations, the administrator must manu-
need to push beyond the 8 bit minor limitation ally update the configuration file which specifies the
locations (usually /etc/fstab). Thus, some means
• simply increasing the device number size is in-
of addressing the disc, irrespective of where it is lo-
sufficient. Besides breaking many applications
cated, is required.
(no libc 5 application can handle larger device
numbers), it doesn’t solve the management is-
Ideally, each device would have a unique identifier
sues of a /dev with thousands or more device
to facilitate tracking it. This unique identifier is de-
nodes
fined by the SCSI 3 standard, and is term a WWN
• ignoring the problem of a huge /dev will not (world-wide number). If a disc is mounted by speci-
make it go away, and dismisses the legitimacy of fying its WWN, then it may be moved to a different
a large number of people who want a dynamic controller without requiring further work by the ad-
/dev ministrator. This is important for a system with a
large number of discs.
• it does not address the problems of managing
hot-plug devices The SCSI sub-system in the Linux kernel needs to
be modified to query devices for their WWNs, which
• the standard response then becomes: “write a can then be used to register a device entry which in-
device management daemon”, which brings us cludes the WWN. All WWN entries would be placed
back to the proposal of section 4.1. in a single directory (such as /dev/volumes/wwn or
/dev/scsi/wwn).
Devfs has been available and widely used since 1998. For administrative reasons, some devices may be di-
It has attracted a user-base numbering in the several vided into a number of “logical volumes”. This is
thousands (possibly far greater), and forms a critical often used for very large storage devices where dif-
technology in SGI Pro-Pack (a modified version of ferent departments of an organisation are each given
a set of logical volumes for their private use. In this gies. Thus, designing a detailed structure to support
case, the storage may be presented as a single phys- different topologies is not feasible.
ical device, and thus would have a single WWN.
The solution I propose is to define a /dev/hw heirar-
As with physical discs, logical volumes may need chy, which is to be completely vendor-specific. This
to be re-arranged for administrative reasons. Here, heirarchy will be created and managed by vendor-
some mechanism which can address volumes by specific code, giving vendors complete flexibilty in
their contents is required. By storing a volume label their design. The /dev/hw heirarchy will effec-
on each volume, it is possible to address volumes by tively be a wiring diagram of the system. The only
content. imposed standard is that the vendor remaps the
generic Linux bus directories into the dev/hw tree.
Existing and planned logical volume managers need For example, /dev/bus/pci0 would become a sym-
to be modified to support storing volume labels and bolic link to a directory somewhere in the /dev/hw
must provide a common programming interface so tree.
that this information may be used in a generic way.
Once these steps have been taken, volume labels The combination of these two naming schemes
may be exposed in the device name-space in a simi- should provide sufficient flexibility for a wide variety
lar fashion as WWNs, placing entries in a directory of applications. The /dev/bus heirarchy will suf-
such as /dev/volumes/labels. fice for uncomplicated systems which do not change
their topology (such as embedded and desktop ma-
chines, which dominate the market). In addition,
5.3 Mounting via physical path /dev/bus provides a convenient place in which to
search for all system busses, which is of use for the
system administrator as well as some system man-
Prior to mounting via WWN or volume label, the agement programmes. Also, because /dev/bus is
initial location of a device is required. Once the managed by the generic Linux bus management sub-
device is located, the WWN may be obtained, or a system, it is always available, even on systems with
volume label may be written. In order to initially lo- complex topologies. A vendor need not implement
cate the device, the physical path to the device must a /dev/hw heirarchy if it considers the benefits to be
be used. To support this, device names which rep- marginal, or if time does not permit prior to prod-
resent the physical location of devices are required. uct shipment. Implementing a /dev/hw tree will
To support this, two new naming schemes are pro- add value, but is not required for basic operation of
posed. a system.