0% found this document useful (0 votes)
25 views50 pages

ch7文件操作

Uploaded by

yueli98354
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
25 views50 pages

ch7文件操作

Uploaded by

yueli98354
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 50

7 File Operations

7.1 File Operation Levels


File operations consist of five levels, from low to high, as shown in the
following hierarchy.
(1). Hardware Level: File operations at hardware level include
fdisk : divide a hard disk, USB or SDC drive into partitions.
mkfs : format disk partitions to make them ready for file systems.
fsck : check and repair file system.
defragmentation: compact files in a file system.
Most of these are system-oriented utility programs.
(2). File System Functions in OS Kernel: Every operating system kernel
provides support for basic file operations. The following lists some of
these functions in a Unix-like system kernel, where the prefix k denotes
kernel functions.
(3). System Calls: User mode programs use system calls to access kernel functions. As
an example, the following program reads the second 1024 bytes of a file.
#include <fcntl.h>
#include <stdlib.h>
#include <unistd.h>
int main(int argc, char *argv[]) // run as a.out filename
{
int fd, n;
char buf[1024];
if ((fd = open(argv[1], O_RDONLY)) < 0) // if open() fails
exit(1);
lseek(fd, 1024, SEEK_SET); // lseek to byte 1024
n = read(fd, buf, 1024); // try to read 1024 bytes
close(fd);
}
• The functions open(), read(), lseek() and close() are C library
functions.
• Each of these library functions issues a system call, which causes the
process to enter kernel mode to execute a corresponding kernel
function, e.g. open goes to kopen(), read goes to kread(), etc.
• When the process finishes executing the kernel function, it returns to
user mode with the desired results.
• Switch between user mode and kernel mode requires a lot of actions
(and time). Data transfer between kernel and user spaces is therefore
quite expensive.
• Although it is permissible to issue a read(fd, buf, 1) system call to read
only one byte of data, it is not very wise to do so since that one byte
would come with a terrific cost.
• The kernel reads/writes files by block size, which ranges from 1KB to
8KB.
• For instance, in Linux, the default block size is 4KB for hard disks and
1KB for floppy disks.
• So each read/write system call should also try to transfer one block of
data at a time.
(4). Library I/O Functions: System calls allow the user to read/write
chunks of data, which are just a sequence of bytes.
• They do not know, nor care, about the meaning of the data.
• A user often needs to read/write individual chars, lines or data
structure records, etc.
• With only system calls, a user mode program must do these operations
from/to a buffer area by itself.
• The C library provides a set of standard I/O functions for convenience,
as well as for run-time efficiency. Library I/O functions include:
FILE mode I/O: fopen(),fread(); fwrite(),fseek(),fclose(),fflush()
char mode I/O: getc(), getchar() ugetc(); putc(),putchar()
line mode I/O: gets(), fgets(); puts(), fputs()
formatted I/O: scanf(),fscanf(),sscanf(); printf(),fprintf(),sprintf()

With the exceptions of sscanf()/sprintf(), which read/write memory


locations, all other library I/O functions are built on top of system calls,
i.e. they ultimately issue system calls for actual data transfer through the
system kernel.
(5). User Commands: Instead of writing programs, users may use
Unix/Linux commands to do file operations. Examples of user
commands are
mkdir, rmdir, cd, pwd, ls, link, unlink, rm, cat, cp, mv, chmod, etc.
Each user command is in fact an executable program (except cd), which
typically calls library I/O functions, which in turn issue system calls to
invoke the corresponding kernel functions.
The processing sequence of a user command is either
(6). Sh Scripts: Sh scripts are programs written in the sh programming
language, which can be executed by the command interpreter sh.
The sh language include all valid Unix/Linux commands.
It also supports variables and control statements, such as if, do, for,
while, case, etc.
In practice, sh scripts are used extensively in Unix/Linux systems
programming.
In addition to sh, many other script languages, such as Perl and Tcl, are
also in wide use.
7.2 File I/O Operations
Figure 7.1 shows the diagram of file I/O operations. The diagram shows
the sequence of actions when a process read/write a file stream.
(1). A process in User mode executes
FILE *fp = fopen("file", "r"); or FILE *fp = fopen(“file”, “w”);
which opens a file stream for READ or WRITE.
(2). fopen() creates a FILE structure in user (heap) space containing a
file descriptor, fd, a fbuf[BLKSIZE] and some control variables.
It issues a fd = open("file", flags=READ or WRITE) syscall to kopen()
in kernel, which constructs an OpenTable to represent an instance of the
opened file.
The OpenTable’s mptr points to the file’s INODE in memory.
For non-special files, the INODE’s i_block array points to data blocks on
the storage device.
On success, fp points to the FILE structure, in which fd is the file
descriptor returned by the open() syscall.
(3). fread(ubuf, size, nitem, fp): READ nitem of size each to ubuf by
• copy data from FILE structure’s fbuf to ubuf, if enough, return;
• if fbuf has no more data, then execute (4a).
(4a). issue read(fd, fbuf, BLKSIZE) system call to read a file block from
kernel to fbuf, then copy data to ubuf until enough or file has no more
data.
(4b). fwrite(ubuf, size, nitem, fp): copy data from ubuf to fbuf;
• if (fbuf has room): copy data to fbuf, return;
• if (fbuf is full) : issue write(fd, fbuf, BLKSIZE) system call to write a
block to kernel, then write to fbuf again.
Thus, fread()/fwrite() issue read()/write() syscalls to kernel, but they do
so only if necessary and they transfer data in chunks of block size for
better efficiency.
Similarly, other Library I/O Functions, such as fgetc/fputc,fgets/fputs,
fscanf/fprintf, etc. also operate on fbuf in the FILE structure, which is
in user space.
(5). File system functions in kernel:
Assume read(fd, fbuf[ ], BLKSIZE) system call of non-special file.
(6). In a read() system call, fd is an opened file descriptor, which is an
index in the running PROC’s fd array, which points to an OpenTable
representing the opened file.
(7). The OpenTable contains the file’s open mode, a pointer to the file’s
INODE in memory and the current byte offset into the file for
read/write.
From the OpenTable’s offset,
• Compute logical block number, lbk;
• Convert logical block number to physical block number, blk, via
INODE.i_block[ ] array
(8). Minode contains the in-memory INODE of the file. The
INODE.i_block[ ] array contains pointers to physical disk blocks.
A file system may use the physical block numbers to read/write data from/to
the disk blocks directly, but these would incur too much physical disk I/O.
(9). In order to improve disk I/O efficiency, the OS kernel usually uses a set
of I/O buffers as a cache memory to reduce the number of physical I/O.
(9a). For a read(fd, buf, BLKSIZE) system call, determine the needed (dev,
blk) number, then consult the I/O buffer cache to
get a buffer = (dev, blk);
if (buffer’s data are invalid){
start_io on buffer;
wait for I/O completion;
}
copy data from buffer to fbuf;
release buffer to buffer cache;
(9b). For a write(fd, fbuf, BLKSIZE) system call, determine the needed
(dev, blk) number, then consult the I/O buffer cache to
get a buffer = (dev, blk);
write data to the I/O buffer;
mark buffer as dataValid and DIRTY (for delay-write to disk);
release the buffer to buffer cache;
(10). Device I/O: Physical I/O on the I/O buffers ultimately go through
the device driver,which consists of start_io() in the upper-half and disk
interrupt handler in the lower-half of the driver.
7.3 Low Level File Operations
7.3.1 Partitions
A block storage device, e.g. hard disk, USB drive, SD card, etc. can be
divided into several logical units, called partitions.
Each partition can be formatted as a specific file system and may be
installed with a different operating system.
Most booting programs, e.g. GRUB, LILO, etc. can be configured to
boot up different operating systems from the various partitions.
The partition table is at the byte offset 446 (0x1BE) in the very first
sector, which is called the Master Boot Record (MBR) of the device.
The table has 4 entries, each defined by a 16-byte partition structure, which is
stuct partition {
u8 drive; // 0x80 - active
u8 head; // starting head
u8 sector; // starting sector
u8 cylinder; // starting cylinder
u8 sys_type; // partition type
u8 end_head; // end head
u8 end_sector; // end sector
u8 end_cylinder; // end cylinder
u32 start_sector; // starting sector counting from 0
u32 nr_sectors; // number of sectors in partition
};
If a partition is EXTEND type (type number=5), it can be divided into
more partitions.
Assume that partition P4 is EXTEND type and it is divided into extend
partitions P5, P6, P7.
The extend partitions form a link list in the extend partition area, as
shown in Fig. 7.2.
The first sector of each extend partition is a local MBR.
Each local MBR also has a partition table at the byte offset 0x1BE,
which contains only two entries.
The first entry defines the start sector and size of the extend partition.
The second entry points to the next local MBR. All the local MBR’s
sector numbers are relative to P4’s start sector.
In a partition table, the CHS values are valid only for disks smaller than
8GB.
For disks larger than 8GB but fewer than 4G sectors(2TB), only the last
2 entries, start_sector and nr_sectors, are meaningful.
For devices which have large blocks. GUID Partition Table (GPT) refers
to the unique identifier partition table. It is a part of the United
Extensive Firmware Interface standard (Unified EFI).
Since working with real disks of a computer is very risky, we shall use a
virtual disk image, which is just an ordinary file but looks like a real
disk.
(1). Under Linux, e.g. Ubuntu, create a virtual disk image file named
“mydisk”.
dd if=/dev/zero of=mydisk bs=1024 count=1440
dd is a program which writes 1440 (1KB) blocks of zeros to the target
file mydisk.
We choose count=1440 because it is the number of 1KB blocks of old
floppy disks.
(2). Run fdisk on the disk image file:
fdisk mydisk
The following figure shows the result of a fdisk session, which
partitions the disk image into 3 primary partitions (P1 to P3) and an
extend partition P4. The extend partition P4 is further divided into more
partitions (P5 to P7), all within the extend partition area.
7.3.2 Format Partitions
In order to store files, a partition must be made ready for a specific file
system first. The operation is traditionally called format a disk or disk
partition.
In Linux, it is called mkfs, which stands for Make File System.
In Linux, the command
mkfs –t TYPE [–b bsize] device nblocks
makes a file system of TYPE on a device of nblocks, each block of
bsize bytes.
If bsize is not specified, the default block size is 1KB.
We shall assume the EXT2/3 file system, which used to be the default
file systems of Linux. Thus,
mkfs –t ext2 vdisk 1440 OR mke2fs vdisk 1440
formats vdisk to an EXT2 file system with 1440 (bsize=1KB) blocks.
A formatted disk should be an empty file system containing only the
root directory.
However, mkfs of Linux always creates a default lost+found directory
under the root directory.
In Linux, a new file system is not yet accessible. It must be mounted to
an existing directory in the root file system.
The /mnt directory is usually used to mount other file systems.
Since virtual file systems are not real devices, they must be mounted as
loop devices, as in
sudo mount –o loop vdisk /mnt
which mounts vdisk onto the /mnt directory.
After mounting, the mounting point /mnt becomes equivalent to
the root directory of the mounted device.
To un-mount the device, detaching it from the root file system, enter
sudo umount /mnt OR sudo umount vdisk
The Linux mount command can mount partitions of real devices or an
entire virtual disk, but it can not mount partitions of a virtual disk.
If a virtual disk contains partitions, the partitions must be associated
with loop devices first.
7.3.3 Mount Partitions
(1). Create a virtual disk image by the dd command:
dd if=/dev/zero of=vdisk bs=1024 count=32768 #32K (1KB) blocks
(2). Run fdisk on vdisk to create a partition P1 using default start and
last sector numbers.
fdisk vdisk
vdisk should contain a partition P1=[start=2048, end=65535]. The
partition is 63488 sectors.
(3). Create a loop device on partition 1 of vdisk using sector numbers
losetup -o $(expr 2048 \* 512) --sizelimit $(expr 65535 \* 512)
/dev/loop1 vdisk
losetup needs the start byte (start_sector*512) and end byte
(end_sector*512) of the partition.
After creating a loop device, the reader may use the command
losetup –a
which displays all loop devices as /dev/loopN.
(4). Format /dev/loop1 an EXT2 file system
mke2fs -b 4096 /dev/loop1 7936 # mke2fs with 7936 4KB blocks
The size of the partition is 63488 sectors. The number of 4KB blocks is
63488 / 8 = 7936
(5). Mount the loop device
mount /dev/loop1 /mnt # mount as loop device
(6). Access mounted device as part of the file system
(cd /mnt; mkdir bin boot dev etc user) # populate with DIRs
(7). When finished with the device, umount it.
umount /mnt
(8). When finished with a loop device, detach it by the command
losetup –d /dev/loop1 # detach a loop device.
7.4 Introduction to EXT2 File System
7.4.1 EXT2 File System Data Structures
Under Linux, we can create a virtual disk containing a simple EXT2 file
system as follows.
(1). dd if=/dev/zero of=mydisk bs=1024 count=1440
(2). mke2fs –b 1024 mydisk 1440
We choose 1440 blocks because it is the number of blocks of (old)
floppy disks.
The layout of such an EXT2 file system is shown in Figure 7.4.
Block#0: Boot Block: B0 is the boot block, which is not used by the file
system.
It is used to contain a booter program for booting up an OS from the
disk.
7.4.2 Superblock
Block#1: Superblock: B1 is the superblock, which contains information
about the entire file system.
Only a few deserve more explanation.
• s_first_data_block = 0 for 4KB block size and 1 for 1KB block size. It
is used to determine the start block of group descriptors, which is
s_first_data_block + 1.
• s_log_block_size determines the file block size, which is
m1KB*(2**s_log_block_size), The most often used block size is 1KB
for small file systems and 4KB for large file systems.
• s_mnt_count = number of times the file system has been mounted.
When the mount count reaches the max_mount_count, a fsck session
is forced to check the file system for consistency.
• s_magic is the magic number which identifies the file system type. For
EXT2/3/4 files systems, the magic number is 0xEF53.
7.4.3 Group Descriptor
Block#2: Group Descriptor Block (s_first_data_block+1 on hard disk): EXT2 divides disk
blocks into groups.
Each group contains 8192 (32K on HD) blocks. Each group is described by a group
descriptor structure.
struct ext2_group_desc {
u32 bg_block_bitmap; // Bmap block number
u32 bg_inode_bitmap; // Imap block number
u32 bg_inode_table; // Inodes begin block number
u16 bg_free_blocks_count; // THESE are OBVIOUS
u16 bg_free_inodes_count;
u16 bg_used_dirs_count;
u16 bg_pad; // ignore these
u32 bg_reserved[3];
};
Since a FD has only 1440 blocks, B2 contains only 1 group descriptor.
The rest are 0’s.
On hard disks with a large number of groups, group descriptors may
span many blocks.
The most important fields in a group descriptor are
bg_block_bitmap,
bg_inode_bitmap
bg_inode_table,
which point to the group’s blocks bitmap, inodes bitmap and inodes
start block, respectively.
For the Linux formatted EXT2 file system, blocks 3 to 7 are reserved.
So bmap=8, imap=9 and inode_table=10.
7.4.4 Bitmaps
Block#8: Block Bitmap (Bmap): (bg_block_bitmap): A bitmap is a
sequence of bits used to represent some kind of items, e.g. disk blocks
or inodes.
In a bitmap, a 0 bit means the corresponding item is FREE, and a 1 bit
means the corresponding item is IN_USE.
A FD has 1440 blocks but block#0 is not used by the file system.
So the Bmap has only 1439 valid bits. Invalid bits are treated as
IN_USE and set to 1’s.
Block#9: Inode Bitmap (Imap): (bg_inode_bitmap): An inode is a data
structure used to represent a file.
An EXT2 file system is created with a finite number of inodes.
The status of each inode is represented by a bit in the Imap in B9.
In an EXT2 FS, the first 10 inodes are reserved.
So the Imap of an empty EXT2 FS starts with ten 1’s, followed by 0’s.
Invalid bits are again set to 1’s.
7.4.5 Inodes
The essential inode fields are listed below.
Block#10: Inodes (begin) Block: (bg_inode_table): Every file is
represented by a unique inode structure of 128 (256 in EXT4) bytes.
In the inode structure, i_mode is a u16 or 2-byte unsigned integer.

The leading 4 bits specify the file type.


The next 3 bits ugs indicate the file’s special usage.
The last 9 bits are the rwx permission bits for file protection.
The i_size field is the file size in bytes.
The various time fields are number of seconds elapsed since 0 hour, 0
minute, 0 second of January 1, 1970.
Each time filed is a very large unsigned integer. They can be converted
to calendar form by the library function
char *ctime(&time_field)
which takes a pointer to a time field and returns a string in calendar
form. For example,
printf(“%s”, ctime(&inode.i_atime); // note: pass & of time field
prints i_atime in calendar form.
The i_block[15] array contains pointers to disk blocks of a file, which
are
Direct blocks: i_block[0] to i_block[11], which point to direct disk
blocks.
Indirect blocks: i_block[12] points to a disk block, which contains 256
(for 1KB BLKSIZE) block numbers, each points to a disk block.
Double Indirect blocks: i_block[13] points to a block, which points to
256 blocks, each of which points to 256 disk blocks.
Triple Indirect blocks: i_block[14] is the triple-indirect block. We may
ignore this for "small“ EXT2 file systems.
The inode size (128 or 256) is designed to divides block size (1KB or
4KB) evenly, so that every inode block contains an integral number of
inodes.
In the simple EXT2 file system, the number of inodes is (a Linux
default) 184.
The number of inodes blocks is equal to 184/8=23. ( 1KB block )
So the inodes blocks include B10 to B32.
Each inode has a unique inode number, which is the inode’s position in
the inode blocks plus 1.
Note that inode positions count from 0, but inode numbers count from
1. A 0 inode number means no inode.
The root directory’s inode number is 2.
Similarly, disk block numbers also count from 1 since block 0 is never
used by a file system.
A zero block number means no disk block.
Data Blocks: Immediately after the inodes blocks are data blocks for file
storage.
Assuming 184 inodes, the first real data block is B33, which is
i_block[0] of the root directory /.
7.4.6 Directory Entries
EXT2 Directory Entries: A directory contains dir_entry structures, which is
struct ext2_dir_entry_2{
u32 inode; // inode number; count from 1, NOT 0
u16 rec_len; // this entry’s length in bytes
u8 name_len; // name length in bytes
u8 file_type; // not used
char name[EXT2_NAME_LEN]; // name: 1-255 chars, no ending NULL
};
The dir_entry is an open-ended structure. The name field contains 1 to 255 chars
without a terminating NULL byte. So the dir_entry’s rec_len also varies.
7.5 Programming Examples
We shall show how to access and display the contents of EXT2 file
systems by example programs.
In order to compile and run these programs, the system must have the
ext2fs.h header file installed, which defines the data structures of
EXT2/3/4 file systems.
Ubuntu Linux user may get and install the ext2fs development package
by
sudo apt-get install e2fslibs-dev
7.5.1 Display Superblock
The following C program in Example 7.1 displays the superblock of an
EXT2 file system.
7.5.2 Display Bitmaps
The C program in Example 7.2 display the inodes bitmap (imap) in HEX form.
The program prints each byte of the inodes bitmap as 2 HEX digits. The
outputs look like the following.

In the imap, bits are stored linearly from low to high address. The first 16 bits
(from low to high) are b’11111111 11100000’, but they are printed as ff 07 in
HEX, which is not very informative since the bits are printed in reverse order,
i.e. from high to low address.
7.5.3 Display root Inode
In an EXT2 file system, the number 2 (count from 1) inode is the inode
of the root directory /.
The program in Example 7.3 displays the INODE information of the
root directory of an EXT2 file system.
7.5.4 Display Directory Entries
Each data block of a directory INODE contains dir_entries, which are
struct ext2_dir_entry_2 {
u32 inode; // inode number; count from 1, NOT 0
u16 rec_len; // this entry’s length in bytes
u8 name_len; // name length in bytes
u8 file_type; // not used
char name[EXT2_NAME_LEN]; // name: 1-255 chars, no
ending NULL
};
Thus, the contents of each data block of a directory has the form
[inode rec_len name_len NAME] [inode rec_len name_len NAME] ......
where NAME is a sequence of name_len chars (without a terminating
NULL).
The following algorithm shows how to step through the dir_entries in a
directory data block.

You might also like