0% found this document useful (0 votes)
66 views47 pages

System - Level I/O: 15 - 213/18 - 243: Introduc3on To Computer Systems 16 Lecture, Mar. 15, 2010

This document discusses system-level I/O and Unix I/O. It describes how all I/O devices are represented as files in Unix, and the different types of Unix files. It explains the basic Unix I/O operations like opening, closing, reading and writing files. It provides examples of reading from standard input and writing to standard output using these system calls. It also discusses dealing with short read/write counts and introduces the RIO robust I/O package.

Uploaded by

shodocozhou
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
66 views47 pages

System - Level I/O: 15 - 213/18 - 243: Introduc3on To Computer Systems 16 Lecture, Mar. 15, 2010

This document discusses system-level I/O and Unix I/O. It describes how all I/O devices are represented as files in Unix, and the different types of Unix files. It explains the basic Unix I/O operations like opening, closing, reading and writing files. It provides examples of reading from standard input and writing to standard output using these system calls. It also discusses dealing with short read/write counts and introduces the RIO robust I/O package.

Uploaded by

shodocozhou
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 47

Carnegie Mellon

System-Level I/O

15-213/18-243: Introduc3on to Computer Systems 16th Lecture, Mar. 15, 2010 Instructors: Gregory Kesden and Anthony Rowe

Carnegie Mellon

Today

Unix I/O RIO (robust I/O) package Metadata, sharing, and redirecEon Standard I/O Conclusions and examples

Carnegie Mellon

Unix Files

A Unix le is a sequence of m bytes:


B0 , B1 , .... , Bk , .... , Bm-1

All I/O devices are represented as les:


/dev/sda2 (/usr disk par33on) /dev/tty2 (terminal)

Even the kernel is represented as a le:


/dev/kmem /proc
(kernel memory image) (kernel data structures)

Carnegie Mellon

Unix File Types

Regular le
File containing user/app data (binary, text, whatever) OS does not know anything about the format

other than sequence of bytes, akin to main memory

Directory le
A le that contains the names and loca3ons of other les

Character special and block special les


Terminals (character special) and disks (block special)

FIFO (named pipe)


A le type used for inter-process communica3on

Socket
A le type used for network communica3on between processes
4

Carnegie Mellon

Unix I/O

Key Features
Elegant mapping of les to devices allows kernel to export simple
interface called Unix I/O Important idea: All input and output is handled in a consistent and uniform way

Basic Unix I/O operaEons (system calls):


Opening and closing les
open()and close() Reading and wri3ng a le read() and write() Changing the current le posi/on (seek) indicates next oset into le to read or write lseek()

B0 B1

Bk-1 Bk Bk+1

Current le posiEon = k

Carnegie Mellon

Opening Files

Opening a le informs the kernel that you are geUng ready to access that le
int fd; /* file descriptor */ if ((fd = open("/etc/hosts", O_RDONLY)) < 0) { perror("open"); exit(1); }

Returns a small idenEfying integer le descriptor


fd == -1 indicates that an error occurred

Each process created by a Unix shell begins life with three open les associated with a terminal:
0: standard input 1: standard output 2: standard error
6

Carnegie Mellon

Closing Files

Closing a le informs the kernel that you are nished accessing that le
int fd; /* file descriptor */ int retval; /* return value */ if ((retval = close(fd)) < 0) { perror("close"); exit(1); }

Closing an already closed le is a recipe for disaster in threaded programs (more on this later) Moral: Always check return codes, even for seemingly benign funcEons such as close()
7

Carnegie Mellon

Reading Files

Reading a le copies bytes from the current le posiEon to memory, and then updates le posiEon
char buf[512]; int fd; /* file descriptor */ int nbytes; /* number of bytes read */ /* Open file fd ... */ /* Then read up to 512 bytes from file fd */ if ((nbytes = read(fd, buf, sizeof(buf))) < 0) { perror("read"); exit(1); }

Returns number of bytes read from le fd into buf


Return type ssize_t is signed integer nbytes < 0 indicates that an error occurred Short counts (nbytes < sizeof(buf) ) are possible and are not
errors!
8

Carnegie Mellon

WriEng Files

WriEng a le copies bytes from memory to the current le posiEon, and then updates current le posiEon
char buf[512]; int fd; /* file descriptor */ int nbytes; /* number of bytes read */ /* Open the file fd ... */ /* Then write up to 512 bytes from buf to file fd */ if ((nbytes = write(fd, buf, sizeof(buf)) < 0) { perror("write"); exit(1); }

Returns number of bytes wriXen from buf to le fd


nbytes < 0 indicates that an error occurred As with reads, short counts are possible and are not errors!
9

Carnegie Mellon

Simple Unix I/O example

Copying standard in to standard out, one byte at a Eme


#include "csapp.h" int main(void) { char c; while(Read(STDIN_FILENO, &c, 1) != 0) Write(STDOUT_FILENO, &c, 1); exit(0); }

cpstdin.c

Note the use of error handling wrappers for read and write (Appendix A).

10

Carnegie Mellon

Dealing with Short Counts

Short counts can occur in these situaEons:


Encountering (end-of-le) EOF on reads Reading text lines from a terminal Reading and wri3ng network sockets or Unix pipes

Short counts never occur in these situaEons:


Reading from disk les (except for EOF) Wri3ng to disk les

One way to deal with short counts in your code:


Use the RIO (Robust I/O) package from your textbooks csapp.c
le (Appendix B)

11

Carnegie Mellon

Today

Unix I/O RIO (robust I/O) package Metadata, sharing, and redirecEon Standard I/O Conclusions and examples

12

Carnegie Mellon

The RIO Package

RIO is a set of wrappers that provide ecient and robust I/O in apps, such as network programs that are subject to short counts RIO provides two dierent kinds of funcEons
Unbuered input and output of binary data
rio_readn and rio_writen Buered input of binary data and text lines rio_readlineb and rio_readnb Buered RIO rou3nes are thread-safe and can be interleaved arbitrarily on the same descriptor

Download from hXp://csapp.cs.cmu.edu/public/code.html


src/csapp.c and include/csapp.h
13

Carnegie Mellon

Unbuered RIO Input and Output


Same interface as Unix read and write Especially useful for transferring data on network sockets
#include "csapp.h" ssize_t rio_readn(int fd, void *usrbuf, size_t n); ssize_t rio_writen(int fd, void *usrbuf, size_t n); Return: num. bytes transferred if OK, 0 on EOF (rio_readn only), -1 on error

rio_readn returns short count only if it encounters EOF


Only use it when you know how many bytes to read rio_writen never returns a short count Calls to rio_readn and rio_writen can be interleaved arbitrarily on the same descriptor

14

Carnegie Mellon

ImplementaEon of rio_readn
/* * rio_readn - robustly read n bytes (unbuffered) */ ssize_t rio_readn(int fd, void *usrbuf, size_t n) { size_t nleft = n; ssize_t nread; char *bufp = usrbuf; while (nleft > 0) { if ((nread = read(fd, bufp, nleft)) < 0) { if (errno == EINTR) /* interrupted by sig handler return */ nread = 0; /* and call read() again */ else return -1; /* errno set by read() */ } else if (nread == 0) break; /* EOF */ nleft -= nread; bufp += nread; } return (n - nleft); /* return >= 0 */ }

csapp.c

15

Carnegie Mellon

Buered I/O: MoEvaEon

ApplicaEons o_en read/write one character at a Eme


getc, putc, ungetc gets, fgets

Read line of text on character at a 3me, stopping at newline

ImplemenEng as Unix I/O calls expensive


read and write require Unix kernel calls

> 10,000 clock cycles

SoluEon: Buered read


Use Unix read to grab block of bytes User input func3ons take one byte at a 3me from buer

Rell buer when empty unread


16

Buer already read

Carnegie Mellon

Buered I/O: ImplementaEon


For reading from le File has associated buer to hold bytes that have been read from le but not yet read by user code
rio_cnt

Buer already read


rio_buf rio_bufptr

unread

Layered on Unix le:


Buered PorEon

not in buer

already read

unread
Current File PosiEon

unseen

17

Carnegie Mellon

Buered I/O: DeclaraEon

All informaEon contained in struct


rio_cnt

Buer already read


rio_buf rio_bufptr

unread

typedef struct { int rio_fd; int rio_cnt; char *rio_bufptr; char rio_buf[RIO_BUFSIZE]; } rio_t;

/* /* /* /*

descriptor for this internal buf */ unread bytes in internal buf */ next unread byte in internal buf */ internal buffer */

18

Carnegie Mellon

Buered RIO Input FuncEons

Eciently read text lines and binary data from a le parEally cached in an internal memory buer
#include "csapp.h" void rio_readinitb(rio_t *rp, int fd); ssize_t rio_readlineb(rio_t *rp, void *usrbuf, size_t maxlen); Return: num. bytes read if OK, 0 on EOF, -1 on error

rio_readlineb reads a text line of up to maxlen bytes from le


fd and stores the line in usrbuf Especially useful for reading text lines from network sockets Stopping condi3ons maxlen bytes read EOF encountered Newline (\n) encountered

19

Carnegie Mellon

Buered RIO Input FuncEons (cont)


#include "csapp.h" void rio_readinitb(rio_t *rp, int fd); ssize_t rio_readlineb(rio_t *rp, void *usrbuf, size_t maxlen); ssize_t rio_readnb(rio_t *rp, void *usrbuf, size_t n); Return: num. bytes read if OK, 0 on EOF, -1 on error

rio_readnb reads up to n bytes from le fd Stopping condi3ons


maxlen bytes read EOF encountered Calls to rio_readlineb and rio_readnb can be interleaved arbitrarily on the same descriptor Warning: Dont interleave with calls to rio_readn

20

Carnegie Mellon

RIO Example

Copying the lines of a text le from standard input to standard output


#include "csapp.h" int main(int argc, char **argv) { int n; rio_t rio; char buf[MAXLINE]; Rio_readinitb(&rio, STDIN_FILENO); while((n = Rio_readlineb(&rio, buf, MAXLINE)) != 0) Rio_writen(STDOUT_FILENO, buf, n); exit(0); }

cpfile.c

21

Carnegie Mellon

Today

Unix I/O RIO (robust I/O) package Metadata, sharing, and redirecEon Standard I/O Conclusions and examples

22

Carnegie Mellon

File Metadata

Metadata is data about data, in this case le data Per-le metadata maintained by kernel
accessed by users with the stat and fstat func3ons

/* Metadata returned by the stat and fstat functions */ struct stat { dev_t st_dev; /* device */ ino_t st_ino; /* inode */ mode_t st_mode; /* protection and file type */ nlink_t st_nlink; /* number of hard links */ uid_t st_uid; /* user ID of owner */ gid_t st_gid; /* group ID of owner */ dev_t st_rdev; /* device type (if inode device) */ off_t st_size; /* total size, in bytes */ unsigned long st_blksize; /* blocksize for filesystem I/O */ unsigned long st_blocks; /* number of blocks allocated */ time_t st_atime; /* time of last access */ time_t st_mtime; /* time of last modification */ time_t st_ctime; /* time of last change */ };
23

Carnegie Mellon

Example of Accessing File Metadata


/* statcheck.c - Querying and manipulating a files meta data */ #include "csapp.h" unix> ./statcheck statcheck.c type: regular, read: yes int main (int argc, char **argv) unix> chmod 000 statcheck.c { unix> ./statcheck statcheck.c struct stat stat; type: regular, read: no char *type, *readok; unix> ./statcheck .. type: directory, read: yes Stat(argv[1], &stat); unix> ./statcheck /dev/kmem if (S_ISREG(stat.st_mode)) type: other, read: yes type = "regular"; else if (S_ISDIR(stat.st_mode)) type = "directory"; else type = "other"; if ((stat.st_mode & S_IRUSR)) /* OK to read?*/ readok = "yes"; else readok = "no"; printf("type: %s, read: %s\n", type, readok); exit(0); }

statcheck.c

24

Carnegie Mellon

Accessing Directories

Only recommended operaEon on a directory: read its entries


dirent structure contains informa3on about a directory entry DIR structure contains informa3on about directory while stepping
through its entries
#include <sys/types.h> #include <dirent.h> { DIR *directory; struct dirent *de; ... if (!(directory = opendir(dir_name))) error("Failed to open directory"); ... while (0 != (de = readdir(directory))) { printf("Found file: %s\n", de->d_name); } ... closedir(directory); }
25

Carnegie Mellon

How the Unix Kernel Represents Open Files

Two descriptors referencing two disEnct open disk les. Descriptor 1 (stdout) points to terminal, and descriptor 4 points to open disk le
Descriptor table [one table per process] Open le table v-node table [shared by all processes] [shared by all processes]
File A (terminal) File access File size File type ...

stdin fd 0 stdout fd 1 stderr fd 2 fd 3 fd 4

File pos
refcnt=1

Info in stat struct

File B (disk) File pos


refcnt=1

... ...

File access File size File type ...


26

Carnegie Mellon

File Sharing

Two disEnct descriptors sharing the same disk le through two disEnct open le table entries
E.g., Calling open twice with the same filename argument
Descriptor table [one table per process] Open le table v-node table [shared by all processes] [shared by all processes]
File A (disk) File access File size File type ...

stdin fd 0 stdout fd 1 stderr fd 2 fd 3 fd 4

File pos
refcnt=1

File B (disk) File pos


refcnt=1

... ...

27

Carnegie Mellon

How Processes Share Files: Fork()

A child process inherits its parents open les


Note: situa3on unchanged by exec func3ons (use fcntl to change)

Before fork() call:


Descriptor table [one table per process] Open le table v-node table [shared by all processes] [shared by all processes]
File A (terminal) File access File size File type ...

stdin fd 0 stdout fd 1 stderr fd 2 fd 3 fd 4

File pos
refcnt=1

File B (disk) File pos


refcnt=1

... ...

File access File size File type ...


28

Carnegie Mellon

How Processes Share Files: Fork()


A child process inherits its parents open les A:er fork():


Childs table same as parents, and +1 to each refcnt
Descriptor table [one table per process]
Parent
fd 0 fd 1 fd 2 fd 3 fd 4

Open le table v-node table [shared by all processes] [shared by all processes]
File A (terminal) File pos
refcnt=2

File access File size File type ...

Child
fd 0 fd 1 fd 2 fd 3 fd 4

File B (disk) File pos


refcnt=2

... ...

File access File size File type ...


29

Carnegie Mellon

I/O RedirecEon

QuesEon: How does a shell implement I/O redirecEon?


unix> ls > foo.txt

Answer: By calling the dup2(oldfd, newfd) funcEon


Copies (per-process) descriptor table entry oldfd to entry newfd

Descriptor table before dup2(4,1)


fd 0 fd 1 fd 2 fd 3 fd 4

Descriptor table a:er dup2(4,1)


fd 0 fd 1 fd 2 fd 3 fd 4

a b

b b
30

Carnegie Mellon

I/O RedirecEon Example

Step #1: open le to which stdout should be redirected


Happens in child execu3ng shell code, before exec
Descriptor table [one table per process] Open le table v-node table [shared by all processes] [shared by all processes]
File A File access File size File type ...

stdin fd 0 stdout fd 1 stderr fd 2 fd 3 fd 4

File pos
refcnt=1

File B File pos


refcnt=1

... ...

File access File size File type ...


31

Carnegie Mellon

I/O RedirecEon Example (cont.)

Step #2: call dup2(4,1)


cause fd=1 (stdout) to refer to disk le pointed at by fd=4
Descriptor table [one table per process] Open le table v-node table [shared by all processes] [shared by all processes]
File A File access File size File type ...

stdin fd 0 stdout fd 1 stderr fd 2 fd 3 fd 4

File pos
refcnt=0

File B File pos


refcnt=2

... ...

File access File size File type ...


32

Carnegie Mellon

Fun with File Descriptors (1)


#include "csapp.h" int main(int argc, char *argv[]) { int fd1, fd2, fd3; char c1, c2, c3; char *fname = argv[1]; fd1 = Open(fname, O_RDONLY, 0); fd2 = Open(fname, O_RDONLY, 0); fd3 = Open(fname, O_RDONLY, 0); Dup2(fd2, fd3); Read(fd1, &c1, 1); Read(fd2, &c2, 1); Read(fd3, &c3, 1); printf("c1 = %c, c2 = %c, c3 = %c\n", c1, c2, c3); return 0; } ffiles1.c

What would this program print for le containing abcde?


33

Carnegie Mellon

Fun with File Descriptors (2)


#include "csapp.h" int main(int argc, char *argv[]) { int fd1; int s = getpid() & 0x1; char c1, c2; char *fname = argv[1]; fd1 = Open(fname, O_RDONLY, 0); Read(fd1, &c1, 1); if (fork()) { /* Parent */ sleep(s); Read(fd1, &c2, 1); printf("Parent: c1 = %c, c2 = %c\n", c1, c2); } else { /* Child */ sleep(1-s); Read(fd1, &c2, 1); printf("Child: c1 = %c, c2 = %c\n", c1, c2); } return 0; } ffiles2.c

What would this program print for le containing abcde?

34

Carnegie Mellon

Fun with File Descriptors (3)


#include "csapp.h" int main(int argc, char *argv[]) { int fd1, fd2, fd3; char *fname = argv[1]; fd1 = Open(fname, O_CREAT|O_TRUNC|O_RDWR, S_IRUSR|S_IWUSR); Write(fd1, "pqrs", 4); fd3 = Open(fname, O_APPEND|O_WRONLY, 0); Write(fd3, "jklmn", 5); fd2 = dup(fd1); /* Allocates descriptor */ Write(fd2, "wxyz", 4); Write(fd3, "ef", 2); return 0; } ffiles3.c

What would be the contents of the resulEng le?

35

Carnegie Mellon

Today

Unix I/O RIO (robust I/O) package Metadata, sharing, and redirecEon Standard I/O Conclusions and examples

36

Carnegie Mellon

Standard I/O FuncEons

The C standard library (libc.so) contains a collecEon of higher-level standard I/O funcEons
Documented in Appendix B of K&R.

Examples of standard I/O funcEons:



Opening and closing les (fopen and fclose) Reading and wri3ng bytes (fread and fwrite) Reading and wri3ng text lines (fgets and fputs) Formahed reading and wri3ng (fscanf and fprintf)

37

Carnegie Mellon

Standard I/O Streams

Standard I/O models open les as streams


Abstrac3on for a le descriptor and a buer in memory. Similar to buered RIO

C programs begin life with three open streams (dened in stdio.h)


stdin (standard input) stdout (standard output) stderr (standard error)

#include <stdio.h> extern FILE *stdin; /* standard input (descriptor 0) */ extern FILE *stdout; /* standard output (descriptor 1) */ extern FILE *stderr; /* standard error (descriptor 2) */ int main() { fprintf(stdout, "Hello, world\n"); }
38

Carnegie Mellon

Buering in Standard I/O

Standard I/O funcEons use buered I/O printf("h"); printf("e"); printf("l"); printf("l"); printf("o");
buf

printf("\n");

e l

o \n .

fflush(stdout); write(1, buf, 6);

Buer ushed to output fd on \n or fflush() call


39

Carnegie Mellon

Standard I/O Buering in AcEon

You can see this buering in acEon for yourself, using the always fascinaEng Unix strace program:
linux> strace ./hello execve("./hello", ["hello"], [/* ... */]). ... write(1, "hello\n", 6) = 6 ... exit_group(0) = ?

#include <stdio.h> int main() { printf("h"); printf("e"); printf("l"); printf("l"); printf("o"); printf("\n"); fflush(stdout); exit(0); }

40

Carnegie Mellon

Today

Unix I/O RIO (robust I/O) package Metadata, sharing, and redirecEon Standard I/O Conclusions

41

Carnegie Mellon

Unix I/O vs. Standard I/O vs. RIO

Standard I/O and RIO are implemented using low-level Unix I/O
fdopen fwrite fprintf sprintf fputs fseek

fopen fread fscanf sscanf fgets fflush fclose

C applicaEon program
Standard I/O funcEons Unix I/O funcEons (accessed via system calls) RIO funcEons rio_readn rio_writen rio_readinitb rio_readlineb rio_readnb

open write stat

read lseek close

Which ones should you use in your programs?


42

Carnegie Mellon

Pros and Cons of Unix I/O

Pros
Unix I/O is the most general and lowest overhead form of I/O.
All other I/O packages are implemented using Unix I/O func3ons. Unix I/O provides func3ons for accessing le metadata. Unix I/O func3ons are async-signal-safe and can be used safely in signal handlers.

Cons
Dealing with short counts is tricky and error prone. Ecient reading of text lines requires some form of buering, also
tricky and error prone. Both of these issues are addressed by the standard I/O and RIO packages.

43

Carnegie Mellon

Pros and Cons of Standard I/O

Pros:
Buering increases eciency by decreasing the number of read
and write system calls Short counts are handled automa3cally

Cons:
Provides no func3on for accessing le metadata Standard I/O func3ons are not async-signal-safe, and not
appropriate for signal handlers. Standard I/O is not appropriate for input and output on network sockets There are poorly documented restric3ons on streams that interact badly with restric3ons on sockets (CS:APP2e, Sec 10.9)

44

Carnegie Mellon

Choosing I/O FuncEons

General rule: use the highest-level I/O funcEons you can


Many C programmers are able to do all of their work using the standard
I/O func3ons

When to use standard I/O


When working with disk or terminal les

When to use raw Unix I/O


Inside signal handlers, because Unix I/O is async-signal-safe. In rare cases when you need absolute highest performance.

When to use RIO


When you are reading and wri3ng network sockets. Avoid using standard I/O on sockets.

45

Carnegie Mellon

Aside: Working with Binary Files

Binary File Examples


Object code, Images (JPEG, GIF),

FuncEons you shouldnt use on binary les


Line-oriented I/O such as fgets, scanf, printf,
rio_readlineb Dierent systems interpret 0x0A (\n) (newline) dierently: Linux and Mac OS X: LF(0x0a) [\n] HTTP servers & Windoes: CR+LF(0x0d 0x0a) [\r\n] Use rio_readn or rio_readnb instead

String func3ons

strlen, strcpy Interprets byte value 0 (end of string) as special


46

Carnegie Mellon

For Further InformaEon

The Unix bible:


W. Richard Stevens & Stephen A. Rago, Advanced Programming in
the Unix Environment, 2nd Edi3on, Addison Wesley, 2005 Updated from Stevenss 1993 classic text.

Stevens is arguably the best technical writer ever.


Produced authorita3ve works in:

Unix programming TCP/IP (the protocol that makes the Internet work) Unix network programming Unix IPC programming

Tragically, Stevens died Sept. 1, 1999


But others have taken up his legacy
47

You might also like