File IO
File IO
File IO
Files and Directories in Unix
based systems
Todays File Systems Extended from UFS (Unix
File System)
• BSD Fast File System
• Linux ext2
• Linux ext3
• Linux ext4 – Structure is same as ext3 with some addition (only 2 are listed here)
• Large file system (disks upto 1 exabytes vs 32 TB in ext3 and file size upto 16TB vs 2TB in ext3)
• Larger Directories 64000 entries vs 32000 in ext3
Unix File System (UFS) inode layout in disks
• 9699466 -rw-rw-r-- 1 faculty faculty 94319943 Mar 21 2017 2.4.13 • 11540615 architectureanalysis
• 9704475 bashrcn
• 12323363 drwxrwxr-x 3 faculty faculty 4096 Aug 29 13:29
ApacheSpark • 9704543 bashrcn~
• 9705036 hosts
• 10094452 drwxrwxr-x 3 faculty faculty 4096 Jun 28 12:03
BackupUSB • 9704703 hosts~
• 9699741 ml
• 9699526 -rw------- 1 faculty faculty 49233 Dec 21 13:23
.bash_history • 9704714 mpi_pi.c
• 9704716 mpi_pi.out
• 9699331 -rw-r--r-- 1 faculty faculty 220 Feb 24 2017 .bash_logout
• 9705098 -rw-r--r-- 1 faculty faculty 4330 Nov 24 14:17 .bashrc • virtualenv/architectureanalysis:
• 9704509 ..
• 11407369 archanalyze_decisiontree.ipynb
• 11540737 archanalyze_linearregression.ipynb
• 11541142 archanalyze_mlp-Copy1.ipynb
• 11540670 archanalyze_mlp.ipynb
• 9312887 archanalyze_mlp.py
• 11540730 archanalyze_partialleastsquare.ipynb
Directories
• Directories are special files that keep
track of other files
• the collection of files is systematically
organized
• first, disks are split into partitions that
create logical volumes (can be thought of as
“virtual disks”)
• second, each partition contains information
about the files within
• this information is kept in entries in a
device directory (or volume table of
contents)
• the directory is a symbol table that
translates file names into their entries in
the directory
• it has a logical structure
• it has an implementation structure (linked list,
table, etc.)
Linux File
System
Hierarchy
Directory entries pointing to file with link - Two
directory entries refer to the same file but with
different names • Hard link is the directory entry of a file or another
directory pointing to inode directly
• Any changes to inode of original file does not affect
the hard link
• Cannot span across file systems
• Symbolic (Soft) link is the directory entry of a file or
another directory pointing to another directory entry
which will either main entry or will have hard or soft link
to main entry
• Changes to inode in original file will affect the
symbolic link or soft link
• Can span across file system
Create Symbolic Links (ln command)
File_IO\stat_example.c
Useful Library Functions – The same are also
available as commands for shell
#include <unistd.h>
int access(const char *pathname, int mode) // get accessibility i.e. file permissions
int chown(const char *pathname, uid_t owner, gid_t group) // change owner of file to
provided values in owner an group parameters
int truncate(const char *pathname, off_t length) // truncate the file size to length
parameter
int link(const char *existingpath, const char *newpath) // creates hard link newpath from
the existing file at existingpath
int ulink(const char *pathname) // remove hard link
int symlink(const char *actualpath, const char *sympath) // create symbolic (soft) link
int readlink(const char *restrict pathname, char *restrict buf, size_t bufsize) // read the
content of symbolic (soft) link, performs open(), read() and close() function all together
int rmdir(const char *pathname) // removes directory
Useful Library Functions – The same are also
available as commands for shell
#include <stdio.h>
int remove(const char *pathname) // delete file or directory or unlink
symbolic link
int rename(const char *oldname, const char *newname) // rename file
or directory
#include <sys/stat.h>
int chmod(const char *pathname, mod_t mode) // change file access
permission
int mkdir(const char *pathname, mode_t mode) // creates new
directory
dirent.h – format of directory entries
• The internal format of directories is unspecified.
• The <dirent.h> header defines the following data type through
typedef:
• DIR : A type representing a directory stream.
• It also defines the structure dirent which includes the following
members:
• ino_t d_ino file inode (aka serial) number
• char d_name[] name of entry
• The type ino_t is defined as described in <sys/types.h>.
• The character array d_name is of unspecified size, but the number of bytes
preceding the terminating null byte will not exceed {NAME_MAX}.
Directory Functions
#include <dirent.h>
DIR *opendir(const char *pathname); // open directory stream and return
pointer to DIR stream object (DIR is similar to FILE stream object)
struct dirent *readdir(DIR *dp); // read directory entries from DIR stream
into pointer to (array of) structure dirent which has inode numbers and
filenames
void rewinddir(DIR *dp); // rewind DIR stream to beginning of directory
int closedir(DIR *dp); // close DIR stream
long telldir(DIR *dp); // return the current position of DIR stream
void seekdir(DIR *dp, long loc); //seek to location/position mentioned by loc
Implementation of pwd (present working
directory) using Directory Functions
Implement present working directory functionality of shell command
that displays the path of the current directory.
File_IO\mypwd.c
Buffered vs Unbuffered IO
• Unbuffered I/O: each read write invokes a system call in the kernel.
• read, write, open, close, lseek
• Data Unit: raw byte
• Buffered I/O: data is read/written in optimal-sized chunks from/to
disk --> streams
• standard I/O library written by Dennis Ritchie
• Data Unit: C data type
Unbuffered IO
Open a File
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h> or <unistd.h>
int open(const char *pathname, int flags);
int open(const char *pathname, int flags, mode_t mode);
• Parameters
• pathname : name of the file with complete path
• flags:
• O_RDONLY : read-only access
• O_WRONLY : write-only access
• O_RDWR : read-write access
• O_CREAT: if file doesn’t exists then create it
• O_APPEND : in write mode, don’t overwrite but append the new content
• O_TRUNC : in write mode, truncate the file before writing new content
• mode:
• 0600 (i.e. –rwx------) : read-write access for current user, no access for group users or other users
• 0644 (i.e. –rwxr—r--) : read-write access for current user, read-only access for group users and other users
Open a File
• open() returns an integer:
• -1 means error i.e. file could not be opened
• >= 0 : this is a “file descriptor” of a open file. Save it in a variable, you will
need to pass it to all subsequent file related functions such as read, write etc
• You don’t need to specify file mode unless you will be creating a new
file
• fd = open(“this_file_already_exists”, O_RDONLY);
• You can combine multiple flags together
• fd = open(“foo”, O_RDWR | O_CREAT, 644);
Operating System Tables
• If file exists
• Create an entry in System-Wide Open File Table
• Create entry in Process File Descriptor Table pointing to System-Wide Table
• If file needs to be created
• Disk FCB (i-node) is created first and then loaded it in System-Wide i-node Table
• Then same two steps from above
dup and dup2
#include <unistd.h>
int dup(int oldfd); → creates a copy of the file descriptor oldfd,
using the lowest-numbered unused file descriptor for the new
descriptor
int dup2(int oldfd, int newfd); → create a new file descriptor newfd
pointing to the same physical file as oldfd
Returns:
New file descriptor on success
-1 on error with errono variable set to check for exact error
Read From a File
#include <sys/types.h>
#include <unistd.h>
ssize_t read(int fd, void *buf, size_t count); → difference betn size_t &
ssize_t?
• Parameters:
• fd : file descriptor of the file which you want to read from
• buf : buffer where the file content will be stored after reading
• count : number of bytes to read
• read() returns an integer:
• -1 means error reading the file
• >= 0 : number of bytes that were actually read from the file. If return value is less
then the value in count (i.e. number of bytes to be read) then it is inferred that End
of File has reached
Difference between open and dup
Write Into a File
#include <sys/types.h>
#include <unistd.h>
ssize_t write(int fd, const void *buf, size_t count);
• Parameters:
• fd : file descriptor of the file which you want to write to
• buf : buffer from where the content will be written to the file
• count : number of bytes to write
• write() returns an integer:
• -1 means error writing the file
• >= 0 : number of bytes were actually written into the file which should be the same
as value in count. If return value is less then the value in count then you may have
encountered error like not sufficient disk space etc.
Close a File
#include <unistd.h>
int close(int fd);
• Parameters:
• fd : file descriptor of the file which you want to close
• close() returns an integer:
• -1 means error closing the file
• 0 : OK (i.e. file successfully closed)
• Don’t forget to close the file once you have finished using the file
otherwise you leave orphan file descriptors in the system. (Normally OS
will check all the open files at the time when program execution ends and
will closes them but we must close them in our program.
Difference between open and dup
#include <stdio.h>
int fflush(FILE *fp);
• All output stream are flushed if fp == NULL
Opening a Stream
• #include <stdio.h>
• FILE *fopen(const char *pathname, const char *type);
r O_RDONLY
r+ O_RDWR