0% found this document useful (0 votes)
77 views70 pages

Unit 6 File Management

Uploaded by

catstudysss
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
77 views70 pages

Unit 6 File Management

Uploaded by

catstudysss
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 70

Unit-6 File Management

Chapter Outcomes
• Explain structure of the given file system with example.
• Describe mechanism of the given file access method.
• Explain procedure to create and access directories and assign the given files access
permissions.
• Explain features of the given Raid level structure of hard disk.

Learning Objectives
• To understand Basic Concepts of File and File System
• To learn various File Access Methods and File Allocation Methods
• To study Directory and its different Structures
• To become familiar with Disk Structure
• To learn Concept of RAID and its Levels
File
• A file is a named collection of related information that is recorded on secondary storage
such as magnetic disks, magnetic tapes and optical disks.

• Commonly files represent programs (source and object forms) and data. Data files may be
numeric alphabetic, alphanumeric or binary.

• In general a file is a sequence of bits, bytes lines or records whose meaning is defined by
the file’s creator and user. The information in a file is defined by its creator.

• Many different types of information may be stored in a file: Source programs, Object
programs, Executable programs, Numeric Data, Text, Payroll records, Graphic Images,
Sound recording and so on.

2
File Attributes
• Name: The symbolic file name is the only information kept in human readable form.
• Type: This information is needed for those systems that support different types.
• Location: This information is a pointer to a device and to the location of the file on that
device.
• Size: The current size of the file (in bytes, words or blocks) and possibly the maximum
allowed size are included in this attribute.
• Protection: Access control information controls that who can do reading, writing, executing
and so on.
• Time, Date and User Identification: This information may be kept for creation, Last
modification and last use. These data can be useful for protection, security and usage
monitoring.
• Identifier: File system gives a unique tag or number that identifies file within file system
and which is used to refer files internally.
• Creator or Owner: A creator is a user or a person who has created that file and the owner
is a person who owns that file currently.
3
File Operations
• Creating a file: Two steps are necessary to create a file. First space in the file system must be
• found for the file. Second an entry for the new file must be made in the directory. The
directory entry records the name of the file and the location in the file system.

• Writing a file: To write a file, we make a system call specifying both the name of the file and
the information to be written to the file. Given the name of the file, the system searches the
• directory to find the location of file then the write pointer must be updated whenever a
write occurs

• Reading a file: To read from a file, we use a system call that specifies the name of the file
and where (in memory) the next block of the file should be put. System needs to keep a
read pointer to location in the file where the next read is to take place. Once the read has
taken place, the read pointer is updated.

4
File Operations
• Repositioning within a file: The directory is searched for the appropriate entry, and the
current file position is set to a given value. Repositioning within a file does not need to involve
any actual I/O. This file operation is also known as a file seeks.

• Deleting a file: To delete a file, we search the directory for the named file. Having found the
associated directory entry, we release all file space and erase the directory entry.

• Truncating a file: Instead of deleting a file and then recreate it, this function allows all
attributes to remain unchanged but for the file to be reset to length zero. User wants to erase
the contents of the file.

• Other common operations include appending new information to the end of an existing file,
and renaming an existing file.

5
File types
The file name is split into two parts a name of file and extension.

6
File Types

7
File Structure

8
File System Tree

9
Serial File
• The least complicated form of file organization is
the serial file or pile. Data are collected in the order
in which they arrive.

• The purpose of this file is simply accumulating the


mass of data and save it. Records may have
different fields, or similar fields in different orders.
Thus, each field should be self describing including
a field name as well as a value.

10
Serial File
Advantages of Serial File:
1. Simple organization.
2. Data usually stored prior to processing.
3. Less complexity and good efficiency for variable sized record.
4. Utilizes space very well for varying data structure.

Disadvantages of Serial File:


1. Because there is no structure to these file, record access is very difficult.
2. Required more searching time.
3. Records are not arrange in proper manner.

11
Sequential File Access
• The simplest access method is sequential access. Information in the file is processed in
order, one record after the other.
• The bulk of the operations on a file are reads and writes. The read and write operations
on the sequential file are done in sequential order.
• A read operation reads the next portion of the file and automatically advances a file
pointer, which tracks the I/O location. Similarly a write appends to the end of the file and
advances to the end of the newly written material.
• Such a file can be reset to the beginning and on some systems a program may be able to
skip forward or backward ‘n’ records for some integer ‘n’.

12
Sequential File Access
Advantages of Sequential File:
1. Easy to access the next record.
2. Data organization is very simple.
3. Absence of data structures.
4. They are easily stored on tapes as well as disks.
5. Automatic backup copy is created.

Disadvantages of Sequential File:


1. Wastage of memory space because of master file and transaction file.
2. For interactive applications that involve queries and/or updates of individual records, the
sequential file provides poor performance.
3. It is more time consuming since, reading, writing and searching always start from
beginning of file.

13
Index Sequential File
These additional methods generally involve the construction of an index for the file. The
index, like an index in the back of a book, contains pointers to the various blocks. To find an
entry in the file we first search the index and then use the pointer to directly access the file
and find the desired entry. From this search we would know exactly which block contains the
desired entry and access that block. This structure allows us to search a large file with very
little Input/Output.

With large files, the index file itself may become too large to be kept in memory. One
solution is then to create an index for the index file. The primary index file would contain
pointers to secondary index files, which then point to the actual data items.

14
Index Sequential File
Advantages:
• Variable length records are allowed.
• Indexed sequential file may be updated in sequential or random mode
• Very fast operation

Disadvantages:
• The major disadvantage of the index sequential file is that as the file grows, a
performance deteriorates rapidly because of overflows and consequently there arises the
need for periodic reorganization. Reorganization is an expensive process and the file
becomes unavailable during reorganization.
• When a new record is added to the main file, all of the index files must be updated.
• Consumes large memory space for maintaining index files.

15
Direct Access
For direct access the file is viewed as a number sequence of blocks of records. A block is
generally a fixed length quantity, defined by operating system. A block may be a byte 512
words, 1024 bytes or some other quantity, depending upon the system.
A direct access file allows arbitrary blocks to be read or written. Thus we may read block 14,
then read block 50 and then write block 7. There are no restrictions on the order of reading
or writing for a direct access file.

Direct access files are of great use for immediate access to large amounts of information.
When a query concerning a particular subject arrives, we compute which block contains the
answer and then read the block directly to provide the desired information. Not all the
operating system support both sequential and direct access of files. Some systems allow only
sequential file access, others allow only direct access.

16
Direct Access

Advantages of Direct File Access:


1. Using this method we can access any records randomly.
2. It gives fastest retrieval of records.

Disadvantages of Direct File Access:


1. Wastage of storage space, if hashing algorithm is not chosen properly.
2. This method is complex and expensive.

17
Hashing
• The basic idea of Hash addressing is that each record is placed in the database at a location
whose address (SRA (Stored Record Address)) may be computed as some function (called
as Hash function) of a value usually the primary key value.
• Thus, to store the record initially the system computes SRA and instructs the access
method to place the occurrence at that position and to retrieve the occurrence the DBMS
performs the same computation as before and then requests the access method to fetch
the occurrence at the computed position.
• We will take a simple hashing function is h (k) = k mod s.

18
Hashing

SRA = S No. Mod 13


The SRA = Stored record address is the remainder of the above operation example for
SNo 100, SRA = 9, next is 5, 1, 10.

19
Swapping
• Swapping is a mechanism in which a process can be swapped temporarily out of main
memory (or move) to secondary storage (disk) and make that memory available to other
processes. At some later time, the system swaps back the process from the secondary
storage to main memory.

• The Resident Monitor memory management scheme may seem of little use since it
appears to be inherently single user. These systems used a resident monitor with the
remainder of memory available to the currently executing user.

• When they switched to the next user, the current contents of user memory were written
out to a backing store (a disk or drum) and the memory of the next user was read in. This
scheme is called Swapping.

20
Swapping

21
Allocation methods
• From the user’s point of view, a file is an abstract data type. It can be created, opened,
written, read, closed and deleted without any real concern for its implementation. The
implementation of a file is a problem for the operating system.
• The main problem is how to allocate space to these files so that disk space is effectively
utilized and files can be quickly accessed.
• Three major methods of allocating disk space are in wide use:
1. Contiguous
2. Linked
3. Indexed
• Each method has its advantages and disadvantages.

22
Contiguous Allocation
• The contiguous allocation method requires each file to occupy a set of contiguous
addresses on the disk. Disk addresses define a linear ordering on the disk. Contiguous
allocation of a file is defined by the disk address of the first block and its length. If the file is
‘n’ blocks long and starts at location ‘b’, then it occupies blocks b, b+1, b+2, - - - - - b+n-1.
The directory entry for each file indicates the address of the starting block and the length
of the area allocated for this file.
• Contiguous allocation supports both sequential and direct access.
• For direct access to block ‘i’ of a file, which starts at block ‘b’, we can immediately access
block b+i. The difficulty with contiguous allocation is finding space for a new file.
• For direct access to block ‘i’ of a file, which starts at block ‘b’, we can immediately access
block b+i.
• The difficulty with contiguous allocation is finding space for a new file.
• If file to be created are ‘n’ blocks long, we must search free space list for ‘n’ free contiguous
blocks.
23
Contiguous Allocation

24
Contiguous Allocation
Advantages of Contiguous File Allocation Method:
1. Supports both sequential and direct access methods.
2. Contiguous allocation is the best form of allocation for sequential files. Multiple blocks can be
brought in at a time to improve I/O performance for sequential processing.
3. It is also easy to retrieve a single block from a file. For example, if a file starts at block ‘n’ and
the ith block of the file is wanted, its location on secondary storage is simply n + i.
4. Reading all blocks belonging to each file is very fast.
5. Provides good performance.

Disadvantages of Contiguous File Allocation Method:


1. Suffers from external fragmentation.
2. Very difficult to find contiguous blocks of space for new files.
3. Also with pre-allocation, it is necessary to declare the size of the file at the time of creation
which many a times is difficult to estimate.
4. Compaction may be required and it can be very expensive.
25
Linked File Allocation (Chained File Allocation)
With linked allocation, each file is a linked list of disk blocks; the disk blocks may be scattered any
where on the disk. The directory contains a pointer to the first (and last) blocks of the file. For
example a file of 5 blocks, which starts at block 9 might continue at block 16, then block 1, block
10, and finally block 25. Each block contains a pointer to the next block. These pointers are not
made available to the user, thus if sector is 512 words and a disk address (the pointer) requires
two words, then the user sees blocks of 510 words.

Creating a file is easy. We simply create a new entry in the device directory. A write to a file
removes first free block from free space list and write to it. This new block is then linked to the
end of the file. To read a file, we simply read block by following the pointers from block to block.
There is no external fragmentation with linked allocation

26
Linked File Allocation (Chained File Allocation)

27
Linked File Allocation (Chained File Allocation)
Advantages of Linked File Allocation Method:
1. Any free blocks can be added to a chain.
2. There is no external fragmentation.
3. Best suited for sequential files that are to be processed sequentially.
4. No need to know the size of the file in advance.
5. The disk address of first block can be used to locate the rest of the blocks.
6. Never necessary to defragment disk. Blocks are completely utilized here. So no disk
fragmentation.
7. No need to compact or relock files.
Disadvantages of Linked File Allocation Method:
1. There is no accommodation of the principle of locality that is series of accesses to different
parts of the disk are required.
2. Space is required for the pointers, 1.5% of disk is used for the pointers and not for
information. If a pointer is lost or damaged or bug occurs in operating system or disk hardware
failure occur, it may result in picking up the wrong pointer.
3. This method cannot support direct access. 28
Indexed File Allocation
• Linked allocation solves the external fragmentation and size declaration problems of
contiguous allocation. However linked allocation cannot support direct access, since the
blocks are scattered all over the disk.Mostly pointers to blocks are scattered all over the
disk.
• Indexed allocation solves this problem by bringing all of the pointers together into one
location the Index Block.
• Each file has its own index block, which is an array of disk block addresses. The ith entry in
the index block points to the ith block of the file. The directory contains the address of the
index block.
• To read the ith block we use pointer in ith index block entry to find and read the desired
block. When the file is created, all pointers in the index block are set to nil. When the ith
block is first written a lock is removed from the free space list and its address is put in the
ith index block entry.

29
Indexed File Allocation
Indexed allocation supports direct access, without suffering from external fragmentation.
Indexed allocation does suffer from wasted space. The pointer overhead of index block is
worse than pointer over head of linked allocation. Assume we have a file of only one or two
blocks with linked allocation we only lose the space of one pointer per block. With indexed
allocation an index block must be allocated even if only one or two pointers will be non-nil.

30
Indexed File Allocation

31
Indexed File Allocation
Advantages of Indexed File Allocation Method:
1. Does not suffer from external fragmentation.
2. Support both sequential and direct access to the file.
3. No need for user to know size of the file in advance.
4. Indexing of free space can be done by mean of the bit map.
5. Entire block is available for data as no space is occupied by pointers.

Disadvantages of Indexed File Allocation Method:


1. It required lot of space for keeping pointers so wasted space of memory.
2. Indexed allocation is more complex and time consuming.
3. Overhead of index blocks is not feasible for very small file.
4. Overhead of index blocks is not feasible for very big file also, because it is difficult to manage
levels of indices.
5. Keeping index in memory requires space.
32
Directory Structure
• Numbers of files are stored on the disk. To keep track
of files, file systems normally have directories or
folders, which in many systems are themselves files. To
manage all these files, we organized them in to
structure called directory structure.

• Directories are basically symbol tables of files. A single


flat directory can contain a list of all files in a system.

• The directory structure is the organization of files into a


hierarchy of folders.

• A directory can be defined as a way of grouping files


together.

33
Single Level Directory Structure
• It is the simplest form of directory system is having one directory containing all the files.
Sometimes, it is called the root directory.

• In single level directory structure, the entire files are contained in the same directory. So
unique name must be assigned to each file of the directory

• Single level directory structure was implemented in the older versions of single user
systems.

• The world’s first supercomputer, the CDC 6600, had only a single directory for all files,
even though it was used by many users at once.

34
Single Level Directory Structure

35
Single Level Directory Structure
Advantages of Single Level Directory Structure:
1. Single level directory structure is easy to implement and maintain.
2. It is simple directory structure.
3. Single level directory structure, the operations like creation, searching, deletion, updating are
very easy and faster.

Disadvantages of Single Level Directory Structure:


1. It having only one directory in a system so there may chance of name collision because two
files cannot have the same name.
2. In single level directory structure, difficult to keep track of the files, if the number of files
increases.
3. This directory is not used on multi-user systems but could be used on a small embedded
system.
4. The files such as graphics, text etc. are inconvenient for this data structure.
5. The MS-DOS operating system allows only 11-character file names; UNIX, in contrast, allows
255 characters. 36
Two Level Directory Structure
• The structure of two level directory structure is divided into two levels of directories
namely, a master directory and user directories. The user directories are the sub-
directories of the master directory.

• In two level directory structure, a separate directory is provided to each user and all
these directories are contained and indexed in the master directory.

• The user directory represents a list of files of a specific user. In this directory structure,
each users has its private directory known as User File Directory (UFD).

• The user directories themselves must be created and deleted as necessary. A special
system program is run with the appropriate user name and account information. The
program creates a new UFD and adds an entry for it to the Master File Directory (MFD).
Thus the two level directories solve the name collision problem.
37
Two Level Directory Structure

38
Two Level Directory Structure
Advantages of Two Level Directory Structure:
1. It solves the file name collision problem by creating own user directory.
2. This method isolates one user from another and protects user’s files.
3. Different users may have files with same name.

Disadvantages of Two level directory


1. Still it not very scalable, two files of the same type cannot be grouped together in the
same user.
2. Sharing of files by different users is difficult.

39
Hierarchical Directory Structure (Tree Structure)
• The two level hierarchies eliminate name conflicts among users but are not satisfactory for
users with a large number of files. We needed general hierarchy i.e., a tree of directories.
• The tree structured directory structure, allows users to create their own subdirectory and
to organize their files accordingly. A subdirectory contains a set of files or subdirectories. A
directory is simply another file, but it is treated in a special way.

40
Hierarchical Directory Structure (Tree Structure)

41
Hierarchical Directory Structure (Tree Structure)
Advantages of Tree Structured Directory:
1. User can create directory as well as subdirectory.
2. Users can be provided access to a sub directory rather than the entire directory.
3. It provides a better structure to file system.
4. Managing millions of files is easy with tree structured directory.

Disadvantages of Tree Structured Directory:


1. The tree structure can create duplicate copies of the files.
2. The users could not share files or directories.
3. It is inefficient, because accessing a file may go under multiple directories.
4. Search time may become unnecessarily long.

42
Disk management in Linux

43
Disk Management in Windows OS

44
Physical Disk Structure

45
Moving Head Disk mechanism

46
Structure of Hard Disk

47
Measurement of speed of the disk
• Transfer Rate is the rate at which the data moves from disk to the computer.

• Random Access Time is the sum of the seek time and rotational latency.

• The seek time is the time for the disk arm to move the head to the required
cylinder containing the desired track.

• The rotational latency is the additional time for the disk to rotate the desired
sector to the disk head.

• The disk bandwidth is the total number of bytes transferred, divided by the total
time between the first request for service and the completion of the last transfer.

48
Logical Structure of Hard Disk
We can divide the logical
structure of the hard disk in the
following five logical terms:
1. MBR (Master Boot Record).
2. DBR (DOS Boot Record).
3. FAT (File Allocation Tables).
4. Root Directory.
5. Data Area.

49
Logical Structure of Hard Disk
Master Boot Record:
• It contains a small program to load and start the active partition from the hard disk.
• The MBR is created on the hard disk drive by executing FDISK.EXE command of DOS.
• It is located at absolute sector 0 or we can say at cylinder 0, head 0 and sector 1(The
MBR).
• If we have more than one partition, then there are Extended Master Boot Records,
located at the beginning of each extended partition volume.

(DBR) Dos Boot Record:


• DOS Boot Record(DBR) or sometimes called DOS Boot Sector is the second most
important information on your hard disk.
• It contains some important information about disk geometry like: Bytes Per Sector,
Sectors per cluster, Reserved Sectors etc.
• The DBR is created by the FORMAT command of DOS.
• All DOS partitions contain the program code to boot the machine, but only that
partition is given control by the MBR which is specified as active partition. 50
Logical Structure of Hard Disk
FAT (File Allocation Table)
• It was developed to fulfil the requirements of a fast and flexible system for managing
data on both removable and fixed media.
• FAT keeps a map of the complete surface of the disk drive such that, which area is
free, which area is taken up by which file etc. When some data stored on the disk is
to be accessed, the DOS consults the FAT to find out the areas of the hard disk which
contains the data.
• The FAT manages the disk area in a group of sectors called “CLUSTER”.

Root Directory:
• The Root Directory is like a table of contents for the information stored on the hard
disk drive. The directory area keeps the information about the file name, date and
time of the file creation, file attribute, file size and starting cluster of the particular
file.
• The number of files that one can store on the root directory depends on the FAT
type being used. 51
Logical Structure of Hard Disk
Data Area OR Files Area:
• The remainder of the volume after Root Directory is the Data Area.
• The data area contains the actual data stored on the disk surfaces.
• When we format a hard disk the FORMAT command of DOS does not
destroy or overwrite the data on the data area. The FORMAT command
only removes the directory entry and FAT entries and it does not touch
the actual data area. This makes the recovery of accidentally formatted
hard disk drive possible.

52
RAID (Redundant Array of Independent Disks)
• RAID (Redundant Array of Independent Disks originally Redundant Array of Inexpensive
Disks) is a way of storing the same data in different places on multiple hard disks to protect
data in the case of a drive failure.

• RAID organizes multiple disks into a large, high-performance logical disk. In other words, if
you have three hard drives, you can configure them to look like one large drive.

• RAID is a set of physical disk drives, and the operating system views it as a single logical
drive. Data are distributed across physical drives in a way that enables simultaneous access
to data from multiple drives.

• RAID is a data storage virtualization technology that combines multiple physical disk drive
components into a single logical unit for the purposes of data redundancy, performance
improvement, or both.

53
RAID Levels

54
RAID-0 (Striped Disk Array without fault Tolerance)
• Simple striping is used in this level to gain in performance.
• This level does not offer any redundancy.
• Data is broken into stripes of user-defined size and written to a different drive in the array.
• Minimum of two disks are required. It uses 100% of the storage capacity since no
redundant information is written.
• Web servers, graphics design, audio and video editing, and online gaming are some
example applications that might benefit from this level.

55
RAID-0 (Striped Disk Array without fault Tolerance)

56
RAID-1 (Mirroring and Duplexing)
• This level performs mirroring of data in drive 1 to drive 2. It offers 100% redundancy as
array will continue to work even if either disk fails.
• This level uses mirroring and data is duplicated on two drives.
• If either fails, the other continues to function until the failed drive is replaced.
• A minimum of 2 drives is required.

57
RAID-1 (Mirroring and Duplexing)

58
RAID-2 (Hamming Code Error Correcting Code)
• This level uses bit-level data stripping rather than block level.
• To be able to use RAID 2 make sure the disk selected has no self-disk error checking
mechanism as this level uses external Hamming code for error detection.
• This is one of the reason RAID is not in the existence in real IT world as most of the disks
used these days come with self-error detection. It uses an extra disk for storing all the
parity information

59
RAID-2 (Hamming Code Error Correcting Code)

60
RAID-3 (Parallel Transfer with parity)
• In RAID 3, the data block is striped and written on the data disks. This requires a
minimum of three drives to implement.
• This level uses byte level stripping along with parity. One dedicated drive is used to store
the parity information and in case of any drive failure the parity is restored using this
extra drive.
• But in case the parity drive crashes then the redundancy gets affected again so not
much considered in organizations.

61
RAID-3 (Parallel Transfer with parity)

62
RAID-4 (Independent Data Disks with shared parity disk)
• This level is very much similar to RAID 3 apart from the feature where RAID 4 uses block
level stripping rather than byte level.
• In a RAID-4 system, if any one of the disks fails, the data on the remaining disks can be
used to reconstruct the data that was on the failed disk. Even if the parity disk fails, the
other disks are still intact. Thus RAID-4 can survive the failure of any of its disks.

63
RAID-4 (Independent Data Disks with shared parity disk)

64
RAID-5 (Independent Data Disks with Distributed parity blocks)
• Parity information is written to a different disk in the array for each stripe. In case of single disk failure
data can be recovered with the help of distributed parity without affecting the operation and other read
write operations.
• One of the most popular RAID techniques, it uses Block Striping of data along with parity and writes
them to all drives. RAID-5 systems require a minimum of 3 disks.
• If anyone drive fails, the array is said to be degraded, and the data blocks residing on that drive can be
derived from parity and data on remainder of the drives.

65
RAID-5 (Independent Data Disks with Distributed parity blocks)

66
RAID-6 (Independent Data Disks with Two independent parity schemes)
• This level is an enhanced version of RAID 5 adding extra benefit of dual parity. This level uses block
level stripping with DUAL distributed parity. So now you can get extra redundancy.
• The advantages of RAID-6 becomes even more pronounced as the capacity of SATA drives go up and
rebuilds take longer to finish.
• RAID 6 requires a minimum of four drives to be implemented and the usable capacity is always 2 less
than the number of available disk drives in the RAID set. Applications suited for this level are the
same as those of level 5.

67
RAID-6 (Independent Data Disks with Two independent parity schemes)

68
Thank You

Vijay Patil
Department of Computer Engineering (NBA Accredited)
Vidyalankar Polytechnic
Vidyalankar College Marg, Wadala(E), Mumbai 400 037
E-mail: [email protected] 69
Copy protected with Online-PDF-No-Copy.com 70

You might also like