Module 5 OS
Module 5 OS
File Management
File management in operating system is formally defined as manipulating files in a computer
system, which includes creating, modifying, and deleting files. Therefore, file management is
one of the simple but crucial features offered by the operating system. The operating system’s
file management function entails software that handles or maintains files (binary, text, PDF, docs,
audio, video, etc.) included in computer software.
The operating system’s file system can manage single and group files in a computer system. The
operating system’s file management manages all of the files on the computer system with
different extensions(such as .exe, .pdf, .txt, .docx, etc.).
File System
A file system is a method an operating system uses to store, organize, and manage files and
directories on a storage device. Some common types of file systems include:
* FAT (File Allocation Table): An older file system used by older versions of Windows and
other operating systems.
* NTFS (New Technology File System): A modern file system used by Windows. It
supports features such as file and folder permissions, compression, and encryption.
* ext (Extended File System): A file system commonly used on Linux and Unix-based
operating systems.
* HFS (Hierarchical File System): A file system used by macOS.
* APFS (Apple File System): A new file system introduced by Apple for their Macs and
iOS devices.
* Compatibility issues: Different file systems may not be compatible with each other,
making it difficult to transfer data between different operating systems.
* Disk space overhead: File systems may use some disk space to store metadata and other
overhead information, reducing the amount of space available for user data.
* Vulnerability: File systems can be vulnerable to data corruption, malware, and other
security threats, which can compromise the stability and security of the system.
The name of the file is divided into two parts as shown below:
* name
* extension, separated by a period.
FILE DIRECTORIES:
Collection of files is a file directory. The directory contains information about the files, including
attributes, location and ownership. Much of this information, especially that is concerned with
storage, is managed by the operating system. The directory is itself a file, accessible by various
file management routines.
* Name
* Type
* Address
* Current length
* Maximum length
* Date last accessed
* Date last updated
* Owner id
* Protection information
SINGLE-LEVEL DIRECTORY
* Naming problem: Users cannot have same name for two files.
* Grouping problem: Users cannot group files according to their need.
TWO-LEVEL DIRECTORY
* Path name:Due to two levels there is a path name for every file to locate that file.
* Now,we can have same file name for different user.
* Searching is efficient in this method.
TREE-STRUCTURED DIRECTORY :
Directory is maintained in the form of a tree. Searching is efficient and also there is grouping
capability. We have absolute or relative path name for a file.
FILE ALLOCATION METHODS :
1. Continuous Allocation –
A single continuous set of blocks is allocated to a file at the time of file creation. Thus, this is a
pre-allocation strategy, using variable size portions. The file allocation table needs just a single
entry for each file, showing the starting block and the length of the file. This method is best from
the point of view of the individual sequential file. Multiple blocks can be read in at a time to
improve I/O performance for sequential processing. It is also easy to retrieve a single block. For
example, if a file starts at block b, and the ith block of the file is wanted, its location on
secondary storage is simply b+i-1.
Disadvantage –
* External fragmentation will occur, making it difficult to find contiguous blocks of space
of sufficient length. Compaction algorithm will be necessary to free up additional space
on disk.
* Also, with pre-allocation, it is necessary to declare the size of the file at the time of
creation.
Allocation is on an individual block basis. Each block contains a pointer to the next block in the
chain. Again the file table needs just a single entry for each file, showing the starting block and
the length of the file. Although pre-allocation is possible, it is more common simply to allocate
blocks as needed. Any free block can be added to the chain. The blocks need not be continuous.
Increase in file size is always possible if free disk block is available. There is no external
fragmentation because only one block at a time is needed but there can be internal fragmentation
but it exists only in the last disk block of file.
Disadvantage –
3. Indexed Allocation –
It addresses many of the problems of contiguous and chained allocation. In this case, the file
allocation table contains a separate one-level index for each file: The index has one entry for
each block allocated to the file. Allocation may be on the basis of fixed-size blocks or
variable-sized blocks. Allocation by blocks eliminates external fragmentation, whereas allocation
by variable-size blocks improves locality. This allocation technique supports both sequential and
direct access to the file and thus is the most popular form of file allocation.
Disk Free Space Management :
Just as the space that is allocated to files must be managed ,so the space that is not currently
allocated to any file must be managed. To perform any of the file allocation techniques,it is
necessary to know what blocks on the disk are available. Thus we need a disk allocation table in
addition to a file allocation table.The following are the approaches used for free space
management.
* Bit Tables : This method uses a vector containing one bit for each block on the disk.
Each entry for a 0 corresponds to a free block and each 1 corresponds to a block in use.
In this vector every bit correspond to a particular block and 0 implies that, that particular block is
free and 1 implies that the block is already occupied. A bit table has the advantage that it is
relatively easy to find one or a contiguous group of free blocks. Thus, a bit table works well with
any of the file allocation methods. Another advantage is that it is as small as possible.
* Free Block List : In this method, each block is assigned a number sequentially and the list
of the numbers of all free blocks is maintained in a reserved block of the disk.
* File Structure
We have seen various data structures in which the file can be stored. The task of the file system
is to maintain an optimal file structure.
Whenever a file gets deleted from the hard disk, there is a free space created in the disk. There
can be many such spaces which need to be recovered in order to reallocate them to other files.
The major concern about the file is deciding where to store the files on the hard disk. There are
various disks scheduling algorithm which will be covered later in this tutorial.
A File may or may not be stored within only one block. It can be stored in the non contiguous
blocks on the disk. We need to keep track of all the blocks on which the part of the files reside.
File concepts
What is the file?
The file can be explained as the smallest unit of storage on a computer system. The user can
perform file operations like open, close, read, write, and modify.
File concept
The operating system can provide a logical view of the information stored in the disks, this
logical unit is known as a file. The information stored in files is not lost during power failures.
A file helps to write data on the computer. It is a sequence of bits, bytes, or records, the structure
of which is defined by the owner and depends on the type of the file.
Computer store information in storage media such as disk, tape drives, and optical disks. The
operating system provides a logical view of the information stored in the disk. This logical
storage unit is a file.
The information stored in files are non-volatile, means they are not lost during power failures. A
file is named collection of related information that is stored on physical storage.
Data cannot be written to a computer unless it is written to a file. A file, in general, is a sequence
of bits, bytes, lines, or records defined by its owner or creator. The file has a structure defined by
its owner or creator and depends on the file type.
A file has a single e editable name given by its creator. The name never changes unless a user has
necessary permission to change the file name. The names are given because it is humanly
understandable.
The properties of a file can be different on many systems, however, some of them are common:
File Operations
The operating system performs the following file operations using various system calls:
1. Create
2. Read
3. Write
4. Delete
5. Reposition
6. Truncate files
Create Files – User must have necessary disk space on the file system to create a file. A
directory entry is required where the file is created.
Read Files – The system call requires file name and next block in memory to be read. The
system needs a read pointer to read the file from a specific location in the file and this pointer is
updated for the next read from the file.
Write Files – The system call uses same file pointers of the process to write to a file. This saves
space and reduces complexity.
Deleting a file – We look into the directory for the file name, if a file is found, release the space
occupied by the file and remove directory entries for the deleted file.
Truncating a file – Sometimes the user does not want to delete a file, but remove some
information from it. This will change file length attribute, however, other attributes remain
unchanged.
There are other file operations such asappending a file,renaming a file, create a duplicate copy of
the file.
Open-File Table
The file operations require searching a directory every time. To avoid frequent searches, the OS
allows a system call – open() and keeps an.open file table When the file operation is requested,
the system refers to the file via an index value. The file is not used actively, the process closes
the file and the OS remove its entry from the open file table.
The open() system call can take mode information such as read-only, write, append, create, and
so on. The mode is checked against the file permission and it allowed file is open for the process.
The open call returns a pointer to the entry in the open-file table and the pointer, not the file is
used for all I/O operations.
Several different files may open the same file, therefore, the system maintains two file-tables –
per-process table and system-wide table. The per-process table contains information about files
opened by the file.
For example, current file-pointer for each file, access rights, and accounting information for the
files.
Each entry in the process file-table points to a system-wide table, which contains process
independent information such as the location of the file on the disk, access dates, and file size.
When a process opens a file, an entry is added to the processor open file table of the process and
points to the entry in the system-wide table.
The open file table keeps an open count indicating the number of processes that have opened the
file. The close operation decreases the count and when it reaches 0 the file entry is removed from
the open-file table.
File pointer – It helps in read() and write() operation. It is unique to each process.
File Open count – When multiple processes open the same file, therefore, file open-count keep
track of open files and closed files and when the count reaches 0 and process can remove the
entry from the open-table.
Disk location of the file – To modify data the path to the file is kept in memory. The system
does not have to read it for each operation.
Access rights – The process opens the file in access mode and information regarding the access
rights is in the per-process table, based on which OS can reject or accept the I/O request.
Apart from the above, some OS provides ways to lock files. This is true in the case of shared
files where only one process can write to the file while other processes read the information on
the file.
Access methods
When a file is used, information is read and accessed into computer memory and there are
several ways to access this information of the file. Some systems provide only one access
method for files. Other systems, such as those of IBM, support many access methods, and
choosing the right one for a particular application is a major design problem.
There are three ways to access a file into a computer system: Sequential-Access, Direct Access,
Index sequential Method.
Sequential Access
Most of the operating systems access the file sequentially. In other words, we can say that most
of the files need to be accessed sequentially by the operating system.
In sequential access, the OS read the file word by word. A pointer is maintained which initially
points to the base address of the file. If the user wants to read first word of the file then the
pointer provides that word to the user and increases its value by 1 word. This process continues
till the end of the file.
Modern word systems do provide the concept of direct access and indexed access but the most
used method is sequential access due to the fact that most of the files such as text files, audio
files, video files, etc need to be sequentially accessed.
Advantages of Sequential Access Method :
* It is simple to implement this file access mechanism.
* It uses lexicographic order to quickly access the next entry.
* It is suitable for applications that require access to all records in a file, in a specific order.
* It is less prone to data corruption as the data is written sequentially and not randomly.
* It is a more efficient method for reading large files, as it only reads the required data and does
not waste time reading unnecessary data.
* It is a reliable method for backup and restore operations, as the data is stored sequentially and
can be easily restored if required.
Disadvantages of Sequential Access Method :
* If the file record that needs to be accessed next is not present next to the current record, this
type of file access method is slow.
* Moving a sizable chunk of the file may be necessary to insert a new record.
* It does not allow for quick access to specific records in the file. The entire file must be
searched sequentially to find a specific record, which can be time-consuming.
* It is not well-suited for applications that require frequent updates or modifications to the file.
Updating or inserting a record in the middle of a large file can be a slow and cumbersome
process.
* Sequential access can also result in wasted storage space if records are of varying lengths.
The space between records cannot be used by other records, which can result in inefficient
use of storage.
Direct Access
The Direct Access is mostly required in the case of database systems. In most of the cases, we
need filtered information from the database. The sequential access can be very slow and
inefficient in such cases.
Suppose every block of the storage stores 4 records and we know that the record we needed is
stored in 10th block. In that case, the sequential access will not be implemented because it will
traverse all the blocks in order to access the needed record.
Direct access will give the required result despite of the fact that the operating system has to
perform some complex tasks such as determining the desired block number. However, that is
generally implemented in database applications.
Indexed Access
If a file can be sorted on any of the filed then an index can be assigned to a group of certain
records. However, A particular record can be accessed by its index. The index is nothing but the
address of a record in the file.
In index accessing, searching in a large database became very quick and easy but we need to
have some extra space in the memory to store the index value.
4.Relative Record Access –
Relative record access is a file access method used in operating systems where records are
accessed relative to the current position of the file pointer. In this method, records are located
based on their position relative to the current record, rather than by a specific address or key
value.
Key Points of Relative Record Access:
Relative record access is a random access method that allows records to be accessed based on
their position relative to the current record.
This method is efficient for accessing individual records but may not be suitable for files that
require frequent updates or random access to specific records.
Relative record access requires fixed-length records and may not be flexible enough for some
applications.
This method is useful for processing records in a specific order or for files that are accessed
sequentially.
Advantages of Relative Record Access:
Random Access: Relative record access allows random access to records in a file. The system
can access any record at a specific offset from the current position of the file pointer.
Efficient Retrieval: Since the system only needs to read the current record and any records that
need to be skipped, relative record access is more efficient than sequential access for accessing
individual records.
Useful for Sequential Processing: Relative record access is useful for processing records in a
specific order. For example, if the records are sorted in a specific order, the system can access the
next or previous record relative to the current position of the file pointer.
Disadvantages of Relative Record Access:
Fixed Record Length: Relative record access requires fixed-length records. If the records are of
varying length, it may be necessary to use padding to ensure that each record is the same length.
Limited Flexibility: Relative record access is not very flexible. It is difficult to insert or delete
records in the middle of a file without disrupting the relative positions of other records.
Limited Application: Relative record access is best suited for files that are accessed sequentially
or with some regularity, but it may not be appropriate for files that are frequently updated or
require random access to specific records.
5.Content Addressable Access-
Content-addressable access (CAA) is a file access method used in operating systems that allows
records or blocks to be accessed based on their content rather than their address. In this method,
a hash function is used to calculate a unique key for each record or block, and the system can
access any record or block by specifying its key.
Keys in Content-Addressable Access:
Unique: Each record or block has a unique key that is generated using a hash function.
Calculated based on content: The key is calculated based on the content of the record or block,
rather than its location or address.
Advantages of Content-Addressable Access:
Efficient Search: CAA is ideal for searching large databases or file systems because it allows for
efficient searching based on the content of the records or blocks.
Flexibility: CAA is more flexible than other access methods because it allows for easy insertion
and deletion of records or blocks.
Data Integrity: CAA ensures data integrity because each record or block has a unique key that is
generated based on its content.
Disadvantages of Content-Addressable Access:
Overhead: CAA requires additional overhead because the hash function must be calculated for
each record or block.
Collision: There is a possibility of collision where two records or blocks can have the same key.
This can be minimized by using a good hash function, but it cannot be completely eliminated.
Limited Key Space: The key space is limited by the size of the hash function used, which can
lead to collisions and other issues.
Key Points of Content-Addressable Access:
Content-addressable access is a file access method that allows records or blocks to be accessed
based on their content rather than their address.
CAA uses a hash function to generate a unique key for each record or block.
CAA is efficient for searching large databases or file systems and is more flexible than other
access methods.
CAA requires additional overhead for calculating the hash function and may have collisions or
limited key space.
Directory structure
Directory can be defined as the listing of the related files on the disk. The directory may store
some or the entire file attributes.
To get the benefit of different file systems on the different operating systems, A hard disk can be
divided into the number of partitions of different sizes. The partitions are also called volumes or
mini disks.
Each partition must have at least one directory in which, all the files of the partition can be listed.
A directory entry is maintained for each file in the directory which stores all the information
related to that file.
A directory is a container that is used to contain folders and files. It organizes files and folders in
a hierarchical manner.
There are several logical structures of a directory, these are given below.
* Single-level directory –
The single-level directory is the simplest directory structure. In it, all files are contained in
the same directory which makes it easy to support and understand.
A single level directory has a significant limitation, however, when the number of files
increases or when the system has more than one user. Since all the files are in the same
directory, they must have a unique name. if two users call their dataset test, then the unique
name rule violated.
Advantages:
* Since it is a single directory, so its implementation is very easy.
* If the files are smaller in size, searching will become faster.
* The operations like file creation, searching, deletion, updating are very easy in such a
directory structure.
* Logical Organization: Directory structures help to logically organize files and directories in
a hierarchical structure. This provides an easy way to navigate and manage files, making it
easier for users to access the data they need.
* Increased Efficiency: Directory structures can increase the efficiency of the file system by
reducing the time required to search for files. This is because directory structures are
optimized for fast file access, allowing users to quickly locate the file they need.
* Improved Security: Directory structures can provide better security for files by allowing
access to be restricted at the directory level. This helps to prevent unauthorized access to
sensitive data and ensures that important files are protected.
* Facilitates Backup and Recovery: Directory structures make it easier to backup and recover
files in the event of a system failure or data loss. By storing related files in the same
directory, it is easier to locate and backup all the files that need to be protected.
* Scalability: Directory structures are scalable, making it easy to add new directories and files
as needed. This helps to accommodate growth in the system and makes it easier to manage
large amounts of data.
Disadvantages:
* There may chance of name collision because two files can have the same name.
* Searching will become time taking if the directory is large.
* This can not group the same type of files together.
Two-level directory –
As we have seen, a single level directory often leads to confusion of files names among
different users. the solution to this problem is to create a separate directory for each user.
In the two-level directory structure, each user has their own user files directory (UFD). The
UFDs have similar structures, but each lists only the files of a single user. system’s master
file directory (MFD) is searched whenever a new user id is Correct.
This method helps the operating system traverse through the directory structure and switch
among file systems as appropriate.
A system may either allow the same file system to be mounted repeatedly on different mount
points or it may allow one mount per file system.
For example, the Macintosh operating system. In this whenever the system encounters a disk for
the first time, it searches for the file system in the disk, and if it finds one it automatically mounts
the system at the root level and adds a folder icon on the screen labelled as the name of the file
system. The Microsoft OS maintains a two-level directory structure.
Name
mount
mount a filesystem
Mount
The mount command mounts a storage device or filesystem, making it accessible and attaching it
to an existing directory structure.
Unmount
The umount command “unmounts” a mounted filesystem, informing the system to complete any
pending read or write operations, and safely detaching it.
Mount Syntax
mount -t type device dir
Mount Command
mount -t ext4
Mount Options
-a
Mount all the file systems listed in /etc/fstab
-d
Do everything except for the actual mount system call. This option is useful in conjunction with
the -v flag to determine what mount(8) is actually trying to do.
-f
Force the mount of an unclean file system (dangerous), or the revocation of write access when
downgrading a file system’s mount status from read-write to read-only.
-r
Mount the file system read-only. This is identical to using -o ro.
-u
Update mount options on the file system.
-v
Be verbose.
-w
Mount the file system read-write.
* Direct File Transfer − Direct file transfer involves the transfer of files between two
devices through a direct connection such as Bluetooth or Wi-Fi Direct. Direct file transfer
is commonly used for sharing files between mobile devices or laptops.
* Removable Media File Sharing − Removable media file sharing involves the use of
physical storage devices such as USB drives or external hard drives. Users can copy files
onto the device and share them with others by physically passing the device to them.
Each type of file sharing method comes with its own set of risks and challenges. Peer-to-peer
file sharing can expose users to malware and viruses, while cloud-based file sharing can lead to
data breaches if security measures are not implemented properly. Direct file transfer and
removable media file sharing can also lead to data breaches if devices are lost or stolen.
To protect against these risks, users should take precautions such as using encryption, password
protection, secure file transfer protocols, and regularly updating antivirus and antimalware
software. It is also essential to educate users on safe file sharing practices and limit access to
files only to authorized individuals or groups. By taking these steps, users can ensure that their
files remain secure and protected during file sharing.
Risks of File Sharing
File sharing is a convenient and efficient way to share information and collaborate on projects.
However, it comes with several risks and challenges that can compromise the confidentiality,
integrity, and availability of files. In this section, we will explore some of the most significant
risks of file sharing.
* Malware and Viruses − One of the most significant risks of file sharing is the spread of
malware and viruses. Files obtained from untrusted sources, such as peer-to-peer (P2P)
networks, can contain malware that can infect the user's device and compromise the
security of their files. Malware and viruses can cause damage to the user's device, steal
personal information, or even use their device for illegal activities without their
knowledge.
* Data Breaches and Leaks − Another significant risk of file sharing is the possibility of
data breaches and leaks. Cloud-based file sharing services and P2P networks are
particularly vulnerable to data breaches if security measures are not implemented
properly. Data breaches can result in the loss of sensitive information, such as personal
data or intellectual property, which can have severe consequences for both individuals and
organizations.
* Legal Consequences − File sharing copyrighted material without permission can lead to
legal consequences. Sharing copyrighted music, movies, or software can result in
copyright infringement lawsuits and hefty fines.
* Identity Theft − File sharing can also expose users to identity theft. Personal
information, such as login credentials or social security numbers, can be inadvertently
shared through file sharing if security measures are not implemented properly.
Cybercriminals can use this information to commit identity theft, which can have severe
consequences for the victim.
To protect against these risks, users should take precautions such as using trusted sources for file
sharing, limiting access to files, educating users on safe file sharing practices, and regularly
updating antivirus and anti-malware software. By taking these steps, users can reduce the risk of
malware and viruses, data breaches and leaks, legal consequences, and identity theft during file
sharing.
File Sharing Protection Measures
* Encryption − Encryption is the process of converting data into a coded language that can
only be accessed by authorized users with a decryption key. This can help protect files
from unauthorized access and ensure that data remains confidential even if it is
intercepted during file sharing.
* Password protection − Password protection involves securing files with a password that
must be entered before the file can be accessed. This can help prevent unauthorized access
to files and ensure that only authorized users can view or modify the files.
* Secure file transfer protocols − Secure file transfer protocols, such as SFTP (Secure
File Transfer Protocol) and HTTPS (Hypertext Transfer Protocol Secure), provide a
secure way to transfer files over the internet. These protocols use encryption and other
security measures to protect files from interception and unauthorized access during
transfer.
* Firewall protection − Firewall protection involves using a firewall to monitor and
control network traffic to prevent unauthorized access to the user's device or network.
Firewalls can also be configured to block specific file sharing protocols or limit access to
certain users or devices, providing an additional layer of protection for shared files.
Best Practices for Secure File Sharing
* Use trusted sources for file sharing − To reduce the risk of downloading malware or
viruses, it is essential to use trusted sources for file sharing. Users should only download
files from reputable sources and avoid downloading files from unknown or suspicious
websites.
* Limit access to files − To minimize the risk of data breaches or leaks, users should limit
access to files only to authorized individuals or groups. This can be done by using
password protection, encryption, and other access control measures.
* Educate users on safe file sharing practices − Educating users on safe file sharing
practices can help reduce the risk of security incidents. Users should be trained on how to
identify and avoid phishing scams, how to recognize suspicious files or emails, and how
to securely share files.
* Regularly update antivirus and anti-malware software − To ensure maximum
protection against malware and viruses, it is essential to regularly update antivirus and
anti-malware software. This will help to identify and remove any potential threats to the
user's device or network.
1. MULTIPLE USERS:
* When an operating system accommodates multiple users, the issues of file sharing, file
naming and file protection become preeminent.
* The system either can allow user to access the file of other users by default, or it may
require that a user specifically grant access to the files.
* To implementing sharing and protection, the system must maintain more file and
directory attributes than a on a single-user system.
* The owner is the user who may change attributes, grand access, and has the most control
over the file or directory.
* The group attribute of a file is used to define a subset of users who may share access to
the file.
* Most systems implement owner attributes by managing a list of user names and
associated user identifiers (user Ids).
* When a user logs in to the system, the authentication stage determines the appropriate
* user ID for the user. That user ID is associated with all of user’s processes and threads.
When they need to be user readable, they are translated, back to the user name via the
user name list.
* Likewise, group functionality can be implemented as a system wide list of group names
and group identifiers.
Every user can be in one or more groups, depending upon operating system design
decisions. The user’s group Ids is also included in every associated process and thread.
· Networking allows the sharing or resource spread within a campus or even around the
world. User manually transfer files between machines via programs like ftp.
· A distributed file system (DFS) in which remote directories is visible from the local
machine.
· The World Wide Web: A browser is needed to gain access to the remote file and
separate operations (essentially a wrapper for ftp) are used to transfer files.
a) The client-server Model:
Remote file systems allow a computer to a mount one or more file systems from one
or more remote machines.
• A server can serve multiple clients, and a client can use multiple servers,
depending on the implementation details of a given client –server facility.
• Client identification is more difficult. Clients can be specified by their network name
or other identifier, such as IP address, but these can be spoofed (or imitate). An
unauthorized client can spoof the server into deciding that it is authorized, and the
unauthorized client could be allowed access.
b) Distributed Information systems:
· Distributed information systems, also known as distributed naming service, have been
devised to provide a unified access to the information needed for remote computing.
c) Failure Modes:
· Redundant arrays of inexpensive disks (RAID) can prevent the loss of a disk from
resulting in the loss of data.
Remote file system has more failure modes. By nature of the complexity of networking system
and the required interactions between remote machines, many more problems can interfere with
the proper operation of remote file systems.
d) Consistency Semantics:
o These semantics should specify when modifications of data by one user are
observable by other users.
o The semantics are typically implemented as code with the file system.
o A series of file accesses (that is reads and writes) attempted by a user to the same
file is always enclosed between the open and close operations.
o The series of access between the open and close operations is a file session.
(i) UNIX Semantics:
1. Writes to an open file by a user are visible immediately to other users that have this
file open at the same time.
2. One mode of sharing allows users to share the pointer of current location into the file.
Thus, the advancing of the pointer by one user affects all sharing users.
(ii) Session Semantics:
The Andrew file system (AFS) uses the following consistency semantics:
1. Writes to an open file by a user are not visible immediately to other users that have
the same file open simultaneously.
2. Once a file is closed, the changes made to it are visible only in sessions starting later.
Already open instances of the file do not reflect this change.
(iii) Immutable –shared File Semantics:
o Its name may not be reused and its contents may not be altered.
* I/O Control level – Device drivers acts as interface between devices and OS, they help to
transfer data between disk and main memory. It takes block number as input and as output it
gives low level hardware specific instruction.
* Basic file system – It Issues general commands to device driver to read and write physical
blocks on disk.It manages the memory buffers and caches. A block in buffer can hold the
contents of the disk block and cache stores frequently used file system metadata.
* File organization Module – It has information about files, location of files and their logical
and physical blocks.Physical blocks do not match with logical numbers of logical block
numbered from 0 to N. It also has a free space which tracks unallocated blocks.
* Logical file system – It manages metadata information about a file i.e includes all details
about a file except the actual contents of file. It also maintains via file control blocks. File
control block (FCB) has information about a file – owner, size, permissions, location of file
contents.
Advantages :
1. Duplication of code is minimized.
2. Each file system can have its own logical file system.
3. File system implementation in an operating system provides several advantages, including:
4. Efficient Data Storage: File system implementation ensures efficient data storage on a
physical storage device. It provides a structured way of organizing files and directories,
which makes it easy to find and access files.
5. Data Security: File system implementation includes features for managing file security and
permissions. This ensures that sensitive data is protected from unauthorized access.
6. Data Recovery: The file system implementation includes features for recovering from system
failures and maintaining data integrity. This helps to prevent data loss and ensures that data
can be recovered in the event of a system failure.
7. Improved Performance: File system implementation includes techniques such as buffering
and caching to optimize file I/O performance. This results in faster access to data and
improved overall system performance.
8. Scalability: File system implementation can be designed to be scalable, making it possible to
store and retrieve large amounts of data efficiently.
9. Flexibility: Different file system implementations can be designed to meet specific needs and
use cases. This allows developers to choose the best file system implementation for their
specific requirements.
10. Cross-Platform Compatibility: Many file system implementations are cross-platform
compatible, which means they can be used on different operating systems. This makes it easy
to transfer files between different systems.
Disadvantages: If we access many files at same time then it results in low performance. We
can implement file system by using two types data structures :
1. Boot Control Block – It is usually the first block of volume and it contains information
needed to boot an operating system.In UNIX it is called boot block and in NTFS it is called
as partition boot sector.
2. Volume Control Block – It has information about a particular partition ex:- free block count,
block size and block pointers etc.In UNIX it is called super block and in NTFS it is stored in
master file table.
3. Directory Structure – They store file names and associated inode numbers.In UNIX,
includes file names and associated file names and in NTFS, it is stored in master file table.
4. Per-File FCB – It contains details about files and it has a unique identifier number to allow
association with directory entry. In NTFS it is stored in master file table.
5. Mount Table – It contains information about each mounted volume.
6. Directory-Structure cache – This cache holds the directory information of recently accessed
directories.
7. System wide open-file table – It contains the copy of FCB of each open file.
8. Per-process open-file table – It contains information opened by that particular process and it
maps with appropriate system wide open-file.
9. Linear List – It maintains a linear list of filenames with pointers to the data blocks.It is
time-consuming also.To create a new file, we must first search the directory to be sure that no
existing file has the same name then we add a file at end of the directory.To delete a file, we
search the directory for the named file and release the space.To reuse the directory entry
either we can mark the entry as unused or we can attach it to a list of free directories.
10. Hash Table – The hash table takes a value computed from the file name and returns a pointer
to the file. It decreases the directory search time. The insertion and deletion process of files is
easy. The major difficulty is hash tables are its generally fixed size and hash tables are
dependent on hash function on that size.
File System provide efficient access to the disk by allowing data to be stored, located and
retrieved in a convenient way. A file System must be able to store the file, locate the file and
retrieve the file.
Most of the Operating Systems use layering approach for every task including file systems.
Every layer of the file system is responsible for some activities.
The image shown below, elaborates how the file system is divided in different layers, and also
the functionality of each layer.
* When an application program asks for a file, the first request is directed to the logical file
system. The logical file system contains the Meta data of the file and directory structure.
If the application program doesn't have the required permissions of the file then this layer
will throw an error. Logical file systems also verify the path to the file.
* Generally, files are divided into various logical blocks. Files are to be stored in the hard
disk and to be retrieved from the hard disk. Hard disk is divided into various tracks and
sectors. Therefore, in order to store and retrieve the files, the logical blocks need to be
mapped to physical blocks. This mapping is done by File organization module. It is also
responsible for free space management.
* Once File organization module decided which physical block the application program
needs, it passes this information to basic file system. The basic file system is responsible
for issuing the commands to I/O control in order to fetch those blocks.
* I/O controls contain the codes by using which it can access hard disk. These codes are
known as device drivers. I/O controls are also responsible for handling interrupts.
The application program requests for a file, the request is sent to the logical file system.
The logical file system stores the meta data of the file and the directory structure. If the
application program does not have the required permissions, then the logical file system shows
an error. It is also responsible for verifying the path to the file.
Files are stored in the hard disks from where it is to be retrieved. The files are divided into
logical blocks. To store and retrieve data from the file, the logical blocks need to be mapped to
the physical blocks. This mapping is done by the file organization module. It also looks over
space management. The file organization module decided which physical block is to be allocated
to the applications.
Once the decision is made, it passes the information to the basic file system. The basic file
system issues a command to the I\O Control to fetch the blocks.
The I\O control handles any interrupts and contains the device drivers to access the hard disk.
Follow implementation from above question
Directory implementation
Directory implementation in the operating system can be done using Singly Linked List and
Hash table. The efficiency, reliability, and performance of a file system are greatly affected by
the selection of directory-allocation and directory-management algorithms. There are numerous
ways in which the directories can be implemented. But we need to choose an appropriate
directory implementation algorithm that enhances the performance of the system.
The implementation of directories using a singly linked list is easy to program but is
time-consuming to execute. Here we implement a directory by using a linear list of filenames
with pointers to the data blocks.
* To create a new file the entire list has to be checked such that the new directory does not exist
previously.
* The new directory then can be added to the end of the list or at the beginning of the list.
* In order to delete a file, we first search the directory with the name of the file to be deleted.
After searching we can delete that file by releasing the space allocated to it.
* To reuse the directory entry we can mark that entry as unused or we can append it to the list
of free directories.
* To delete a file linked list is the best choice as it takes less time.
Disadvantage
The main disadvantage of using a linked list is that when the user needs to find a file the user has
to do a linear search. In today’s world directory information is used quite frequently and linked
list implementation results in slow access to a file. So the operating system maintains a cache to
store the most recently used directory information.
An alternative data structure that can be used for directory implementation is a hash table. It
overcomes the major drawbacks of directory implementation using a linked list. In this method,
we use a hash table along with the linked list. Here the linked list stores the directory entries, but
a hash data structure is used in combination with the linked list.
In the hash table for each pair in the directory key-value pair is generated. The hash function on
the file name determines the key and this key points to the corresponding file stored in the
directory. This method efficiently decreases the directory search time as the entire list will not be
searched on every operation. Using the keys the hash table entries are checked and when the file
is found it is fetched.
Disadvantage:
The major drawback of using the hash table is that generally, it has a fixed size and its
dependency on size. But this method is usually faster than linear search through an entire
directory using a linked list.
Allocation methods
The allocation methods define how the files are stored in the disk blocks. There are three main
disk space or file allocation methods.
* Contiguous Allocation
* Linked Allocation
* Indexed Allocation
The main idea behind these methods is to provide:
* Efficient disk space utilization.
* Fast access to the file blocks.
All the three methods have their own advantages and disadvantages as discussed below:
Contiguous Allocation
If the blocks are allocated to the file in such a way that all the logical blocks of the file get the
contiguous physical block in the hard disk then such allocation scheme is known as contiguous
allocation.
In the image shown below, there are three files in the directory. The starting block and the length
of each file are mentioned in the table. We can check in the table that the contiguous blocks are
assigned to each file as per its need.
Advantages
1. It is simple to implement.
2. We will get Excellent read performance.
3. Supports Random Access into files.
Disadvantages
Advantages
Disadvantages
Advantages
Disadvantages
Advantages
Disadvantages
For the larger files, the last entry of the index block is a pointer which points to another index
block. This is also called as linked schema.
Free-space management
Free space management is a critical aspect of operating systems as it involves managing the
available storage space on the hard disk or other secondary storage devices. The operating
system uses various techniques to manage free space and optimize the use of storage devices.
Here are some of the commonly used free space management techniques:
1. Linked Allocation: In this technique, each file is represented by a linked list of disk blocks.
When a file is created, the operating system finds enough free space on the disk and links the
blocks of the file to form a chain. This method is simple to implement but can lead to
fragmentation and wastage of space.
2. Contiguous Allocation: In this technique, each file is stored as a contiguous block of disk
space. When a file is created, the operating system finds a contiguous block of free space and
assigns it to the file. This method is efficient as it minimizes fragmentation but suffers from
the problem of external fragmentation.
3. Indexed Allocation: In this technique, a separate index block is used to store the addresses of
all the disk blocks that make up a file. When a file is created, the operating system creates an
index block and stores the addresses of all the blocks in the file. This method is efficient in
terms of storage space and minimizes fragmentation.
4. File Allocation Table (FAT): In this technique, the operating system uses a file allocation
table to keep track of the location of each file on the disk. When a file is created, the
operating system updates the file allocation table with the address of the disk blocks that
make up the file. This method is widely used in Microsoft Windows operating systems.
5. Volume Shadow Copy: This is a technology used in Microsoft Windows operating systems to
create backup copies of files or entire volumes. When a file is modified, the operating system
creates a shadow copy of the file and stores it in a separate location. This method is useful for
data recovery and protection against accidental file deletion.
Overall, free space management is a crucial function of operating systems, as it ensures that
storage devices are utilized efficiently and effectively.
The system keeps tracks of the free disk blocks for allocating space to files when they are
created. Also, to reuse the space released from deleting the files, free space management
becomes crucial. The system maintains a free space list which keeps track of the disk blocks that
are not allocated to some file or directory. The free space list can be implemented mainly as:
1. Bitmap or Bit vector – A Bitmap or Bit Vector is series or collection of bits where each bit
corresponds to a disk block. The bit can take two values: 0 and 1: 0 indicates that the block is
allocated and 1 indicates a free block. The given instance of disk blocks on the disk in Figure
1 (where green blocks are allocated) can be represented by a bitmap of 16 bits as:
0000111000000110.
Advantages –
* Simple to understand.
* Finding the first free block is efficient. It requires scanning the words (a group of
8 bits) in a bitmap for a non-zero word. (A 0-valued word has all bits 0). The first
free block is then found by scanning for the first 1 bit in the non-zero word.
2. Linked List – In this approach, the free disk blocks are linked together i.e. a free block
contains a pointer to the next free block. The block number of the very first disk block is
stored at a separate location on disk and is also cached in memory.
In Figure-2, the free space list head points to Block 5 which points to Block 6, the next free
block and so on. The last free block would contain a null pointer indicating the end of free
list. A drawback of this method is the I/O required for free space list traversal.
3.Grouping – This approach stores the address of the free blocks in the first free block. The
first free block stores the address of some, say n free blocks. Out of these n blocks, the first
n-1 blocks are actually free and the last block contains the address of next free n blocks.
An advantage of this approach is that the addresses of a group of free disk blocks can be
found easily.
4.Counting – This approach stores the address of the first free disk block and a number n of
free contiguous disk blocks that follow the first block. Every entry in the list would contain:
1. Address of first free disk block
2. A number n
Here are some advantages and disadvantages of free space management techniques in
operating systems:
Advantages:
1. Efficient use of storage space: Free space management techniques help to optimize the use of
storage space on the hard disk or other secondary storage devices.
2. Easy to implement: Some techniques, such as linked allocation, are simple to implement and
require less overhead in terms of processing and memory resources.
3. Faster access to files: Techniques such as contiguous allocation can help to reduce disk
fragmentation and improve access time to files.
Disadvantages:
1. Fragmentation: Techniques such as linked allocation can lead to fragmentation of disk space,
which can decrease the efficiency of storage devices.
2. Overhead: Some techniques, such as indexed allocation, require additional overhead in terms
of memory and processing resources to maintain index blocks.
3. Limited scalability: Some techniques, such as FAT, have limited scalability in terms of the
number of files that can be stored on the disk.
4. Risk of data loss: In some cases, such as with contiguous allocation, if a file becomes
corrupted or damaged, it may be difficult to recover the data.
5. Overall, the choice of free space management technique depends on the specific
requirements of the operating system and the storage devices being used. While some
techniques may offer advantages in terms of efficiency and speed, they may also have
limitations and drawbacks that need to be considered.
Efficiency
The efficient use of disk space depends heavily on the disk allocation and directory algorithms
in use. For instance, UNIX inodes are preallocated on a volume. Even an "empty" disk has a
percentage of its space lost to inodes. However, by preallocating the inodes and. spreading them
across the volume, we improve the file system's performance. This improved performance
results from the UNIX allocation and free-space algorithms, which try to keep a file's data
blocks near that file's inode block to reduce seek time. As another example, let's reconsider the
clustering scheme discussed in Section 11.4, which aids in file-seek and file-transfer
performance at the cost of internal fragmentation.
To reduce this fragmentation, BSD UNIX varies the cluster size as a file grows. Large clusters
are used where they can be filled, and small clusters are used for small files and the last cluster
of a file. This system is described in Appendix A. The types of data normally kept in a file's
directory (or inode) entry also require consideration. Commonly, a 'last write date" is recorded
to supply information to the user and, to determine whether the file needs to be backed up.
Some systems also keep a "last access date," so that a user can determine when the file was last
read.
The result of keeping this information is that, whenever the file is read, a field in the directory
structure must be written to. That means the block must be read into memory, a section changed,
and the block written back out to disk, because operations on disks occur only in block (or
cluster) chunks. So any time a file is opened for reading, its directory entry must be read and
written as well. This requirement can be inefficient for frequently accessed files, so we must
weigh its benefit against its performance cost when designing a file system. Generally, every
data item associated with a file needs to be considered for its effect on efficiency and
performance.
As an example, consider how efficiency is affected by the size of the pointers used to access
data. Most systems use either 16- or 32-bit pointers throughout the operating system. These
pointer sizes limit the length of a file to either 2 16 (64 KB) or 232 bytes (4 GB). Some systems
implement 64-bit pointers to increase this limit to 264 bytes, which is a very large number
indeed. However, 64-bit pointers take more space to store and in turn make the allocation and
free-space-management methods (linked lists, indexes, and so on) use more disk space. One of
the difficulties in choosing a pointer size, or indeed any fixed allocation size within an operating
system, is planning for the effects of changing technology. Consider that the IBM PC XT had a
10-MB hard drive and an MS-DOS file system that could support only 32 MB. (Each FAT entry
was 12 bits, pointing to an 8-KB cluster.)
As disk capacities increased, larger disks had to be split into 32-MB partitions, because the file
system could not track blocks beyond 32 MB. As hard disks with capacities of over 100 MB
became common, most disk controllers include local memory to form an on-board cache that is
large enough to store entire tracks at a time. Once a seek is performed, the track is read into the
disk cache starting at the sector under the disk head (reducing latency time).
The disk controller then transfers any sector requests to the operating system. Once blocks
make it from the disk controller into main memory, the operating system may cache the blocks
there. Some systems maintain a separate section of main memory for a buffer cache, where
blocks are kept under the assumption that they will be used again shortly. Other systems cache
file data using a page cache.
The page cache uses virtual memory techniques to cache file data as pages rather than as
file-system-oriented blocks. Caching file data using virtual addresses is far more efficient than
caching through physical disk blocks, as accesses interface with virtual memory rather than the
file system. Several systems—including Solaris, Linux, and Windows NT, 2000, and XP—use
page caching to cache both process pages and file data. This is known as unified virtual
memory. Some versions of UNIX and Linux provide a unified buffer cache. To illustrate the
benefits of the unified buffer cache, consider the two alternatives for opening and accessing a
file. One approach is to use memory mapping (Section 9.7); the second is to use the standard
system calls readO and write 0 . Without a unified buffer cache, we have a situation similar to
Figure 11.11.
Here, the read() and write () system calls go through the buffer cache. The memory-mapping
call, however, requires using two caches—the page cache and the buffer cache. A memory
mapping proceeds by reading in disk blocks from the file system and storing them in the buffer
cache. Because the virtual memory system does not interface with the buffer cache, the contents
of the file in the buffer cache must be copied into the page cache. This situation is known as
double caching and requires caching file-system data twice. Not only does it waste memory but
it also wastes significant CPU and I/O cycles due to the extra data movement within, system
memory.
In add ition, inconsistencies between the two caches can result in corrupt files. In contrast, when
a unifiedthe disk data structures and algorithms in MS-DOS had to be modified to allow larger
file systems. (Each FAT entry was expanded to 16 bits and later to 32 bits.) The initial
file-system decisions were made for efficiency reasons; however, with the advent of MS-DOS
version 4, millions of computer users were inconvenienced when they had to switch to the new,
larger file system. Sun's ZFS file system uses 128-bit pointers, which theoretically should never
need to be extended. (The minimum mass of a device capable of storing 2'2S bytes using
atomic-level storage would be about 272 trillion kilograms.) As another example, consider the
evolution of Sun's Solaris operating system.
Originally, many data structures were of fixed length, allocated at system startup. These
structures included the process table and the open-file table. When the process table became
full, no more processes could be created. When the file table became full, no more files could be
opened. The system would fail to provide services to users. Table sizes could be increased only
by recompiling the kernel and rebooting the system. Since the release of Solaris 2, almost all
kernel structures have been allocated dynamically, eliminating these artificial limits on system
performance. Of course, the algorithms that manipulate these tables are more complicated, and
the operating system is a little slower because it must dynamically allocate and deallocate table
entries; but that price is the usual one for more general, functionality.
Performance
Even after the basic file-system algorithms have been selected, we can still improve
performance in several ways. As will be discussed in Chapter 13, most disk controllers include
local memory to form an on-board cache that is large enough to store entire tracks at a time.
Once a seek is performed, the track is read into the disk cache starting at the sector under the
disk head (reducing latency time). The disk controller then transfers any sector requests to the
operating system. Once blocks make it from the disk controller into main memory, the operating
system may cache the blocks there. Some systems maintain a separate section of main memory
for a buffer cache, where blocks are kept under the assumption that they will be used again
shortly. Other systems cache file data using a page cache.
The page cache uses virtual memory techniques to cache file data as pages rather than as
file-system-oriented blocks. Caching file data using virtual addresses is far more efficient than
caching through physical disk blocks, as accesses interface with virtual memory rather than the
file system. Several systems—including Solaris, Linux, and Windows NT, 2000, and XP—use
page caching to cache both process pages and file data. This is known as unified virtual
memory. Some versions of UNIX and Linux provide a unified buffer cache.
To illustrate the benefits of the unified buffer cache, consider the two alternatives for opening
and accessing a file. One approach is to use memory mapping (Section 9.7); the second is to use
the standard system calls readO and write 0 .
Without a unified buffer cache, we have a situation similar to Figure 11.11. Here, the read() and
write () system calls go through the buffer cache. The memory-mapping call, however, requires
using two caches—the page cache and the buffer cache. A memory mapping proceeds by
reading in disk blocks from the file system and storing them in the buffer cache. Because the
virtual memory system does not interface with the buffer cache, the contents of the file in the
buffer cache must be copied into the page cache.
This situation is known as double caching and requires caching file-system data twice. Not only
does it waste memory but it also wastes significant CPU and I/O cycles due to the extra data
movement within, system memory. In add ition, inconsistencies between the two caches can
result in corrupt files. In contrast, when a unified buffer cache is provided, both memory
mapping and the read () and write () system calls use the same page cache. This has the benefit
of avoiding double caching, and it allows the virtual memory system to manage file-system
data. The unified buffer cache is shown in Figure 11.12. Regardless of whether we are caching
disk blocks or pages (or both), LEU (Section 9.4.4) seems a reasonable general-purpose
algorithm for block or page replacement. However, the evolution of the Solaris page-caching
algorithms reveals the difficulty in choosing an algorithm. Solaris allows processes and the page
cache to share unused inemory.
Versions earlier than Solaris 2.5.1 made no distinction between allocating pages to a process and
allocating them to the page cache. As a result, a system performing many I/O operations used
most of the available memory for caching pages. Because of the high rates of I/O, the page
scanner (Section 9.10.2) reclaimed pages from processes— rather than from the page
cache—when free memory ran low. Solaris 2.6 and Solaris 7 optionally implemented priority
paging, in which the page scanner gives priority to process pages over the page cache. Solaris 8
applied a fixed limit to process pages and the file-system page cache, preventing either from
forcing the other out of memory. Solaris 9 and 10 again changed the algorithms to maximize
memory use and minimize thrashing. This real-world example shows the complexities of
performance optimizing and caching.
There are other issvies that can affect the performance of I/O such as whether writes to the file
system occur synchronously or asynchronously. Synchronous writes occur in the order in which
the disk subsystem receives them, and the writes are not buffered. Thus, the calling routine must
wait for the data to reach the disk drive before it can proceed. Asynchronous writes are done the
majority of the time. In an asynchronous write, the data are stored in the cache, and control
returns to the caller. Metadata writes, among others, can be synchronous.
Operating systems frequently include a flag in the open system call to allow a process to
request that writes be performed synchronously. For example, databases use this feature for
atomic transactions, to assure that data reach stable storage in the required order. Some systems
optimize their page cache by using different replacement algorithms, depending on the access
type of the file.
A file being read or written sequentially should not have its pages replaced in LRU order,
because the most 11.7 Recovery 435 recently used page will be used last, or perhaps never
again. Instead, sequential access can be optimized by techniques known as free-behind and
read-ahead. Free-behind removes a page from the buffer as soon as the next page is requested.
The previous pages are not likely to be used again and waste buffer space. With read-ahead, a
requested page and several subsequent pages are read and cached. These pages are likely to be
requested after the current page is processed.
Retrieving these data from the disk in one transfer and caching them saves a considerable
amount of time. One might think a track cache on the controller eliminates the need for
read-ahead on a multiprogrammed system. However, because of the high latency and overhead
involved in making many small transfers from the track cache to main memory, performing a
read-ahead remains beneficial. The page cache, the file system, and the disk drivers have some
interesting interactions. When data are written to a disk file, the pages are buffered in the cache,
and the disk driver sorts its output queue according to disk address. These two actions allow the
disk driver to minimize disk-head seeks and to write data at times optimized for disk rotation.
Unless synchronous writes are required, a process writing to disk simply writes into the cache,
and the system asynchronously writes the data to disk when convenient. The user process sees
very fast writes. When data are read from a disk file, the block I/O system does some
read-ahead; however, writes are much more nearly asynchronous than are reads. Thus, output to
the disk through the file system is often faster than is input for large transfers, counter to
intuition.
Security
An operating system offers a technique to avoid tampering with logical and physical resources.
Two of these are security and protection. Although these terms are frequently used
interchangeably, protection and security are different. Protection entails preventing unauthorized
users from interfering with the user's applications and data. In contrast, security entails
protecting the user's programs and data against disruption by outside parties like unauthorized
users of other systems.
A system's security revolves around its external environment and needs a suitable protective
system. Security systems safeguard computer resources against unauthorized access,
manipulation, and inconsistency. In this context, resources might be stored information in
the system, CPU, memory, drives, etc.
The system's security emphasizes the system's authentication process to secure the physical
resources and the integrity of the information contained in the system. Security is a method that
protects the user's programs and data against interference produced by an entity or person outside
the system. For instance, multiple workers of an organization can access data. Still, it cannot be
accessed by a user who does not exist in that organization or a user who works in another
organization. An organization's primary responsibility is to implement security measures to
prevent unauthorized access to its data by external users.
Security is a technique used in operating systems to address threats from outside the system to
maintain the system's proper functioning.
The security technique specifies whether or not a specific user is allowed to access the system.
Security techniques include adding, deleting users, determining whether or not a certain user is
authorized, employing anti-malware software, etc.
Security offers a technique for protecting system and user resources from unauthorized access.
Protection
File protection in an operating system refers to the various mechanisms and techniques used to
secure files from unauthorized access, alteration, or deletion. It involves controlling access to
files, ensuring their security and confidentiality, and preventing data breaches and other security
incidents.
Operating systems provide several file protection features, including file permissions,
encryption, access control lists, auditing, and physical file security. These measures allow
administrators to manage access to files, determine who can access them, what actions can be
performed on them, and how they are stored and backed up. Proper file protection requires
ongoing updates and patches to fix vulnerabilities and prevent security breaches. It is crucial for
data security in the digital age where cyber threats are prevalent. By implementing file
protection measures, organizations can safeguard their files, maintain data confidentiality, and
minimize the risk of data breaches and other security incidents.
Type of File protection
File protection is an essential component of modern operating systems, ensuring that files are
secured from unauthorized access, alteration, or deletion. In this context, there are several types
of file protection mechanisms used in operating systems to provide robust data security.
* File Permissions − File permissions are a basic form of file protection that controls
access to files by setting permissions for users and groups. File permissions allow the
system administrator to assign specific access rights to users and groups, which can
include read, write, and execute privileges. These access rights can be assigned at the file
or directory level, allowing users and groups to access specific files or directories as
needed. File permissions can be modified by the system administrator at any time to
adjust access privileges, which helps to prevent unauthorized access.
* Encryption − Encryption is the process of converting plain text into ciphertext to protect
files from unauthorized access. Encrypted files can only be accessed by authorized users
who have the correct encryption key to decrypt them. Encryption is widely used to secure
sensitive data such as financial information, personal data, and other confidential
information. In an operating system, encryption can be applied to individual files or entire
directories, providing an extra layer of protection against unauthorized access.
* Access Control Lists (ACLs) − Access control lists (ACLs) are lists of permissions
attached to files and directories that define which users or groups have access to them and
what actions they can perform on them. ACLs can be more granular than file
permissions, allowing the system administrator to specify exactly which users or groups
can access specific files or directories. ACLs can also be used to grant or deny specific
permissions, such as read, write, or execute privileges, to individual users or groups.
* Auditing and Logging − Auditing and logging are mechanisms used to track and
monitor file access, changes, and deletions. It involves creating a record of all file access
and changes, including who accessed the file, what actions were performed, and when
they were performed. Auditing and logging can help to detect and prevent unauthorized
access and can also provide an audit trail for compliance purposes.
* Physical File Security − Physical file security involves protecting files from physical
damage or theft. It includes measures such as file storage and access control, backup and
recovery, and physical security best practices. Physical file security is essential for
ensuring the integrity and availability of critical data, as well as compliance with
regulatory requirements.
Overall, these types of file protection mechanisms are essential for ensuring data security and
minimizing the risk of data breaches and other security incidents in an operating system. The
choice of file protection mechanisms will depend on the specific requirements of the
organization, as well as the sensitivity and volume of the data being protected. However, a
combination of these file protection mechanisms can provide comprehensive protection against
various types of threats and vulnerabilities.
Advantages of File protection
File protection is an important aspect of modern operating systems that ensures data security and
integrity by preventing unauthorized access, alteration, or deletion of files. There are several
advantages of file protection mechanisms in an operating system, including −
* Data Security − File protection mechanisms such as encryption, access control lists, and
file permissions provide robust data security by preventing unauthorized access to files.
These mechanisms ensure that only authorized users can access files, which helps to
prevent data breaches and other security incidents. Data security is critical for
organizations that handle sensitive data such as personal data, financial information, and
intellectual property.
* Compliance − File protection mechanisms are essential for compliance with regulatory
requirements such as GDPR, HIPAA, and PCI-DSS. These regulations require
organizations to implement appropriate security measures to protect sensitive data from
unauthorized access, alteration, or deletion. Failure to comply with these regulations can
result in significant financial penalties and reputational damage.
* Business Continuity − File protection mechanisms are essential for ensuring business
continuity by preventing data loss due to accidental or malicious deletion, corruption, or
other types of damage. File protection mechanisms such as backup and recovery, auditing,
and logging can help to recover data quickly in the event of a data loss incident, ensuring
that business operations can resume as quickly as possible.
* Increased Productivity − File protection mechanisms can help to increase productivity
by ensuring that files are available to authorized users when they need them. By
preventing unauthorized access, alteration, or deletion of files, file protection mechanisms
help to minimize the risk of downtime and data loss incidents that can impact
productivity.
* Enhanced Collaboration − File protection mechanisms can help to enhance
collaboration by allowing authorized users to access and share files securely. Access
control lists, file permissions, and encryption can help to ensure that files are only
accessed by authorized users, which helps to prevent conflicts and misunderstandings that
can arise when multiple users access the same file.
* Reputation − File protection mechanisms can enhance an organizations reputation by
demonstrating a commitment to data security and compliance. By implementing robust
file protection mechanisms, organizations can build trust with their customers, partners,
and stakeholders, which can have a positive impact on their reputation and bottom line.
Disadvantages of File protection
There are also some potential disadvantages of file protection in an operating system, including
−
* Overhead − Some file protection mechanisms such as encryption, access control lists,
and auditing can add overhead to system performance. This can impact system resources
and slow down file access and processing times.
* Complexity − File protection mechanisms can be complex and require specialized
knowledge to implement and manage. This can lead to errors and misconfigurations that
compromise data security.
* Compatibility Issues − Some file protection mechanisms may not be compatible with all
types of files or applications, leading to compatibility issues and limitations in file usage.
* Cost − Implementing robust file protection mechanisms can be expensive, especially for
small organizations with limited budgets. This can make it difficult to achieve full data
protection.
* User Frustration − Stringent file protection mechanisms such as complex passwords,
frequent authentication requirements, and restricted access can frustrate users and impact
productivity.
Security threats
The various security threats in file system of OS are Trojan Horse, Trap Door, Logic Bomb,
Stack and Buffer Overflow and Viruses.
Computer Security is the process of preventing and detecting unauthorized use of your computer.
It involves the process of safeguarding against intruders from using your computer resources for
malicious intents or for their own gains.
A file virus attaches itself to an executable file, causing it to run the virus code first and then
jump to the start of the original program.
These viruses are termed parasitic, because they do not leave any new files on the system, and
the original program is still fully functional.
Trojan Horse:
A Trojan Horse is a program that secretly performs some maliciousness in addition to its visible
actions.
Malware in a Trojan horse does not replicate itself, nor can it propagate without the end user's
assistance since the user is often unaware that he has installed a Trojan horse.
Unexpected changes to computer settings and unusual activity even when the computer should
be idle are strong indications that a Trojan or other malware is residing on a computer.
To avoid being infected by Trojan malware, users should keep their antivirus software up to date,
never download files or programs from untrusted sources, and always scan new files with
antivirus software before opening them.
Trap Door :
A Trap Door is when a designer or a programmer (or hacker) deliberately inserts a security hole
that they can use later to access the system.
Because of the possibility of trap doors, once a system has been in an untrustworthy state, that
system can never be trusted again.
Even the backup tapes may contain a copy of some cleverly hidden back door.
Logic Bomb:
A Logic Bomb is code that is not designed to cause havoc all the time, but only when a certain
set of circumstances occurs, such as when a particular date or time is reached or some other
noticeable event.
Logic Bomb is also called slag code, it is a programming code added to the software of an
application or operating system that lies dormant until a predetermined period of time or event
occurs, triggering the code into action.
Logic bombs typically are malicious in intent, acting in the same ways as a virus or Trojan horse
once activated.
Viruses that are set to be released at a certain time are considered logic bombs.
They can perform such actions as reformatting a hard drive and/or deleting, altering or
corrupting data.
This is a classic method of attack, which exploits bugs in system code that allows buffers to
overflow.
Stack based buffer overflows are one of the most common vulnerabilities. It affects any function
that copies input to memory without doing bounds checking.
A buffer overflow occurs when a function copies data into a buffer without doing bounds
checking. So if the source data size is larger than the destination buffer size this data will
overflow the buffer towards higher memory address and probably overwrite previous data on
stack.
Viruses:
Viruses are more likely to infect PCs than UNIX or other multi-user systems, because programs
in the latter systems have limited authority to modify other programs or to access critical system
structures such as the boot block.
Viruses are delivered to systems in a virus dropper, usually some form of a Trojan Horse, and
usually via e-mail or unsafe downloads.
Viruses
A virus is a fragment of code embedded in a legitimate program. Viruses are self-replicating and
are designed to infect other programs. They can wreak havoc in a system by modifying or
destroying files causing system crashes and program malfunctions. On reaching the target
machine a virus dropper(usually a trojan horse) inserts the virus into the system.
File Virus:
This type of virus infects the system by appending itself to the end of a file. It changes the start
of a program so that the control jumps to its code. After the execution of its code, the control
returns back to the main program. Its execution is not even noticed. It is also called a Parasitic
virus because it leaves no file intact but also leaves the host functional.
It infects the boot sector of the system, executing every time system is booted and before the
operating system is loaded. It infects other bootable media like floppy disks. These are also
known as memory viruses as they do not infect the file systems.
Macro Virus:
Unlike most viruses which are written in a low-level language(like C or assembly language),
these are written in a high-level language like Visual Basic. These viruses are triggered when
a program capable of executing a macro is run. For example, the macro viruses can be contained
in spreadsheet files.
It looks for source code and modifies it to include virus and to help spread it.
Polymorphic Virus:
A virus signature is a pattern that can identify a virus(a series of bytes that make up virus code).
So in order to avoid detection by antivirus a polymorphic virus changes each time it is installed.
The functionality of the virus remains the same but its signature is changed.
Encrypted Virus:
In order to avoid detection by antivirus, this type of virus exists in encrypted form. It carries a
decryption algorithm along with it. So the virus first decrypts and then executes.
Stealth Virus:
It is a very tricky virus as it changes the code that can be used to detect it. Hence, the detection of
viruses becomes very difficult. For example, it can change the read system call such that
whenever the user asks to read a code modified by a virus, the original form of code is shown
rather than infected code.
Tunneling Virus:
This virus attempts to bypass detection by antivirus scanner by installing itself in the interrupt
handler chain. Interception programs, which remain in the background of an operating system
and catch viruses, become disabled during the course of a tunneling virus. Similar viruses install
themselves in device drivers.
Multipartite Virus:
This type of virus is able to infect multiple parts of a system including the boot sector, memory,
and files. This makes it difficult to detect and contain.
Armored Virus:
An armored virus is coded to make it difficult for antivirus to unravel and understand. It uses a
variety of techniques to do so like fooling antivirus to believe that it lies somewhere else than its
real location or using compression to complicate its code.
Browser Hijacker:
As the name suggests this virus is coded to target the user’s browser and can alter the browser
settings. It is also called the browser redirect virus because it redirects your browser to other
malicious sites that can harm your computer system.
Resident viruses installation store for your RAM and meddle together along with your device
operations. They behave in a very secret and dishonest way that they can even connect
themselves for the anti-virus software program files.
The main perspective of this virus is to replicate and take action when it is executed. When a
particular condition is met the virus will get into action and infect files in the directory that are
specified in the AUTOEXEC.BAT file path.
Overwrite virus:
This type of virus deletes the information contained in the file that it infects, rendering them
partially or totally is useless once they have been infected.
Directory Virus:
This virus is also called called File System Virus or Cluster Virus. It infects the directory of the
computer by modifying the path that is indicating the location of a file.
Companion Virus:
This kind of virus usually use the similar file name and create a different extension of it. For
example, if there’s a file “Hello.exe”, the virus will create another file named “Hello.com” and
will hide in the new file
FAT Virus:
The File Allocation Table is the part of the disk used to store all information about the location of
files, available space , unusable space etc.
This virus affects the FAT section and may damage crucial information.
Trust - How can the system be sure that the messages received are really from the source that
they say they are, and can that source be trusted?
Confidentiality - How can one ensure that the messages one is sending are received only by the
intended recipient?
Cryptography can help with both of these problems, through a system of secrets and keys. In the
former case, the key is held by the sender, so that the recipient knows that only the authentic
author could have sent the message; In the latter, the key is held by the recipient, so that only the
intended recipient can receive the message accurately.
Keys are designed so that they cannot be divined from any public information, and must be
guarded carefully. (Asymmetric encryption involves both a public and a private key.)
Encryption
The basic idea of encryption is to encode a message so that only the desired recipient can decode
and read it. Encryption has been around since before the days of Caesar, and is an entire field of
study in itself. Only some of the more significant computer encryption schemes will be covered
here.
The basic process of encryption is shown in Figure 15.7, and will form the basis of most of our
discussion on encryption. The steps in the procedure and some of the key terminology are as
follows:
The message is then entered into an encryption algorithm, E, along with the encryption key, Ke.
The encryption algorithm generates the cipher text, c, = E(Ke)(m). For any key k, E(k) is an
algorithm for generating ciphertext from a message, and both E and E(k) should be efficiently
computable functions.
The ciphertext can then be sent over an unsecure network, where it may be received by
attackers.
The recipient enters the ciphertext into a decryption algorithm, D, along with the decryption key,
Kd.
The decryption algorithm re-generates the plaintext message, m, = D(Kd)(c). For any key k,
D(k) is an algorithm for generating a clear text message from a cipher text, and both D and D(k)
should be efficiently computable functions.
The algorithms described here must have this important property: Given a ciphertext c, a
computer can only compute a message m such that c = E(k)(m) if it possesses D(k).
Symmetric Encryption
With symmetric encryption, the same key is used for both encryption and decryption, and must
be safely guarded. There are a number of well-known symmetric encryption algorithms that have
been used for computer security:
The Data-Encryption Standard, DES, developed by the National Institute of Standards, NIST,
has been a standard civilian encryption standard for over 20 years. Messages are broken down
into 64-bit chunks, each of which are encrypted using a 56-bit key through a series of
substitutions and transformations. Some of the transformations are hidden ( black boxes ), and
are classified by the U.S. government.
DES is known as a block cipher, because it works on blocks of data at a time. Unfortunately, this
is vulnerability if the same key is used for an extended amount of data. Therefore an
enhancement is to not only encrypt each block, but also to XOR it with the previous block, in a
technique known as cipher-block chaining.
As modern computers become faster and faster, the security of DES has decreased, to where it is
now considered insecure because its keys can be exhaustively searched within a reasonable
amount of computer time. An enhancement called triple DES encrypts the data three times using
three separate keys ( actually two encryptions and one decryption ) for an effective key length of
168 bits. Triple DES is in widespread use today.
The Advanced Encryption Standard, AES, developed by NIST in 2001 to replace DES uses key
lengths of 128, 192, or 256 bits, and encrypts in blocks of 128 bits using 10 to 14 rounds of
transformations on a matrix formed from the block.
The two fish algorithm uses variable key lengths up to 256 bits and works on 128-bit blocks.
RC5 can vary in key length, block size, and the number of transformations, and runs on a wide
variety of CPUs using only basic computations.
RC4 is a stream cipher, meaning it acts on a stream of data rather than blocks. The key is used to
seed a pseudo-random number generator, which generates a key stream of keys. RC4 is used in
WEP, but has been found to be breakable in a reasonable amount of computer time.
Authentication
Description:-
Authentication involves verifying the identity of the entity who transmitted a message
.For example, if D(Kd)(c) produces a valid message, then we know the sender was in possession
of E(Ke).
This form of authentication can also be used to verify that a message has not been modified
Authentication revolves around two functions, used for signatures ( or signing ), and
verification:
o A signing function, S(Ks) that produces an authenticator, A, from any given message m.
o A Verification function, V(Kv,m,A) that produces a value of "true" if A was created from m,
and "false" otherwise.
o More importantly, it must not be possible to generate a valid authenticator, A, without having
possession of S(Ks).
o Furthermore, it must not be possible to divine S(Ks) from the combination of ( m and A ),
since both are sent visibly across networks.
Understanding authenticators begins with an understanding of hash functions, which is the first
step:
o Hash functions, H (m) generate a small fixed-size block of data known as a message digest, or
hash value from any given input data.
o For authentication purposes, the hash function must be collision resistant on m. That is it
should not be reasonably possible to find an alternate message m' such that H(m') = H(m).
o Popular hash functions are MD5, which generates a 128-bit message digest, and SHA-1,
which generates a 160-bit digest.
Message digests are useful for detecting (accidentally) changed messages, but are not useful as
authenticators, because if the hash function is known, then someone could easily change the
message and then generate a new hash value for the modified message. Therefore, authenticators
take things one-step further by encrypting the message digest.
There are three good reasons for having separate algorithms for encryption of messages and
authentication of messages:
2. Authenticators are almost always smaller than the messages, improving space efficiency.
(?)
3. Sometimes we want authentication only, and not confidentiality, such as when a vendor
issues a new software patch.
Another use of authentication is non-repudiation, in which a person filling out an electronic form
cannot deny that they were the ones who did so.
Key Distribution
Description:-
Ø Key distribution with symmetric cryptography is a major problem, because all keys must be
kept secret, and they obviously can't be transmitted over unsecure channels. One option is to
send them out-of-band, say via paper or a confidential conversation.
Ø Another problem with symmetric keys, is that a separate key must be maintained and used for
each correspondent with whom one wishes to exchange confidential information.
Ø Asymmetric encryption solves some of these problems, because the public key can be freely
transmitted through any channel, and the private key doesn't need to be transmitted anywhere.
Recipients only need to maintain one private key for all incoming messages, though senders
must maintain a separate public key for each recipient to which they might wish to send a
message. Fortunately the public keys are not confidential, so this key-ring can be easily stored
and managed.
Ø Unfortunately there are still some security concerns regarding the public keys used in
asymmetric encryption. Consider for example the following man-in-the-middle attack involving
phony public keys:
One solution to the above problem involves digital certificates, which are public keys that have
been digitally signed by a trusted third party. But wait a minute - How do we trust that third
party, and how do we know they are really who they say they are? Certain certificate authorities
have their public keys included within web browsers and other certificate consumers before they
are distributed. These certificate authorities can then vouch for other trusted entities and so on in
a web of trust.
Implementation of Cryptography
Network communications are implemented in multiple layers - Physical, Data Link, Network,
Transport, and Application being the most common breakdown.
Encryption and security can be implemented at any layer in the stack, with pros and cons to each
choice:
o Because packets at lower levels contain the contents of higher layers, encryption at lower
layers automatically encrypts higher layer information at the same time.
o However, security and authorization may be important to higher levels independent of the
underlying transport mechanism or route taken.
At the network layer the most common standard is IPSec, a secure form of the IP layer, which is
used to set up Virtual Private Networks, VPNs.
Introduction:- Protection, dealt with making sure that only certain users were allowed to perform
certain tasks, i.e. that a users privileges were dependent on his or her identity.
Passwords
Ø Passwords are the most common form of user authentication. If the user is in possession of
the correct password, then they are considered to have identified themselves.
Ø In theory, separate passwords could be implemented for separate activities, such as reading
this file, writing that file, etc. In practice most systems use one password to confirm user identity,
and then authorization is based upon that identification. This is a result of the classic trade-off
between security and convenience.
Password Vulnerabilities
o Intelligent guessing requires knowing something about the intended target in specific, or
about people and commonly used passwords in general.
o Brute-force guessing involves trying every word in the dictionary, or every valid combination
of characters. For this reason good passwords should not be in any dictionary ( in any language ),
should be reasonably lengthy, and should use the full range of allowable characters by including
upper and lower case characters, numbers, and special symbols.
Ø "Shoulder surfing" involves looking over people's shoulders while they are typing in their
password.
o Even if the lurker does not get the entire password, they may get enough clues to narrow it
down, especially if they watch on repeated occasions.
o Common courtesy dictates that you look away from the keyboard while someone is typing
their password.
o Passwords echoed as stars or dots still give clues, because an observer can determine how
many characters are in the password.
"Packet sniffing" involves putting a monitor on a network connection and reading data
contained in those packets.
o However, you should still never e-mail a password, particularly not with the word "password"
in the same message or worse yet the subject header.
o Beware of any system that transmits passwords in clear text. ( "Thank you for signing up for
XYZ. Your new account and password information are shown below". ) You probably want to
have a spare throw-away password to give these entities, instead of using the same high-security
password that you use for banking or other confidential uses.
Ø Long hard to remember passwords are often written down, particularly if they are used
seldomly or must be changed frequently. Hence a security trade-off of passwords that are easily
divined versus those that get written down. :-(
Ø Passwords can be given away to friends or co-workers, destroying the integrity of the entire
user-identification system.
Ø Most systems have configurable parameters controlling password generation and what
constitutes acceptable passwords.
o They may need to be changed with a given frequency. (In extreme cases for every session.)
Encrypted Passwords
Modern systems do not store passwords in clear-text form, and hence there is no mechanism to
look up an existing password.
Rather they are encrypted and stored in that form. When a user enters their password, that too is
encrypted, and if the encrypted version match, then user authentication passes.
The encryption scheme was once considered safe enough that the encrypted versions were stored
in the publicly readable file "/etc/passwd".
o Modern computers can try every possible password combination in a reasonably short time,
so now the encrypted passwords are stored in files that are only readable by the super user. Any
password-related programs run as setuid root to get access to these files. ( /etc/shadow )
o A random seed is included as part of the password generation process, and stored as part of
the encrypted password. This ensures that if two accounts have the same plain-text password that
they will not have the same encrypted password. However, cutting and pasting encrypted
passwords from one account to another will give them the same plain-text passwords.