0% found this document useful (0 votes)
22 views9 pages

UNIT 5 Storage Systems

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
22 views9 pages

UNIT 5 Storage Systems

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 9

UNIT 5 Storage Systems

Evolution of storage technology

Storage devices can be broadly classified into two categories:

 Block Storage Devices


 File Storage Devices

Block Storage Devices


The block storage devices offer raw storage to the clients. These raw storage are
partitioned to create volumes.

File Storage Devices


The file Storage Devices offer storage to clients in the form of files, maintaining its own
file system. This storage is in the form of Network Attached Storage (NAS).

Cloud Storage Classes

Cloud storage can be broadly classified into two categories:

 Unmanaged Cloud Storage


 Managed Cloud Storage

Unmanaged Cloud Storage

Unmanaged cloud storage means the storage is preconfigured for the customer. The
customer can neither format, nor install his own file system or change drive properties.

Managed Cloud Storage

Managed cloud storage offers online storage space on-demand. The managed cloud
storage system appears to the user to be a raw disk that the user can partition and
format.

Explore our latest online courses and learn new skills at your own pace. Enroll and
become a certified expert to boost your career.

Creating Cloud Storage System

The cloud storage system stores multiple copies of data on multiple servers, at multiple
locations. If one system fails, then it is required only to change the pointer to the
location, where the object is stored.

To aggregate the storage assets into cloud storage systems, the cloud provider can use
storage virtualization software known as StorageGRID. It creates a virtualization layer
that fetches storage from different storage devices into a single management system. It
can also manage data from CIFS and NFS file systems over the Internet. The following
diagram shows how StorageGRID virtualizes the storage into storage clouds:
Virtual Storage Containers
The virtual storage containers offer high performance cloud storage systems. Logical
Unit Number (LUN) of device, files and other objects are created in virtual storage
containers. Following diagram shows a virtual storage container, defining a cloud storage
domain:

Challenges

Storing the data in cloud is not that simple task. Apart from its flexibility and
convenience, it also has several challenges faced by the customers. The customers must
be able to:

 Get provision for additional storage on-demand.


 Know and restrict the physical location of the stored data.
 Verify how data was erased.
 Have access to a documented process for disposing of data storage hardware.
 Have administrator access control over data.

storage models

Storage devices can be broadly classified into two categories:

 Block Storage Devices


 File Storage Devices
Block Storage Devices
The block storage devices offer raw storage to the clients. These raw storage are
partitioned to create volumes.

File Storage Devices


The file Storage Devices offer storage to clients in the form of files, maintaining its own
file system. This storage is in the form of Network Attached Storage (NAS).

Cloud Storage Classes

Cloud storage can be broadly classified into two categories:

 Unmanaged Cloud Storage


 Managed Cloud Storage

Unmanaged Cloud Storage

Unmanaged cloud storage means the storage is preconfigured for the customer. The
customer can neither format, nor install his own file system or change drive properties.

Managed Cloud Storage

Managed cloud storage offers online storage space on-demand. The managed cloud
storage system appears to the user to be a raw disk that the user can partition and
format.

Creating Cloud Storage System

The cloud storage system stores multiple copies of data on multiple servers, at multiple
locations. If one system fails, then it is required only to change the pointer to the
location, where the object is stored.

To aggregate the storage assets into cloud storage systems, the cloud provider can use
storage virtualization software known as StorageGRID. It creates a virtualization layer
that fetches storage from different storage devices into a single management system. It
can also manage data from CIFS and NFS file systems over the Internet. The following
diagram shows how StorageGRID virtualizes the storage into storage clouds:
Virtual Storage Containers
The virtual storage containers offer high performance cloud storage systems. Logical
Unit Number (LUN) of device, files and other objects are created in virtual storage
containers. Following diagram shows a virtual storage container, defining a cloud storage
domain:

file systems and database, distributed file systems,

A distributed operating system is a type of operating system designed to manage the


resources of a network of computers and devices, rather than a single computer. In such
a system, the file model plays a crucial role in managing files and providing access to
them across the network. The file model defines how files are created, stored, accessed,
and managed in a distributed environment. It involves concepts such as file systems,
distributed file systems, data consistency, fault tolerance, and security. In this topic, we
will explore the basic concepts of the file model in distributed operating systems, the
challenges associated with it, and the design of distributed file systems. We will also
examine examples of distributed file systems such as Google File System and Hadoop
Distributed File System.

Basic Concepts of File Model in Distributed Operating System


A. File
 Definition − A file is a named collection of related data or information that is
stored on a computer storage device such as a hard drive, flash drive, or network
storage device.
 Characteristics of a file include its size, type, location, and content. Files can be
read, written, deleted, or modified by applications or users.

B. File system
 Definition of a file system − A file system is a software component that
manages files and directories on a storage device. It provides a way for
applications and users to access and organize files. A file system also manages
space allocation, file naming, and file permissions.
 Types of file system − Types of file systems include local file systems that are
used on a single computer such as NTFS, FAT32, and HFS+, and network file
systems that allow files to be accessed over a network such as NFS, CIFS, and
AFS.

Distributed file system


 Definition of a distributed file system − A distributed file system is a file
system that allows files to be stored and accessed from multiple computers over a
network. It provides a way to share data and resources among multiple users or
applications in a distributed environment. Examples of distributed file systems
include Google File System (GFS), Hadoop Distributed File System (HDFS), and
Microsoft Distributed File System (DFS).
 Advantages of a distributed file system − Advantages of a distributed file
system include improved data availability, scalability, and fault tolerance.
Distributed file systems can also provide faster data access and better resource
utilization by distributing data across multiple servers or storage devices.

Challenges in Distributed File Model


Data Consistency
 Data consistency refers to the ability of a system to ensure that data remains
accurate and consistent across multiple copies of the same data. In a distributed
file system, data consistency can be challenging due to the possibility of conflicts
arising from multiple users accessing and modifying the same data.
 Challenges in achieving data consistency in distributed file systems include issues
with data replication, synchronization, and access control. Techniques such as
locking, versioning, and caching can be used to manage data consistency in
distributed file systems.

Fault Tolerance
 Fault tolerance is the ability of a system to continue operating in the presence of
hardware or software failures. In a distributed file system, fault tolerance is crucial
to ensure that data remains available and accessible in the event of failures.
 Challenges in achieving fault tolerance in distributed file systems include issues
with data replication, network partitioning, and failure detection. Techniques such
as replication, redundancy, and fault detection can be used to manage fault
tolerance in distributed file systems.
Security
 Security in distributed file systems refers to the ability of a system to protect data
from unauthorized access or modification. This includes ensuring that data is
encrypted, authenticated, and authorized based on user roles and permissions.
 Challenges in achieving security in distributed file systems include issues with
data privacy, integrity, and authentication. Techniques such as encryption, access
control, and firewalls can be used to manage security in distributed file systems.

Design of Distributed File Model


Architecture of a Distributed File System
 A distributed file system typically consists of several components including client
machines, server machines, and storage devices. These components work
together to provide file access and storage services to users in a distributed
environment.
 A distributed file system is often divided into multiple layers including the
application layer, file system layer, network layer, and storage layer.

Data Access Mechanisms


 Data access protocols define how clients access data in a distributed file system.
Epamples of data access protocols include the Network File System (NFS),
Common Internet File System (CIFS), and Server Message Block (SMB).
 Data replication strategies define how data is stored and replicated across
multiple servers or storage devices in a distributed file system. Examples of data
replication strategies include active-passive replication, active-active replication,
and quorum-based replication.

Synchronization Techniques
 Synchronization methods are used to ensure that data remains consistent and up-
to-date across multiple copies in a distributed file system. Examples of
synchronization methods include locking, versioning, and time-stamping.
 Consensus algorithms are used to achieve agreement among multiple nodes in a
distributed file system. Examples of consensus algorithms include the Paxos
algorithm and the Raft algorithm. These algorithms are used to ensure that data
remains consistent and available even in the presence of network failures or node
crashes.

Examples of Distributed File Systems


Google File System (GFS)
 Google File System (GFS) is a distributed file system developed by Google for
storing and managing large amounts of data across multiple servers.
 The architecture of GFS consists of three main components: a master node, chunk
servers, and client machines. The master node is responsible for managing
metadata and coordinating file access requests, while the chunk servers are
responsible for storing and serving data.
 Advantages of GFS include its ability to handle large files and high write
throughput. However, it has some disadvantages such as limited support for small
files and limited concurrency support.

Hadoop Distributed File System (HDFS)


 Hadoop Distributed File System (HDFS) is a distributed file system used by the
Apache Hadoop software framework for storing and processing large data sets.
 The architecture of HDFS consists of two main components: a NameNode and
multiple DataNodes. The NameNode is responsible for managing metadata and
coordinating file access requests, while the DataNodes are responsible for storing
and serving data.
 Advantages of HDFS include its scalability, fault tolerance, and support for data-
intensive applications. However, it has some disadvantages such as limited
support for real-time data processing and small file sizes.

Conclusion

In conclusion, a distributed file system provides a way for users to access and store files
across multiple machines in a networked environment. It offers advantages such as
scalability, fault tolerance, and support for data-intensive applications. However, it also
presents challenges such as data consistency, fault tolerance, and security. To design a
distributed file system, one must consider the architecture, data access mechanisms,
and synchronization techniques. Examples of distributed file systems include Google File
System (GFS) and Hadoop Distributed File System (HDFS), each with its own advantages
and disadvantages. Overall, distributed file systems play a crucial role in managing and
processing large amounts of data in today's interconnected world.

general parallel file systems


 Parallel File System: The parallel file system is a system that is used to store data
across multiple network servers. It provides high-performance network access through
parallel coordinated input-output operations. This is a file system that allows concurrent
access to data by more than one user.
 Flock: A group of processes (corresponding to a group of threads) sharing the same
memory image.
 Flock semantics: The properties describe how an entity can be accessed by other
processes within the flock when it is not active. In flock semantics, only one process at
a time may have exclusive access to an entity and all other processes must share the
same view of the entity, even if it is active or protected.
How PFS Relates to Cloud Computing:
Cloud computing gives users a lot of freedom to access the data and resources
that they need on demand. However, when it comes to accessing data, it’s
important that we shouldn’t lose the data from different machines at the same time.
Without locking down file system access between different machines, there is a
high risk of losing or corrupting important data across multiple computers at once.
This can make managing files difficult because certain users may end up accessing
a file while others are trying to edit it at the same time.
Example: Google File System is a cloud file system that uses a parallel file system.
Google File System (GFS) is a scalable distributed file system that provides
consistently high performance across tens of thousands of commodity servers. It
manages huge data sets across dynamic clusters of computers using only
application-level replication and auto-recovery techniques. This architecture pr

Google file system


Google Inc. developed the Google File System (GFS), a scalable distributed file
system (DFS), to meet the company’s growing data processing needs. GFS offers
fault tolerance, dependability, scalability, availability, and performance to big
networks and connected nodes. GFS is made up of a number of storage systems
constructed from inexpensive commodity hardware parts. The search engine,
which creates enormous volumes of data that must be kept, is only one example of
how it is customized to meet Google’s various data use and storage requirements.
The Google File System reduced hardware flaws while gains of commercially
available servers.
GoogleFS is another name for GFS. It manages two types of data namely File
metadata and File Data.
The GFS node cluster consists of a single master and several chunk servers that
various client systems regularly access. On local discs, chunk servers keep data in
the form of Linux files. Large (64 MB) pieces of the stored data are split up and
replicated at least three times around the network. Reduced network overhead
results from the greater chunk size.
Without hindering applications, GFS is made to meet Google’s huge cluster
requirements. Hierarchical directories with path names are used to store files. The
master is in charge of managing metadata, including namespace, access control,
and mapping data. The master communicates with each chunk server by timed
heartbeat messages and keeps track of its status updates.

You might also like