W7 - CLO2 - File System and Storage
W7 - CLO2 - File System and Storage
2
Objectives
• Large data-sets are generated and produced everyday and are sent for processing in high-
performance computing environments.
• Consumers’ requirements and technical aspects related to the storage facility in high-
performance computing environment are different from the traditional storage system in
many ways.
• Traditional enterprise storage systems are no more sufficient to tackle those issues.
• Scalability and high-performance distributed data processing are complementary to each
other.
• Distribution of large data-sets among as-many-nodes as required for processing promotes
scalability of the application.
• Suitable file systems are required to support this distribution and scaling proficiently.
4
Requirements of Data-Intensive Computing
5
Challenges and Design Goals of a Cloud Native File System
• A Cloud Native File System is one that enables the efficient implementation of cloud
computing capabilities like application scaling, metering of storage usage and else.
• Cloud Native File System requirements include a high performance data-intensive
computing environment.
• The prime considerations of the design goals are:
• Multi-tenancy: Cloud system offers multi-tenancy and allows sharing of its underlying resources
among multiple tenants. Especially in a public cloud, consumer share resources with others who are
unknown to them. Hence, the file system in cloud must ensure that tenants (tenants’ processes and
data) remain isolated from one another to provide higher degree of security.
• Scalability: A cloud file system must scale well so that users can rely upon the system with their
growing storage needs. At the same time downward scaling is also important to minimize the
resource wastage.
• Unlimited storage support: File system support for unlimited data storage is another need of
cloud computing for its business success. Moreover, the file system has to be extremely fault
tolerant. And, all of this needs to be achieved by building storage network out of inexpensive
commodity hardware.
6
Challenges and Design Goals of a Cloud Native File System
• Efficiency: The other performance parameter to be counted over the others is the system’s output.
While dealing with thousands of concurrent operations issued by many clients, to achieve the
requisite performance like the local file-system is a critical issue.
• Compatibility: Compatibility is always an issue when new technologies are introduced. In the
domain of computing file systems, backward compatibility with existing file system interfaces is
important to facilitate the migration to the cloud seamlessly.
• Metered Use: Metered use of resources is one of the basic requirements in cloud computing and
for storage too. The file systems used in cloud have to enable and promote this capability.
• Apart from these, error detection mechanism and automatic recovery have to be the
integral parts of the file system. Other important ability required is constant monitoring of
the health of the storage system.
7
Cloud Native File Systems
8
Cloud Native File Systems
9
Storage Deployment Models
• Cloud native storage systems are built over several cloud native file systems.
• Depending on the deployment locations, cloud storages can be categorized into three
models as public, private and hybrid. This is just similar to the standard cloud computing
deployment models.
• Public cloud storage:
• Can be accessed by anyone. It is provided by reputed service providers. In this model, the consumer
enterprise and storage service provider are two different organizations.
• Private cloud storage:
• Arranged by the consumer enterprise itself with help of some service providers. This type of storage
can be set up both on-premises and off-premises. In this model, the consumer enterprise and
storage service provider integrate each other at enterprise’s data center or provider’s data center.
• Hybrid storage:
• Is the combination of both public and private storages as its name implies. In this model, the
enterprises get the opportunity of storing critical and active data in private cloud storage while
public storage can be used for archiving data.
10
Storage Delivery Types
• There are two types of cloud storage users with different needs:
• General Users: people who simply store files into storage
• Need to store their files and folders in ready-to-use storage spaces (like formatted pen drive or external hard disk).
• Their requirements can be fulfilled through personal file hosting services where users can store their files.
• System Developers: people who deploy or develop the applications.
• Need to be able to have full control the storage system.
• Used to deploy applications or perform application development tasks using the storage service.
11
Managed and Unmanaged Cloud Storage
• The general purpose and the specialized cloud storage systems are also characterized as
unmanaged and managed storages respectively.
• Managed Storage System:
• Provides raw disk like facility to users. User can divide or format the storage spaces as per their
requirements and can also install the software.
• Mainly meant for computing system developers.
• Can function as a part of some virtual machine owned by the consumers.
• Delivered as Infrastructure-as-a-Service.
12
Managed and Unmanaged Cloud Storage
13
Popular Cloud Storages for Developers (Managed Storage)
• Provides block level storage volumes for use with Amazon EC2 (Elastic Compute Cloud) instances
(servers).
• Block level storages can be used to create raw storage volumes which can be attached with the
servers. Variety of file systems like NTFS (for windows OS) or ext4 (for Linux OS) can be run on the
block level storage.
• Appears like a massive SAN (Storage Area Network) under the AWS infrastructure.
14
Popular Cloud Storages for Developers (Managed Storage)
• Files are stored as objects and those objects are stored into the containers called as ‘buckets’.
• Object size can go up to few terabytes and trillions of objects are stored per month as reported by
Amazon.
• S3 can be used together with Amazon’s virtual server Elastic Compute Cloud (EC2).
15
Popular Cloud Storages for Developers (Managed Storage)
• Is the persistence storage attached with Google’s cloud server Google Compute Engine (GCE).
• Data are stored as objects and objects are stored into containers called as the ‘buckets’ as well.
• Objects can be of terabytes in size and does not have a size limit.
• Billing in Google Cloud Storage is calculated as per storage usages as well as the bandwidth usage
on monthly basis.
16
Popular General Purpose Cloud Storage (Unmanaged Storage)
17
Popular General Purpose Cloud Storage (Unmanaged Storage)
• Google Drive
• Provides file sharing facility with other users.
• Currently, it provides 15 GB of free storage space for users.
• Has the benefit of a built-in office suite where one can edit documents, spreadsheets and
presentations.
• Google Drive client software is available for both desktop platform like Windows, Mac as well as
mobile platforms like iOS and Android.
• Google Drive storage is for general users and Google Cloud Storage is meant for the developers.
18
Popular General Purpose Cloud Storage (Unmanaged Storage)
• OneDrive (Microsoft)
• Apart from file storage, OneDrive offers the facility for document creation and
• collaboration.
• OneDrive’s biggest strength is that it works closely with Microsoft Office apps such as Word, Excel or
PowerPoint.
19
Summary
• Cloud computing is synonymous to high-performance computing. Hence, the file system and file
processing characteristics of high-performance computing environments are also applicable in cloud
computing.
• Efficient processing of large data-sets is critical for the success of high-performance computing systems.
• High-performance processing of large data-sets requires parallel execution of partitioned data across
distributed computing nodes. This facility should be enabled with suitable data processing programming
models and supporting file systems.
• Among the various file systems to support high-performance processing of data, Google File System (GFS)
is considered as the pioneer.
• Storage in cloud is delivered in two categories: for general users and for developers. Storage for general
users are delivered as SaaS and for the developers it is delivered as IaaS.
• For general users, cloud provides ready-to-use storage which is usually managed by the providers. Hence,
users can directly use the storage without worrying about any kind of processing of the storage. Such
storages are known as ‘unmanaged’ storage type.
• Managed storages are raw storages which are built to be managed by the users themselves. Computing
developers use such kind of storages.
20
Thank You