0% found this document useful (0 votes)
64 views21 pages

W7 - CLO2 - File System and Storage

This document discusses file systems and storage in cloud computing. It begins by outlining the objectives of understanding high performance processing, big data, and different storage deployment models and types. It then discusses the requirements of data-intensive computing, challenges of cloud native file systems, examples of cloud native file systems like Google File System, and storage deployment models. It also differentiates managed and unmanaged storage systems.

Uploaded by

maryam saeed
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
64 views21 pages

W7 - CLO2 - File System and Storage

This document discusses file systems and storage in cloud computing. It begins by outlining the objectives of understanding high performance processing, big data, and different storage deployment models and types. It then discusses the requirements of data-intensive computing, challenges of cloud native file systems, examples of cloud native file systems like Google File System, and storage deployment models. It also differentiates managed and unmanaged storage systems.

Uploaded by

maryam saeed
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 21

The Campus of Tomorrow

CIS 4403: Cloud Computing


Week 7: LO2 – File System and Storage

Thursday, November 17, 2022


Delivery Outline
• W1: CLO1 - Introduction to Cloud Computing
• W2: CLO1 - Cloud Computing Models and Services
• W3-4: CLO2 - Resource Virtualization and Pooling
• W5: CLO2 - Scaling and Capacity Planning
• W6: CLO2 - Load Balancing
• W7: CLO2 - File System and Storage
• W8: CLO3 - Database Technology
• W9-10: CLO3 - Cloud Computing Security
• W11: CLO3 - Privacy and Compliance
• W12: CLO4 - Content Delivery Network
• W13: CLO4 - Portability and Interoperability
• W14: CLO4 - Cloud Management
• W15: CLO all - Hot Research Topics

2
Objectives

• Upon completing this chapter the learner should be able to:


• Understand the need for High Performance Processing and Big Data (Chapter 13 of Text Book)
• Explore Storage Deployment Models (Chapter 13 of Text Book)
• Differentiate Storage Types (Chapter 13 of Text Book)
Preload

• Large data-sets are generated and produced everyday and are sent for processing in high-
performance computing environments.
• Consumers’ requirements and technical aspects related to the storage facility in high-
performance computing environment are different from the traditional storage system in
many ways.
• Traditional enterprise storage systems are no more sufficient to tackle those issues.
• Scalability and high-performance distributed data processing are complementary to each
other.
• Distribution of large data-sets among as-many-nodes as required for processing promotes
scalability of the application.
• Suitable file systems are required to support this distribution and scaling proficiently.

4
Requirements of Data-Intensive Computing

• Data-intensive computing presents a challenge to computing systems in terms of delivering


high-performance.
• Large volume complex data-sets cannot be processed centrally in a single node and
require partitioning and distribution over multiple processing nodes.
• Data intensive computing is I/O-bound and requires rapid movements of data in large
numbers. This requires appropriate management of data in transition.
• Complex data-sets present challenges before the computing system.
• Fast and efficient processing of such data is essential along with sophisticated technique to
reduce data access time.
• Data-intensive computing often involves process intensive computing too.
• Data modelling, partitioning, node assignment and accumulation are some of the critical
parts of this computing.
• The data processing model must sync with the distributed file system architecture.
Sophisticated storage techniques are needed to reduce access time.

5
Challenges and Design Goals of a Cloud Native File System

• A Cloud Native File System is one that enables the efficient implementation of cloud
computing capabilities like application scaling, metering of storage usage and else.
• Cloud Native File System requirements include a high performance data-intensive
computing environment.
• The prime considerations of the design goals are:
• Multi-tenancy: Cloud system offers multi-tenancy and allows sharing of its underlying resources
among multiple tenants. Especially in a public cloud, consumer share resources with others who are
unknown to them. Hence, the file system in cloud must ensure that tenants (tenants’ processes and
data) remain isolated from one another to provide higher degree of security.
• Scalability: A cloud file system must scale well so that users can rely upon the system with their
growing storage needs. At the same time downward scaling is also important to minimize the
resource wastage.
• Unlimited storage support: File system support for unlimited data storage is another need of
cloud computing for its business success. Moreover, the file system has to be extremely fault
tolerant. And, all of this needs to be achieved by building storage network out of inexpensive
commodity hardware.

6
Challenges and Design Goals of a Cloud Native File System

• Efficiency: The other performance parameter to be counted over the others is the system’s output.
While dealing with thousands of concurrent operations issued by many clients, to achieve the
requisite performance like the local file-system is a critical issue.
• Compatibility: Compatibility is always an issue when new technologies are introduced. In the
domain of computing file systems, backward compatibility with existing file system interfaces is
important to facilitate the migration to the cloud seamlessly.
• Metered Use: Metered use of resources is one of the basic requirements in cloud computing and
for storage too. The file systems used in cloud have to enable and promote this capability.
• Apart from these, error detection mechanism and automatic recovery have to be the
integral parts of the file system. Other important ability required is constant monitoring of
the health of the storage system.

7
Cloud Native File Systems

• Any storage system builds up above a file system.


• To support high-performance computing for process-intensive as well as data-intensive
tasks, an appropriate file system development is the first step.
• The real revolution came in high-performance distributed file system development with the
emergence of the Google File System (GFS).
• It was developed during late 1990s as a result of an earlier Google effort called as Big
Files.
• GFS connects a very large distributed cluster of inexpensive commodity components using
high-speed network connections. GFS files are collections of fixed-size segments called as
chunks.
• GFS uses the a programming model called MapReduce that splits each large file into
chunks of 64 MB size.
• Each chunk consists of 64 KB blocks. The larger chunk size increases the probability of less
number of chunk being accessed to perform some operation and thus reducing the
processing time.

8
Cloud Native File Systems

• A GFS cluster consists of two different


types of nodes. One special node which
is called as master server and a large
number of other nodes known as chunk
servers.
• The chunk server stores all of the GFS
chunks while the master server
maintains information about the
associated chunks.
• Chunks are distributed and replicated
across multiple sites at chunk servers
and are stored on Linux files systems.
• GFS supports execution of distributed
applications and promotes scalability of
system.

9
Storage Deployment Models

• Cloud native storage systems are built over several cloud native file systems.
• Depending on the deployment locations, cloud storages can be categorized into three
models as public, private and hybrid. This is just similar to the standard cloud computing
deployment models.
• Public cloud storage:
• Can be accessed by anyone. It is provided by reputed service providers. In this model, the consumer
enterprise and storage service provider are two different organizations.
• Private cloud storage:
• Arranged by the consumer enterprise itself with help of some service providers. This type of storage
can be set up both on-premises and off-premises. In this model, the consumer enterprise and
storage service provider integrate each other at enterprise’s data center or provider’s data center.
• Hybrid storage:
• Is the combination of both public and private storages as its name implies. In this model, the
enterprises get the opportunity of storing critical and active data in private cloud storage while
public storage can be used for archiving data.

10
Storage Delivery Types

• There are two types of cloud storage users with different needs:
• General Users: people who simply store files into storage
• Need to store their files and folders in ready-to-use storage spaces (like formatted pen drive or external hard disk).
• Their requirements can be fulfilled through personal file hosting services where users can store their files.
• System Developers: people who deploy or develop the applications.
• Need to be able to have full control the storage system.
• Used to deploy applications or perform application development tasks using the storage service.

11
Managed and Unmanaged Cloud Storage

• The general purpose and the specialized cloud storage systems are also characterized as
unmanaged and managed storages respectively.
• Managed Storage System:
• Provides raw disk like facility to users. User can divide or format the storage spaces as per their
requirements and can also install the software.
• Mainly meant for computing system developers.
• Can function as a part of some virtual machine owned by the consumers.
• Delivered as Infrastructure-as-a-Service.

12
Managed and Unmanaged Cloud Storage

• Unmanaged Cloud Storage:


• Users directly get storage capacity available for use. All of the primary disk management tasks like
partitioning, formatting and else are managed by the vendors or service providers.
• Users have very little control. Storage provider decides the nature of the storage and applications
through which the space can be accessed. User can use it, but cannot separate or format as they
like, or cannot even install any application in it.
• All other attributes or capabilities of the storage space (like encryption or compression) are also pre-
configured and set by the provider.
• Used independently irrespective of any virtual machines.
• Delivered as Software-as-a-Service.

13
Popular Cloud Storages for Developers (Managed Storage)

• Amazon’s Elastic Block Store (EBS):

• Provides block level storage volumes for use with Amazon EC2 (Elastic Compute Cloud) instances
(servers).

• Block level storages can be used to create raw storage volumes which can be attached with the
servers. Variety of file systems like NTFS (for windows OS) or ext4 (for Linux OS) can be run on the
block level storage.

• Appears like a massive SAN (Storage Area Network) under the AWS infrastructure.

• Delivered storage volume size goes up to TB in size.

14
Popular Cloud Storages for Developers (Managed Storage)

• Amazon’s Simple Storage Service (S3):

• Introduced by Amazon as a cost-effective web service solution for developers.

• Files are stored as objects and those objects are stored into the containers called as ‘buckets’.

• Object size can go up to few terabytes and trillions of objects are stored per month as reported by
Amazon.

• S3 can be used together with Amazon’s virtual server Elastic Compute Cloud (EC2).

15
Popular Cloud Storages for Developers (Managed Storage)

• Google’s Cloud Storage:

• Is the persistence storage attached with Google’s cloud server Google Compute Engine (GCE).

• Data are stored as objects and objects are stored into containers called as the ‘buckets’ as well.

• Objects can be of terabytes in size and does not have a size limit.

• Billing in Google Cloud Storage is calculated as per storage usages as well as the bandwidth usage
on monthly basis.

16
Popular General Purpose Cloud Storage (Unmanaged Storage)

• Used for file hosting purpose in the cloud.


• Cloud storage services provide the applications to access the storage from local computing
devices. Once the corresponding app is installed in a local device like PC, tablet or mobile it
provides a folder always synced to the cloud storage. Any folder or file which is moved into
that folder gets automatically synced to the cloud.
• Dropbox
• Dropbox is a file hosting service offered by Dropbox Inc., United States.
• Provides client applications through which the storage can be accessed from different personal
devices.
• Accessible from Windows, Mac, Linux using desktop applications and also from Android, iOS and the
BlackBerry platforms using mobile apps.
• Dropbox uses Amazon’s S3 storage system to store the files.

17
Popular General Purpose Cloud Storage (Unmanaged Storage)

• Google Drive
• Provides file sharing facility with other users.
• Currently, it provides 15 GB of free storage space for users.
• Has the benefit of a built-in office suite where one can edit documents, spreadsheets and
presentations.
• Google Drive client software is available for both desktop platform like Windows, Mac as well as
mobile platforms like iOS and Android.
• Google Drive storage is for general users and Google Cloud Storage is meant for the developers.

18
Popular General Purpose Cloud Storage (Unmanaged Storage)

• OneDrive (Microsoft)
• Apart from file storage, OneDrive offers the facility for document creation and
• collaboration.
• OneDrive’s biggest strength is that it works closely with Microsoft Office apps such as Word, Excel or
PowerPoint.

• Amazon Cloud Drive


• Amazon Cloud Drive is a file hosting service offered by Amazon.
• The storage can be accessed from different mobile devices and computers. It currently provides 5
GB of free storage space to consumers.

19
Summary
• Cloud computing is synonymous to high-performance computing. Hence, the file system and file
processing characteristics of high-performance computing environments are also applicable in cloud
computing.
• Efficient processing of large data-sets is critical for the success of high-performance computing systems.
• High-performance processing of large data-sets requires parallel execution of partitioned data across
distributed computing nodes. This facility should be enabled with suitable data processing programming
models and supporting file systems.
• Among the various file systems to support high-performance processing of data, Google File System (GFS)
is considered as the pioneer.
• Storage in cloud is delivered in two categories: for general users and for developers. Storage for general
users are delivered as SaaS and for the developers it is delivered as IaaS.
• For general users, cloud provides ready-to-use storage which is usually managed by the providers. Hence,
users can directly use the storage without worrying about any kind of processing of the storage. Such
storages are known as ‘unmanaged’ storage type.
• Managed storages are raw storages which are built to be managed by the users themselves. Computing
developers use such kind of storages.

20
Thank You

800 MyHCT (800 69428) www.hct.ac.ae

You might also like