0% found this document useful (0 votes)
41 views32 pages

Unit 3

Uploaded by

priya
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
41 views32 pages

Unit 3

Uploaded by

priya
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 32

Cloud

Computing
”Storage in Cloud“

What is Cloud Storage


• Cloud storage is a digital storage solution which utilizes multiple
servers to store data in logical pools.
• The organizations buy the storage capacity from the providers to store
user, organization, or application data.
Benefits:
• Security: The backups are located across multiple servers and are
better protected from data loss or hacking.
• Accessibility: The data stored is accessible online regardless of
location.
• Amazon S3: It enables file storage to multiple servers and offers file
encryption where in we can share the data publicly.
• Google Cloud: It offers unlimited storage space. It also has the ability
to resume the file transfer after a failure.

2
”Storage in Cloud“

Types of Cloud Storage


There are four types of Cloud-Storage as detailed below:
• Personal Cloud Storage
It is a subset of public cloud-storage that stores individual’s data in
the cloud and provides the individual with access to the data
from anywhere. It also data sharing across multiple devices.
An example of personal cloud-storage is Apple iCloud.
• Public Cloud Storage
The cloud-storage provider fully manages enterprise’s public cloud
storage.
• Private Cloud Storage
The enterprise and cloud-storage provider are integrated in the
enterprise’s data center.
Private cloud storage helps in resolving the potential for security and
performance concerns while still offering the advantages of cloud-
storage.
• Hybrid Cloud Storage 3
”Storage in Cloud“

Cloud Storage Providers

4
”Storage in Cloud“
Free Cloud Storage
Google Drive
Google is one of the giants in cloud-storage. It offers:
• Free Data Storage up to 15GB – Google Drive is one of the most
generous cloud offerings. Google storage space is also shared with
other Google services including Gmail and Google Photos. Mobile apps
are also available for easy access for iOS and Android users.
• G Suit Tools – Includes online office tools for word processing,
spreadsheets and presentations which make sharing files with others
effortless.

One Drive
• One Drive is particularly for Microsoft Windows users. It
allows 5GB of free data storage. It has a great integration with
Microsoft products.
• The files can be edited without downloading. File sharing in One
Drive is possible with other users even if they aren’t One Drive users.
5
”Storage in Cloud“

Dropbox
• It has a great storage support for third-party apps with web interface
that remains streamlined and easy-to-use.
• Dropbox has 2GB of storage space for new users. However there are
other ways for boosting this space without paying, such as inviting
friends (500MB for referral), completing getting started guide
(250MB), etc.
• There are desktop apps for Windows, Linux and Mac, and mobile apps
including Android, iOS and even Kindle.
• The web version lets you edit files without the need of downloading
them.
Business Cloud Storage
Spider Oak
• Founded in 2007, Spider Oak is a collaboration tool, file
hosting and online backup service. It allows users to access,
synchronize and share data using a cloud-based server.
• The main focus in Spider Oak is on privacy and security.
• 6
The tool has a very basic design which makes the admin console and
”Storage in Cloud“

Tresorit
• Founded in 2011, Tresorit is a cloud storage provider based in
Hungary and Switzerland. It emphasizes on enhanced security and
data encryption for businesses and personal users.
• It allows you to keep control of your files through ‘zero-knowledge
encryption’ which means only you and the chosen few you decide to
share with and see your data.

Egnyte
• Founded in 2007, Egnyte provides software for enterprise file
synchronization and sharing. It allows businesses to store their data
locally and online.

7
”Storage in Cloud“

The advantages of Cloud Storage include:


• File Accessibility – The files can be accessed at any time from any
place so long as you have Internet access.

• Offsite Backup – Cloud Storage provides organizations with offsite


(remote) backups of data which in turn reduces costs.

• Effective Use of Bandwidth – Cloud storage uses the bandwidth


effectively i.e. instead of sending files to recipients, a web link can be
sent through email.

• Security of Data – Helps in protecting the data against ransomware


or malware as it is secured and needs proper authentication to access
the stored data.

8
”Storage in Cloud“

Disadvantages of Cloud Storage


• Dependency on Internet Speed – If the Internet connection is slow
or unstable, we might have problems accessing or sharing the files.

• Dependency on a Third Party – A third party service provider


(company) is responsible for the data stored and hence it becomes an
important pre-requisite in selecting a vendor and to examine the
security standards prior investing.

• High Cost for Huge Data – Organizations that require a large


amount of storage may also find costs increase significantly even after
the first few gigabytes of data stored.

• No/ Minimal Control over Data Storage Framework – Since the


cloud storage framework is entirely managed and monitored by the
service provider, the customer has minimal control over it .

9
Big Data in“
”Cloud

10
Big Data in“
”Cloud

11
Big Data in“
”Cloud
Characteristics of big data
 Volume
The key characteristic of big data is its scale—the volume of data that is
available for collection by your enterprise from a variety of devices and sources.
 Variety
Variety refers to the formats that data comes in, such as email messages, audio
files, videos, sensor data, and more. Classifications of big data variety include
structured, semi-structured, and unstructured data.
 Velocity
Big data velocity refers to the speed at which large datasets are acquired,
processed, and accessed.
 Variability
Big data variability means the meaning of the data constantly changes.
Therefore, before big data can be analyzed, the context and meaning of the
datasets must be properly understood.

12
Big Data in“
”Cloud

13
Big Data in“
”Cloud

14
Big Data in“
”Cloud

15
Big Data in“
”Cloud
Cloud Computing and Big Data
• In cloud computing, all data is gathered in data centers and
then distributed to the end-users. Further, automatic backups
and recovery of data is also ensured for business continuity, all
such resources are available in the cloud.

• We do not know exact physical location of these resources


provided to us. You just need dummy terminals like desktops,
laptops, phones etc. and a net connection.

There are multiple ways to access the cloud:


1. Applications or software as a service (SAAS) ex.
Salesforce.com, dropbox, google drive etc.
2. Platform as a service (PAAS)
3. Infrastructure as a service (IAAS)

16
Big Data in“
”Cloud
Cloud for Big Data
Below are some examples of how cloud applications are used for Big Data:
 IAAS in a public cloud: Using a cloud provider’s infrastructure for Big Data services,
gives access to almost limitless storage and compute power. IaaS can be utilized by
enterprise customers to create cost-effective and easily scalable IT solutions where cloud
providers bear the complexities and expenses of managing the underlying hardware.

 PAAS in a private cloud: PaaS vendors are beginning to incorporate Big Data
technologies such as Hadoop and MapReduce into their PaaS offerings, which eliminate the
dealing with the complexities of managing individual software and hardware elements.
For example, web developers can use individual PaaS environments at every stage of
development, testing and ultimately hosting their websites.
However, businesses that are developing their own internal software can also utilize
Platform as a Service, particularly to create distinct ring-fenced development and testing
environments.

 SAAS in a hybrid cloud: Many organizations feel the need to analyze the customer’s
voice, especially on social media. SaaS vendors provide the platform for the analysis as
well as the social media data.
Office software is the best example of businesses utilizing SaaS. Tasks related to
accounting, sales, invoicing, and planning can all be performed through SAAS. Businesses
may wish to use one piece of software that performs all of these tasks or several that each
performs different tasks.

17
Big Data in“
”Cloud
• Providers in the Big Data Cloud Market
 Infrastructure as a Service cloud computing companies:
Amazon’s offerings include S3 (Data storage/file system), SimpleDB (non-
relational database) and EC2 (computing servers). Rackspace’s offerings
include Cloud Drive (Data storage/file system), Cloud Sites (web site hosting on
cloud) and Cloud Servers(computing servers).
IBM’s offerings include Smart Business Storage Cloud and Computing on
Demand (CoD).
AT&T’s provides Synaptic Storage and Synaptic Compute as a service.

 Platform as a Service cloud computing companies


Googles AppEngine is a development platform that is built upon Python and
Java.
com’s provides a development platform that is based upon Apex.
Microsoft Azure provides a development platform based upon .Net.

 Software as a Service companies


In SaaS, Google provides space that includes Google Docs, Gmail, Google Calendar
and Picasa.
IBM provides LotusLive iNotes, a web-based email service for messaging and18
calendaring capabilities to business users.
Virtual Data“
”Center

What is a Virtual Data Center?


• A virtual data center offers the capabilities of a traditional data
center, but using cloud-based resources instead of physical
resources. It provides an organization with the ability to deploy
additional infrastructure resources at need without acquiring,
deploying, configuring, and maintaining physical appliances.
• Virtual Data Centre (VDC) is a fully managed and self serve
Infrastructure as a Service (IaaS) Private Cloud solution
• Providing multiple levels of security that hold to Cloud Security
Principles by design, it is a flexible, automated and scalable cloud
computing platform.

19
Virtual Data“
”Center

• s

20
Cloud file“
”systems

• Cloud file systems: GFS and HDFS


• Google File System (GFS) is a scalable distributed file system (DFS)
created by Google Inc. and developed to accommodate Google’s
expanding data processing requirements.
• GFS provides fault tolerance, reliability, scalability, availability and
performance to large networks and connected nodes.
• GFS is made up of several storage systems built from low-cost
commodity hardware components.
• It is optimized to accomodate Google's different data use and storage
needs, such as its search engine, which generates huge amounts of
data that must be stored.
• The Google File System capitalized on the strength of off-the-shelf
servers while minimizing hardware weaknesses.
• GFS is also known as GoogleFS.

21
Cloud file“
”systems
• Stored data is divided into large chunks (64 MB), which are replicated
in the network a minimum of three times. The large chunk size
reduces network overhead.
• GFS is designed to accommodate Google’s large cluster requirements
without burdening applications. Files are stored in hierarchical
directories identified by path names. Metadata - such as namespace,
access control data, and mapping information - is controlled by the
master, which interacts with and monitors the status updates of each
chunk server through timed heartbeat messages.
GFS features include:
 Fault tolerance
 Critical data replication
 Automatic and efficient data recovery
 High aggregate throughput
 Reduced client and master interaction because of large chunk server
size
 Namespace management and locking
 High availability 22
• The largest GFS clusters have more than 1,000 nodes with 300 TB disk
Cloud file“
”systems
• Stored data is divided into large chunks (64 MB), which are replicated
in the network a minimum of three times. The large chunk size
reduces network overhead.
• GFS is designed to accommodate Google’s large cluster requirements
without burdening applications. Files are stored in hierarchical
directories identified by path names. Metadata - such as namespace,
access control data, and mapping information - is controlled by the
master, which interacts with and monitors the status updates of each
chunk server through timed heartbeat messages.
GFS features include:
 Fault tolerance
 Critical data replication
 Automatic and efficient data recovery
 High aggregate throughput
 Reduced client and master interaction because of large chunk server
size
 Namespace management and locking
 High availability 23
• The largest GFS clusters have more than 1,000 nodes with 300 TB disk
Cloud file“
”systems

What is HDFS
Hadoop comes with a distributed file system called HDFS.
• In HDFS data is distributed over several machines and
replicated to ensure their durability to failure and high availability to
parallel application.
• It is cost effective as it uses commodity hardware. It involves the
concept of blocks, data nodes and node name.
Where to use HDFS
• Very Large Files: Files should be of hundreds of megabytes,
gigabytes or more.
• Streaming Data Access: The time to read whole data set is more
important than latency in reading the first. HDFS is built on write-once
and read-many-times pattern.
• Commodity Hardware:It works on low cost hardware.

24
Cloud file“
”systems

• HDFS Concepts
Blocks: A Block is the minimum amount of data that it can read or
write. HDFS blocks are 128 MB by default and this is configurable.
• Files HDFS are broken into block-sized chunks, which are stored as
independent units.
Name Node: HDFS works in master-worker pattern where the name
node acts as master.
• Name Node is controller and manager of HDFS as it knows the status
and the metadata of all the files in HDFS;
• the metadata information being file permission, names and location of
each block.
• The file system operations like opening, closing, renaming etc. are
executed by it.
Data Node: They store and retrieve blocks when they are told to; by
client or name node.
• They report back to name node periodically, with list of blocks
that they are storing.
• The data node being a commodity hardware also does the work 25
Cloud file“
”systems
• a

26
Cloud file“
”systems

• As we can see, it focuses on NameNodes and DataNodes. The


NameNode is the hardware that contains the GNU/Linux operating
system and software. The Hadoop distributed file system acts as the
master server and can manage the files, control a client’s access to
files, and overseas file operating processes such as renaming, opening,
and closing files.
• A DataNode is hardware having the GNU/Linux operating system and
DataNode software. For every node in a HDFS cluster, you will locate a
DataNode. These nodes help to control the data storage of their
system as they can perform operations on the file systems if the client
requests, and also create, replicate, and block files when the
NameNode instructs.
• The HDFS meaning and purpose is to achieve the following goals:
• Manage large datasets - Organizing and storing datasets can be a
hard talk to handle. HDFS is used to manage the applications that
have to deal with huge datasets. To do this, HDFS should have
hundreds of nodes per cluster.
• Detecting faults - HDFS should have technology in place to scan27 and
detect faults quickly and effectively as it includes a large number of
Cloud file“
”systems

• How to use HDFS


• So, how do you use HDFS? Well, HDFS works with a main NameNode
and multiple other datanodes, all on a commodity hardware cluster.
These nodes are organized in the same place within the data center.
Next, it’s broken down into blocks which are distributed among the
multiple DataNodes for storage. To reduce the chances of data loss,
blocks are often replicated across nodes. It’s a backup system should
data be lost.
• Let’s look at NameNodes. The NameNode is the node within the
cluster that knows what the data contains, what block it belongs to,
the block size, and where it should go. NameNodes are also used to
control access to files including when someone can write, read, create,
remove, and replicate data across the various data notes.
• The cluster can also be adapted where necessary in real-time,
depending on the server capacity - which can be useful when there is a
surge in data. Nodes can be added or taken away when necessary.
• Now, onto DataNodes. DataNodes are in constant communication with
the NameNodes to identify whether they need to commence and 28
complete a task. This stream of consistent collaboration means that
Cloud file“
”systems

• When a DataNode is singled out to not be operating the way it should,


the namemode is able to automatically re-assign that task to another
functioning node in the same datablock. Similarly, DataNodes are also
able to communicate with each other, which means they can
collaborate during standard file operations. Because the NameNode is
aware of DataNodes and their performance, they’re crucial in
maintaining the system.
• Datablocks are replicated across multiple datanotes and accessed by
the NameNode.
• To use HDFS you need to install and set up a Hadoop cluster. This can
be a single node set up which is more appropriate for first-time users,
or a cluster set up for large, distributed clusters. You then need to
familiarize yourself with HDFS commands, such as the below, to
operate and manage your system.

29
Cloud file“
”systems
Command Description
rm- Removes file or directory
ls- Lists files with permissions and other details
mkdir- Creates a directory named path in HDFS
cat- Shows contents of the file
rmdir- Deletes a directory

put- Uploads a file or folder from a local disk to HDFS

rmr- Deletes the file identified by path or folder and subfolders

get- Moves file or folder from HDFS to local file

count- Counts number of files, number of directory, and file size

df- Shows free space


getmerge- Merges multiple files in HDFS
chmod- Changes file permissions
copyToLocal- Copies files to the local system
Stat- Prints statistics about the file or directory
head- Displays the first kilobyte of a file
usage- Returns the help for an individual command
30
chown- Allocates a new owner and group of a file
Cloud file“
”systems

31
Cloud file“
”systems

32

You might also like