Cloud Storage
Cloud Storage
CLOUD STORAGE
SEMINARSKI RAD
Predmet: Student:
Engleski jezik Čehić Amila
Mentor:
1
Abstract
Computer data storage, often called memory or storage, refers to computer hardware components and
recording media that retain digital data used for computing for some interval of time. Computer data
storage provides one of the core functions of the modern computer.
For some computer owners, finding enough storage space to hold all the data they've acquired is a real
challenge. Some people are buying larger hard drives. Others prefer external storage devices such as
thumb drives, compact discs or even NAS drives. Desperate computer owners might delete entire folders
worth of old files in order to make some space for new data. On the other hands, some people are
choosing to rely on a new, fast growing trend: Cloud storage.
With introduction of cloud storage and cloud servers it has become easier than ever to backup all our
important computer files online. We are now given the flexibility of accessing all our files from anywhere
in the world in just a few mouse clicks, with the benefit of knowing that all our important pictures,
videos, music, files e.c.t. are securely stored and available to us 24 hours a day 7 days a week.
So in this seminar report we will briefly learn more about cloud storage and it’s capabilities as well as its
structure, companies which provides these services, accessibility and so on.
Keywords
Cloud Storage
API
SCSI
COST
PERFORMANCE
GOOGLE
CLIENT
SERVER
INTERNET
2
1. What is Cloud Storage?
Cloud Storage is technology that allows you to save files in storage, and then access those files via the
Cloud.
Storage is computer’s ability to save files and other resources for later use. When you restart a computer,
all saved files will be available after the computer turns back. Such storage commonly consists of
physical hard drive installed in your computer, USB
flash drive, or another type of drive.1
1
https://electricalfundablog.com/cloud-storage-architecture-types/?fbclid=IwAR29XJpTgboY-
s294ssCKZJe3udiNjxyd--oq1VZAGbNR1e_nesLT13SoEw
3
Unfortunately local data drives can be damaged of stolen, an idea was developed to use data drives over a
network as storage. This allows the drives to be secured in a data centre and backed up automatically.
Network storage requires fast local network (LAN), but today we have a ubiquitous network called the
Internet.
The second part of Cloud Storage is the Cloud. It represents the internet. Any service, including storage,
available over the internet is called Cloud service. For example if you use GMAIL it is email in the
Cloud, if you use Spotify, all the music is stored in the Cloud and it is streamed to the device. 2
2
https://developer.ibm.com/depmodels/cloud/articles/cl-cloudstorage/?
fbclid=IwAR1KV2xD0IksjYbfJxQED6TFVFX-azw1pvwxpcvRUtDnlgewkiuV8VR0m-4
4
2. How Cloud Storage Works
It’s true that most of the computer users store data, some users acquire so much data that their computer is
becoming “mini-library” of some sort. For those users, finding enough storage space to hold all the data
they’ve accumulated can seem like mission impossible.
What Cloud Storage really is bolis down to this: It’s saving data to an off-site storage system maintained
by a third party. Rather than sorting information to your computer’s hard drive or other local storage
device, you save it to a remote database. The internet provides the connection between your computer and
the database itself.3
3
https://aws.amazon.com/what-is-cloud-storage/
5
2.1. Cloud Storage architecture
Cloud Storage architectures are primarily about delivery of storage on demand in high scalable and multi-
tenant way. Generically, Cloud Storage architectures consist of a front end that exports API to access the
storage. In traditional storage systems, this API is the SCSI protocol; but in the cloud, these protocols are
evolving. There, you can find Web service front ends, file-based front ends, and even more traditional
front ends (such as Internet SCSI, or iSCSI). Behind the frontend layer of middleware that I call the
Storage logic. This layer implements a variety of features, such as replication and data reduction, over the
traditional data-placement algorithms (with consideration for geographic placement). Finally, the back
end implements the physical storage for data. This may be an internal protocol that implements specific
features or a traditional back end to the physical disks. 4
From picture 5, you can see some of the characteristics for current Cloud Storage architectures. Note that
no characteristics are exclusive in the particular layer but serve as a guide for specific topics that this
article addresses. These characteristics are defined in table 1.
4
https://developer.ibm.com/depmodels/cloud/articles/cl-cloudstorage/?
fbclid=IwAR1KV2xD0IksjYbfJxQED6TFVFX-azw1pvwxpcvRUtDnlgewkiuV8VR0m-4
6
Table 1.
Characteristic Description
Manageability The ability to manage a system with minimal resources
Access method Protocol through which cloud storage is exposed
Performance Performance as measured by bandwidth and latency
Multi-tenancy Support for multiple users
Scalability Ability to scale to meet higher demands or load in a graceful manner
Data availability Measure of a system’s uptime
Control Ability to control a system-in particular, to configure for cost,
performance, or other characteristics
Storage Measure of how efficiently the raw storage is used
efficiency
Cost Measure of the cost the storage (commonly in dollar per gigabyte)
2.1.1. Manageability
One key focus of Cloud Storage is cost. If a client can buy and manage storage locally compared to
leasing it in the cloud, the cloud storage market disappears. But cost can be divided into two high-level
categories:
The cost of the physical storage ecosystem itself and the cost of managing it
The management cost is hidden but represents a long-term component of the overall cost.
For this reason, Cloud Storage must be self-managing to a large extent. The ability to introduce new
storage where the system automatically self-configures to accommodate it and the ability to find and self-
heal in the presence of errors are critical. Concepts such as autonomic computing will have a key role in
Cloud Storage architectures in the future. 5
5
https://electricalfundablog.com/cloud-storage-architecture-types/?fbclid=IwAR29XJpTgboY-
s294ssCKZJe3udiNjxyd--oq1VZAGbNR1e_nesLT13SoEw
7
2.1.2. Access method
One of the most striking differences between Cloud Storage and traditional storage is the means by which
it’s accessed. Most providers implement multiple access methods, but web service APIs is common.
Many of the APIs are implemented based on REST principles, which imply an object-based scheme
developed on top of HTTP. REST APIs are stateless and therefore simple and efficient to provide. Many
Cloud Storage providers implement REST APIs, including Amazon Simple Storage Service (Amazon
S3), Windows Azure, and Mezeo Cloud Storage Platform.
One problem with Web services APIs is that they require integration with an application to take
advantage of the cloud storage. Therefore common access methods are also used with Cloud Storage to
provide immediate integration. For example, file-based protocols such as NFS/Common Internet File
System (CIFS) or FTP are used, as block-based protocols such as iSCSI. Cloud Storage providers such as
Six degrees, Zetta, and Cleversafe provide these access methods.
6
https://developer.ibm.com/depmodels/cloud/articles/cl-cloudstorage/?
fbclid=IwAR1KV2xD0IksjYbfJxQED6TFVFX-azw1pvwxpcvRUtDnlgewkiuV8VR0m-4
8
Although the protocols mentioned above are the most common, other protocols are suitable for Cloud
Storage. One of the most interesting is Web-based Distributed Authoring and Versioning (WebDAV).
WebDAV is also based on HTTP and enables the Web as a readable and writable resource. Providers of
WebDAV include Zetta and Cleversafe in affilation to others.
You can also find solutions that support multi-protocol access. For example, IBM Smart Business Storage
Cloud enables both file-based (NFS and CIFS) and SAN-based protocols from same storage-
virtualization infrastructure.
2.1.3. Performance
There are many aspects to performance, but the ability to move data between a user and a remote Cloud
Storage provider represents the largest challenge to Cloud Storage. The problem, which is also the
workhorse of the Internet is TCP. TCP controls the flow of data based on packet acknowledgements from
the peer endpoint. Packet loss, or late arrival, enables congestion control, which further limits
performance to avoid more global networking issues. TCP is ideal for moving small amounts of data
through the global Internet but is less suitable for larger data movement, with increasing round-trip time
(RTT).7
7
https://developer.ibm.com/depmodels/cloud/articles/cl-cloudstorage/?
fbclid=IwAR1KV2xD0IksjYbfJxQED6TFVFX-azw1pvwxpcvRUtDnlgewkiuV8VR0m-4
9
Picture 8. Relationship between Client and Server using TCP
For example, instead of using TCP, Amazon is using FASP (Fast and Secure Protocol). FASP was
developed to accelerate bulk data movement in the face to large RTT and severe packet loss. The key is
the use of the UDP, which is the partner transport protocol to TCP. UDP permits the host to ménage
congestion, pushing this aspect into the application layer protocol of FASP.
Using standard (non-accelerated) NICs, FASP efficiently uses the bandwidth available to the application
and removes the fundamental bottlenecks of conventional bulk data-transfer schemes. 8
8
https://public.csusm.edu/fangfang/Teaching/HTMmaterial/StudentProjectFall2011/Team3.pdf
10
2.1.4. Multi-tenancy
One key characteristic of Cloud Storage architectures is called multi-tenancy. This simply means that the
storage is used by many users. Multi-tenancy applies to many layers of the Cloud Storage stack, from the
application layer, where the storage namespace is segregated among users, to the storage layer, where
physical storage can be segregated for particular users of classes of users. Multi-tenancy even applies to
the networking infrastructure that connects users to the storage to permit quality of service and carving
bandwidth to a particular user.
2.1.5. Scalability
You can look at scalability in number of ways, but it is the on-demand view of Cloud Storage that makes
it most appealing. The ability to scale storage needs (up and down) means improved cost for the user and
increased complexity for the Cloud Storage provider.
Scalability must be provider not only for the storage itself but also the bandwidth to the storage. Another
key feature of Cloud Storage is geographic distribution of data (geographic scalability), allowing data to
be nearest the users over a set of Cloud Storage data centres. For read only data, replication and
distribution is also possible. This is shown in picture bellow.
Internally, a Cloud Storage infrastructure must be able to scale. Servers and storage must be capable of
resizing without impact to users. As discussed in the Manageability section, autonomic computing is a
requirement for Cloud Storage architectures.
11
2.1.6. Availability
Once a Cloud Storage provider has a user’s data, it must be able to provide that data back to the user upon
request. Given network outages, user errors, and other circumstances, this can be difficult to provide in a
reliable and deterministic way.
There are some interesting and novel schemes to address availability, such as information dispersal.
Cleversafe, a company that provides private cloud storage, uses the Information Dispersal Algorithm
(IDA) to enable greater availability of data in the face of physical failures and network outages. IDA,
which was first created for telecommunication systems by Michael Rabin, in an algorithm that allows
data to be sliced with Reed-Solomon codes for purposes of data reconstruction in the face of missing data.
Further, IDA allows you to configure the number of data slices, such that a given data object cloud be
carved into four slices with one tolerated failure or 20 slices with eight tolerated failures. Similar to
RAID, IDA permits the reconstruction of data from a subset of the original data, with some amount of
overhead for error codes (depend of the number of tolerated failures).
With the ability to slice data along with cauchy Reed-Solomon correction codes, the slices can then be
distributed to geographically disparate sites for storage. For a number of slices (p) and a number of
tolerated failures (m), the resulting overhead is p/(p–m). So, in the case of picture above, the overhead
to the storage system for p = 4 and m = 1 is 33%.9
9
http://mrkve.etfos.hr/pred/orasje/ar/seminari/Matej%20An%C4%91eli%C4%87%20-%20Ra
%C4%8Dunalstvo%20u%20oblaku.pdf
12
The downside of IDA is that it is processing intensive without hardware acceleration. Replication is
another useful technique and is implemented by a variety of cloud storage providers. Although
replication introduces a large amount of overhead (100%), it’s simple and efficient to provide.
2.1.7. Control
A customer’s ability to control and manage how his or her data is stored and the costs associated with it is
important. Numerous cloud storage providers implement controls that give users greater control over their
costs.
Amazon implements Reduced Redundancy Storage (RRS) to provide users with a means of minimizing
overall storage costs. Data is replicated within the Amazon S3 infrastructure, but with RRS, the data is
replicated fewer times with the possibility for data loss. This is ideal for data that can be recreated or that
has copies that exist elsewhere.
2.1.8. Efficiency
Storage efficiency is an important characteristic of cloud storage infrastructures, particularly with their
focus on overall cost. The next section speaks to cost specifically, but this characteristic speaks more to
the efficient use of the available resources over their cost.
To make a storage system more efficient, more data must be stored. A common solution is data
reduction, whereby the source data is reduced to require less physical space. Two means to achieve this
include:
compression—the reduction of data through encoding the data using a different representation
de-duplication—the removal of any identical copies of data that may exist.
Although both methods are useful, compression involves processing (re-encoding the data into and out
of the infrastructure), where de-duplication involves calculating signatures of data to search for
duplicates.
2.1.9. Cost
One of the most notable characteristics of cloud storage is the ability to reduce cost through its use.
This includes the cost of purchasing storage, the cost of powering it, the cost of repairing it (when
drives fail), as well as the cost of managing the storage. When viewing cloud storage from this
perspective (including SLAs and increasing storage efficiency), cloud storage can be beneficial in
certain use models.
13
An interesting peak inside a cloud storage solution is provided by a company called Backblaze.
Backblaze set out to build inexpensive storage for a cloud storage offering. A Backblaze POD (shelf of
storage) packs 67TB in a 4U enclosure for under US$8,000. This package consists of a 4U enclosure, a
motherboard, 4GB of DRAM, four SATA controllers, 45 1.5TB SATA hard disks, and two power
supplies. On the motherboard, Backblaze runs Linux® (with JFS as the file system) and GbE NICs as
the frontend using HTTPS and Apache Tomcat. Backblaze’s software includes de-duplication,
encryption, and RAID6 for data protection. Backblaze’s description of their POD (which shows you in
detail how to build your own) shows you the extent to which companies can cut the cost of storage,
making cloud storage a viable and cost-efficient option.
Thus far, We’ve talked primarily about cloud storage providers, but there are models for cloud storage
that allow users to maintain control over their data. Cloud storage has evolved into three categories, one
of which permits the merging of two categories for a cost-efficient and secure option.
Much of this article has discussed public cloud storage providers, which present storage infrastructure
as a leasable commodity (both in terms of long-term or short-term storage and the networking
bandwidth used within the infrastructure). Private clouds use the concepts of public cloud storage but in
a form that can be securely embedded within a user’s firewall. Finally, hybrid cloud storage permits the
two models to merge, allowing policies to define which data must be maintained privately and which
can be secured within public clouds. 10
The cloud models are shown graphically in picture above. Examples of public cloud storage providers
include Amazon (which offers storage as a service). Examples of private cloud storage providers
10
https://repozitorij.unipu.hr/islandora/object/unipu%3A3948/datastream/PDF/view
14
include IBM, Parascale, and Cleversafe (which build software and/or hardware for internal clouds).
Finally, hybrid cloud providers include Egnyte, among others. 11
11
https://public.csusm.edu/fangfang/Teaching/HTMmaterial/StudentProjectFall2011/Team3.pdf
15
3. Types of Cloud Storage
16
Picture 14. Types of Clud Storage
Below are listed some of the top rated Cloud Storage providers:
Free Data Storage up to 15GB- Google Drive is one of the most generous cloud offerings.
Google storage space is also shared with other Google services including Gmail and Google
Photos. Mobile apps are also available for easy access for iOS and Android users.
17
Picture 15. Google drive logo
One Drive is particularly for Microsoft Windows users. It allows 5GB of free data storage. It has a great
integration with Microsoft products. The files can be edited without downloading. File sharing in One
Drive is possible with other users even if they aren’t One Drive users. 12
4.3. Dropbox
It has a great storage support for third-party apps with web interface that remains streamlined and easy-to-
use. Dropbox has 2GB of storage space for new users. However there are other ways for boosting this
space without paying, such as inviting friends (500MB for referral), completing getting started guide
(250MB), etc.
12
https://developer.ibm.com/depmodels/cloud/articles/cl-cloudstorage/?
fbclid=IwAR1KV2xD0IksjYbfJxQED6TFVFX-azw1pvwxpcvRUtDnlgewkiuV8VR0m-4
18
There are desktop apps for Windows, Linux and Mac, and mobile apps including Android, iOS and even
Kindle. The web version lets you edit files without the need of downloading them.
13
https://developer.ibm.com/depmodels/cloud/articles/cl-cloudstorage/?
fbclid=IwAR1KV2xD0IksjYbfJxQED6TFVFX-azw1pvwxpcvRUtDnlgewkiuV8VR0m-4
19
Picture 18. SpiderOak logo
4.4.2. Tresorit
Founded in 2011, Tresorit is a cloud storage provider based in Hungary and Switzerland. It emphasizes
on enhanced security and data encryption for businesses and personal users.
It allows you to keep control of your files through ‘zero-knowledge encryption’ which means only you
and the chosen few you decide to share with and see your data.
4.4.3. Egnyte
Founded in 2007, Egnyte provides software for enterprise file synchronization and sharing. It allows
businesses to store their data locally and online.
14
https://developer.ibm.com/depmodels/cloud/articles/cl-cloudstorage/?
fbclid=IwAR1KV2xD0IksjYbfJxQED6TFVFX-azw1pvwxpcvRUtDnlgewkiuV8VR0m-4
20
It integrates with applications such as Office 365. This allows both remote and internal employees to
access the files with ease.
File Accessibility – The files can be accessed at any time from any place so long as you have
Internet access.
Offsite Backup – Cloud Storage provides organizations with offsite (remote) backups of data
which in turn reduces costs.
21
Effective Use of Bandwidth – Cloud storage uses the bandwidth effectively i.e. instead of
sending files to recipients, a web link can be sent through email.
Security of Data – Helps in protecting the data against ransomware or malware as it is secured
and needs proper authentication to access the stored data.
15
https://electricalfundablog.com/cloud-storage-architecture-types/?fbclid=IwAR29XJpTgboY-
s294ssCKZJe3udiNjxyd--oq1VZAGbNR1e_nesLT13SoEw
22
6. Conclusion
Cloud storage is an interesting evolution in storage models that redefines the ways that we construct,
access, and manage storage within an enterprise. Although cloud storage is predominantly a consumer
technology today, it is quickly evolving toward enterprise quality. Hybrid models of clouds will enable
enterprises to maintain their confidential data within a local data centre, while relegating less
confidential data to the cloud for cost savings and geographic protection.
23
7. Literature
Internet resources:
1. https://electricalfundablog.com/cloud-storage-architecture-types/?fbclid=IwAR29XJpTgboY-
s294ssCKZJe3udiNjxyd--oq1VZAGbNR1e_nesLT13SoEw
2. https://developer.ibm.com/depmodels/cloud/articles/cl-cloudstorage/?
fbclid=IwAR1KV2xD0IksjYbfJxQED6TFVFX-azw1pvwxpcvRUtDnlgewkiuV8VR0m-4
3. https://aws.amazon.com/what-is-cloud-storage/
4. https://public.csusm.edu/fangfang/Teaching/HTMmaterial/StudentProjectFall2011/Team3.pdf
5. https://repozitorij.unipu.hr/islandora/object/unipu%3A3948/datastream/PDF/view
6. http://mrkve.etfos.hr/pred/orasje/ar/seminari/Matej%20An%C4%91eli%C4%87%20-%20Ra
%C4%8Dunalstvo%20u%20oblaku.pdf
24