Unit II CC
Unit II CC
Storage Area Network, Network Attached Storage, Data Storage Management, File
System, Cloud Data Stores, Using Grids for Data Storage. Cloud Storage: Data
Management, Provisioning Cloud storage, Data Intensive Technologies for Cloud
Computing. Cloud Storage from LANs to WANs: Cloud Characteristics, Distributed
Data Storage.
Over the past two decades, the concept of the enterprise cloud has evolved a lot from its
inception. The term “cloud computing” was brought to the mainstream by Amazon Web
Service in 2006 –2007 as a way of renting virtualized computing resources in an infinitely
scalable environment.
Let’s recap. Cloud computing is the terminology referring to a technology that enables
users to access and utilize IT resources, such as data storage, process power, and
software applications, all over the internet. Thanks to cloud services, IT infrastructure is
no longer physically located on a user’s device or local server but instead is hosted and
managed on a remote data center empowered by a cloud provider. Moving to the cloud
environment enables users to scale and adjust their computing needs dynamically without
investing in expensive hardware or infrastructure.
Enterprise cloud computing is not as complex or confusing as it sounds. At its core, the
enterprise cloud is simply implementing cloud infrastructure in a business setting. It is an
industry-level approach to leveraging cloud computing technologies for the purpose of
achieving business objectives. It can be as simple as using a cloud-based data storage
system, or it can involve more complex tasks like automating processes and streamlining
operations. Whatever the use case may be, enterprise cloud solutions are designed to
bring an organization into the digital era with ease.
On the cloud journey of any enterprise or company, there will be challenges, especially
in the very first stages of implementation, but it is worth considering that the advantages
far outweigh the risks. Therefore, finding the right partners for cloud computing services
is of paramount importance. Currently, some of the renowned enterprise cloud service
providers are Amazon Web Services (AWS), Microsoft Azure, Google Cloud Platform
(GCP), IBM Cloud, Oracle Cloud Infrastructure (OCI), and more. Despite the wide range
of options, business owners must select the best cloud vendor based on their needs as
well as the security, scalability, and functionality offered.
Enterprise data storage needs have expanded as data plays an increasingly valuable role
in business. A good data storage platform is essential to security and success—huge
amounts of data needs to be stored safely, kept readily accessible, and available for
business intelligence (BI) and analysis. From local storage solutions to cloud platforms,
there’s never been more options for enterprise-grade data storage. Here’s a high-level
look at the different types available to businesses today.
Storage Types
There are a lot of options that are available for storing data. The main purpose is deciding
which is the most appropriate for you. All are different on the basis of their need and
usage criteria e.g. CD, Punch card, Zip disk, DVD, Blu-ray disc, Flash jump drive, Hard
Drive, Floppy diskette, NAS, Tape drive, SSD (Solid State Drive), NAS and Cloud
Storage. For example, we are using USB Flash drives for our students data management
here at Network Walks Academy.
Advantages of DAS
Disadvantages of DAS
Advantages of NAS
Disadvantages of NAS
Advantages of SAN
Disadvantages of SAN
Cloud Storage
Cloud storage makes stored data accessible from any location with an internet
connection. It’s a reliable means of backup and can keep files available even in the event
of an on-premises hardware failure or physical disaster. Cloud storage providers often
store multiple copies of the data in multiple locations for reliability—if one data center has
a power outage, for example, the data remains safe elsewhere. Cloud storage is also
more scalable than on-premises server storage.
Businesses have three choices for cloud storage environments: public cloud, private
cloud, and hybrid cloud. A public cloud environment is completely managed by a cloud
service provider. A private cloud is managed by the business, often on premises and on
company servers. Hybrid cloud combines cloud environments in whatever way a business
needs.
Hybrid cloud is a competitive offering from cloud providers and offers businesses more
flexibility for storing their data. Hybrid cloud is a good choice for businesses that have
many different types of data and many workloads. Some data might need to stay on
premises, but some data might be best in an object pool in a public cloud, for example. A
hybrid environment provides more options.
Grid computing is defined as a distributed architecture of multiple computers
connected by networks that work together to accomplish a joint task. This system
operates on a data grid where computers interact to coordinate jobs at hand. This
article explains the fundamentals of grid computing in detail.
The software allows computers to communicate and share information on the portion of
the subtasks being carried out. As a result, the computers can consolidate and deliver a
combined output for the assigned main task.
In grid computing, each computing task is broken into small fragments and distributed
across computing nodes for efficient execution. Each fragment is processed in parallel,
and, as a result, a complex task is accomplished in less time. Let’s consider this equation:
X = (4 x 7) + (3 x 9) + (2 x 5)
Typically, on a desktop computer, the steps needed here to calculate the value of X may
look like this:
● Step 1: X = 28 + (3 x 9) + (2 x 5)
● Step 2: X = 28 + 27 + (2 x 5)
● Step 3: X = 28 + 27 + 10
● Step 4: X = 65
However, in a grid computing setup, the steps are different as three processors or
computers calculate different pieces of the equation separately and combine them later.
The steps look like this:
● Step 1: X = 28 + 27 + 10
● Step 2: X = 65
As seen above, grid computing combines the involved steps due to the multiplicity of
available resources. This implies fewer steps and shorter timeframes.
Grid computing is enabled via an open set of standards and protocols such as open grid
services architecture (OGSA) that allow communication across heterogeneous systems
and environments that are geographically dispersed. With grid computing, organizations
can pool resources and computing for high-weighted tasks or share them across networks
to allow collaboration. Enterprises can thus optimize computing and resources perfectly
irrespective of their locations.
1. User interface
Today, users are well-versed with web portals. They provide a single interface that allows
users to view a wide variety of information. Similarly, a grid portal offers an interface that
enables users to launch applications with resources provided by the grid.
The interface has a portal style to help users query and execute various functions on the
grid effectively. A grid user views a single, large virtual computer offering computing
resources, similar to an internet user who views a unified instance of content on the web.
2. Security
Security is one of the major concerns for grid computing environments. Security
mechanisms can include authentication, authorization, data encryption, and others. Grid
security infrastructure (GSI) is an important ingredient here. It outlines specifications that
establish secret and tamper-proof communication between software entities operating in
a grid network.
It includes OpenSSL implementation and provides a single sign-on mechanism for users
to perform actions within the grid. It offers robust security by providing authentication and
authorization mechanisms for system protection.
3. Scheduler
On identifying the resources, the next step is to schedule the tasks to run on them. A
scheduler may not be needed if standalone tasks are to be executed that do not showcase
interdependencies. However, if you want to run specific tasks concurrently that require
inter-process communication, the job scheduler would suffice to coordinate the execution
of different subtasks.
4. Data management
Data management is crucial for grid environments. A secure and reliable mechanism to
move or make any data or application module accessible to various nodes within the grid
is necessary. Consider the Globus toolkit — an open-source toolkit for grid computing.
It offers a data management component called grid access to secondary storage (GASS).
It includes GridFTP built on the standard FTP protocol and utilizes GSI for user
authentication and authorization. After authentication, the user can move files using the
GridFTP facility without going through the login process at every node.
The workload & resource component enables the actual launch of a job on a particular
resource, checks its status, and retrieves the results when the job is complete. Say a user
wants to execute an application on the grid. In that case, the application should be aware
of the available resources on the grid to take up the workload.
So, it interacts with the workload manager to determine the resource availability and
updates the status accordingly. This helps in efficient workload and resource
management for various nodes on the grid.
See More: Will Symmetric and Asymmetric Encryption Withstand the Might of Quantum
Computing?
Grid computing is divided into several types based on its uses and the task at hand. Let’s
understand the types of grid computing with some examples.
Grid Computing Types
Computational grids account for the largest share of grid computing usage across
industries today, and the trend is expected to stay the same over the years to come. A
computational grid comes into the picture when you have a task taking longer to execute
than expected. In this case, the main task is split into multiple subtasks, and each subtask
is executed in parallel on a separate node. Upon completion, the results of the subtasks
are combined to get the main task’s result. By splitting the task, the end result is achieved
O(n) times faster (where ‘n’ denotes the number of subtasks) than when a single machine
executes the task.
Data grids refer to grids that split data onto multiple computers. Like computational grids
where computations are split, data grids enable placing data onto a network of computers
or storage. However, the grid virtually treats them as one despite the splitting. Data grid
computing allows several users to simultaneously access, change, or transfer distributed
data.
For instance, a data grid can be used as a large data store where each website stores its
own data on the grid. Here, the grid enables coordinated data sharing across all grid
users. Such a grid allows collaboration along with increased knowledge transfer between
grid users.
It overcomes geographical barriers and adds capabilities that enhance work experience
by allowing remote individuals to work together. For example, with a collaborative grid, all
users can access and simultaneously work on text-based documents, graphics, design
files, and other work-related products.
Manuscript grid computing comes in handy when managing large volumes of image and
text blocks. This grid type allows the continuous accumulation of image and text blocks
while it processes and performs operations on previous block batches. It is a simple grid
computing framework where vast volumes of text or manuscripts and images are
processed in parallel.
Fundamentally, in a modular grid, a set of resources is combined with software for distinct
applications. For example, CPU and GPU drives may reside in a server rack chassis.
They can be interconnected with an auxiliary high-speed and low-latency fabric to create
a server configuration that is optimized for a particular application.
When applications are created, a set of computing resources and services are defined to
support them. Subsequently, when the applications expire, computing support is
withdrawn, and resources are set free, making them available for other apps. Practically,
original equipment manufacturers (OEMs) play a key role in modular grid computing as
their cooperation is critical in creating modular grids that are application-specific.
Cloud storage is a storage method where files are stored in the cloud, rather than on-
premises. It is managed by a cloud service provider, who will oversee security, updates
and maintenance. Cloud storage offers a more scalable, secure and affordable solution
compared to on-premises storage, enabling data to be accessed anywhere with an
internet connection.
Why is cloud storage important?
Modern businesses are faced with handling increased amounts of data, and this can
create challenges around cost, compliance and security. Cloud storage solves these
problems by offering a scalable and secure storage space on a pay-as-you-go basis,
without the need to purchase physical hardware. It also helps businesses to comply with
data regulations, as they can host their cloud storage at a datacentre location that meets
their needs. For example, to comply with GDPR, a company might choose a cloud storage
solution hosted in a European datacentre.
Cloud storage also increases agility and simplicity for IT teams, as they no longer need
to setup and maintain on-premises machines or control security. Instead, data can be
managed online at the click of a button.
Public Cloud Portfolio
How does cloud storage work?
Cloud storage is typically delivered by a cloud service provider. Using the cloud, it relies
on a network of remote servers hosted in multiple datacentres, as opposed to on-
premises machines. Users will need to create an account with the cloud provider and use
authentication methods, such as a password or two-factor authentication, to access and
upload their data via a web interface. Because cloud storage uses an internet connection,
users can normally access their data on any device, at any time.
To ensure redundancy, cloud providers create multiple copies of the data and store this
on different servers or datacentres. This ensures that the data is backed-up and available
during a datacentre or server failure, thus preventing data loss. Cloud storage also uses
data segmentation to optimise performance, segmenting large files into smaller chunks
and distributing them across multiple servers. If a user requests the segmented file, the
pieces are then retrieved and assembled by the cloud storage solution. Cloud storage is
also scalable by design, enabling users to scale their storage capacity up or down,
depending on their needs. Finally, the cloud storage provider might also offer services to
help users collect, manage and analyse their data.
Cloud storage is the abstraction, pooling, and sharing of storage resources through the
internet. Cloud storage is facilitated by IT environments known as clouds, which enable
cloud computing—the act of running workloads within a cloud environment. Access cloud
storage doesn't require an intranet connection (that’s known as network-attached
storage) or a direct connection to storage hardware (that's known as direct-attached
storage).
There are 3 types of cloud storage: public cloud storage, private cloud storage, and hybrid
cloud storage. There are also 3 ways to format this storage: As blocks, files, or objects.
Each format has its pros and cons (blocks are faster, files are easier to understand, and
objects work best with quick moving workloads), but some software-defined cloud storage
products can combine all 3 formats into a unified, easy-to-deploy solution.
Many organizations are discovering that traditional storage methods can be the
bottleneck that slows their agility and scalability. This has led to the development of
containers, which allow applications to scale rapidly, be more reliable, and offer better
performance than more conventional means or methods.
Block storage splits a single storage volume (like a cloud storage node) into individual
instances known as blocks. It's a fast, low latency storage system ideal for high
performance workloads.
Object Storage
Object storage involves pairing a piece of data with unique identifiers known as metadata.
Since objects are uncompressed and unencrypted, they can be accessed very quickly at
huge scale—making them ideal for cloud-native applications.
File storage
File storage is the dominant technology used on NAS systems and is responsible for
organizing data and representing it to users. Its hierarchical structure allows us to
navigate data from top to bottom easily, but increases processing time.
There are several types of cloud storage, each built for different use cases. Here are the
most common types of cloud storage:
Object Storage
Object storage is a method of cloud storage that combines data into an “object” alongside
metadata and a unique identifier. Consolidating these elements into a single unit offers
increased flexibility and cost control, even when storing large amounts of data. Object
storage is also highly scalable, with an unlimited cloud storage space, making it ideal for
storing large volumes of unstructured data, such as backups and multimedia.
File Storage
Traditional file storage organises data into a hierarchy of folders and directories. File
storage in the cloud imitates this approach, enabling users to access and manage their
data via a file system interface. It is suitable for a wide range of use cases, including file
sharing, collaborative document editing, multimedia, backups, and archiving.
Block Storage
With block storage, data is divided into uniform-sized blocks and stored as separate units.
It doesn’t have the hierarchy of file storage – instead, each block contains raw data and
features a unique identifier. Block storage is ideal for high-performance use cases, such
as virtual machines, databases, and apps that require direct access to data. Block storage
in the cloud is also highly scalable, with dynamic scaling enabling users to add new blocks
quickly and easily.
Cold Storage
Cold storage is designed for storing infrequently accessed data. Such data usually needs
to be stored for compliance purposes, such as within the healthcare or financial industries.
Cold storage is more cost-effective than standard storage solutions, as it is optimised for
keeping data for long periods of time, thus using less resources. When hosted in the
cloud, cold storage is also highly scalable. However, as it is designed for long-term
storage, cold storage is unsuitable for storing data that needs to be retrieved quickly.
What are the use cases of cloud storage?
Cloud storage offers a cost-effective, scalable, and compliant method of storing data. This
makes it ideal for a wide range of personal and business use cases. Here are some of
the most common use cases for cloud storage:
Big Data and Analytics
Due to its high scalability, cloud storage is ideal for big data and analytics scenarios,
which utilise large datasets. Cloud storage enables users to store large volumes of data
in the cloud, without the need for on-premises hardware. It can also be seamlessly
integrated with analytics software, enabling users to harness valuable insights from their
data.
Backup and Disaster Recovery
As cloud storage is based in the cloud, it enables businesses and individuals to store data
securely offsite. This makes it well-suited for backup and disaster recovery use cases,
which require an environment that keeps data safe from loss, deletion or corruption, as
well as enabling easy recovery in the case of hardware failure.
Compliance
Businesses use cloud storage to ensure compliance with data retention regulations. This
is because cloud providers offer a range of features that protect data and meet
compliance requirements, including encryption, access controls, audit trails, policies,
disaster recovery and security updates.
Collaboration
Cloud storage is commonly used in modern workplaces, where teams are often
distributed across different geographic locations and collaboration happens remotely.
Cloud storage enables such teams to create, edit and share documents anywhere, no
matter where individuals are located. A good example of cloud storage being used for
collaboration is Microsoft OneDrive.
Multimedia Storage
Another common use case for cloud storage is multimedia storage e.g. photos, audio,
videos and other content. Both individuals and businesses can store and share their files
using cloud storage and benefit from its high scalability and capacity.
AI, ML and IoT
The data processed during artificial intelligence (AI), machine learning (ML) and Internet
of Things (IoT) scenarios is huge. Cloud storage offers the capacity, performance and
scalability to handle such large volumes of data, as well as enabling businesses to
analyse and extract insights from their data.
Virtual Machine Storage
Cloud storage is ideal for storing the data associated with virtual machines. Block storage
in the cloud is especially well-suited to such scenarios, enabling businesses to benefit
from high-performance and scale resources as needed.
Software Development and Testing
Software and app developers can leverage the scalability of cloud storage to store
codebases, repositories, datasets and binaries. Cloud storage also enables developers
to collaborate on projects and establish formal version control, plus it can be integrated
seamlessly with continuous integration/continuous deployment (CI/CD) pipelines to store
artifacts and facilitate a smooth development process.
Web Hosting
Its high scalability and availability make cloud storage ideal for hosting static website
content, such as images, multimedia, scripts and stylesheets.
Hybrid and Multi-Cloud Deployments
Cloud storage delivers the flexibility and scalability required for hybrid and multi-cloud
architectures. It supports the distributed nature of such architectures, enabling
businesses to integrate their cloud storage with their on-premises infrastructure and
synchronise data across environments.
What are the pros and cons of cloud storage?
Cloud storage offers multiple benefits for businesses and individuals; however, it isn’t
suitable for every situation. Here’s a summary of the pros and cons of cloud storage:
Pros of cloud storage:
Scalability
One of the biggest benefits of cloud storage is scalability. Being based in the cloud, it
offers a huge storage space, which can be scaled up or down to suit user requirements.
This makes it ideal for modern businesses, who are faced with ever-increasing volumes
of data.
Cost-Effectiveness
Cloud storage is offered on a pay-as-you-go basis, which makes billing much more
transparent and easier to control. Also, unlike on-premises storage, there’s no need for
an upfront investment in physical machines or maintenance expenses.
Accessibility
Another benefit of cloud storage is accessibility. Users can access their data easily
anywhere, anytime using an internet connection. This makes it ideal for distributed teams
and remote collaboration – two common features of modern business life.
Security
To protect user data, cloud storage offered by trusted cloud storage providers has multiple
security features, such as encryption, authentication, access controls, and compliance
policies. Cloud storage providers also implement security at their datacentres to prevent
unauthorised access to servers.
Backup and Disaster Recovery
A huge benefit of cloud storage is its backup and disaster recovery capabilities. Unlike
on-premises storage, data on cloud storage can be recovered quickly in the event of
hardware failure, fire, cyberattack, or other disaster. Data can also be replicated and
stored across multiple datacentre locations as a backup mechanism.
Automatic Updates and Maintenance
Cloud storage providers handle all the administration associated with data storage e.g.
updates and security patches. This eliminates the burden of managing and maintaining
on-premises infrastructure, leaving users free to focus on more important tasks.
public cloud image
Cons of cloud storage:
Dependent on an Internet Connection
As cloud storage is accessed via the internet, an unreliable internet connection can
prevent users from accessing, syncing, updating or backing-up their data.
Limited Control
With cloud storage, data is stored in the cloud, at multiple datacentres owned by the cloud
storage provider. This reduces the amount of control a business has over its data, which
can be a problem for organisations with specific compliance requirements.
Limited Customisation
Cloud storage is often an “out-the-box” solution, with limited features and customisation.
As such, businesses with specific requirements might find some cloud storage solutions
unsuitable for their needs.
Downtime
Although cloud storage providers work hard to deliver high availability, there’s still a risk
of outages. This impacts access to data and other services, such as backup and updates.
Security Concerns
Cloud storage is usually highly secure, with cloud storage providers going to great lengths
to protect user data. However, cyberattacks and data breaches are still a risk, especially
if data is distributed across multiple locations. This might dissuade some businesses from
choosing cloud storage.
Despite there being some negatives to cloud storage, it is important to remember that
these are very much based on the specific requirements and preferences of the individual
or business. By comparing different cloud storage solutions and understanding how they
can support your needs, you can make a more informed decision.
Cloud entails an ever-expanding list of tools and techniques, but the key characteristics
of cloud computing remain the same.
Verus provides a full range of local-area network (LAN) and wide-area network (WAN)
services, including design, installation, support, monitoring with our unique VerusGuard
service, and hosting in our data center. We know the technology from end to end—from
the Cisco routers and HP and Dell servers that form the backbone of your network, to the
network connections that deliver the quality of service you need, to working with a carrier
your can trust.
And since Verus partners with leading vendors, we can work directly with their
engineering teams and offer faster and more responsive support than many other firms.
For example, Verus was the first Cisco Select Partner in the Midwest specializing in small
to medium business (SMB) core network infrastructure. Select Certification reflects a
partner’s technology and business expertise specific to the SMB market. The designation
reflects our ability to deliver value-added Cisco solutions and our technical expertise in
switching, routing, security, and wireless solutions for SMB customers.
Cloud Characteristics
AWS was the first to popularize cloud computing as an alternative to on-premises
infrastructure when it began selling computing resources and storage instances in 2006.
Google and Microsoft followed soon after. Today, cloud computing extends from
infrastructure to software-as-a-service (SaaS) models and everything in between,
including AI, containers, serverless computing, databases, IoT, dedicated networking,
analytics, business apps and much more.
Each subset has its own benefits and challenges, but several core cloud computing
features underpin all of them. Explore these eight key characteristics of cloud computing
that explain why it's the go-to destination for building and deploying modern applications.
1. On-demand self-service
AWS, Microsoft Azure, Google Cloud and other public cloud platforms make resources
available to users at the click of a button or API call. With data centers all over the world,
these vendors have vast amounts of compute and storage assets at the ready. This
represents a radical departure for IT teams accustomed to an on-premises procurement
process that can take months to complete.
Cloud computing's characteristic of self-service provisioning goes hand in hand with on-
demand computing capabilities. Instead of waiting for new servers to be delivered to a
private data center, developers can select the resources and tools they need -- typically
through a cloud provider's self-service portal -- and build right away. An admin sets
policies to limit what IT and development teams can run, but within those guardrails,
employees have the freedom to build, test and deploy apps as they see fit.
2. Resource pooling
Resource pooling enables scalability for cloud providers and users, letting them add or
remove compute, storage, networking and other assets as needed. This helps enterprise
IT teams optimize their cloud-hosted workloads and avoid end-user bottlenecks. Clouds
can scale vertically or horizontally, and service providers offer automation software to
handle dynamic scaling for users.
While scalability usually describe longer-term cloud infrastructure plans, rapid elasticity is
a short-term characteristic. When demand unexpectedly surges, properly configured
cloud applications and services instantly and automatically add resources to handle the
load. When the demand abates, services return to original resource levels.
4. Pay-per-use pricing
This cloud computing characteristic shifts IT spending from Capex to Opex as providers
offer per-second billing. This model achieves economies of scale through reducing costs
on a large scale and seeing an increase in efficiency. Though this can generally be seen
as a positive, IT teams must be careful since their resource needs likely aren't static. VMs
should be right-sized, turned off while not in use, or scaled down as conditions dictate.
Otherwise, organizations waste money and can end up with sticker shock when the
monthly bill arrives.
This pricing model was once the only way to pay for cloud. But vendors have since added
various pricing plans that provide cheaper costs in exchange for longer-term
commitments. This model is cost effective since customers only pay for what they use.
Cloud features
Key cloud features include the ability to manage automation, costs, performance,
compliance, and security.
5. Measured service
Measuring cloud service usage is useful for both a cloud provider and its customers. The
provider and the customer monitor and report on the use of resources and services, such
as VMs, storage, processing and bandwidth. That data is used to calculate the customer's
consumption of cloud resources and feeds into the pay-per-use model. The cloud
provider, meanwhile, can better understand how customers utilize its resources and
potentially improve the infrastructure and cloud computing services offered
6. Resiliency and availability
Cloud providers use several techniques to guard against downtime, such as minimizing
regional dependencies to avoid single points of failure. Users can also extend their
workloads across availability zones, which have redundant networks connecting multiple
data centers in relatively close proximity. Some higher-level services automatically
distribute workloads across availability zones.
Of course, these systems aren't foolproof. Outages occur and enterprises must have
contingency plans in place. For some, that means extending workloads across isolated
regions or even different platforms -- though that can come with a hefty price tag and
increased complexity.
7. Security
While many enterprises balked at migrating workloads because of security fears, those
concerns have largely subsided, partly due to the benefits of the above characteristics of
cloud computing. Cloud vendors employ some of the best security experts in the world
and are generally better equipped to handle threats than most in-house IT teams. In fact,
some of the biggest financial firms in the world say the cloud is a security asset.
However, this doesn't absolve users of their duties. Public cloud providers follow the
shared-responsibility model. They tend to the security of the platform, and users handle
their own apps that sit on top. Failure to fully grasp those delineations has led to high-
profile exposures of sensitive corporate data in the past.
8. Broad network access
A big part of the cloud's utility is its ubiquity. Data can be uploaded and accessed from
anywhere with an internet connection. Users can work from any location. The cloud is an
attractive option for most enterprises that have a mix of operating systems, platforms and
devices.
To preserve that broad network access, cloud providers monitor and ensure various
metrics that reflect how customers access cloud resources and data: latency, access
time, data throughput, etc. These factor into quality-of-service requirements and service-
level agreements.
Distributed Storage: What’s Inside Amazon S3?
The exponential growth of data volumes in all industries demands new storage
technology. Distributed storage can spread files, block storage or object storage across
multiple physical servers, for high availability, data backup and disaster recovery
purposes. Learn about the distributed storage technology that powers massively scalable
storage services like Amazon S3, and huge data pools in on-premise data centers.
A distributed storage system is infrastructure that can split data across multiple physical
servers, and often across more than one data center. It typically takes the form of a cluster
of storage units, with a mechanism for data synchronization and coordination between
cluster nodes.
Distributed storage is the basis for massively scalable cloud storage systems like Amazon
S3 and Microsoft Azure Blob Storage, as well as on-premise distributed storage systems
like Cloudian Hyperstore.
● Files—a distributed file system allows devices to mount a virtual drive, with the
actual files distributed across several machines.
● Block storage—a block storage system stores data in volumes known as blocks.
This is an alternative to a file-based structure that provides higher performance. A
common distributed block storage system is a Storage Area Network (SAN).
● Objects—a distributed object storage system wraps data into objects, identified
by a unique ID or hash.
Most distributed storage systems have some or all of the following features:
An inherent limitation of distributed storage systems is defined by the CAP theorem. The
theorem states that a distributed system cannot maintain Consistency, Availability and
Partition Tolerance (the ability to recover from a failure of a partition containing part of the
data). It has to give up at least one of these three properties. Many distributed storage
systems give up consistency while guaranteeing availability and partition tolerance.
Amazon S3 is a distributed object storage system. In S3, objects consist of data and
metadata. The metadata is a set of name-value pairs that provides information about the
object, such as date last modified. S3 supports standard metadata fields and custom
metadata defined by the user.