0% found this document useful (0 votes)
14 views8 pages

Unit 5 CC

Useful
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as TXT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views8 pages

Unit 5 CC

Useful
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as TXT, PDF, TXT or read online on Scribd
You are on page 1/ 8

CLOUD COMPUTING

UNIT 5
EVLUTION OF STORAGE TECHNOLOGY

Since the beginning of this century, advancements in data storage technology have
occurred at a breakneck speed, and the amount of information that is saved annually
has been steadily growing [233]: • 1986: 2.6 exabytes, which is equivalent to less
than one CD-ROM with 730 megabytes of storage space per person. • 1993 – 15.8 EB;
comparable to 4 CD-ROMs per person. • 2000 – 54.5 EB; comparable to 12 CD-ROMs per
person. • 2007 – 295 EB; equivalent to roughly 61 CD-ROMs per person
According to a study that was published in 2003 [354], the storage density of hard
disc drives (HDD) rose by four orders of magnitude between the years 1980 and 2003,
going from approximately 0.01 Gb/in2 to approximately 100 Gb/in2. At the same time
span, costs dropped by five orders of magnitude, reaching an all-time low of
approximately one cent per megabyte. It was anticipated that hard disc drive (HDD)
density will increase to 1,800 Gb/in2 by 2016, up from 744 Gb/in2 in 2011

Between the years 1990 and 2003, the density of DRAM, or dynamic random access
memory, increased from approximately 1 Gb/in2 to 100 Gb/in2. At the same time span,
the price of DRAM dropped from approximately $80 per megabyte to less than $1 per
megabyte. Using a 30 nm process Samsung was the first company to introduce LPDDR2
dram with four gigabytes in year 2010

Recent developments in storage technology have had a significant impact on the


storage systems that are utilised for cloud computing. The capacity of NAND flash-
based devices has substantially outperformed the rise of DRAM capacity, and the
price per gigabyte has experienced a significant drop. The manufacturers of storage
devices are making investments in alternative solid state technologies such as
phase-change memory in order to stay competitive.

Information can be stored not only using the charge of an electron, which is the
basis for solid state memories, but also using the spin of an electron, which is
another essential feature of an electron.

Although as the density of the storage devices has increased cost has been
decreased but the performance of this devices in accessing the data have been
increased. the performance of I/o when compared to the processor is not up to mark
which cause problem in accessing the applications like multi media engineering
software etc.

now a days data has been rapidly growing when compared to 1990 and 2000 coz evry
one are using mobile......
because of this it is unable to manage huge amount of data but data mining
algorithm came into existence........ and also cost of storage devices is also
more but using that in cloud computing would make it efficient.

Phase Time Period Key Innovations Storage Examples


Magnetic Storage 1950s - 1980s HDDs, Floppy Disks, Magnetic Tapes IBM HDD,
Floppy Disk
Optical & Flash Storage 1990s CDs, DVDs, Early SSDs CD-ROM, DVD, Early SSDs
Network Storage 1990s - 2000s NAS, SAN, RAID NAS Devices, RAID Arrays
Early Cloud Storage 2006 - 2010 Object Storage, Scalability Amazon S3
Cloud Storage Expansion 2010 - 2015 Block/File Storage, Tiered Storage AWS EBS,
Azure Files
Hybrid & Multi-Cloud 2015 - 2019 Hybrid Models, Multi-Cloud AWS Storage
Gateway, Google Anthos
Intelligent & Decentralized 2019 - Present AI Optimization, Edge,
Decentralized Storage AWS S3 INTELLIGENT TIERING IPFS

DISTRIBUTEF FILE SYSTEM:


When we need to store and process a large data file (approx 1 TB size file at
least), the Local file system of the Operating system is not appropriate. In such
cases, we use a Distributed File system. It can be created on any Linux operating
system with Hadoop. DFS stores any data file by dividing it into several blocks.
This file system works on the Master-Slave architecture where the Master is
NameNode and DataNodes are the slaves. All the blocks of a Data file are stored in
different DataNodes and the location is only known by NameNode. Every Data Block is
replicated into different Datanodes to avoid data loss when any data node fails. In
DFS Data files are directly not accessible to any user because only NameNode knows
where the Data blocks of the Data file are stored.
..........MORE INFOR FROM GFG...............

STORGAE MODELS:av wrote

FILE SYSTEM:CHATGPT

PARALLEL FILE SYSTEM:


gfg+
A parallel file system is a type of storage system that splits data into smaller
parts and spreads these parts across multiple storage servers. This setup allows
different parts of the data to be accessed at the same time, making it much faster
than regular storage. Imagine a large video file being split into small segments
and stored across different servers. When someone wants to watch the video, instead
of retrieving it from a single server, the system pulls different segments from
multiple servers at once, speeding up the process.
TYPES:
LUSTRE
IBM SPECTRUM SCALE
AMAZON FSX FOR LUSTURE
BEE GFS
CEPH FS

GENERAL PARALLEL FILE SYSTEM(CHATGPT)

### **IBM GPFS (General Parallel File System) - Now Known as IBM Spectrum Scale**

IBM’s **General Parallel File System (GPFS)**, rebranded as **IBM Spectrum Scale**,
is a high-performance, scalable, and robust parallel file system designed for
managing large volumes of data. It was initially developed by IBM to support
supercomputing and large-scale enterprise applications, and it has evolved to meet
the needs of modern cloud and hybrid environments.

### **Key Features of GPFS**

1. **Parallel Data Access**:


- GPFS allows **multiple clients** (servers, applications) to access data
simultaneously, distributing the workload across multiple nodes. This parallel
access boosts read and write performance, making GPFS ideal for data-intensive
tasks like big data analytics, machine learning, and high-performance computing
(HPC).

2. **Scalability**:
- GPFS can scale horizontally by adding more nodes and storage devices,
supporting **petabytes of data** and billions of files. It is designed to handle
both small and large files efficiently, making it suitable for diverse workloads,
from small data processing jobs to massive research simulations.

3. **Distributed Metadata Management**:


- One of GPFS’s strengths is its **distributed metadata architecture**. Unlike
traditional file systems with a single metadata server, GPFS distributes metadata
across multiple nodes. This reduces bottlenecks, improves performance, and ensures
better load balancing.

4. **Advanced Data Management**:


- GPFS includes features like **data tiering**, which automatically moves data
between high-speed and low-speed storage based on usage patterns. This optimizes
storage costs and performance.
- **Policy-based data management** allows users to define rules for file
placement, migration, and backup, helping in automating complex data workflows.

5. **Data Replication and Fault Tolerance**:


- GPFS provides **data replication**, where copies of data are stored on
multiple nodes. In case of a hardware failure, GPFS can retrieve data from
replicated nodes, ensuring high availability and data protection.
- It also uses **erasure coding**, a technique that breaks data into fragments
and spreads them across different nodes with added redundancy, further enhancing
fault tolerance.

6. **Snapshot and Backup**:


- GPFS supports **snapshots**, which are point-in-time copies of the file
system. This feature allows for quick data recovery in case of accidental deletions
or data corruption.
- It integrates with backup solutions for regular data backups and supports
**incremental backups**, where only changes since the last backup are stored,
saving time and storage space.

7. **Multi-Protocol Access**:
- GPFS supports multiple access protocols, including **POSIX**, **NFS**,
**SMB**, and **Object Storage API**, making it compatible with a wide range of
applications and environments. Users can access the file system using standard file
system interfaces, ensuring flexibility.

8. **Cloud and Hybrid Integration**:


- IBM Spectrum Scale is designed to work seamlessly in **cloud, on-premises, and
hybrid environments**. It integrates with major cloud providers like **AWS, Azure,
and Google Cloud**, allowing users to extend their on-premises storage to the cloud
for additional capacity or disaster recovery.
- It also supports **containerized applications**, integrating with platforms
like **Kubernetes** and **Red Hat OpenShift**, making it suitable for cloud-native
workloads.

### **Architecture of GPFS**

The architecture of GPFS is designed to provide high availability and performance


through its distributed components:
- **Storage Nodes**: These are the servers that store the actual data. Data is
spread across multiple nodes to ensure parallel access and redundancy.
- **Metadata Nodes**: Manage file metadata (information about file names, sizes,
permissions). GPFS uses distributed metadata, so multiple nodes handle this task,
reducing bottlenecks.
- **Client Nodes**: These are the systems that access the file system. They can
read and write data in parallel, accessing storage nodes directly.
- **Management Server**: A central server that handles configuration, monitoring,
and administrative tasks.

### **How GPFS Works**


1. **Data Striping**:
- GPFS divides files into small blocks and spreads these blocks across multiple
storage nodes. This technique, called **data striping**, ensures that different
parts of a file can be read or written at the same time, boosting performance.

2. **Locking Mechanism**:
- GPFS uses a distributed locking mechanism to manage access to files. This
ensures that multiple clients can access the same file concurrently without data
corruption, while also preventing conflicts.

3. **Failover and Recovery**:


- If a node fails, GPFS automatically reroutes requests to other nodes. Its
**self-healing** capabilities allow it to detect and fix issues, such as corrupted
data blocks, without manual intervention.

### **Use Cases for GPFS**


- **High-Performance Computing (HPC)**: GPFS is widely used in scientific computing
and research projects that require fast processing of large datasets, such as
simulations and weather forecasting.
- **Big Data Analytics**: With its ability to handle large volumes of structured
and unstructured data, GPFS is ideal for analytics platforms like Apache Hadoop and
Spark.
- **Media and Entertainment**: GPFS is used for video editing, rendering, and
broadcasting because of its ability to handle large media files with high-speed
access.
- **Financial Services**: In banking and trading, GPFS supports fast data access
and processing, crucial for real-time analytics and transaction processing.

### **Advantages of GPFS**


- **High Throughput**: Parallel access and data striping ensure faster read and
write speeds.
- **Reliability**: Advanced fault tolerance features like data replication and
erasure coding protect against data loss.
- **Flexibility**: Supports multiple protocols and integrates well with cloud and
hybrid environments.
- **Efficiency**: Features like data tiering and policy-based management help
optimize storage usage and costs.

### **Disadvantages of GPFS**


- **Complex Setup**: The installation and configuration of GPFS can be complex,
requiring specialized knowledge.
- **Cost**: Licensing and hardware costs can be high, making it more suitable for
enterprises with large data needs.
- **Maintenance Overhead**: Managing and monitoring a distributed file system like
GPFS can require significant administrative effort.

### **Conclusion**
IBM GPFS (Spectrum Scale) is a powerful, versatile file system designed to meet the
needs of high-performance, large-scale data environments. Its parallel
architecture, fault tolerance, and scalability make it a preferred choice for
industries requiring fast and reliable data access. As data workloads continue to
grow, the role of parallel file systems like GPFS becomes even more critical in
ensuring efficient data management and processing across on-premises, cloud, and
hybrid environments.

GOOGLE FILE SYSTEMS:


The **Google File System (GFS)** is a special file system created by Google to
store and manage large amounts of data. It works by breaking files into big chunks
(64 MB each) and spreading these chunks across many computers. Each chunk is copied
multiple times to make sure the data isn’t lost if a computer fails. GFS has one
**master node** that keeps track of where the chunks are stored and manages things
like file names and permissions. The data is stored on **chunkservers**, which
handle the actual file data. GFS is designed to work best with large files and is
very good at handling **append-only** data, meaning it can only add new data to
files, not change existing data. It uses a **weak consistency model**, so it might
take a little time for changes to show up everywhere, but it makes the system
faster and more reliable. GFS can grow by adding more computers to the system, and
it balances the load automatically to keep things running smoothly. Though it has
some limits, like not allowing changes to files easily, it is very efficient and
helps Google manage its huge data needs.

DIAGRAM-GFG AND COMPONENETS-GFG

FILE SYSTEM AND DATABSES


Sure! Let me give you a more detailed and easy-to-understand explanation that will
help you in your exam.

---

### *File Systems in Cloud Storage*

A *file system* is a way of organizing and storing files on a storage device,


whether it’s a local computer or a cloud-based storage system. Cloud storage
services like *Google Drive, **OneDrive, and **Dropbox* use file systems to store
your data on remote servers. In a file system, data is stored as individual *files*
and placed inside *folders* (also called directories). You access the files by
navigating through the folders, just like on your computer.
It provides a way to store, retrieve, and manage data in the form of files and
directories. Cloud file systems are scalable, highly available, and accessible from
anywhere, often being designed to provide redundancy and fault tolerance

- *Structure: The file system is organized in a **hierarchical structure. This


means that data is organized in a tree-like structure where folders can contain
files or other subfolders. For example, you might have a folder called **Documents,
which contains files like **Resume.docx* or *Report.pdf*.

- *How it works: When you upload a file to cloud storage, that file is stored in a
specific **folder* on the server. When you want to access that file, you simply
navigate to the folder, find the file, and open it. The *path* of the file (for
example, *Documents/Resume.docx*) shows you where it is stored in the folder
hierarchy.
#### *Example*:

Cloud Storage Root



├── Documents
│ ├── Resume.docx
│ ├── Report.pdf
│ └── Budget.xlsx

└── Photos
├── Holiday.jpg
└── Family.png

- *Advantages*:
- Simple to use, just like the way you organize files on your computer.
- Good for storing files like documents, images, videos, and presentations.
- Easy to share files with others by sharing folder links.

- *Limitations*:
- File systems are not ideal for large amounts of complex, structured data (like
customer records, inventory details, etc.).
- Searching through a large collection of files can be slower compared to using
databases.
- Difficult to establish relationships between different pieces of data.

---
Network File Systems (NFSs):

NFS is a protocol used for accessing files over a network, enabling systems to
share files between different machines. NFS allows a client system to access files
stored on a server as though they were local files.
Limitations:
NFS can become a bottleneck for performance due to scalability issues.
It can be unreliable since a failure in the NFS server can disrupt access to shared
files.
Storage Area Networks (SANs):

A SAN is a specialized high-speed network that connects servers and storage


devices. SANs provide block-level access to data, allowing servers to treat remote
storage as if it were local.
Advantages:
High flexibility and scalability for large-scale storage systems.
Enables seamless addition and removal of storage resources.
Disadvantages:
More expensive due to the need for specialized hardware like Fibre Channel
adapters.
Complexity in managing the network and storage resources.
Parallel File Systems (PFSs):

Parallel file systems are designed to handle large-scale storage by distributing


data across multiple nodes. They allow multiple clients to access the same file
simultaneously and provide high throughput.
Key Features:

### *Databases in Cloud Storage*

A *database* is a more advanced system used to store, manage, and organize large
amounts of *structured data. In a database, data is stored in **tables, which are
made up of **rows* and *columns. Each **row* represents a single record (like a
customer or an order), and each *column* represents an attribute of that record
(like the customer’s name, age, or address).

For example, instead of storing customer information as individual files, a


database stores it in a table with different columns for each piece of information.
This makes it much easier to *search, **update, and **retrieve* data quickly.
Databases are especially useful when dealing with large datasets or data that needs
to be related, such as customer information, transactions, or products in a store.

#### *How it works*:


In a database, data is stored in *tables. Each table has **columns* that define the
type of data (like Name, Age, Email, etc.), and each *row* represents an individual
record. You can perform *queries* (using SQL, for example) to search, update, or
delete specific records from the database.

- *Relational Databases* (e.g., *MySQL, **PostgreSQL, **Oracle*): These store data


in tables and support relationships between tables, such as linking a customer to
their orders.
- *NoSQL Databases* (e.g., *MongoDB, **Cassandra*): These are used for unstructured
or semi-structured data and don't use tables in the traditional sense.

#### *Example*:
A simple *customer database table* might look like this:

| *Customer ID* | *Name* | *Age* | *Email* |


|-----------------|------------|---------|------------------------|
| 1 | Alice | 30 | [email protected] |
| 2 | Bob | 25 | [email protected] |
| 3 | Charlie | 35 | [email protected] |

- *Advantages*:
- Great for managing large amounts of data, especially if you need to organize it
in a structured way (e.g., customer data, transactions, inventory).
- Supports *relationships* between data (for example, connecting customers to
their orders).
- Allows for *efficient searches* and *complex queries* to find specific
information (e.g., “Find all customers older than 30”).
- Ensures data integrity and consistency with constraints and rules.

- *Limitations*:
- More complex to set up and use than a file system.
- Not ideal for unstructured data (like photos or videos).
- Requires more resources and management (e.g., database software, maintenance).

---

### *Comparison: File Systems vs. Databases*

| *Aspect* | *File Systems* | *Databases*


|
|--------------------------|---------------------------------------------|---------
--------------------------------------|
| *Data Organization* | Files stored in folders | Data
stored in tables (rows and columns) |
| *Best For* | Simple data (documents, media files) | Structured
data (customer info, transactions) |
| *Data Retrieval* | Accessed by file path | Accessed
via queries (SQL or API) |
| *Efficiency* | Simple and easy to use for small data sets | Efficient
for large, complex datasets |
| *Examples* | Google Drive, OneDrive, Dropbox | MySQL,
MongoDB, PostgreSQL, Oracle |

---SQL AND NOSQL REFER FROM ANY SITE

You might also like