Unit V
Unit V
RAID?
Let us understand How RAID works with an example- Imagine you have
a bunch of friends, and you want to keep your favorite book safe. Instead
of giving the book to just one friend, you make copies and give a piece to
each friend. Now, if one friend loses their piece, you can still put the book
together from the other pieces. That’s similar to how RAID works with
hard drives. It splits your data across multiple drives, so if one drive fails,
your data is still safe on the others. RAID helps keep your information
secure, just like spreading your favorite book among friends keeps it safe
RAID-0 (Stripping)
RAID-1 (Mirroring)
Raid Controller
1. RAID-0 (Stripping)
Instead of placing just one block into a disk at a time, we can work
with two (or more) blocks placed into a disk before moving on to
the next one.
Raid-0
Evaluation
Reliability: 0
There is no duplication of data. Hence, a block once lost cannot be
recovered.
Capacity: N*B
The entire space is being used to store data. Since there is no
duplication, N disks each having B blocks are fully utilized.
Advantages
It is easy to implement.
Disadvantages
A single drive loss can result in the complete failure of the system.
2. RAID-1 (Mirroring)
More than one copy of each block is stored in a separate disk. Thus,
every block has two (or more) copies, lying on different disks.
Raid-1
RAID 0 was unable to tolerate any disk failure. But RAID 1 is capable
of reliability.
Evaluation
Reliability: 1 to N/2
1 disk failure can be handled for certain because blocks of that disk
would have duplicates on some other disk. If we are lucky enough
and disks 0 and 2 fail, then again this can be handled as the blocks
of these disks have duplicates on disks 1 and 3. So, in the best case,
N/2 disk failures can be handled.
Capacity: N*B/2
Only half the space is being used to store data. The other half is just
a mirror of the already stored data.
Advantages
Disadvantages
It is highly expensive.
In Raid-2, the error of the data is checked at every bit level. Here,
we use Hamming Code Parity Method to find the error in the data.
Advantages
Disadvantages
Raid-3
Here Disk 3 contains the Parity bits for Disk 0, Disk 1, and Disk 2. If
data loss occurs, we can construct it with Disk 3.
Advantages
Disadvantages
Raid-4
Parity is calculated using a simple XOR function. If the data bits are
0,0,0,1 the parity bit is XOR(0,0,0,1) = 1. If the data bits are 0,1,1,0
the parity bit is XOR(0,1,1,0) = 0. A simple approach is that an even
number of ones results in parity 0, and an odd number of ones
results in parity 1.
Raid-4
Assume that in the above figure, C3 is lost due to some disk failure.
Then, we can recompute the data bit stored in C3 by looking at the
values of all the other columns and the parity bit. This allows us to
recover lost data.
Evaluation
Reliability: 1
RAID-4 allows recovery of at most 1 disk failure (because of the way
parity works). If more than one disk fails, there is no way to recover
the data.
Capacity: (N-1)*B
One disk in the system is reserved for storing the parity. Hence, (N-
1) disks are made available for data storage, each disk having B
blocks.
Advantages
It helps in reconstructing the data if at most one data is lost.
Disadvantages
Raid-5
Evaluation
Reliability: 1
RAID-5 allows recovery of at most 1 disk failure (because of the way
parity works). If more than one disk fails, there is no way to recover
the data. This is identical to RAID-4.
Capacity: (N-1)*B
Overall, space equivalent to one disk is utilized in storing the parity.
Hence, (N-1) disks are made available for data storage, each disk
having B blocks.
Advantages
Disadvantages
Raid-6 helps when there is more than one disk failure. A pair of
independent parities are generated and stored on multiple disks at
this level. Ideally, you need four disk drives for this level.
There are also hybrid RAIDs, which make use of more than one
RAID level nested one after the other, to fulfill specific
requirements.
Raid-6
Advantages
Disadvantages
Advantages of RAID
Disadvantages of RAID
Storage Device
1. Primary Memory
2. Secondary Memory
3. Tertiary Memory
o field.
Hard Disk: Hard Disk is a storage device (HDD) that stores and
retrieves data using magnetic storage. It is a non-volatile storage
device that can be modified or deleted n number of times without
any problem. Most computers and laptops have HDDs as their
secondary storage device. It is actually a set of stacked disks, just
like phonograph records. In every hard disk, the data is recorded
electromagnetically in concentric circles or we can say track
present on the hard disk, and with the help of a head just like a
phonograph arm(but fixed in a position) to read the information
present on the track. The read-write speed of HDDs is not so fast
but decent. It ranges from a few GBs to a few and more TB.
Pen Drive: It is also known as a USB flash drive that includes flash
memory with an integrated USB interface. We can directly connect
these devices to our computers and laptops and read/write data
into them in a much faster and more efficient way. These devices
are very portable. It ranges from 1GB to 256GB generally.
SSD: It stands for Solid State Drive, a mass storage device like HDD.
It is more durable because it does not contain optical disks inside
like hard disks. It needs less power as compared to hard disks, is
lightweight, and has 10x faster read and writes speed as compared
to hard disks. But, these are costly as well. While SSDs serve an
equivalent function as hard drives, their internal components are
much different. Unlike hard drives, SSDs don’t have any moving
parts and thus they’re called solid-state drives. Instead of storing
data on magnetic platters, SSDs store data using non-volatile
storage. Since SSDs haven’t any moving parts, they do not need to
“spin up”. It ranges from 150GB to a few more TB.
Shift registers
work one bit at a time in a serial fashion, while parallel registers work
simultaneously with all bits of the word. At high levels of complexity, parallel
processing derives from having a plurality of functional units that perform
separate or similar operations simultaneously. By distributing data among
several functional units, parallel processing is installed. As an example,
arithmetic, shift and logic operations can be divided into three units and
operations are transformed into a teach unit under the supervision of a
control unit. One possible method of dividing the execution unit into eight
functional units operating in parallel is shown in figure. Depending on the
operation specified by the instruction, operands in the registers are
transferred to one of the units, associated with the operands. In each
functional unit, the operation performed is denoted in each block of the
diagram. The arithmetic operations with integer numbers are performed by
the adder and integer multiplier.
Floating-point operations
can be divided into three circuits operating in parallel. Logic, shift, and
increment operations are performed concurrently on different data. All
units are independent of each other, therefore one number is shifted while
another number is being incremented. Generally, a multi-functional
organization is associated with a complex control unit to coordinate all the
activities between the several components.
The main advantage of parallel processing is that it provides better
utilization of system resources by increasing resource multiplicity which
overall system throughput.
Hardware multithreading
Types of Hardware Multithreading
Coarse-Grained Multithreading
Fine-Grained Multithreading
In SMT, the processor maintains the context for multiple threads and
interleaves their execution at the pipeline level. This means that
instructions from different threads can be fetched, decoded, and
executed simultaneously, leading to improved core utilization and
better overall system performance.
Clustered Multithreading
Clustered multithreading divides the processor core into multiple
clusters, with each cluster responsible for executing a specific subset of
threads. Each cluster has its own set of resources, including instruction
buffers and register sets, which are shared among the threads within
the cluster.
Computer designs like SIMD and MIMD are used to enhance the
efficiency of specific computing activities. The amount of data and
instruction streams serve as the classification’s basis. The computer
architecture known as SIMD, or single instruction multiple data,
allows for the execution of a single instruction across numerous data
streams. In contrast, MIMD (multiple instruction multiple data)
computer architectures can carry out a number of instructions on a
number of data streams.
With multiple processors carrying out the same task in parallel, SIMD
is frequently employed for issues needing numerous computations.
With each component allocated to a distinct processor for
simultaneous solutions, MIMD is widely employed for issues that
divide algorithms into independent and separate portions.
Cache coherence
Cache coherence : In a multiprocessor system, data inconsistency may
occur among adjacent levels or within the same level of the memory
hierarchy. In a shared memory multiprocessor with a separate cache
memory for each processor, it is possible to have many copies of any one
instruction operand: one copy in the main memory and one in each
cache memory. When one copy of an operand is changed, the other
copies of the operand must be changed also. Example : Cache and the
main memory may have inconsistent copies of the same object.
Modified – It means that the value in the cache is dirty, that is the
value in current cache is different from the main memory.
Shared – It means that the cache value holds the most recent data
copy and that is what shared among all the cache and main
memory as well.
Owned – It means that the current cache holds the block and is now
the owner of that block, that is having all rights on that particular
blocks.
Invalid – This states that the current cache block itself is invalid and
is required to be fetched from other cache or main memory.
Message-passing Multicomputers
Multicomputers are message-passing machines which apply packet
switching method to exchange data. Here, each processor has a private
memory, but no global address space as a processor can access only its
own local memory.
Multicomputers
Explore our latest online courses and learn new skills at your own pace.
Enroll and become a certified expert to boost your career.
Message-Routing Schemes
In Store and forward routing, packets are the basic unit of information
transmission. In this case, each node uses a packet buffer. A packet is
transmitted from a source node to a destination node through a
sequence of intermediate nodes. Latency is directly proportional to the
distance between the source and the destination.
When all the channels are occupied by messages and none of the
channel in the cycle is freed, a deadlock situation will occur. To avoid this
a deadlock avoidance scheme has to be followed.
Table of Content
Memory Hierarchy
Parallel Processing
Functioning of GPUs
1. Gaming
GPUs play a key role in gaming by providing realistic graphics and smooth
animations. Their parallel architecture ensures a better gaming
experience with lifelike visuals and immersive gameplay.
Aspect Description
GPU-accelerated matrix
operations significantly speed up
Deep Learning Acceleration deep neural network training.
Specialized hardware
enhancements, such as Tensor
Cores, improve performance for
Integration with AI Hardware AI workloads.
GPUs play a pivotal role in the realm of artificial intelligence (AI) and deep
learning. Their parallel architecture accelerates the matrix calculations
essential for training and running deep neural networks. This
acceleration significantly boosts the efficiency of machine learning
models, making GPUs integral to AI applications.
3. Scientific Computing
The processing speed of medical imaging tasks, such as MRI and CT scans,
is enhanced by GPUs. GPU acceleration enables real-time rendering and
analysis of volumetric data, contributing to quicker diagnoses and
improved medical imaging outcomes.
5. Cryptocurrency Mining
Memory Hierarchy
Proximity to
Level Type Characteristics GPU Cores Examples
GDDR5,
High
GDDR GDDR6,
capacity,
(Graphics Far HBM (High
moderate
DDR) Bandwidth
speed
Global Memory)
Proximity to
Level Type Characteristics GPU Cores Examples
On-chip,
Shared L2
GPU shared
On-chip cache, L1
(Device) among all
cache
Device GPU cores
On-chip, Shared
shared memory
Shared
within a GPU On-chip within a
Memory
block (thread CUDA
Shared block) thread block
Optimized
Specialized
Texture for texture
On-chip for texture
Memory mapping and
operations
Texture filtering
Read-only
Read-only
Constant data shared
On-chip data for all
Memory among all
threads
Constant threads
Proximity to
Level Type Characteristics GPU Cores Examples
Fast, private
L1 cache for
Level 1 cache for
On-chip individual
Cache each GPU
GPU cores
L1 Cache core
Larger, L2 cache
Level 2 shared cache shared
On-chip
Cache for all GPU among all
L2 Cache cores GPU cores
Fastest,
private Registers
Register File storage for On-chip allocated to
individual each thread
Registers threads
Shared memory is a faster but smaller memory space that allows threads
within the same block to share data. Registers are the smallest and
fastest memory units residing on the GPU cores for rapid access during
computation.
Parallel Processing
Aspect Description
GPU-accelerated matrix
operations significantly speed up
Deep Learning Acceleration deep neural network training.
Aspect Description
Specialized hardware
enhancements, such as Tensor
Cores, improve performance for
Integration with AI Hardware AI workloads.
2. Cryptocurrency Mining
Functioning of GPUs
1. Task Offloading
2. Data Parallelism
GPUs have evolved beyond their initial focus on graphics rendering and
are increasingly harnessed for general-purpose computing through
GPGPU. General-purpose computing on GPUs extends the utility of GPUs
to a broader spectrum of applications.
Developers can use GPUs for various tasks like scientific simulations and
machine learning, thanks to their ability to handle multiple tasks
simultaneously. This extends the use of GPUs beyond just graphics,
making them essential for a wide range of computational challenges.
1. Energy Efficiency
2. Ray Tracing
servers are not energy-efficient they will increase cost of electricity cost
of infrastructure to provide electricity cost of infrastructure to cool the
servers. Dependability via redundancy: The hardware and software in a
WSC must collectively provide at least 99.99% availability, while
individual servers are much less reliable. Redundancy is the key to
dependability for both WSCs and servers. WSC architects rely on multiple
cost-effective servers connected by a low cost network and redundancy
managed by software. Multiple WSCs may be needed to handle faults in
whole WSCs. Multiple WSCs also reduce latency for services that are