Lecture 3 - Availability Mechanism
Lecture 3 - Availability Mechanism
Mechanism
REDUNDANCY
Learning Outcomes
Network backups load the network significantly, it may not be possible, for example
to backup 100 4GB file servers each night over the network. Planning is important.
Environment
The computing environment can be protected with Air Conditioning, locked server
rooms and UPS.
Redundancy
Redundancy
It is one of the Availability Mechanisms
Redundancy
B. Hybrid solutions: Cost-sensitive solutions similar to pure software RAID, but with bootability
requirements.
Targeted Applications:
1. Entry-level servers without large storage requirements
2. Compute engines connected to networked storage
C. Hardware RAID solutions: Most feature-rich and highest performance solution. This may be
implemented as RAID on the Motherboard (ROMB) or with plug-in cards.
Targeted Applications:
1. High Performance workstations with large data storage requirements
2. Entry level to enterprise servers requiring performance and scalability from the storage subsystem.
Application/service
redundancy
This is often the cheapest and easiest to implement, where available.
The principle problem is that few applications support this type of redundancy.
In this type, Clients connecting to these servers automatically look for a backup or duplicate server if the
primary is not available.
Examples
Naming servers (NIS+, DNS, NIS, WINS, Lan Manager...) often have this capability in-built and its use is
highly recommended.
RAID / mirroring is not necessary for these servers, unless the cost of RAID is cheaper.
Filesystem servers can increase availability (cheaply) by replicating files (e.g. user home directories) to
another system or to another local disk regularly.
If a major crash of the primary file server occurs, users can mount their files from the second system, but
changes made since the last replication/ synchronisation will be lost.
=> Typical products: Rdist (UNIX), File replicator (NT).
=> To reduce down time, the above method can be used to keep a synchronised copy of the system disk
available on important servers, without adding RAID.
RAID / Mirroring
The classical method of increasing system availability is by duplicating one of the
weakest part in a computer: the disk.
RAID (Redundant Array of Inexpensive Disks) is a de-facto standard for defining how
standard disks can be used to increase redundancy. The top RAID systems duplicate
disks, disk controllers, power supplies and communication channels. The simplest
RAID systems are software-only disk drivers which group together disparate disks
into a redundant set.
While older storage devices used only one disk drive to store data,
RAID storage uses multiple disks in order to provide fault tolerance,
to improve overall performance, and to increase storage capacity in
a system.
How RAID Works
With RAID technology, data can be mirrored on one or more
other disks in the same array, so that if one disk fails, the data is
preserved.
Thanks to a technique known as "striping," RAID also offers the
option of reading or writing to more than one disk at the same
time in order to improve performance.
In this arrangement, sequential data is broken into segments
which are sent to the various disks in the array, speeding up
throughput.
Also, because a RAID array uses multiple disks that appear to be
a single device, it can often provide more storage capacity than
a single disk.
RAID 0
A RAID 0 (also known as a stripe set or striped
volume) splits data evenly across two or more
disks (striped) without parity information for
speed. RAID 0 was not one of the original RAID
levels and provides no data redundancy. RAID 0
is normally used to increase performance,
although it can also be used as a way to create a
large logical disk out of two or more physical
ones.
Before the bits are sent, they are counted and if the total number of data bits is
even, the parity bit is set to one so that the total number of bits transmitted
will form an odd number. If the total number of data bits is already an odd
number, the parity bit remains or is set to 0.
At the receiving end, each group of incoming bits is checked to see if the group
totals to an odd number. If the total is even, a transmission error has occurred
and either the transmission is retried or the system halts and an error message
is sent to the user.
Parity (cont…)