CS 1 2
CS 1 2
presentation
BITS Pilani Sourish Banerjee
WILP
Pilani Campus
BITS Pilani
Pilani Campus
CS 01 & 02
Books
Solution ??
✓ Think big, large … very large “Distributed File Systems”
that involves both STORAGE & NETWORK.
Differentiate between
• Data Storage and
• Data Access
Why ?
➢ OSs are not about data, applications are.
SAN is the abbreviation for ‘Storage Area Network’. Very often ‘storage
area networks’ or ‘SANs’ are equated with Fibre Channel
technology. The advantages of storage area networks can, however,
also be achieved with alternative technologies such as for example
iSCSI. Therefore always state the transmission technology with
which a storage area network is realized, for example Fibre Channel
SAN or iSCSI SAN.
BITS Pilani, Pilani Campus
DAS, NAS & SAN cont.
Think DAS
Desktops/Laptops with attached hard disk(s).
– Pros:
• Direct access, no intermediary.
• Least number of points of failure.
• Reasonable performance.
• Can do data consolidation !! But is it really a Pro?
– Cons:
• Zero redundancy.
• Single failure can kill the whole system.
• Almost total absence of failover mechanisms.
• Cannot be everywhere
• Think Network
• NAS (DNAS)
• SAN
Answer: Think of
an enterprise
• Lack/Absence
of uniform
utilization
• Solution:
• More software/
hardware
• Increase in surface
of attack.
• Lack of uniform
physical safeguards.
• Solves limitations
imposed by DAS.
• Storage networks
open up new
possibilities for
data
management
• SCSI cable is
replaced by a
network
• Think Mode : iSCSI
Storage Networks:
• In storage networks storage devices exist completely
independently of any computer.
• Several servers can access the same storage device
directly over the storage network without another server
having to be involved.
• Storage devices are also consolidated, which involves
replacing the many small hard disks attached to the
computers with a large disk subsystem.
• Recall enterprise.
Controller
• In most disk subsystems there is a
controller between the connection ports
and the hard disks.
• can significantly increase the data availability
and data access performance with the aid of
a so-called RAID procedure.
• realize the copying services instant copy and
remote mirroring and further additional
services. Figure 2.2 Servers are
• can act as a cache in an attempt to connected to the disk
subsystems via the ports.
accelerate read and write accesses to the Internally, the disk
server. subsystem consists of hard
disks, a controller, a cache
and internal I/O channels
In Active/active (load sharing) approach all hard disks are addressed via both I/O
channels in normal operation. The controller divides the load dynamically between the two I/O
channels so that the available hardware can be optimally utilized. If one I/O channel fails, then
the communication goes through the other channel only.
BITS Pilani, Pilani Campus
Disk sub systems types
• A RAID controller can distribute the data that a server writes to the
virtual hard disk amongst the individual physical hard disks in
various manners (RAID Levels).
• One factor common to almost all RAID levels is that they store
redundant information.
• If a physical hard disk fails, its data can be reconstructed from the
hard disks that remain intact.
• The defective hard disk can even be replaced by a new one during
operation if a disk subsystem has the appropriate hardware.
• Then the RAID controller reconstructs the data of the exchanged
hard disk.
• This process remains hidden to the server apart from a possible
reduction in performance: the server can continue to work
uninterrupted on the virtual hard disk
RAID 0 (striping): As in all RAID levels, the server sees only the virtual hard disk. The RAID
controller distributes the write operations of the server amongst several physical hard disks.
Parallel writing means that the performance of the virtual hard disk is higher than that of the
individual physical hard disks.
BITS Pilani, Pilani Campus
RAID 0: block-by-block striping cont.
• Whilst the first disk is writing the first block to the physical hard disk,
the RAID controller is already sending the second block, block B, to
the second hard disk and block C to the third hard disk.
• RAID 0 increases the performance of the virtual hard disk, but not its
fault-tolerance. If a physical hard disk is lost, all the data on the
virtual hard disk is lost.
• In RAID 1 fault-tolerance is of
primary importance.
• In both RAID 0+1 and RAID 10 the server sees only a single hard
disk, which is larger, faster and more fault-tolerant than a physical
hard disk. Question is which of the two RAID levels, RAID 0+1 or
RAID 10, is preferable?
• If large data quantities are written sequentially, then the RAID controller
can calculate the parity blocks from the data flow without reading the old
parity block from the disk.
• For example say, the blocks E, F, G and H are written in one go.
• The controller calculates the parity block PEFGH from them and overwrite
this without having previously read in the old value.
• Further more a RAID controller with a suitably large cache can hold
frequently changed parity blocks in the cache after writing to the disk, so
that the next time one of the data blocks in question is changed there is
no need to read in the parity block.
• Notepad++
• Winscp
• MobaXTerm
• Linux VM (HyperV, Virtual Box, VMWare workstation
player)
• Good references
• https://www.howtoforge.com/tutorial/linux-filesystem-defrag/
• https://www.thomas-krenn.com/en/wiki/Analyzing_a_Faulty_Hard_Disk_using_Smartctl