Cit314 J Black Educational Consult
Cit314 J Black Educational Consult
COMPUTER ORGANIZATION
Computer Organization is concerned with the structure and behavior of a computer system as seen by the
user. It acts as the interface between hardware and software.
Computer organization refers to the operational units and their interconnection that realize the architecture
specification.
Examples of organization attributes include those hardware details transparent to the programmer, such as
control signals, interfaces between the computer and peripherals, and the memory technology used.
COMPUTER ARCHITECTURE
Computer architecture refers to those attributes of a system visible to a programmer, or put another way,
those attributes that have a direct impact on the logical execution of a program.
Examples of architecture attributes include the instruction set, the number of bit to represent various data
types (e.g.., numbers, and characters), I/O mechanisms, and technique for addressing memory.
The memory unit is an essential component in any digital computer since it is needed for storing programs
and data.
The memory unit that communicates directly with the CPU is called the main memory.
The main memory is the central storage unit in a computer system. It is a relatively large and fast memory
used to store programs and data during the computer operation.
Q. Discuss in details the principal technology used for the main memory
The principal technology used for the main memory is based on semiconductor integrated circuits.
Integrated circuit RAM chips are available in two possible operating modes, static and dynamic.
2
The static RAM consists essentially of internal flip-flops that store the binary information. The stored
information remains valid as long as power is applied to the unit.
The static RAM is easier to use and has shorter read and write cycles. Most of the main memory in a
general-purpose computer is made up of RAM integrated circuit chips, but a portion of the memory may
be constructed with ROM chips.
The dynamic RAM stores the binary information in the form of electric charges that are applied to
capacitors.
The dynamic RAM offers reduced power consumption and larger storage capacity in a single memory chip
The capacitors are provided inside the chip by MOS transistors. The stored charge on the capacitors tend
to discharge with time and the capacitors must be periodically recharged by refreshing the dynamic
memory. Refreshing is done by cycling through the words every few milliseconds to restore the decaying
charge.
ROM is used for storing programs that are permanently resident in the computer and for tables of constants
that do not change in value once the production of the computer is completed. Among other things, the
ROM portion of main memory is needed for storing an initial program called a bootstrap loader.
The bootstrap loader is a program whose function is to start the computer software operating when power
is turned on.
AUXILIARY MEMORIES
NOTE: Auxiliary Memory was not defined in the material, only the types were discussed
1. Magnetic Tapes
Magnetic tape is a medium for magnetic recording, made of a thin, magnetizable coating on a long, narrow
strip of plastic film. It was developed in Germany, based on magnetic wire recording. Devices that record
and play back audio and video using magnetic tape are tape recorders and video tape recorders.
Magnetic tape is an information storage medium consisting of a magnetic coating on a flexible backing in
tape form. Data is recorded by magnetic encoding of tracks on the coating according to a particular tape
format.
Magnetic tape is wound on reels (or spools). These may be used on their own, as open-reel tape, or they
may be contained in some sort of magnetic tape cartridge for protection and ease of handling. Early
computers used open-reel tape, and this is still sometimes used on large computer systems although it has
3
been widely superseded by cartridge tape. On smaller systems, if tape is used at all it is normally cartridge
tape.
ii. Economical: The cost of storing characters on tape is very less as compared to other storage devices.
iv. Long term Storage and Re-usability: Magnetic tapes can be used for long term storage and a tape can
be used repeatedly without loss of data.
2. Magnetic Disks
Magnetic disks are most popular for direct access storage. Each disk consists of a number of invisible
concentric circles called tracks. Information is recorded on tracks of a disk surface in the form of tiny
magnetic sports. The presence of a magnetic sport represents one bit (1) and its absence represents zero bit
(0). The information stored in a disk can be read many times without affecting the stored data. So the
reading operation is non-destructive. But if you want to write a new data, then the existing data is erased
from the disk and new data is recorded.
4
Data is stored on either or both surfaces of discs in concentric rings called "tracks". Each track is divided
into a whole number of "sectors". Where multiple (rigid) discs are mounted on the same axle the set of
tracks at the same radius on all their surfaces is known as a" cylinder". Data is read and written by a disk
drive which rotates the discs and positions the read/write heads over the desired track(s). The latter radial
movement is known as "seeking”. There is usually one head for each surface that stores data. The head
writes binary data by magnetizing small areas or "zones" of the disk in one of two opposing orientations. It
reads data by detecting current pulses induced in a coil as zones with different magnetic alignment pass
underneath it.
3. Floppy Disks
These are small removable disks that are plastic coated with magnetic recording material. Floppy disks are
typically 3.5″ in size (diameter) and can hold 1.44 MB of data. This portable storage device is a rewritable
media and can be reused a number of times. Floppy disks are commonly used to move files between
different computers. The main disadvantage of floppy disks is that they can be damaged easily and,
therefore, are not very reliable. The following figure shows an example of the floppy disk. It is similar to
magnetic
• Read/Write head: A floppy disk drive normally has two read/write heads making Modern floppy disk
drives as double-sided drives. A head exists for each side of disk and Both heads are used for reading and
writing on the respective disk side.
• Head 0 and Head 1: Many people do not realize that the first head (head 0) is bottom one and top head
is head 1. The top head is located either four or eight tracks inward from the bottom head depending upon
the drive type.
• Head Movement: A motor called head actuator moves the head mechanism. The heads can move in and
out over the surface of the disk in a straight line to position themselves over various tracks. The heads move
in and out tangentially to the tracks that they record on the disk.
• Head: The heads are made of soft ferrous (iron) compound with electromagnetic coils. Each head is a
composite design with a R/W head centered within two tunnel erasure heads in the same physical assembly.
PC compatible floppy disk drive spin at 300 or 360r.p.m. The two heads are spring loaded and physically
grip the disk with small pressure, this pressure does not present excessive friction.
Recording Method
• Tunnel Erasure: As the track is laid down by the R/W heads, the trailing tunnel erasure heads force the
data to be present only within a specified narrow tunnel on each track. This process prevents the signals
from reaching adjacent track and making cross talk.
• Straddle Erasure: In this method, the R/W and the erasure heads do record and erasing at the same time.
The erasure head is not used to erase data stored in the diskette. It trims the top and bottom fringes of
recorded flux reversals. The erasure heads reduce the effect of cross-talk between tracks and minimize the
errors induced by minor run out problems on the diskette or diskette drive.
5
• Head alignment: Alignment is the process of placement of the heads with respect to the track that they
must read and write. Head alignment can be checked only against some sort of reference- standard disk
recorded by perfectly aligned machine. These types of disks are available and one can use one to check the
drive alignment.
HDDs are a type of non-volatile storage, retaining stored data even when powered off. A hard drive can be
used to store any data, including pictures, music, videos, text documents, and any files created or
downloaded.
On the back of a hard drive is a circuit board called the disk controller or interface board and is what
allows the hard drive to communicate with the computer.
$40,000. 1983 marked the introduction of the first 3.5-inch size hard drive, developed by Rodime. It had a
storage capacity of 10 MB. Seagate was the first company to introduce a 7200 RPM hard drive in 1992.
Seagate also introduced the first 10,000 RPM hard drive in 1996 and the first 15,000 RPM hard drive in
2000. The first solid-state drive (SSD) as we know them today was developed by SanDisk Corporation in
1991, with a storage capacity of 20 MB. However, this was not a flash-based SSD, which were introduced
later in 1995 by M-Systems. These drives did not require a battery to keep data stored on the memory chips,
making them a non-volatile storage medium.
Q. Describe a CD-ROM
CD-ROM
Advantages
• Easier access to a range of CD-ROMs.
• Ideally, access from the user’s own workstation in the office or at home.
• Simultaneous access by several users to the same data.
• Better security avoids damage to discs and equipment.
• Less personnel time needed to provide disks to users.
• Automated, detailed registration of usage statistics to support the management
Disadvantages
• Costs of the network software and computer hardware.
• Increased charges imposed by the information suppliers.
• Need for expensive, technical expertise to select, set up, manage, and maintain the network system.
• Technical problems when the CD-ROM product is not designed for use in the network.
• The network software component for the workstation side must be installed on each microcomputer
before this can be applied to access the CD-ROM’s.
Optical Devices
An optical disk is made up of a rotating disk which is coated with a thin reflective metal. To record data on
the optical disk, a laser beam is focused on the surface of the spinning disk. The laser beam is turned on
and off at varying rates! Due to this, tiny holes (pits) are burnt into the metal coating along the tracks. When
data stored on the optical disk is to be read, a less powerful laser beam is focused on the disk surface.
The storage capacity of these devices is tremendous; the Optical disk access time is relatively fast. The
biggest drawback of the optical disk is that it is a permanent storage device. Data once written cannot be
erased. Therefore, it is a read only storage medium. A typical example of the optical disk is the CD-ROM.
1. Read-only memory (ROM) disks, like the audio CD, are used for the distribution of standard program
and data files. These are mass-produced by mechanical pressing from a master die. The information is
actually stored as physical indentations on the surface of the CD. Recently low-cost equipment has been
introduced in the market to make one-off CD-ROMs, putting them into the next category.
2. Write-once read-many (WORM) disks: Some optical disks can be recorded once. The information
stored on the disk cannot be changed or erased. Generally, the disk has a thin reflective film deposited on
the surface. A strong laser beam is focused on selected spots on the surface and pulsed. The energy melts
the film at that point, producing a nonreflective void. In the read mode, a low power laser is directed at the
disk and the bit information is recovered by sensing the presence or absence of a reflected beam from the
disk.
3. Re-writeable, write-many read-many (WMRM) disks, just like the magnetic storage disks, allows
information to be recorded and erased many times. Usually, there is a separate erase cycle although this
may be transparent to the user. Some modern devices have this accomplished with one over-write cycle.
These devices are also called direct read-after-write (DRAW) disks.
8
4. WORM (write once, read many) is a data storage technology that allows information to be written to
a disc a single time and prevents the drive from erasing the data. The discs are intentionally not rewritable,
because they are especially intended to store data that the user does not want to erase accidentally. Because
of this feature, WORM devices have long been used for the archival purposes of organizations such as
government agencies or large enterprises
5. Erasable Optical Disk: An erasable optical disk is the one which can be erased and then loaded with
new data content all over again. These generally come with a RW label. These are based on a technology
popularly known as Magnetic Optical which involves the application of heat on a precise point on the disk
surface and magnetizing it using a laser.
6. Touchscreen Optical Device: A touchscreen is an input and output device normally layered on the top
of an electronic visual display of an information processing system. A user can give input or control the
information processing system through simple or multi-touch gestures by touching the screen with a special
stylus or one or more fingers. Some touchscreens use ordinary or specially coated gloves to work while
others may only work using a special stylus or pen. The user can use the touchscreen to react to what is
displayed and, if the software allows, to control how it is displayed; for example, zooming to increase the
text size.
• Capacitive Touch Technology – Capacitive touch screens take advantage of the conductivity of the
object to detect location of touch. While they are durable and last for a long time, they can malfunction if
they get wet. Their performance is also compromised if a nonconductor like a gloved finger presses on the
screen. Most smart phones and tablets have capacitive touch screens.
• Resistive Touch Technology – Resistive touch screens have moving parts. There is an air gap between
two layers of transparent material. When the user applies pressure to the outer layer, it touches the inner
layer at specific locations. An electric circuit is completed and the location can be determined. Though they
are cheaper to build compared to capacitive touch screens, they are also less sensitive and can wear out
quickly.
MEMORY MAPPING
Memory-mapping is a mechanism that maps a portion of a file, or an entire file, on disk to a range of
addresses within an application's address space. The application can then access files on disk in the same
way it accesses dynamic memory. This makes file reads and writes faster in comparison with using
functions such as fread and fwrite.
Benefits of Memory-Mapping
The principal benefits of memory-mapping are efficiency, faster file access, the ability to share memory
between applications, and more efficient coding.
to allocate, copy into, and then deallocate data buffers owned by the process does not access data from the
disk when the map is first constructed.
2. Efficiency
Mapping a file into memory allows access to data in the file as if that data had been read into an array in
the application's address space. Initially, MATLAB only allocates address space for the array; it does not
actually read data from the file until you access the mapped region.
VIRTUAL MEMORIES
In order to manage memory more efficiently and with fewer errors, modern systems provide an abstraction
of main memory known as virtual memory (VM).
Virtual memory is an elegant interaction of hardware exceptions, hardware address translation, main
memory, disk files, and kernel software that provides each process with a large, uniform, and private
address space.
With one clean mechanism, virtual memory provides three important capabilities.
• It uses main memory efficiently by treating it as a cache for an address space stored on
• disk, keeping only the active areas in main memory, and transferring data back and forth between
disk and memory as needed.
• It simplifies memory management by providing each process with a uniform address space.
• It protects the address space of each process from corruption by other processes.
Q. Since virtual memory works so well behind the scenes, why would a programmer need to understand it?
• Virtual memory is central. Virtual memory pervades all levels of computer systems, playing key roles
in the design of hardware exceptions, assemblers, linkers, loaders, shared objects, files, and processes.
Understanding virtual memory will help you better understand how systems work in general.
• Virtual memory is powerful. Virtual memory gives applications powerful capabilities to create and
destroy chunks of memory, map chunks of memory to portions of disk files, and share memory with other
processes
As with any other cache in the memory hierarchy, the data on disk (the lower level) is partitioned into
blocks that serve as the transfer units between the disk and the main memory (the upper level). VM systems
handle this by partitioning the virtual memory into fixed-sized blocks called virtual pages (VPs). Each
virtual page is P = 2p bytes in size. Similarly, physical memory is partitioned into physical pages (PPs),
also P bytes in size. (Physical pages are also referred to as page frames.)
11
Memory as a Cache
Replacement Algorithms
When a page fault occurs, the operating system has to choose a page to remove from memory to make room
for the page that has to be brought in. If the page to be removed has been modified while in memory, it
must be rewritten to the disk to bring the disk copy up to date. If, however, the page has not been changed
(e.g., it contains program text), the disk copy is already up to date, so no rewrite is needed.
The page to be read in just overwrites the page being evicted. While it would be possible to pick a random
page to evict at each page fault, system performance is much better if a page that is not heavily used is
chosen. If a heavily used page is removed, it will probably have to be brought back in quickly, resulting in
extra overhead. Much work has been done on the subject of page replacement algorithms, both theoretical
and experimental. Below we will describe some of the most important algorithms.
It is worth noting that the problem of ‘‘page replacement’’ occurs in other areas of computer design as well.
For example, most computers have one or more memory caches consisting of recently used 32-byte or 64-
byte memory blocks. When the cache is full, some block has to be chosen for removal. This problem is
precisely the same as page replacement except on a shorter time scale (it has to be done in a few
nanoseconds, not milliseconds as with page replacement).
The reason for the shorter time scale is that cache block misses are satisfied from main memory, which has
no seek time and no rotational latency. To select the particular algorithm, the algorithm with lowest page
fault rate is considered.
Once a bit has been set to 1, it stays 1 until the operating system resets it to 0 in software. If the hardware
does not have these bits, they can be simulated as follows. When a process is started up, all of its page table
entries are marked as not in memory. As soon as any page is referenced, a page fault will occur. The
operating system then sets the R bit (in its internal tables), changes the page table entry to point to the
correct page, with mode READ ONLY, and restarts the instruction. If the page is subsequently written on,
another page fault will occur, allowing the operating system to set the M bit and change the page’s mode
to READ/WRITE.
The R and M bits can be used to build a simple paging algorithm as follows. When a process is started up,
both page bits for all its pages are set to 0 by the operating system. Periodically (e.g., on each clock
interrupt), the R bit is cleared, to distinguish pages that have not been referenced recently from those that
have been. When a page fault occurs, the operating system inspects all the pages and divides them into four
categories based on the current values of their R and M bits:
In effect, the supermarket maintains a linked list of all the products it currently sells in the order they were
introduced. The new one goes on the back of the list; the one at the front of the list is dropped. As a page
replacement algorithm, the same idea is applicable. The operating system maintains a list of all pages
currently in memory, with the page at the head of the list the oldest one and the page at the tail the most
recent arrival. On a page fault, the page at the head is removed and the new page added to the tail of the
list. When applied to stores, FIFO might remove mustache wax, but it might also remove flour, salt, or
butter. When applied to computers the same problem arises. For this reason, FIFO in its pure form is rarely
used.
When a page fault occurs, the page being pointed to by the hand is inspected. If its R bit is 0, the page is
evicted, the new page is inserted into the clock in its place, and the hand is advanced one position. If R is
1, it is cleared and the hand is advanced to the next page. This process is repeated until a page is found with
R = 0. Not surprisingly, this algorithm is called clock. It differs from second chance only in the
implementation.
Advantages
• The I/O devices can directly access the main memory without the intervention by the processor in
I/O processor-based systems.
• It is used to address the problems that are arises in Direct memory access method.
Modes of Transfer
• Programmed I/O.
• Interrupt- initiated I/O.
• Direct memory access (DMA).
1. Programmed I/O: It is due to the result of the I/O instructions that are written in the computer program.
Each data item transfer is initiated by an instruction in the program. Usually, the transfer is from a CPU
register and memory. In this case it requires constant monitoring by the CPU of the peripheral devices.
Example of Programmed I/O: In this case, the I/O device does not have direct access to the memory unit.
A transfer from I/O device to memory requires the execution of several instructions by the CPU, including
an input instruction to transfer the data from device to the CPU and store instruction to transfer the data
from CPU to memory. In programmed I/O, the CPU stays in the program loop until the I/O unit indicates
that it is ready for data transfer. This is a time-consuming process since it needlessly keeps the CPU busy.
This situation can be avoided by using an interrupt facility. This is discussed below.
2. Interrupt- initiated I/O: Since in the above case we saw the CPU is kept busy unnecessarily. This
situation can very well be avoided by using an interrupt driven method for data transfer. By using interrupt
facility and special commands to inform the interface to issue an interrupt request signal whenever data is
available from any device. In the meantime, the CPU can proceed for any other program execution. The
15
interface meanwhile keeps monitoring the device. Whenever it is determined that the device is ready for
data transfer it initiates an interrupt request signal to the computer. Upon detection of an external interrupt
signal the CPU stops momentarily the task that it was already performing, branches to the service program
to process the I/O transfer, and then return to the task it was originally performing.
3. Direct Memory Access: The data transfer between a fast storage media such as magnetic disk and
memory unit is limited by the speed of the CPU. Thus, we can allow the peripherals directly communicate
with each other using the memory buses, removing the intervention of the CPU. This type of data transfer
technique is known as DMA or direct memory access. During DMA the CPU is idle and it has no control
over the memory buses. The DMA controller takes over the buses to manage the transfer directly between
the I/O devices and the memory unit.
• Bus Request: It is used by the DMA controller to request the CPU to relinquish the control of the buses.
• Bus Grant: It is activated by the CPU to Inform the external DMA controller that the buses are in high
impedance state and the requesting DMA can take control of the buses. Once the DMA has taken the control
of the buses it transfers the data. This transfer can take place in many ways.
The growth of microprocessor speed/performance by a factor of 2 every 18 months (or about 60% per year)
is known as Moore’s law.
Q. The growth of microprocessor speed is the result of a combination of Two (2) factors, State them.
• Increase in complexity (related both to higher device density and to larger size) of VLSI chips, projected
to rise to around 10 M transistors per chip for microprocessors, and 1B for dynamic random-access
memories (DRAMs), by the year 2000
• Introduction of, and improvements in, architectural features such as on-chip cache memories, large
instruction buffers, multiple instruction issue per cycle, multithreading, deep pipelines, out-oforder
instruction execution, and branch prediction.
1. Higher speed, or solving problems faster. This is important when applications have “hard” or “soft”
deadlines. For example, we have at most a few hours of computation time to do 24-hour weather forecasting
or to produce timely tornado warnings.
16
2. Higher throughput, or solving more instances of given problems. This is important when many similar
tasks must be performed. For example, banks and airlines, among others, use transaction processing
systems that handle large volumes of data.
3. Higher computational power, or solving larger problems. This would allow us to use very detailed, and
thus more accurate, models or to carry out simulation runs for longer periods of time (e.g., 5-day, as opposed
to 24-hour, weather forecasting).
Types of Parallelism
Control-flow parallel computers are essentially based on the same principles as the sequential or von
Neumann computer, except that multiple instructions can be executed at any given time.
Data-flow parallel computers, sometimes referred to as “non-von Neumann,” are completely different in
that they have no pointer to active instruction(s) or a locus of control. The control is totally distributed, with
the availability of operands triggering the activation of instructions.
Q. What is Pipelining
PIPELINING
Pipelining owes its origin to car assembly lines. The idea is to have more than one instruction being
processed by the processor at the same time. Similar to the assembly line, the success of a pipeline depends
upon dividing the execution of an instruction among a number of subunits (stages), each performing part
of the required operations.
Pipelining refers to the technique in which a given task is divided into a number of subtasks that need to
be performed in sequence. Each subtask is performed by a given functional unit. The units are connected
in a serial fashion and all of them operate simultaneously.
The use of pipelining improves the performance compared to the traditional sequential execution of tasks.
Figure 3.20 shows an illustration of the basic difference between executing four subtasks of a given
instruction (in this case fetching F, decoding D, execution E, and writing the results W) using pipelining
and sequential processing.
Types of Pipelines:
1. Arithmetic Pipeline- Arithmetic pipelines are usually found in most of the computers. They are used
for floating point operations, multiplication of fixed-point numbers etc.
2. Instruction Pipeline- In this a stream of instructions can be executed by overlapping fetch, decode and
execute phases of an instruction cycle. This type of technique is used to increase the throughput of the
computer system. An instruction pipeline reads instruction from the memory while previous instructions
are being executed in other segments of the pipeline. Thus, we can execute multiple instructions
simultaneously. The pipeline will be more efficient if the instruction cycle is divided into segments of equal
duration.
Pipeline Conflicts
These are factors that cause the pipeline to deviate from its normal performance.
1. Timing Variations: All stages cannot take same amount of time. This problem generally occurs in
instruction processing where different instructions have different operand requirements and thus different
processing time.
2. Data Hazards: When several instructions are in partial execution, and if they reference same data then
the problem arises. We must ensure that next instruction does not attempt to access data before the current
instruction, because this will lead to incorrect results. Branching In order to fetch and execute the next
instruction, we must know what that instruction is. If the present instruction is a conditional branch, and its
result will lead us to the next instruction, then the next instruction may not be known until the current one
is processed.
3. Interrupts: Interrupts set unwanted instruction into the instruction stream. Interrupts effect the execution
of instruction.
4. Data Dependency: It arises when an instruction depends upon the result of a previous instruction but
this result is not yet available.
Advantages of Pipelining
• The cycle time of the processor is reduced.
• It increases the throughput of the system
• It makes the system reliable.
Disadvantages of Pipelining
• The design of pipelined processor is complex and costly to manufacture.
18
MEMORY ADDRESSING
A memory address is a unique identifier used by a device or CPU for data tracking.
In computing, a memory address is a reference to a specific memory location used at various levels by
software and hardware. Memory addresses are fixed-length sequences of digits conventionally displayed
and manipulated as unsigned integers.
Q. Using a tabular format, show various addressing modes, meaning and their meanings.
in memory. In other words, the selected register contains the address of the operand rather than the operand
itself.
a. To provide the user with programming flexibility by offering such facilities as memory pointers, loop
control counters, data indexing, and programme displacement.
Memory hierarchy
The organization of smaller memories to hold the recently accessed files or programs closer to the CPU is
term memory hierarchy.
The memory hierarchy system encompasses all the storage devices used in a computer system. Its ranges
from the cache memory, which is smaller in size but faster in speed to a relatively auxiliary memory which
is larger in size but slower in speed. The smaller the size of the memory the costlier it becomes.
Memory_Stall_Cycles = IC*Mem_Refs * Miss_Rate * Miss_Penalty
Where,
IC = Instruction Count
Mem_Refs = Memory References per Instruction
Miss_Rate = Fraction of Accesses that are not in the
cache
Miss_Penalty = Additional time to service the Miss
21
The cache memory is the fastest and smallest memory. It is easily accessible by the CPU because it closer
to the CPU. Cache memory is very costly compare to the main memory and the auxiliary memory.
The main memory also known as primary memory, communicates directly to the CPU. Its also
communicates to the auxiliary memory through the I/O processor. During program execution, the files that
are not currently needed by the CPU are often moved to the auxiliary storage devices in order to create
space in the main memory for the currently needed files to be stored. The main memory is made up of
Random Access Memory (RAM) and Read Only Memory (ROM).
The auxiliary memory is very large in size and relatively slow in speed. Its includes the magnetic tapes
and the magnetic disks which are used for the storage and backup of removable files. The auxiliary
memories store programs that are not currently needed by the CPU. They are very cheap when compare to
the both cache and main memories.
1. Access Time: refers to the action that physically takes place during a read or write operation. When
a data or program is moved from the top of the memory hierarchy to the bottom , the access time
automatically increases. Hence, the interval of time at which the data are request to read or write is
called Access time.
2. Capacity: the capacity of a memory hierarchy often increased when a data is moved from the top
of the memory hierarchy to the bottom. The capacity of a memory hierarchy is the total amount of
data a memory can store. The capacity of a memory level is usually measured in bytes.
3. Cycle time: is defined as the time elapsed from the start of a read operation to the start of a
subsequent read.
4. Latency: is defined as the time interval between the request for information and the access to the
first bit of that information.
5. Bandwidth: this measures the number of bits that can be accessed per second.
6. Cost: the cost of a memory level is usually specified as dollars per megabytes. When the data is
moved from bottom of the memory hierarchy to top, the cost for each bit increases automatically. This
means that an internal memory is expensive compared to external memory.
22
• registers,
• cache,
• main memory,
• magnetic discs, and
• magnetic tapes.
The first three hierarchies are the primary memory (volatile memories) which mean when there is no
power, and then automatically they lose their stored data. The last two hierarchies are the secondary
memories (nonvolatile) which means they store the data permanently.
CPU Level 0
Register
Cache Memory
Level 1
(SRAM )
Main Memory
Level 2
(DRAM )
Optical Disk
Level 4
Magnetic Tape
1. operation. Normally, a complex instruction set computer uses many registers to accept main memory.
2. Cache Memory: Cache memory can also be found in the processor, however rarely it may be another
integrated circuit (IC) which is separated into levels. The cache holds the chunk of data which are frequently
used from main memory. When the processor has a single core then it will have two (or) more cache levels
rarely. Present multi-core processors will be having three, 2-levels for each one core, and one level is
shared.
3. Main Memory: This is the memory unit that communicate directly to the CPU. It is the primary storage
unit in a computer system. the main stores data or program currently used by the CPU during operation. It
is very fast in terms of access time and it is made up of RAM and ROM.
4. Magnetic Disks: The magnetic disks is a circular plates fabricated of plastic or metal by magnetized
material. Frequently, two faces of the disk are utilized as well as many disks may be stacked on one spindle
by read or write heads obtainable on every plane. All the disks in computer turn jointly at high speed. The
tracks in the computer are nothing but bits which are stored within the magnetized plane in spots next to
concentric circles. These are usually separated into sections which are named as sectors.
5. Magnetic Tape: This tape is a normal magnetic recording which is designed with a slender magnetizable
covering on an extended, plastic film of the thin strip. This is mainly used to back up huge data. Whenever
the computer requires to access a strip, first it will mount to access the data. Once the data is allowed, then
it will be unmounted. The access time of memory will be slower within magnetic strip as well as it will
take a few minutes for accessing a strip.
23
• A facility for dynamic storage relocation that maps logical memory references into physical
memory addresses.
• A provision for sharing common programs stored in memory by different users.
• Protection of information against unauthorized access between users and preventing users from
changing operating system functions. The dynamic storage relocation hardware is a mapping
process similar to the paging system
Q. What is Paging
Paging
In memory management, paging can be described as a storage mechanism that allows operating system
(OS) to retrieve processes from the secondary storage into the main memory in the form of pages. It is a
function of memory management where a computer will store and retrieve data from a device’s secondary
storage to the primary storage.
Paging Protection
The paging process should be protected by using the concept of insertion of an additional bit called
Valid/Invalid bit. Paging Memory protection in paging is achieved by associating protection bits with each
page. These bits are associated with each page table entry and specify protection on the corresponding page.
Advantages
The following are the advantages of using Paging method:
• No need for external Fragmentation
• Swapping is easy between equal-sized pages and page frames.
• Easy to use memory management algorithm
Disadvantages
The following are the disadvantages of using Paging method
• May cause Internal fragmentation
• Page tables consume additional memory.
• Multi-level paging may lead to memory reference overhead.
Multi-programming
Multiprogramming is the basic form of parallel processing in which several programs are run at the same
time on a single processor.
Disadvantages of Multiprogramming:
• Long time jobs have to wait long
• Tracking all processes sometimes difficult
• CPU scheduling is required
• Requires efficient memory management
• User interaction not possible during program execution
A much better place to apply protection is in the logical address space rather than the physical address
space. This can be done by including protection information within the segment table or segment register
of the memory management hardware. The content of each entry in the segment table or a segment register
is called a descriptor. A typical descriptor would contain, in addition to a base address field, one or two
additional fields for protection purposes.
The protection field in a segment descriptor specifies the access rights available to the particular segment.
In a segmented-page organization, each entry in the page table may have its own protection field to describe
the access rights of each page. The protection information is set into the descriptor by the master control
program of the operating system.
Some of the access rights of interest that are used for protecting the programs residing in memory are:
• Full read and write privileges
• Read only (write protection)
• Execute only (program protection)
• System only (operating system protection)
Full read and write privileges are given to a program when it is executing its own instructions.
Write protection is useful for sharing system programs such as utility programs and other library routines.
The operating system protection condition is placed in the descriptors of all operating system programs to
prevent the occasional user from accessing operating system segments.
• External Memory or Secondary Memory: This is a permanent storage (non-volatile) and does not lose
26
any data when power is switched off. It is made up of Magnetic Disk, Optical Disk, Magnetic Tape i.e.
peripheral storage devices which are accessible by the processor via I/O Module.
• Internal Memory or Primary Memory: This memory is volatile in nature. it loses its data, when power
is switched off. It is made up of Main Memory, Cache Memory & CPU registers. This is directly accessible
by the processor.
CONTROL UNIT
Control Unit is the part of the computer’s central processing unit (CPU), which directs the operation of the
processor.
Control Unit tell the computer’s memory, arithmetic/logic unit and input and output devices how to
respond to the instructions that have been sent to the processor. It fetches internal instructions of the
programs from the main memory to the processor instruction register, and based on this register contents,
the control unit generates a control signal that supervises the execution of these instructions. A control
unit works by receiving input information to which it converts into control signals, which are then sent to
the central processor.
2. It depends on number of gates; how much delay can occur in generation of control signals.
2. Modifications in the control signals are very difficult because it requires rearranging of wires in the
hardware circuit.
28
5. It is Expensive.
A control unit whose binary control values are saved as words in memory is called a micro-programmed
control unit. The control unit of a microprogram-controlled computer is a computer inside a computer.
• With a single-level control store: In this, the instruction opcode from the instruction register is sent to
the control store address register. Based on this address, the first microinstruction of a microprogram that
interprets execution of this instruction is read to the microinstruction register. This microinstruction
contains in its operation part encoded control signals, normally as few bit fields. In a set microinstruction
field decoder, the fields are decoded. The microinstruction also contains the address of the next
microinstruction of the given instruction microprogram and a control field used to control activities of the
microinstruction address generator.
• With a two-level control store: In this, in a control unit with a two-level control store, besides the control
memory for microinstructions, a Nano-instruction memory is included. In such a control unit,
microinstructions do not contain encoded control signals. The operation part of microinstructions contains
29
the address of the word in the Nano-instruction memory, which contains encoded control signals. The
Nano-instruction memory contains all combinations of control signals that appear in microprograms that
interpret the complete instruction set of a given computer, written once in the form of Nano-instructions
• The control data register holds the present microinstruction while the next address is computed and
read from memory.
• The data register is sometimes called a pipeline register.
• It allows the execution of the microoperations specified by the control word simultaneously with
the generation of the next microinstruction.
• This configuration requires a two-phase clock, with one clock applied to the address register and
the other to the data register.
• The main advantage of the micro programmed control is the fact that once the hardware
configuration is established; there should be no need for further hardware or wiring changes.
• If we want to establish a different control sequence for the system, all we need to do is specify a
different set of microinstructions for control memory.
Asynchronous (clockless) control is a method of control in which the time allotted for performing an
operation depends on the time actually required for the operation, rather than on a predetermined fraction
of a fixed machine cycle. Virtually all digital design today is based on asynchronous approach.
Basic Concepts
There are a few key concepts fundamental to the understanding of asynchronous circuits:
• The timing models
• The mode of operation and
• The signaling conventions.
1. Timing model
Asynchronous circuits are classified according to their behaviour with respect to circuit delays. If a circuit
functions correctly irrespective of the delays in the logic gates and the delays in the wiring it is known as
delay-insensitive. A restricted form of this circuit known as speed independent allows arbitrary delays in
logic elements but assumes zero delays in the interconnect (i.e. all interconnect wires are equi-potential).
Finally, if the circuit only functions when the delays are below some predefined limit the circuit is known
as bounded delay.
2. Mode
Asynchronous circuits can operate in one of two modes. The first is called fundamental mode and assumes
no further input changes can be applied until all outputs have settled in response to a previous input. The
second, input/output mode, allows
i. Two-phase
In a two-phase communication the information is transmitted by a single transition or change in voltage
level on a wire. The Figure below shows an example of two-phase communication.
32
ii. Four-phase
With four-phase communication two phases are active communication while the other two permit recovery
to a predefined state. The Figure bellow shows an example of four-phase communication; in this example
all wires are initialized to a logical Low level.
• No clock skew - Clock skew is the difference in arrival times of the clock signal at different parts of the
circuit. Since asynchronous circuits by definition have no globally distributed clock, there is no need to
33
worry about clock skew. In contrast, synchronous systems often slow down their circuits to accommodate
the skew. As feature sizes decrease, clock skew becomes a much greater concern.
• Lower power - Standard synchronous circuits have to toggle clock lines, and possibly pre-charge and
discharge signals, in portions of a circuit unused in the current computation. For example, even though a
floating-point unit on a processor might not be used in a given instruction stream, the unit still must be
operated by the clock. Although asynchronous circuits often require more transitions on the computation
path than synchronous circuits, they generally have transitions only in areas involved in the current
computation. Note that there are some techniques in synchronous design that addresses this issue as well.
Synchronous circuits must wait until all possible computations have completed before latching the results,
yielding worst-case performance. Many asynchronous systems sense when a computation has completed,
allowing them to exhibit average case performance. For circuits such as ripple-carry adders where the
worst-case delay is significantly worse than the average-case delay, this can result in a substantial savings.
• Easing of global timing issues: In systems such as a synchronous microprocessor, the system clock, and
thus system performance, is dictated by the slowest (critical) path. Thus, most portions of a circuit must be
carefully optimized to achieve the highest clock rate, including rarely used portions of the system. Since
many asynchronous systems operate at the speed of the circuit path currently in operation, rarely used
portions of the circuit can be left un-optimized without adversely affecting system performance.
• Better technology migration potential - Integrated circuits will often be implemented in several different
technologies during their lifetime. Early systems may be implemented with gate arrays, while later
production runs may migrate to semi-custom or custom ICs. Greater performance for synchronous systems
can often only be achieved by migrating all system components to a new technology, since again the overall
system performance is based on the longest path. In many asynchronous systems, migration of only the
more critical system components can improve system performance on average, since performance is
dependent on only the currently active path. Also, since many asynchronous systems sense computation
completion, components with different delays may often be substituted into a system without altering other
elements or structures.
• Automatic adaptation to physical properties - The delay through a circuit can change with variations
in fabrication, temperature, and power-supply voltage. Synchronous circuits must assume that the worst
possible combination of factors is present and clock the system accordingly. Many asynchronous circuits
sense computation completion, and will run as quickly as the current physical properties allow.
• Asynchronous circuits are more difficult to design in an ad hoc fashion than synchronous circuits. In a
synchronous system, a designer can simply define the combinational logic necessary to compute the given
function, and surround it with latches.
• By setting the clock rate to a long enough period, all worries about hazards (undesired signal transitions)
and the dynamic state of the circuit are removed. In contrast, designers of asynchronous systems must pay
a great deal of attention to the dynamic state of the circuit. Hazards must also be removed from the circuit,
or not introduced in the first place, to avoid incorrect results.
• The ordering of operations, which was fixed by the placement of latches in a synchronous system, must
be carefully ensured by the asynchronous control logic. For complex systems, these issues become too
difficult to handle by hand. Unfortunately, asynchronous circuits in general cannot leverage off of existing
CAD tools and implementation alternatives for synchronous systems.
• For example, some asynchronous methodologies allow only algebraic manipulations (associative,
commutative, and DeMorgan's Law) for logic decomposition, and many do not even allow these.
• Placement, routing, partitioning, logic synthesis, and most other CAD tools either need modifications for
asynchronous circuits, or are not applicable at all.
• Finally, even though most of the advantages of asynchronous circuits are towards higher performance, it
isn't clear that asynchronous circuits are actually any faster in practice.
• Asynchronous circuits generally require extra time due to their signaling policies, thus increasing average-
case delay.
A communication is synchronous when sending and receiving information between a sender and a receiver
are simultaneous events. Microcontrollers have the ability to communication asynchronously and
synchronously. With Synchronous communication, there is a wire between two communicating agents
carrying the clock pulse so both microcontrollers can communicate using the same pulse.
With asynchronous communication, there is no wire between the two microcontrollers, so each
microcontroller is essentially blind to the pulse rate. Each microcontroller is told, using a baud rate, what
speed to execute the communication. That means the two devices do not share a dedicated clock signal (a
unique clock exists on each device). Each device must setup ahead of time a matching bit rate and how
many bits to expect in a given transaction.
2. Synchronous transmission needs a clock signal between the source and target to let the target know of
the new byte. In comparison, with asynchronous transmission, a clock signal is not needed because of the
parity bits that are attached to the data being transmitted, which serves as a start indicator of the new byte.
3. The data transfer rate of synchronous transmission is faster since it transmits in chunks of data, compared
to asynchronous transmission which transmits one byte at a time.
Asynchronous Transmission
In asynchronous transmission, data moves in a half-paired approach, 1 byte or 1 character at a time. It
sends the data in a constant current of bytes. The size of a character transmitted is 8 bits, with a parity bit
added both at the beginning and at the end, making it a total of 10 bits. It doesn’t need a clock for
integration—rather, it utilizes the parity bits to tell the receiver how to translate the data. It is
straightforward, quick, cost-effective, and does not need two-way communication to function.
1. Large-scale heterogenous system integration. In multi- and many-core processors and systems-onchip
(SoC’s), some level of asynchrony is inevitable in the integration of heterogeneous components. Typically,
there are several distinct timing domains, which are glued together using an asynchronous communication
fabric. There has been much recent work on asynchronous and mixed synchronous asynchronous systems
2. Ultra-low-energy systems and energy harvesting. Asynchronous design is also playing a crucial role
in the design of systems that operate in regimes where energy availability is extremely limited. In one
application, such fine-grain adaptation, in which the datapath latency can vary subtly for each input sample,
is not possible in a fixed-rate synchronous design. In a recent in-depth case study by Chang et al., focusing
on ultra-lowenergy 8051 microcontroller cores with voltage scaling, it was shown that under extreme
process, voltage, and temperature (PVT) variations, a synchronous core requires its delay margins to be
increased by a factor of 12_, while a comparable asynchronous core can operate at actual speed.
3. Continuous-time digital signal processors (CTDSP’s). Another intriguing direction is the development
of continuous-time digital signal processors, where input samples are generated at irregular rates by a level-
crossing analog-to-digital converter, depending on the actual rate of change of the input’s waveform. An
early specialized approach, using finel discretized sampling, demonstrated a 10_ power reduction
4. Alternative computing paradigms. Finally, there is increasing interest in asynchronous circuits as the
organizing backbone of systems based on emerging computing technologies, such as cellular nano-array
36
and nano magnetics, where highly-robust asynchronous approaches are crucial to mitigating timing
irregularities.
These two methods can achieve this asynchronous way of data transfer:
• Strobe Control: This is one way of transfer i.e. by means of strobe pulse supplied by one of the units to
indicate to the other unit when the transfer has to occur.
• Handshaking: This method is used to accompany each data item being transferred with a control signal
that indicates the presence of data in the bus. The unit receiving the data item responds with another control
signal to acknowledge receipt of the data.
Strobe control method of data transfer uses a single control signal for each transfer. The strobe may be
activated by either the source unit or the destination unit. This control line is also known as a strobe, and
it may be achieved either by source or destination, depending on which initiate the transfer.
SOURCE INITIATED STROBE: The data bus carries the binary information from source unit to the
destination unit as shown below.
The strobe is a single line that informs the destination unit when a valid data word is available in the bus.
• The source removes the data from the bus for a brief period of time after it disables its strobe pulse.
HANDSHAKING
The strobe method has the disadvantage that the source unit that initiates the transfer has no way of knowing
whether the destination has received the data that was placed in the bus. Similarly, a destination unit that
initiates the transfer has no way of knowing whether the source unit has placed data on the bus.
• In case of source-initiated data transfer under strobe control method, the source unit has no way of
knowing whether destination unit has received the data or not.
• Similarly, destination-initiated transfer has no method of knowing whether the source unit has placed the
data on the data bus.
• Handshaking mechanism solves this problem by introducing a second control signal that provides a reply
to the unit that initiate the transfer.
Destination IT
39
Asynchronous Data Transfer in computer organization has the following advantages, such as:
• It is more flexible, and devices can exchange information at their own pace. In addition, individual
data characters can complete themselves so that even if one packet is corrupted, its predecessors
and successors will not be affected.
• It does not require complex processes by the receiving device. Furthermore, it means that
inconsistency in data transfer does not result in a big crisis since the device can keep up with the
data stream.
• It also makes asynchronous transfers suitable for applications where character data is generated
irregularly.
Fault Tolerance is the art and science of building computing systems that continue to operate satisfactorily
in the presence of faults.
faults are the cause of errors, and errors are the cause of failures. Often the term failure is used
interchangeably with the term malfunction
1. A fault is a physical defect, imperfection, or flaw that occurs within some hardware or software
component. Essentially, the definition of a fault, as used in the fault tolerance community, agrees with the
definition found in the dictionary. A fault is a blemish, weakness, or shortcoming of a particular hardware
or software component.
2. An error is the manifestation of a fault. Specifically, an error is a deviation from accuracy or correctness.
3. Finally, if the error results in the system performing one of its functions incorrectly then a system failure
has occurred. Essentially, a failure is the nonperformance of some action that is due or expected. A failure
is also the performance of some function in a subnormal quantity or quality.
40
Q. Using Three (3) Universal models present the concept of faults, errors, and failures
• first universe is the physical universe in which faults occur. The physical universe contains the
semiconductor devices, mechanical elements, displays, printers, power supplies, and other physical entities
that make up a system. A fault is a physical defect or alteration of some component within the physical
universe.
• The second universe is the informational universe. The informational universe is where the error occurs.
Errors affect units of information such as data words within a computer or digital voice or image
information. An error has occurred when some unit of information becomes incorrect.
• The final universe is the external or user’s universe. The external universe is where the user of a system
ultimately sees the effect of faults and errors. The external universe is where failures occur. The failure is
any deviation that occurs from the desired or expected behavior of a system. In summary, faults are physical
events that occur in the physical universe. Faults can result in errors in the informational universe, and
errors can ultimately lead to failures that are witnessed in the external universe of the system.
• Fault latency is the length of time between the occurrence of a fault and the appearance of an error due
to that fault.
• Error latency is the length of time between the occurrence of an error and the appearance of the resulting
failure. Based on the three-universe model, the total time between the occurrence of a physical fault and
the appearance of a failure will be the sum of the fault latency and the error latency.
Characteristics of Faults
• Causes/Source of Faults
• Nature of Faults
• Fault Duration
• Extent of Faults
• Value of faults
1. Sources of faults: Faults can be the result of a variety of things that occur within electronic components,
external to the components, or during the component or system design process. Problems at any of several
points within the design process can result in faults within the system.
• Specification mistakes, which include incorrect algorithms, architectures, or hardware and software
design specifications.
• Implementation mistakes. Implementation, as defined here, is the process of transforming hardware and
software specifications into the physical hardware and the actual software. The implementation can
introduce faults because of poor design, poor component selection, poor construction, or software coding
mistakes.
• Component defects. Manufacturing imperfections, random device defects, and component wear-out are
typical examples of component defects. Electronic components simply become defective sometimes. The
41
defect can be the result of bonds breaking within the circuit or corrosion of the metal. Component defects
are the most commonly considered cause of faults.
• External disturbance; for example, radiation, electromagnetic interference, battle damage, operator
mistakes, and environmental extremes.
2. Nature of a faults: specifies the type of fault; for example, whether it is a hardware fault, a software
fault, a fault in the analog circuitry, or a fault in the digital circuitry.
3. Fault Duration. The duration specifies the length of time that a fault is active.
• Permanent fault, that remains in existence indefinitely if no corrective action is taken.
• Transient fault, which can appear and disappear within a very short period of time.
• Intermittent fault that appears, disappears, and then reappears repeatedly.
4. Fault Extent. The extent of a fault specifies whether the fault is localized to a given hardware or software
module or globally affects the hardware, the software, or both.
5. Fault value of a fault can be either determinate or indeterminate. A determinate fault is one whose status
remains unchanged throughout time unless externally acted upon. An indeterminate fault is one whose
status at some time, T, may be different from its status at some increment of time greater than or less than
T.
Three primary techniques for maintaining a system’s normal performance in an environment where faults
are of concern; fault avoidance, fault masking, and fault tolerance.
• Fault avoidance is a technique that is used in an attempt to prevent the occurrence of faults. Fault
avoidance can include such things as design reviews, component screening, testing, and other quality
control methods.
• Fault masking is any process that prevents faults in a system from introducing errors into the
informational structure of that system.
• Fault tolerance is the ability of a system to continue to perform its tasks after the occurrence of faults.
The ultimate goal of fault tolerance is to prevent system failures from occurring. Since failures are directly
caused by errors, the terms fault tolerance and error tolerance are often used interchangeably.
• Fault recovery is the process of remaining operational or regaining operational status via
reconfiguration even in the presence of faults.
Fault tolerance is an attribute that is designed into a system to achieve some design goals such as;
• Dependability
• Reliability
• Availability
• Safety
• Performability
• Maintainability
• Testability
1. Dependability. The term dependability is used to encapsulate the concepts of reliability, availability,
safety, maintainability, performability, and testability. Dependability is simply the quality of service
provided by a particular system. Reliability, availability, safety, maintainability, performability, and
testability, are examples of measures used to quantify the dependability of a system.
2. Reliability. The reliability of a system is a function of time, R(t), defined as the conditional probability
that the system performs correctly throughout the interval of time, [t0,t], given that the system was
performing correctly at time t0. In other words, the reliability is the probability that the system operates
correctly throughout a complete interval of time.
3. Availability. Availability is a function of time, A(t), defined as the probability that a system is operating
correctly and is available to perform its functions at the instant of time, t. Availability differs from reliability
in that reliability involves an interval of time, while availability is taken at an instant of time.
4. Safety. Safety is the probability, S(t), that a system will either perform its functions correctly or will
discontinue its functions in a manner that does not disrupt the operation of other systems or compromise
the safety of any people associated with the system. Safety is a measure of the failsafe capability of a
system; if the system does not operate correctly, it is desired to have the system fail in a safe manner.
5. Performability. The performability of a system is a function of time, P(L,t), defined as the probability
that the system performance will be at, or above, some level, L, at the instant of time, t Performability
differs from reliability in that reliability is a measure of the likelihood that all of the functions are performed
correctly, while performability is a measure of the likelihood that some subset of the functions is performed
correctly.
6. Maintainability. Maintainability is a measure of the ease with which a system can be repaired, once it
has failed. In more quantitative terms, maintainability is the probability, M(t), that a failed system will be
restored to an operational state within a period of time t.
7. Testability. Testability is simply the ability to test for certain attributes within a system. Measures of
testability allow one to assess the ease with which certain tests can be performed. Certain tests can be
automated and provided as an integral part of the system to improve the testability.
1. Diversity: If a system’s main electricity supply fails, potentially due to a storm that causes a power
outage or affects a power station, it will not be possible to access alternative electricity sources.
2. Replication: Replication is a more complex approach to achieving fault tolerance. It involves using
multiple identical versions of systems and subsystems and ensuring their functions always provide identical
results. If the results are not identical, then a democratic procedure is used to identify the faulty system.
• No Single Point of Failure: This means if a capacitor, block of software code, a motor, or any single item
fails, then the system does not fail. As an example, many hospitals have backup power systems in case the
grid power fails, thus keeping critical systems within the hospital operational. Critical systems may have
multiple redundant schemes to maintain a high level of fault tolerance and resilience.
• No Single Point Repair Takes the System Down: Extending the single point failure idea, effecting a
repair of a failed component does not require powering down the system, for example. It also means the
system remains online and operational during repair. This may pose challenges for both the design and the
maintenance of a system. Hot swappable power supplies is an example of a repair action that keeps the
system operating while replacing a faulty power supply.
• Fault isolation or identification: The system is able to identify when a fault occurs within the system
and does not permit the faulty element to adversely influence to functional capability (i.e. Losing data or
making logic errors in a banking system). The faulty elements are identified and isolated. Portions of the
system may have the sole purpose of detecting faults, built-in self-test (BIST) is an example.
• The change may cause transient or permanent changes affecting how the working elements of the
system response and function. Variation occurs, and when a failure occurs there often is an increase
in variability.
Q. What is Redundancy
Redundancy
All of fault tolerance is an exercise in exploiting and managing redundancy. Redundancy is the property of
having more of a resource than is minimally necessary to do the job at hand. As failures happen, redundancy
is exploited to mask or otherwise work around these failures, thus maintaining the desired level of
functionality.
Hardware redundancy is provided by incorporating extra hardware into the design to either detect or
override the effects of a failed component.
Software redundancy is used mainly against software failures. It is a reasonable guess that every large
piece of software that has ever been produced has contained faults (bugs).
Information redundancy is the addition of redundant information to data to allow fault detection fault
masking, or possibly fault tolerance.
Time redundancy can be used to detect transient faults in situations in which such faults may otherwise
go undetected.
• Define top-level faults. Whether for a system or single block define the starting point for the
analysis by detailing the failure of interest for the analysis.
• Identify causes for top-level fault. What events could cause the top level fault to occur? Use the
logic gate symbols to organize which causes can cause the failure alone (or), or require multiple
events to occur before the failure occurs (and).
• Identify next level of events. Each event leading to the top level failure may also have precipitating
events.
• Identify root causes. For each event above continue to identify precipitating events or causes to
identify the root or basic cause of the sequence of events leading to failure.
• Add probabilities to events. When possible add the actual or relative probability of occurrence of
each event.
• Analysis the fault tree. Look for the most likely events that lead to failure, for single events the
initiate multiple paths to failure, or patterns related to stresses, use, or operating conditions. Identify
means to resolve or mitigate paths to failure.
2. RAZOR Razor is a well-known solution to achieve timing error resilience by using the technique called
timing speculation. The principle idea behind this architecture is to employ temporally separated double-
sampling of input data using Razor FFs, placed on critical paths. The main FF takes input sample on the
rising edge of the clock, while a time-shifted clock (clk-del) to the shadow latch is used to take a second
sample.
3. STEM STEM cell architecture takes Razor a step further by incorporating capability to deal with
transient faults as well. STEM cell architecture presented in incorporates power saving and performance
enhancement mechanisms like Dynamic Frequency Scaling (DFS) to operate circuits beyond their worst-
case limits.
47
4. CPipe The CPipe or Conjoined Pipeline architecture proposed uses spatial and temporal redundancy to
detect and recover from timing and transient errors. It duplicates CL blocks and the FFs as well to from two
pipelines interlinked together. The primary or leading pipeline is overclocked to speedup execution while
the replicated of shadow pipeline has sufficient speed margin to be free from timing errors.
5. TMR TMR is one of the most p5opular fault tolerant architectures. In a basic TMR scheme called Partial-
TMR, we have three implementation of same logic function and their outputs are voted by a voter circuit.
This architecture can tolerate all the single-faults occurring in the CL block but faults in voter or pipeline
registers cause the system to fail.
6. DARA-TMR DARA-TMR triplicates entire pipeline but uses only two pipeline copies to run identical
process threads in Dual Modular Redundancy (DMR) mode. The third pipeline copy is disabled using
power gating and is only engaged for diagnosis purposes in case of very frequent errors reported by the
detection circuitry. Once the defective pipeline is identified the system returns back to DMR redundancy
mode by putting the defected pipeline in off mode.
HyFT Architecture
48
Fault Models
A fault model attempts to identify and categorize the faults that may occur in a system, in order to provide
clues as to how to fine-tune the software development environment and how to improve error detection and
recovery.
2. Error Detection
Error Detection is a fault tolerance technique where the program locates every incidence of error in the
system. This technique is practically implemented using two attributes, namely, self-protection and
selfchecking. The Self-Protection attribute of error detection is used for spotting the errors in the external
modules, whereas the Self-Checking attribute of error detection is used for spotting the errors in the internal
module.
3. Exception Handling
Exception Handling is a technique used for redirecting the execution flow towards the route to recovery
whenever an error occurs in the normal functional flow. As a part of fault tolerance, this activity is
performed under three different software components, such as the Interface Exception, the Local Exception
and the Failure Exception.
5. Process Pairs
49
Process Pair technique is a method of using the same software in two different hardware units and validating
the functional differences in order to capture the faulty areas. This technique functions on top of the
checkpoint and restart technique, as similar checkpoints and restart instructions are placed in both systems.
6. Data Diversity
Data Diversity technique is typically a process where the programmer passes a set of input data, and places
checkpoints for detecting the slippage. The commonly used Data Diversity models are ‘Input Data Re-
Expression’ model, ‘Input Data Re-Expression with Post-Execution Adjustment’ model, and ‘Re-
Expression via Decomposition and Recombination’ model.
7. Recovery Blocks
Recovery Block technique for multiple version software Fault Tolerance involves the checkpoint and restart
method, where the checkpoints are placed before the fault occurrence, and the system is instructed to move
on to next version to continue the flow. It is carried out in three areas, that is, the main module, the
acceptance tests, and the swap module.
8. N – Version Programming
The N – Version programming technique for the multi – version fault tolerance is the commonly used
method when the there is a provision for testing multiple code editions. The recovery is made from
executing all the versions and comparing the outputs from each of the versions. This technique also involves
the acceptance test flow.
9. N Self–Checking Programming
N Self – Checking Programming is a combination technique of both the Recovery block and the N – version
Programming techniques, which also calls for the acceptance test execution. It is performed by the
sequential and the parallel execution of various versions of the software.
1. Interference with fault detection in the same component. In passenger vehicle example, with either
of the fault-tolerant systems it may not be obvious to the driver when a tire has been punctured. This is
usually handled with a separate "automated fault-detection system". In the case of the tire, an air pressure
monitor detects the loss of pressure and notifies the driver. The alternative is a "manual fault-detection
system", such as manually inspecting all tires at each stop.
2. Interference with fault detection in another component. Another variation of this problem is when
fault tolerance in one component prevents fault detection in a different component. For example, if
component B performs some operation based on the output from component A, then fault tolerance in B
can hide a problem with A. If component B is later changed (to a less fault-tolerant design) the system may
fail suddenly, making it appear that the new component B is the problem. Only after the system has been
carefully scrutinized will it become clear that the root problem is actually with component A.
50
3. Reduction of priority of fault correction. Even if the operator is aware of the fault, having a fault-
tolerant system is likely to reduce the importance of repairing the fault. If the faults are not corrected, this
will eventually lead to system failure, when the fault-tolerant component fails completely or when all
redundant components have also failed.
4. Test difficulty. For certain critical fault-tolerant systems, such as a nuclear reactor, there is no easy way
to verify that the backup components are functional. The most infamous example of this is Chernobyl,
where operators tested the emergency backup cooling by disabling primary and secondary cooling. The
backup failed, resulting in a core meltdown and massive release of radiation.
5. Cost. Both fault-tolerant components and redundant components tend to increase cost. This can be a
purely economic cost or can include other measures, such as weight. Manned spaceships, for example, have
so many redundant and fault-tolerant components that their weight is increased dramatically over unmanned
systems, which don't require the same level of safety.
6. Inferior components. A fault-tolerant design may allow for the use of inferior components, which would
have otherwise made the system inoperable. While this practice has the potential to mitigate the cost
increase, use of multiple inferior components may lower the reliability of the system to a level equal to, or
even worse than, a comparable non-fault-tolerant system.
• Error Confinement: Error confinement stage prevents an error effect on web services. It can be
gain with the help of error detection within a service by multiple checks.
• Error Detection: Error detection stage helps in identifying unexpected error in a web service.
• Error Diagnosis: Error diagnosis stage helps to diagnose the fault that has been traced in error
detection stage. Error diagnosis stage comes into picture when error detection doesn't provide
enough information about fault location.
• Reconfiguration: Reconfiguration comes into picture when and error is detected and located in
the error detection and error diagnosis stage.
• Recovery: Recovery is used to recover fault from web service using retry and rollback approaches.
51
• Restart: Restart comes into picture after the recovery of web service. Restart can be done either
using hot start or cold start.
• Repair: In Repair, failed component has to be changed in order to work properly.
• Reintegration: In the reintegration stage repaired component has to be integrating.
• Servers – The physical machines that act as host machines for one or more e virtual machines.
• Virtualization – Technology that abstracts physical components such as servers, storage, and
networking and provides these as logical resources.
• Storage – In the form of Storage Area Networks (SAN), network attached storage (NAS), disk
drives etc. Along with facilities as archiving and backup.
• Network – To provide interconnections between physical servers and storage.
• Management – Various software for configuring, management and monitoring of cloud
infrastructure including servers, network, and storage devices.
• Security – Components that provide integrity, availability, and confidentiality of data and security
of information, in general.
• Backup and recovery services.
• Network fault: Since cloud computing resources are accessed over a network (Internet), a predominant
cause of failures in cloud computing are the network faults. These faults may occur due to partitions in the
network, packet loss or corruption, congestion, failure of the destination node or link, etc.
• Physical faults: These are faults that mainly occur in hardware resources, such as faults in CPUs, in
memory, in storage, failure of power etc.
• Process faults: faults may occur in processes because of resource shortage, bugs in software, incompetent
processing capabilities, etc.
• Service expiry fault: If a resource’s service time expires while an application that leased it is using it, it
leads to service failures.
The Fault Tolerance methods can be applied to cloud computing in three levels:
• At hardware level: if the attack on a hardware resource causes the system failure, then its effect can be
compensated by using additional hardware resources.
• At software (s/w) level: Fault tolerance techniques such as checkpoint restart and recovery methods can
be used to progress system execution in the event of failures due to security attacks.
• At system level: At this level, fault tolerance measures can compensate failure in system amenities and
guarantee the availability of network and other resources.
52
• There is a need to implement autonomic fault tolerance technique for multiple instances of an
application running on several virtual machines
• Different technologies from competing vendors of cloud infrastructure need to be integrated for
establishing a reliable system
• The new approach needs to be developed that integrate these fault tolerance techniques with
existing workflow scheduling algorithms
• A benchmark based method can be developed in cloud environment for evaluating the
performances of fault tolerance component in comparison with similar ones
• To ensure high reliability and availability multiple clouds computing providers with
• independent software stacks should be used
• Autonomic fault tolerance must react to synchronization among various clouds