Computer Organization and Architecture Notes 2 - TutorialsDuniya
Computer Organization and Architecture Notes 2 - TutorialsDuniya
COM
Computer Organization
& Architecture Notes
m
describe the working principle of register
describe the working principle of counter
co
1.2 INTRODUCTION
a.
You have already been acquainted with different combinational and
sequential circuits in Unit-4 of BCA(F1)01. Before going ahead, you
iy
may read that unit and start learning this unit with ease.
8 Computer Organization
0 0 0 0
m
0 1 0 1( X Y)
1 0 0 1 (X Y )
co
1 1 1(XY) 0
a.
The minterms for SUM and CARRY are shown in the bracket.
iy
The Sum-Of-Product (SOP) equation for SUM is : un
S = XY + XY = X Y …..………… ( 1 )
1.5 FULL-ADDER
Full- Adder is a logic circuit to add three binary bits. Its outputs
are SUM and CARRY. In the following truth table X, Y, Z are
inputs and C and S are CARRY and SUM.
Computer Organization 9
0 0 0 0 0
0 0 1 0 1
0 1 0 0 1
0 1 1 1 0
1 0 0 0 1
1 0 1 1 0
1 1 0 1 0
1(XYZ) 1(XYZ)
m
Truth Table for a Full-Adder
The minterms are written in the brackets for each 1 output in
co
the truth table. From these the SOP equation for SUM can be
written as :
a.
S = XYZ XYZ XYZ XYZ
= e j e
X YZ YZ X YZ YZ j
iy
= XS XS ……………………. (3)
un
( Exclusive OR and equivalence functions are complement to each
other). Here S is SUM of Half-Adder.
sD
= YZ + XS
to
= C + XS ................................. (4)
of half-adder.
10 Computer Organization
1.3.3 HALF-SUBTRACTOR
m
X Y BORROW (B) DIFFERENCE (D)
co
0 0 0 0
0 1 ej
1 XY ej
1 XY
1e
XYj
a.
1 0 0
1 1 0 0
iy
Truth Table for un Half-Subtractor
D
sD
= X Y + XY
= X Y ................................. (5)
B = X Y ................................. (6)
al
ri
to
1.3.4 FULL-SUBTRACTOR
Computer Organization 11
0 0 0 0 0
0 0 1 d i
1 XYZ d i
1 XYZ
0 1 0 1e
XYZj 1e
XYZj
0 1 1 1d
XYZi 0
m
1 0 0 0 d i
1 XYZ
1 0 1 0 0
co
1 1 0 0 0
1 1 1 1(XYZ) 1 (X YZ )
a.
Truth Table for Full-Subtractor
iy
The SOP equation for the DIFFERENCE IS :
= DZ DZ ................................. (7)
And SOP equation for BORROW is :
al
= DZ XZ ................................. (7)
Tu
12 Computer Organization
m
CHECK YOUR PROGRESS
co
Choose the correct answer :
a.
1. A half-adder can add
(a) Two binary numbers (b) Two binary bits
iy
of 4 bits each
(c) Add half of a binary (d) None of these number
un
2. A full-adder is a logic circuit that has two outputs namely:
(a) product & Sum (b) sum & borrow
sD
binary bits
(c) complement of half (d) none of these
ri
binary bits
4. A full-subtractor has the ability to do
to
Computer Organization 13
m
co
Fig. 1.5 : Block Diagram of Multiplexer
a.
iy
It has 2n inputs, n – numbers of selection lines and only
one output. A multiplexer is also called a many – to
– one data selector.
un
8 - to - 1 MULTIPLEXER :
Fig 1.6 shows a 8-to-1 multiplexer, where there are 8
inputs, 3 selection lines and 1 output. The eight inputs are
sD
then the upper AND gate is enabled and all other AND
gates are disabled. As a result, the input I 0 alone is
ri
14 Computer Organization
iy
16 - to -1 MULTIPLEXER :
ABCD = 0000
ri
ABCD = 0010
Tu
ABCD = 1111
Computer Organization 15
m
co
a.
iy
Fig. 1.7 16-to-1 Multiplexer
un
1.3.6 DE-MULTIPLEXER
sD
It is opposite to the multiplexer. De-multiplexer has 1 input
and many outputs. With the application of appropriate
control signal, the common input data can be steered to
al
16 Computer Organization
m
co
a.
iy
un
Fig 1.9 : 1-to-16 De-Multiplexer
Here, the data input line is denoted by D. This input line
sD
at Y0 as output.
Computer Organization 17
m
(c) 3 selection lines (d) 4 selection lines
7. De-multiplexer means
co
(a) deduct multiple bits (b) one-to-many
(c) multiple-to-multiple (d) one-to-one
a.
1.3.7 ENCODER
iy
An encoder converts a digital signal into a coded signal.
A generalized view of an encoder is shown in Fig 1.10.
un
sD
al
ri
to
18 Computer Organization
m
co
a.
iy
Fig. 1.11 : Decimal-to-BCD Encoder
un
A decimal to BCD encoder is shown in Fig 1.11 This
circuit generates BCD output when any one of the push
button switches is pressed. As for example, if button 6
sD
ABCD = 0110
ABCD = 1000
Tu
Computer Organization 19
Inputs Outputs
m
A B C Y0 Y1 Y2 Y3 Y4 Y5 Y6 Y7
0 0 0 1 0 0 0 0 0 0 0
co
0 0 1 0 1 0 0 0 0 0 0
0 1 0 0 0 1 0 0 0 0 0
a.
0 1 1 0 0 0 1 0 0 0 0
1 0 0 0 0 0 0 1 0 0 0
iy
1 0 1 0 0 0 0 0 1 0 0
1 1 0 0 0 0 0 0 0 1 0
1 1 1 0 0 0 0 0 0 0 1
un
Truth table for a 3-to-8 decoder
sD
al
ri
to
Tu
20 Computer Organization
m
form the block diagram of which is shown in Fig 1.13. It
compares two 4-bit binary numbers A3 A2 A1 A0 and B 3 B 2 B1B 0 .
co
Three output terminals are available for A < B, A = B and
A > B
a.
iy
un
sD
al
Computer Organization 21
the corresponding cascading inputs of the next stage that handles the
more significant bits.
al
22 Computer Organization
m
(d) check error in a binary number.
co
1.4 SEQUENTIAL CIRCUITS
If the output of a circuit depends on its present inputs and
a.
immediate past output, then the circuit is called sequential circuit.
To build a sequential circuit, we need memory circuits and
iy
combinational circuits. Flip-Flop is used as memory circuit the
application of which we would see in counter, register etc.
un
1.5 FLIP- FLOPS
sD
A digital circuit that can produce two states of output, either high or
low, is called a multivibrator. Three types of multivibrators are-
monostable, bi-stable and a stable.
al
inputs, its new output is either high (or 1) or low (or 0). Once the
output is fixed, the inputs can be removed and then also the already
to
Computer Organization 23
RS Flip-flop
D Flip-Flop
JK Flip-Flop
MS Flip-Flop
1.5.1 RS FLIP-FLOP
m
have shown two RS flip-flop circuits using NOR and NAND
gate. To study its working principle, we can use any one of
co
them. In the following section, let us consider the RS flip-flop
using NOR gate.
a.
iy
un
Fig. 1.14 : Basic circuit of RS flip-flop using NOR and
sD
NAND gate.
RS flip-flop has two inputs: Set (S) and Reset(R). It has two
outputs, Q and Q . It should be noted Q is always the
al
R S Q Action
to
1 1 ? Forbidden
24 Computer Organization
The second input condition R=0, S=1 forces the output of NOR
gate2 low. This low output will reach NOR gate1 and when both
inputs of NOR gates1 is low, its inputs Q will be high. Thus, a
1 at the S input will SET the flip-flop and Q will be equal to 1.
The third input condition R=1, S=0 will force the output of NOR
gate1 to low. This low will reach NOR gate2 and forces its
outputs to high. Hence, when R=1, S=0, then Q=0, Q =1. Thus,
m
the flip-flop is RESET.
The last input condition in the table R=1, S=1 is forbidden since
co
it forces both the NOR gates to the low state, means both Q=0,
and Q =0 at the same time, which violets the basic definition of
a.
flip-flop that requires Q to be the complement of Q . Hence, this
input condition is forbidden and its output is unpredictable.
iy
un
sD
CLOCK INPUT:
ri
Computer Organization 25
m
1.5.2 D FLIP-FLOP
co
D flip-flop is a modification of RS flip-flop. In RS flip-flop when
both the inputs are high i.e., R=1, S=1, the output becomes
unpredictable and this input combination is termed as forbidden.
a.
To avoid this situation, the RS flip-flop is modified so that both
the inputs can not be same at a time. The modified flip-flop is
iy
called D flip-flop. Fig 1.17 shows a clocked D flip-flop.
un
sD
table is:
to
CLK D Q
0 X Last state
Tu
0 0 0
1 1 1
26 Computer Organization
1.5.3 JK FLIP-FLOP
m
an unpredictable output. In JK flip-flop this condition is used by
changing the RS flip-flop in some way. In JK flip-flop both input
can be high simultaneously and the corresponding toggle output
co
makes the JK flip-flop a good choice to build counter- a circuit
that counts the number of +ve or –ve clock edges. Fig 1.18
a.
shows one way to build a JK flip-flop.
iy
un
(a) (b)
Fig. 1.18 : (a) JK Flip-Flop (b) Symbol of JK Flip-Flop
sD
CLK J K Q
al
X 0 0 Last state
0 1 0
ri
1 0 1
to
1 1 Toggle
Tu
Computer Organization 27
m
Q =1.
co
When J and K are both high , then the flip-flop is set or reset
depending on the previous value of Q. If Q is high previously,
the lower AND gate sends a RESET trigger to the flip-flop on
a.
the next clock pulse. Then Q becomes equal to 0. On the other
hand, if Q is low previously, the upper AND gate sends a SET
iy
trigger on the flip-flop making Q=1.
un
So, when J = K = 1, Q changes its value from 0 to 1 or 1 to
0 on the positive clock pulse. This changing of Q to Q is
called toggle. Toggle means to change to the opposite state.
sD
28 Computer Organization
m
co
a.
Fig. 1.20 : RC Differentiator Circuit
iy
The upper tip of the differentiated pulse is called positive edge
and the lower tip is called is negative edge. When a flip-flop is
un
triggered by this type of narrow spike, it is called edge triggered
flip-flop. If the flip-flop is driven by +ve edge, it is called +ve edge
triggered flip-flop. If it is a driven by negative edge, it is called
sD
Racing
al
which means the output changes its state after a certain time
to
period from applying the input and the clock pulse. So, when a
flip-flop is edge triggered, then due to propagation delay the
output cannot affect the input again, because by that time the
Tu
Computer Organization 29
m
When J=0, K=1, master resets at +ve clock edge and the slave
resets of the –ve clock edge.
co
When J = K =1 master toggles at +ve clock edge, and the slave
toggles at the –ve clock edge.
a.
Hence, whatever master does, the slave copies it.
iy
resistance to racing. Hence, to build counter it is extensively
used.
un
sD
al
Fig.1.21:
ri
30 Computer Organization
m
(a) fast response time
(b) toggle property
co
(c) spike shaped clock input
(d) preset input
13. In MS flip-flop the master changes state
a.
(a) after the slave
(b) with the slave at the same time
iy
(c) before the slave
(d) never
un
1.6 COUNTER
A counter is one of the most useful sequential circuits in a digital
sD
asynchronous counter.
the flip-flops are connected serially then the output of one flip-flop is
applied as input to the next flip-flop. Therefore, this type of counter has
Tu
Computer Organization 31
m
It is called preset. To do these, two extra inputs are there in every flip-
flop called CLR and PR.
co
1.6.1 Asynchronous Counter
a.
When the output of a flip-flop is used as the clock input for the
next flip-flop it is called asynchronous counter.
iy
Asynchronous counters are also called ripple counter because
flip-flop transitions ripple through from one flip-flop to the next in
un
a sequence until all flip-flops reach a new state.
its state at the negative edges of the clock pulses. Its output
is applied to the B flip-flop as its clock input.
Tu
32 Computer Organization
a.
output of A flip-flop. Similarly, the output of B flip-flop is used as
clock input of the C flip-flop and therefore C toggles at the negative
iy
edges of the output of B flip-flop. We can see that triggering
pulses move through the flip-flops like a ripple in water.
un
The wave form of the ripple counter is shown in fig 1.22(b). It
shows the action of the counter as the clock runs. To understand
the wave form let us assume that the counter is cleared before
sD
Since B acts as the clock input for C, each time the output of
B goes low, the C flip-flop toggles. Thus C goes high at point d
on the time line, it goes back to low again at point h.
Computer Organization 33
m
1.6.2 Synchronous Counter
co
operating frequency. Each flip-flop has a delay time which is
additive in asynchronous counter.
a.
In synchronous counter the delay of asynchronous counter is
overcome by the use of simultaneous applications of clock pulse
to all the flip-flops. Hence, in synchronous counter, the common
iy
clock pulse triggers all the flip-flops simultaneously and therefore
the individual delay of flip-flop does not add together. This feature
un
increases the speed of synchronous counter. The clock pulse
applied can be counted by the output of the counter.
see that the output of A is ANDed with CLK to drive the 2nd flip-
flop and the outputs of A, B are ANDed with CLK to drive the
to
individual flip-flop.
In the figure , the clock pulse is directly applied to the first flip-
flop. Its J and K are both high, so the first flip-flop toggles state
at the negative transition of the input clock pulses. This can be
seen at points a,b,c,d,e,f,g,h,i on the time line.
34 Computer Organization
m
co
a.
Fig. 1.23 : Parallel Binary Counter
iy
The AND gate Y is enabled only when both A and B are high and
it transmits the clock pulses to the clock input of the 3rd flip-
un
flop. The 3rd flip-flop toggles state with every fourth negative
clock transition at d and h on the time line.
The wave form and the truth table shows that the synchronous
sD
Computer Organization 35
m
convert serial data to parallel and parallel to serial data.
co
shift registers, different types of registers are:
a.
Serial In –Parallel Out (SIPO)
Parallel In –Serial Out (PISO)
iy
Parallel In –Parallel Out (PIPO)
un
1.7.1 Serial In - Serial Out (SIPO)
Fig 1.24 shows a typical 4 bit SISO register using flip-flops. Here
the content of the register is named as QRST. Let us consider
sD
that all flip-flops are initially reset. Hence, at the beginning QRST=
0000. Let us consider a binary number 1011 which we want to
store in the SISO register.
al
are QRST=1000.
Tu
36 Computer Organization
iy
At time B: Another 1 is applied in the data input of the first flip-
flop. So at the negative CLK edge , this 1 is shifted to Q. The
un
1 of Q is shifted in R.,O of R shifted in S, O of S is shifted into
T. so, at the end of time B the output of all the flip-flops are
QRST=1100.
sD
Computer Organization 37
m
co
a.
iy
un
Fig. 1.25 : Bit Serial In - Parallel Out Shift Register
sD
38 Computer Organization
m
of the NOR gate will be X =0, thereby a 1 will be clocked into
the flip-flop. This NOR gate allows entering data from two sources,
co
either from X1 or X2. To shift X1 into the flip-flop, X2 is kept at
ground level and to shift X2 into the flip-flop, X1 is kept at ground
level.
a.
iy
un
sD
al
ri
to
Now in Fig 1.26 (c) two AND gates and two NOT gates are
added. These will allow the selection of data X1 or data X2. If the
control line is high, the upper AND gate is enabled and the lower
AND gate is disabled. Thus, the data X1 will enter at the upper
leg of the NOR gate and at the same time the lower leg of the
Computer Organization 39
m
co
a.
iy
un
sD
al
ri
to
Tu
40 Computer Organization
If Fig 1.27 the X2 input of Fig 1.26( c ) is taken out from each
flip-flop to form 8 inputs named as ABCDEFGH to enter 8 bit
data parallelly to the register. The control is named here as
SHIFT/LOAD which is kept low to load 8 bit data into the flip-
m
flops with a single clock pulse parallelly. If the SHIFT/LOAD is
kept high it will enable the upper AND gate for each flip-flop. If
co
any input is given to this upper AND gate then a clock pulse will
shift a data bit from one flip-flop to the next flip-flop. That means
data will be shifted serially.
a.
1.7.4 Palallel In - Palallel Out Register (PIPO)
iy
The register of Fig 1.27 can be converted to PIPO register
simply by adding an output line from each flip-flop.
un
The 54/74198 is an 8 bits such PIPO and 54/7459A is a 4 bit
PIPO register. Here the basic circuit is same as Fig 1.26(C).
The parallel data outputs are simply taken out from the Q sides
sD
When the MODE CONTROL line is high, the data bits ABCD
will be loaded into the register parallely at the negative clock
ri
Computer Organization 41
m
co
a.
iy
un
Fig. 1.28 : Parallel In - Parallel Out Shift Register
sD
42 Computer Organization
m
Demultiplexer is opposite to a multiplexer.
An encoder generates a binary code for input variables.
co
A decoder decodes an information receives from n input
lines and transmits the decoded information to maximum
a.
outputs.
A sequential circuit’s output depends on past output and
iy
present inputs.
A flip-flop is basically a single cell of memory which can
store either 1 or 0.
un
Sequential circuits use flip-flop as their building block.
There are many types of flip-flop viz RS, D, JK, MS flip-
sD
flop.
A counter is a sequential circuit that can count square
waves give as clock input. There are two types of counters-
al
Computer Organization 43
m
15. (a) 16 (c)
co
1.10 FURTHER READINGS
a.
iy
1. Mano, M. M., Digital Logic and Computer Design, PHI.
2. Mano, M. M., Computer System Architecture, PHI.
3. Malvino, Albert Paul & Leach, Donald P., Digital Principles and
un
Applications, Mcgraw-Hill International.
4. Lee, Samual C, Digital Circuits and Logic Design, PHI.
sD
2. With truth table and logic diagram explain the working of a full-
Tu
adder circuit.
44 Computer Organization
m
Serial In- Parallel Out shift register for an input of 1101.
Explain its operation.
co
8. Why is square wave clock pulse converted to a narrow
spike to be used for flip-flops ? Draw a RC differentiator
a.
circuit to convert a square wave into a narrow spike.
iy
9. What is called racing ? To get rid of racing what
techniques are used ?
un
10. What do you mean by magnitude comparator? Draw a block
diagram and the function table of the magnitude comparator
sD
SN 7485.
***********
al
ri
to
Tu
Computer Organization 45
m
2.5.3 Direct Memory Access (DMA)
2.5.4 I/O Processors
2.6 Let Us Sum Up
co
2.7 Further Readings
2.8 Answers To Check Your Progress
a.
2.9 Model Questions
iy
2.1 LEARNING OBJECTIVES
peripherals
know about the interrupts
describe input-output processors
al
2.2 INTRODUCTION
ri
In the previous unit, we have learnt about the various digital components.
In this unit, we will discuss how the data is fed to the computer and
to
the way they are being processed, that is, how the data is being
transferred to and from the external devices to the CPU and the memory.
Tu
Keyboard,
Monitor,
Printers,
m
The auxiliary storage devices such as magnetic disks and tapes.
co
Keyboard and Monitor :
a.
Keyboard allows the user to enter alphanumeric information. On pressing
a key, a binary coded character, typically 7 or 8 bits in length is sent
iy
to the computer. The most commonly used code is a 7-bit code referred
to as ASCII ( American Standard Code For Information Interchange).
un
Each character in this code is represented by a unique 7-bit binary
code; thus a total of 128 different characters can be represented as
shown in table 2.1
sD
al
ri
to
Tu
m
co
a.
iy
un
sD
al
ri
to
Tu
NUL - Null
ENQ - Enquiry
ACK - Acknowledge
m
BEL - Bell
BS - Backspace
co
HT - Horizontal Tab
LF - Line feed
a.
VT - Vertical Tab
iy
FF - Form feed un
CR - Carriage Return
SO - Shift Out
sD
SI - Shift In
CAN - Cancel
Tu
EM - End of Medium
SUB - Substitute
FS - File Separator
GS - Group Separator
RS - Record Separator
US - Unit Separator
DEL - Delete
Video monitors may be of different types but most commonly used one
m
is the CRT (Cathode Ray Tube) monitors. Another one that becomes
popular now-a-days is LCD (Liquid Crystal Display) monitor.
co
The monitor displays a cursor on the screen which marks the position
on the screen where the next character is to be inserted.
a.
Printers :
iy
Printers produce a hard copy of the output data. Based on the technology
used printers may be either impact or non-impact. Impact printers
un
operates by using a hammer to strike a character against an inked
ribbon; the impact then causes an image of the character to be printed.
Types of impact printers are dot-matrix, daisy wheel, line printers. Non-
sD
impact printers do not touch the paper while printing. Non-impact printers
like Laser printers uses a rotating photographic drum to imprint the
character images and then transfer the pattern onto the paper. Inkjet,
al
The magnetic tape drives and disk drives are used to read and write
to
data to and from the magnetic tapes and the disks. The surfaces of
these output devices are coated with magnetic material so as to store
Tu
m
1. Fill in the blanks :
co
(a) The Keyboard is a ___________ device.
(b) Daisy Wheel is a _______ printer and the Deskjet is
a _________ printer.
a.
(c) Phosphor on being struck by ______ emits light on the
monitor screen.
iy
(d) Magnetic tape stores data on its ________ sides.
un
2.4 I/O INTERFACE
The I/O Interface is responsible for exchange of data between the
peripherals and internal storage of the computer. Instead of directly
sD
The interface units are hardware components between the CPU and
the peripherals that supervise and synchronize all the I/O transfers.
Input-Output devices are attached to the processor with the help of the
m
I/O Interface as shown in the figure below:
co
ADDRESS LINES
DATA LINES I/O
Processor
a.
CONTROL LINES
iy
un
Interface ............... Interface
sD
The I/O interface circuit also has an address decoder, control circuits
to
and data and status registers. Whenever the CPU has to communicate
with a particular device, it places its address in the address lines; the
Tu
The input-output interface can detect errors and then after detection
reports them to the CPU.
m
CHECK YOUR PROGRESS
co
2. Write True or False :
a.
(a) The I/O Interface is a hardware component in the computer.
(b) The Control lines of the System Bus carries the addresses
iy
of the peripherals.
(d) The Status register in the I/O Interface holds the data that
sD
is being transferred.
There are different I/O techniques in the computer. Some may involve
the CPU for transferring data between the computer and the input-
ri
output devices and others may directly transfer data from and to the
memory units. The three possible techniques for I/O operations are :
to
Programmed I/O,
m
There are four types of commands that an interface may receive:
o Control command
co
o Status command
o Read command
a.
o Write command
iy
A Control command is used to activate the peripheral and tell it
what to do. This command depends on the type of the particular
peripheral.
un
A Status command is used to test the various status of the
peripheral and the I/O interface.
sD
A Read command causes the interface to get the data from the
peripheral and place it in its data registers. The interface then
places the data in the data lines of the bus for the processor
al
m
to what it was doing before it received the interrupt signal. One
of the bus control lines, called an Interrupt Request Line, is
co
dedicated for this purpose.
In this method, the advantage is that the CPU need not spend
a.
time waiting for an I/O operation to be completed, thereby
increasing its efficiency.
iy
The routine that is executed when there is an interrupt request
un
is called Interrupt Service Routine (ISR). When the CPU gets an
interrupt signal, while it is executing an instruction of a program,
say instruction I, it first finishes execution of that instruction. It
sD
1. Software Poll
2. Vectored Interrupt
3. Bus arbitration
2. Vectored Interrupt
m
It provides a hardware poll. The interrupts are signaled through
a common Interrupt request line. The interrupt acknowledge
co
line is daisy-chained through the I/O interface. On receiving an
interrupt, CPU sends an acknowledge signal through it. The
interface which sent the interrupt responds by placing its vector
a.
on the data lines.
iy
3. Bus Arbitration
speeds such as the disks are given higher priorities, while the
slower devices such as keyboards are given low priority. Also
Tu
m
of the processor when accessing the main memory during the
data transfers. In DMA transfers, the controller takes control of
co
the memory buses from the CPU and does the transfer between
the peripheral and the memory.
a.
CPU Main Memory
iy
System Bus un
DMA DMA
Controller Controller
sD
m
Cycle Stealing : The controller transfers one word at a time
and return the bus control to the CPU. The CPU then temporarily
co
suspends its operation for one memory cycle to allow the DMA
transfer to steal its memory cycle.
a.
The DMA controller uses a register for storing the starting address
of the word to be transferred, a register for storing the word
iy
count of the transfer and another register contains the status
and the control flags. The status and the control registers are
shown below
un
IRQ IE R/W Done
sD
IE : Interrupt Enabled
al
m
the computer.
co
to get the data from I/O devices and place it in its data
registers.
a.
(c) Interrupt Driven I/O wastes the CPU time needlessly.
iy
(e) In Direct Memory Access, the data transfer takes place
directly between the peripheral and the memory.
un
(f) The DMA Controller carries out the functions of the
processor when accessing the main memory during the
sD
data transfers.
devices. The IOP need not interfere with the tasks of the CPU
as they themselves can fetch and execute the instructions. Other
to
and branching.
Data from the devices are first collected into the IOP and then
transferred to the memory directly by stealing one memory cycle
from the CPU. Similarly data is transferred from memory to IOP
m
1) Programmed I/O
co
2) Interrupt-Driven I/O
a.
In Programmed I/O, the processor must constantly check the
status of the interface until the I/O task is completed, which is a
iy
wastage of the CPU time.
e) True f) True
m
2.9 MODEL QUESTIONS
co
1. Briefly describe the work of the I/O Interface in data transferring.
a.
2. Describe the functions of the I/O Interface units in a computer.
iy
3. What is the difference between Programmed I/O and Interrupt
un
Driven I/O? What are their advantages and disadvantages?
5. What are the methods for determining which I/O device has
requested an interrupt?
al
*****
Tu
UNIT STRUCTURE
3.1 Learning Objectives
3.2 Introduction
3.3 Memory Hierarchy
3.4 Main Memory
3.5 Semiconductor RAM
3.5.1 Static and Dynamic RAM
3.5.2 Internal Organization of Memory Chips
m
3.6 ROM
3.6.1 Types of ROM
co
3.7 Locality of Reference
3.8 Cache Memory
3.8.1 Cache Operation - an overview
a.
3.9 Mapping Functions
3.9.1 Direct Mapping
iy
3.9.2 Associative Mapping
3.9.3 Set-Associative Mapping
3.10 Replacement Algorithm
un
3.11 Virtual Memory
3.11.1 Paging
sD
3.12 Magnetic Disk
3.12.1 Data Organization in Magnetic Disk
3.12.2 Disk Access Time
3.13 RAID
al
m
magnetic tapes
• learn about RAID technology
co
• describe the technology of optical memory
3.2 INTRODUCTION
a.
In the previous unit, we have learnt about the peripheral devices
associated with a computer system and various techniques with the
iy
help of which data transfers between the main memory and the
input-output devices takes place.
un
In this unit we shall discuss about various types of memory
associated with a computer system including main memory, cache
and virtual memory and various technology associated with these
sD
memory units. Finally, we conclude the unit discussing the concept
of secondary memory along with their types.
The computer stores the programs and the data in its memory unit.
ri
The CPU fetches the instructions out of the memory unit to execute
and process them.
to
Memory can be primary (or main) memory and secondary (or auxiliary)
memory. Main memory stores programs and data currently executed
Tu
m
accessed in a linear fashion, from its current location in the
memory to the desired location moving through each and every
record in the memory unit. For example , in case of the magnetic
co
tapes this method is used.
a.
on the physical location of the memory and the shared Read/
Write head moves directly to the desired record. This method
iy
is used in magnetic disks.
location and transmits that data to the requesting device via bus. On
the other hand, a memory Write operation causes the memory to
accept data from a bus and to write that particular information in a
memory location.
al
Regarding the speed of the memory, there are two useful measures
ri
Register
Static RAM
Dynamic RAM
Magnetic disks
Magnetic Tapes
m
There are three key characteristics of the memory. They are cost,
capacity and access time. On moving down the memory hierarchy,
co
it is found that the cost of the storage devices decreases but their
storage capacity as well as the memory access time increases. In
other words, the smaller memories are more expensive and much
a.
faster. These are supplemented by the larger, cheaper and slower
storage devices.
iy
Thus from the above figure it can be seen that the registers are at
the top of the hierarchy and so provides the fastest, the smallest and
un
the most expensive type of memory device for the CPU to access
data. Registers are actually small amount of storage available on the
CPU and their contents can be accessed more quickly than any
other available storage. They may be 8–bit registers or 32-bit registers
sD
tapes are more suited for the off-line storage of the large amounts
of the computer data. The data are kept as records which are again
to
separated by gaps.
The main memory is the central storage unit of the computer system.
Main memory refers to the physical memory which is internal to a
computer. The word “Memory” when used usually refers to the Main
Memory of the computer system. The computer can process only
m
RAM : In RAM, it is possible to both read and write data from and
to the memory in a fixed amount of time independent of the memory
location or address. RAM is also a volatile memory which means it
co
stores the data in it as long as power is switched on. Once the
power goes off, all the data stored in it is also lost. Therefore, a RAM
cell must be provided a constant power supply.
a.
ROM : ROM is a non-volatile semiconductor memory; that is, it doesn’t
lose its contents even when the power is switched off. ROM is not
iy
re-writable once it has been written or manufactured. ROM is used
for programs such as bootstrap program that starts a computer and
un
load its operating system.
sD
two stable states : 0 and 1. The binary information are stored in the
form of arrays having rows and columns in the memories. With the
advent and advances of VLSI (Very Large Scale Integration) circuits,
to
m
then connected to two bit lines by the transistors T1 and T2.
T1 and T2 are controlled by a word line. They are in the off-
state when the word line is at the ground level.
co
a.
b b'
T1 T2
iy
un
Word Line
Bit Lines
sD
cells. Thus, there is less memory per chip which makes it lot
more expensive. The size of SRAM is so larger comparatively
to
m
at that time or not. So the DRAM is slower than the SRAM just
because of the refresh circuitry overhead.
co
DRAMs are used for the computer’s main memory system as
they are cheaper and take up much less space than the SRAM.
Even though there is the overhead of the refresh circuitry, it is
a.
but possible to use a large amount of inexpensive main memory.
The figure given below is a DRAM cell consisting of a capacitor
iy
C and a transistor T.
Bit Line
un
Word Line
sD
T
C
al
ri
to
first turned on and voltage is applied to the bit line. This causes
the capacitor to get some amount of charge. After the transistor
is turned off, the capacitor begins to discharge. Therefore, the
cell contents can be got correctly only if it is read before the
capacitor’s charge falls below some threshold value. During a
Refresh
Counter
Cell
Row Row
m
address decoder Array
latch
Row /
co
Column
Address
Read/Write
Column
Column circuits &
a.
address
decoder
counter
iy
Clock
un
RAS Mode register
and Timing
Data input Data output
CAS Control register register
sD
R/W
CS
al
is pulled down to the ground level so that the capacitor now has
no charge at all, that is it will represent logical 0. Thus, reading
the cell contents automatically refreshes its cell contents. The
Tu
m
at increasing speeds. Thus SDRAM will soon be replacing the
conventional DRAM as it is designed to work with higher
co
operating bus speeds in the computer systems.
a.
Internally, the memory in the computer system is organized in
iy
b7 b’7 b1 b’1 b0 b’0
W0
un
A0
A1 W1
sD
Address Memory
A2 Decoder
Cell
A3
al
W15
ri
b7 b1 b0
Fig.3.5: Organization of cells in a memory chip
the form of array of rows and columns. Each cell in the array
can hold one bit of information. Each row in the array forms
m
co
CHECK YOUR PROGRESS
a.
1. Compare the characteristics of SRAM and DRAM.
2. Fill in the blanks :
iy
(a) Data and _____ are transferred from the secondary
memory to the _______ whenever it is needed by the CPU.
un
(b) The records or the data are accessed in a linear fashion,
from its current location in the memory in _________ access.
(c) Memory _______time is the time between the initiation of a
memory operation and the completion of that operation.
sD
ROM (Read Only Memory) is another type of main memory that can
only be read. Each memory cell in ROM is hardware preprogrammed
during the IC (Integrated Circuit) fabrication process. That is the code
or the data in ROM is programmed in it at the time of its manufacture.
The data stored in ROM is not lost even if the power is switched off.
For this reason, it is called a non-volatile storage. It is used to store
programs that are permanently stored and are not subject to change.
The system BIOS program is stored in ROM so that the computer
can use it to boot the system when the computer is switched on.
m
The figure below shows a possible configuration for a ROM memory
cell.
co
Bit Line
Word Line
a.
iy
T
un P
sD
Fig. 3.6 : A ROM memory cell
the cell. To read the cell contents, the word line is activated. A sense
circuit connected at the end of the bit line generates the output value.
ri
As the name indicates, this type of ROM chips allows the user
m
to code data into it. They were created as making ROM chips
from scratch. It is time-consuming and also expensive in creating
co
small numbers of them. PROM chips are like ROM chips but
the difference is that PROM chips are created by inserting a
fuse at the point P (in the above diagram). Before programming,
a.
all the memory cells in PROM contains 0. The user then can
insert 1’s wherever needed by burning out the fuses at those
locations by sending high current pulses. However, PROMs
iy
can be programmed only once. They are more fragile than
ROMs.
un
3. EPROM (Erasable Programmable Read Only Memory)
into it.
Only Memory)
m
but it is possible to write an entire block of cells. Also, before
writing, the previous contents of the block are erased.
co
The advantage is that they work faster and the power
consumption is low.
a.
CHECK YOUR PROGRESS
iy
3. Write True or False :
(a) SRAM and DRAM are the two types of semiconductor
memories.
un
(b) SRAM stores the data in flip-flops and DRAM stores the data
in capacitors.
(c) Main memory is actually the Static RAM.
sD
m
spatial locality.
co
Temporal locality of reference means that a recently ex-
ecuted instruction is likely to be executed again very soon. This
a.
is so because when a program loop is executed the same set
of instructions are referenced and fetched repeatedly. For ex-
iy
ample, loop indices, single data element etc.
un
Spatial locality of reference means that data and instructions
which are close to each other are likely to be executed soon.
This is so because, in a program, the instructions are stored
sD
m
intelligent algorithm, a cache contains the data that is accessed most
often between a slower peripheral device and the faster processor.
co
SRAM chips are much more expensive as compared to DRAM chips.
If the whole main memory is made using SRAM chips, then the need
of cache memory will be eliminated.
a.
Some memory caches are built into the architecture of
iy
microprocessors. These are called internal cache. For example,
the Intel 80486 microprocessor contains a 8K memory cache and
un
the Pentium has a 16K cache. Such internal caches are often called
Level 1(L1) caches. Cache outside the microprocessor i.e., on the
motherboard is called external cache or Level 2(L2) cache. External
sD
Disk Caching
ri
14
m
‘miss’ has occured and the cache controller enables the
controller of the main memory to send the specified code or
co
data from the main memory. The performance of cache memory
is measured in terms of hit ratio. It is the ratio of number of
hits divided by the total number of requests made. By adding
a.
number of hits and number of misses, the total request is
calculated.
iy
It is not necessary for the processor to know about the
existence of cache. Processor simply issues READ and WRITE
requests using addresses that refer to locations in the memory.
un
Both the main memory and the cache are divided into equal-
size units called blocks. The term block is usually used to
refer to a set of contiguous address locations of some size.
sD
m
containing this marked word is to be removed from the
cache to make room for a new block.
co
a.
CHECK YOUR PROGRESS
iy
5. Choose the appropriate option:
(i) The cache is made up of high-speed ______ in case of
memory caching.
un
(a) EPROM (b) SRAM
(c) ROM (d) none of these
(ii) Cache memory is used in computer sytem to
sD
• Direct Mapping
• Associative Mapping
• Set-Associative Mapping
m
block occupies which line of cache. As there are less lines (or block)
of cache than main memory blocks, an algorithm is needed to decide
co
this. Let us take an example, a system with a cache of 2048 (2K)
words and 64K (65536) words of the main memory. Each block of
a.
the cache memory is of size 16 words. Thus, there will be 128 such
blocks (i.e., 16*128 = 2048). Let the main memory is addressable
iy
by 16 bit address (i.e., 216 = 65536 = 64*1024).
Tag
Block 0
Tag
Block 1
m
Tag Block 2 Block 127
Block 128
co
Block 129
a.
BB
Tag
Block 127
iy
(b) Cache memory
Block 255
Block 256
un
Block 257
sD
Block 4095
al
m
transferre
The advantage of direct mapping is that it is simple and main mem
inexpensive. The main disadvantage of this mapping is that cache. Th
co
there is a fixed cache location for any given block in main generally
typically ra
memory. If a program accesses two blocks that map to the
to 256 by
a.
same cache line repeatedly, then cache misses are very high.
iy
This type of mapping overcomes the disadvantage of direct
un
mapping by permitting each main memory block to be loaded
12 bits 4 bits
Tag Word
sD
Block 1
Tag
Tag
al
Tag
Tag
ri
Block i
to
Tu
m
ed between the algorithms are required to maximize its potential.
emory and the
he cache line is 3.9.3 Set-Associative Mapping
co
y fixed in size,
anging from 16 Set-associative mapping combines the best of direct and
ytes. associative cache mapping techniques. In this mapping,
a.
cache memory blocks are grouped into sets. It allows a
block of the main memory to reside in any block of a
iy
specific set. Thus, the contention problem which usually arise
in direct mapping can be avoided by having a few choices for
placement of block. In the figure 3.9, set-associative mapping
un
technique is shown. Here, the cache is divided into sets where
each set contains 2 cache blocks. Thus, there will be 64 sets
in the cache of 2048(2K) words. Each memory address is
sD
on. If there is 128 block per set, then it requires no set bits and
it becomes a fully associative technique with 12 tag bits. The
other extreme condition of one block per set is the direct-
mapping method. A cache that has N blocks per set is referred
Block 63
Block 64
Block 65
m
Block 4
Block 127
co
Block 128
Block 129
a.
Block 126
Block 4095
Block 127
iy
(c) Main memory
(b) Cache memory
For direct mapping where there is only one possible line for a block
of memory, no replacement algorithm is required. For associative
and set associative mapping, however, an algorithm is needed. At
this point, we will describe the one most common replacement
algorithm, called LRU algorithm. This algorithm is easy to understand
and provides a good background to understand the more advanced
replacement algorithms. Several other replacement algorithms are
also used in practice such as: first in first out replacement algorithm,
m
random replacement algorithm etc.
co
Least Recently Used (LRU) : Due to locality of reference,
programs usually stay in localized areas for reasonable period of
time. So there is a high probability that the blocks that have been
a.
referenced recently will be referenced again soon. Therefore, it
overwrites block in cache memory that has been there the longest
iy
with no reference to it. That block is known as the least recently
used block and the technique is known as least recently used
un
(LRU) replacement algorithm. In this method, a counter is associted
with each page in the main memory and it is incremented by 1 at
fixed intervals of time. When a page is referenced, its counter is set
sD
Before coming to the point virtual memory, let us review some basics:
physical memory(RAM) versus secondary memory(disk space).
to
m
3.11.1 Paging
co
Paging is a method for achieving virtual memory. It is the most
common memory management technique. Here, the virtual
address space and memory space are broken up into several
a.
equal sized groups. To facilitate copying the virtual memory
into the main memory, the operating system divides virtual
iy
address space into fixed size pages. Physical address
space(memory space) is also broken up into fixed size page
un
frames. Page in virtual address space fits into frame in physical
memory. Each page is stored on a secondary storage (hard
disk) until it is needed. When the page is needed, the operating
sD
as paging.
ri
0
Page 0
1024
Page 1
2048
Page 2
3072
0
Page frame 0
1024
Page frame 1
2048
Page frame 2
3072
m
Page frame 3
Page 15 4095
16383 (b)
(a) 4K Main memory (RAM) is
co
16K Virtual memory is divided into divided into 4 page frames each
16 pages each of size 1K of size 1K
a.
memory space
iy
Out of these 16 pages, only 4 pages can accommodate in the main
memory at a time. The physical address must be specified with 12
un
bits (as 212= 4096 = 4K). At any instant of time 4096 words of
memory can be directly accessible but they need not correspond to
address 0 to 4095. For the execution of the program, the 14 bit
sD
0100 0 0 0 0 0 1 0 1 1 0
Fig.3.11
m
page table contains the list of all the 16 pages of virtual memory and
the page frame number where the page is stored in the main memory.
co
A presence bit is also there to indicate whether the page is present
in the main memory or not. If the page is present in the main memory
then the presence bit will be 1 otherwise it will be 0. For example,
a.
if the pages 2, 4, 5, 8 are present in the main memory, the the
content of memory page table will be as shown in the figure 3.12.
iy
Let us assume that the virtual page 4 is in main memory. The
un
first 4 bits of a virtual address will specify the page number where
the word is stored. Similary, the first 2 bits of a physical address will
specify the page frame number of the memory where the word is
sD
stored. The 4 bit page number is taken from the virtual address and
it is compared with the memory page table entry. If the presence bit
against this page number is 1, then the block number (2 bits) is
al
0100 0 0 0 0 0 1 0 1 1 0
0001 0
0 0 1 0Page 02 0 1
0011 0
0100 01 1
0101 10 1
0110 0
m
0111 0 01 0 0 0 0 0 1 0 1 1 0
1000 11 1 12 bit Physical address
co
1001 0
1010 0
00 Page frame 0
1011 0
01 Page frame 1
a.
1100 0
10 Page frame 2
1110 0
11 Page frame 3
iy
1111 0
4K Main memory
Presence bit
un
Page frame number
Page number
mapped cache.
(ii) Paging is a method of achieving ___________.
(iii) Virtual address is also known as ____________ address.
Tu
m
on which electronic data are stored. Data can be stored on both
sides of the disk. Several disks can be stacked on top of the other
on a spindle. These disks are actually rotating platters with a
co
mechanical arm that moves a read/write head between the inner and
outer edges of the disks surface. A read/write head is available on
each of the disk surface. A magnetic disk works on the principle of
a.
magnetic charge.
iy
The disks rotates at very high speeds. During a read/ write operation,
only the disk rotates and the head is always stationary.
un
3.12.1 Data Organization in Magnetic Disk
Tracks
Read/Write head
m
A write head is used to record the data bits as magnetic spots
co
on the tracks and these recorded bits are detected by a change
in the magnetic field produced by a recorded magnetic spot,
a.
on the disk surface, as it passes through a read head. The
data on the disk surfaces are accessed by specifying the
iy
surface number, track number and sector number.
Some magnetic disk uses a single read/ write head for each
un
disk surface while others uses separate read/ write heads for
each track on the disk surface. Accordingly, the read/write head
sD
may be movable or fixed. If the magnetic disk uses a single
head for each disk surface then the read/write head must be
able to be positioned above any track in the surface. Therefore,
al
the surface. All the heads are mounted on a rigid arm that
extends across all tracks.
Tu
(a)
m
co
(b)
a.
Head
iy
The read and write operations starts at the sector boundaries.
The bits of data are recorded serially on each track. To read
un
or write data, the head must be first positioned to that particular
track.
sD
The heads should be always be at very small distance from
the moving disk surfaces so that the high bit densities and
also due to this more reliable read/write operations can be
al
performed.
can be more densely stored along the tracks and the tracks
can also be closer to one another. These Winchester disk
units has a larger capacity for storing data.
m
operation. It also provides an interface between the disk drive
and the bus that connects it to the computer system.
co
The disks which are attached to the computer units and cannot
be removed by the occasional users are called the hard disk.
a.
Those which can be inserted and removed from the system
easily by the users are called floppy disks.
iy
3.12.2 Disk Access Time
un
To perform a read or a write operation, the read/write head is
first positioned on the desired track and sector. In fixed-head
systems the head over the desired track is selected
sD
The sum of these two delays, that is the seek time and the
latency time is called the disk access time.
m
storing.
co
Read/Write Head
Access arm
Surfac assembly
e5
a.
iy
Surface 4 un
Surf ace
3
Surface 2
sD
Surf ac
e1
al
Surface 0
ri
to
Direction of movement of
access arm assembly
m
e. Data is stored as __________ in the magnetic disks.
f. In case of Winchester disks, the disks and the heads
co
are ________.
g. The three vital parts of a disk system are ___, ____
and ________.
a.
h. ______ is the time required to position the head to the
desired track.
iy
i. The seek time and the rotational delay together is called
the _____ time.
un
3.13 RAID
The rate in the increase in the speeds of the processor and the
sD
operate in parallel. This lead to the use of arrays of disks that operate
independently and in parallel.
to
m
the disks.
co
in several separate disks by breaking the file into a number of smaller
pieces. Whenever a file is read, all the disks delivers their stored
a.
portion of the file in parallel. So the total transfer time of the file is
equal to the transfer time that would be required in case of a single
iy
disk system divided by the number of disks used in the array.
The disk is divided into strips, which may be physical blocks or
some sectors.
un
In the RAID 1 scheme identical copies of the same data are stored
sD
on two disks. Data stripping is also used here but the strip is mapped
to both the physical disks. The two disks are mirrors of each other.
If a disk failure occurs, then all the operations on its data can be
al
the errors. The data are not fully duplicated RAID 2 requires fewer
disks but is still costly. RAID 5 distributes the parity strips across all
Tu
disks.
m
single logical unit using special hardware or software.
b. The full form of RAID is Redundant Array of Inexpensive
Disks.
co
c. The levels of the RAID are hierarchy of one another.
d. All the disks in RAID behave as one logical disk drive.
e. In case of a disk failure, the data cannot be recovered in
a.
RAID.
f. RAID1 level uses parity calculations for error-recovery.
iy
12. What are the three key concepts in RAID?
un
3.14 OPTICAL MEMORY
sD
very reasonable cost. The audio CD (Compact Disk) were the first
application of such technology.
ri
In the mid 1980’s Sony and Philips companies developed the first
to
generation of CDs. The CDs are non-volatile and they could not be
erased.
Tu
m
co
a.
iy
un
sD
al
In the past few years a variety of optical disk systems have been
to
introduced:
CD (Compact Disk)
Tu
m
stands for bytes. One CD can hold 650MB of data or 300,000
pages of text. Most CDs are read only, which means you cannot
co
save data to the disk. This device is usually not used as a
primary storage device for data.
a.
An optical disk is mounted on an optical disk drive for reading/
writing of information on it. An optical disk drive contains all the
iy
mechanical, electrical and electronic components for holding
an optical disk and for reading/writing of information on it. That
un
is, it contains the tray on which the disk is kept, read/write
laser beams assembly, and the motor to rotate the disk. Access
time for optical disks are in the range of 100 to 300 milliseconds.
sD
Disadvantages are -
Tu
m
are video and data storage. DVDs are of the same dimensions
as compact disks (CDs), but store more than six times as
co
much data.
Variations of the term DVD often indicate the way data is stored
a.
on the disks : DVD-ROM (read only memory) has data that
can only be read and not written; DVD-R
iy
and DVD+R (recordable) can record data only once, and then
function as a DVD-ROM; DVD-RW (re-writable), DVD+RW,
un
and DVD-RAM (random access memory) can all record and
erase data multiple times. The wavelength used by standard
DVD lasers is 650 nm thus, the light has a red color.
sD
DVD -5 4.7GB 1 1
DVD -9 8.54GB 2 1
m
DVD –R 4.7GB 1 1
co
DVD –R 9.4GB 1 2
a.
DVD –RW 9.4GB 1 2
iy
3.15 MAGNETIC TAPE un
The magnetic tape is mostly used for off-line storage of large amount
of data. They are the cheapest and the slowest methods for data
storage.
sD
Earlier the tapes had nine tracks each but the newer tape systems
use 18 or 36 tracks, corresponding to a word or a double word. A
Tu
RECOR RECORD
D
File Record 7 or 9
Mark bit
Fig. 3.17: Data Organization on a Magnetic Tape
m
can be stopped. Each record has an identification pattern both at the
beginning and at the end. The starting bit pattern gives the record
number and when the tape head reach the bit pattern at the record
co
end, it comes to know that there is a gap after it.
a.
is always marked by a file mark as shown in the figure above. The
gap after the file mark can be used as a header or identifier for the
iy
file. There are gaps after each record to distinguish between them.
In addition to the read and write commands, there are a number of
un
other control commands executed by the tape drive, which includes
the following operations :
sD
a. Rewind tape
b. Erase tape
c. Forward space one record
al
The end of the tape is marked by EOT (End of Tape). The records
in a tape may be of fixed or variable length.
Tu
m
(d)Magnetic tape is a ______ access device storage.
(e)Tape motion stops when it reaches a ______.
co
(f) Data on a tape are organized in the form of______.
(g)A tape is written from the ______ to the end.
a.
14. Find True or False
i) Data storage in optical memory is very costly.
iy
ii) CDs are non-erasable.
iii) WORM is an optical disk product.
un
iv) Recorded data can be read with the help of a laser beam
by the processor.
Main memory is of two types: Random Access Memory (RAM)
Tu
m
cannot be modified easily. There are 5 types of ROM such as
ROM, PROM, EPROM, EEPROM and Flash EEPROM.
The cache is a small amount of high-speed memory. Compared
co
to the size of main memory, cache is relatively small. It operates
at or near the speed of the processor. Cache memory contains
a.
copies of sections of the main the memory and is very expensive
compared to the main memory.
iy
Caches exploit the property of locality which states that
applications tend to re-use data which they have recently used.
un
This is seen in two forms: temporal locality and spatial locality.
In direct mapping, one block from main memory maps into only
one possible line of cache memory. As there are more blocks
ri
m
where a block can occupy any line within that set. Replacement
algorithms may be used within the set.
co
Virtual Memory is simply the operating system using some
amount of disk space as if it were real memory.
a.
Paging is a method of achieving virtual memory. Virtual space
is broken up into equal size pages and memory space is broken
iy
up into equal size page frames.
The magnetic disk is divided into tracks, sectors and cylinders
un
where the data are recorded.
m
3.18 ANSWERS TO CHECK YOUR
co
PROGRESS
1. The characteristics of SRAM and DRAM are as follows ::
a.
• Simplicity :: SRAM are more simple as no refresh circuitry are
needed to keep their data intact.
iy
• Speed :: SRAM is faster than DRAM
• Cost :: SRAM is costlier than DRAM
un
• Size :: SRAM is larger compared to DRAM.
c) Access d) Cycle
e) less f) processor
6. (i) Miss (ii) Spatial locality (iii) external (iv) hit ratio
9. Tracks are the concentric set of rings along which the data is
stored in magnetic disks.
m
forms a logical cylinder.
co
c) read/write head, d) fixed, movable, e) magnetic spots,
f) sealed, g) disk, disk drive, disk controller, h) seek time,
a.
i) disk access time
iy
12. The three key concepts in RAID are -
(i) mirroring – writing identical data to more than one disk
un
(ii) striping – splitting of data across more than one disk.
(iii) error-correction – uses parity for error detection and recovery.
sD
13. a) 0.5, 0.25, b) read/write head, c) inter-record, d) sequential,
m
11. What do you mean by locality of reference? What are its types?
12. What is Winchester disks? Briefly discuss its advantages.
co
13. Differentiate between fixed-head systems and movable head-
systems.
14. Explain optical memory with example.
a.
15. Describe in brief the magnetic tape systems.
iy
16. Differentiate between CD-ROM and DVD.
17. List the types of optical storage.
un
18. List the major types of magnetic storage.
19. Write short notes on the following:
a) Hard Disk
sD
b) RAID
c) Optical Memory
al
d) CD-ROM
e) DVD
ri
f) Magnetic Tapes
to
*****
Tu
UNIT STRUCTURE
m
4.3.1 Arithmetic and Logic Unit
4.3.2 Control Unit
co
4.3.3 CPU Registers
4.4 BUS and its Characteristics
4.4.1 System BUS
a.
4.4.2 BUS Design Elements
4.5 Instruction Format
4.6
4.7
Addressing Modes
Interrupts: Concept and Types
iy
un
4.8 Instruction Execution Cycle with Interrupt
4.9 Hardwired and Micro Programmed Control
4.9.1 Hardwired Control
sD
4.2 INTRODUCTION
m
unit, it has been discussed about the building blocks of the CPU in
details.
co
Obviously, Buses are also playing an important role as a
communication medium between the different modules of the
computer and so, in this chapter it has been discussed about the
Bus and its design elements as well.
a.
A computer system works on basis of the instructions stored in the
memory and hence there is a need to know about the layout of the
iy
instruction and working modes of the instructions and those are
discussed in this unit.
un
In this unit, the intermediate steps that are needed to execute a
computer instruction are discussed thoroughly and as well as
about the interrupts and about the situation of the instruction cycle,
if interrupt occurs.
sD
Finally, a special focus has given to the Control Unit, a sub division
of CPU by looking the hardware to software design issues of the
Control Unit. RISC and CISC architecture also have come to focus
in this unit.
al
ri
CPU stands for Central Processing Unit, which is the principal part
of a computer. Once some tasks will be submitted to the computer
through the input devices, then the CPU is the responsible for
Tu
performing the operations over the given tasks and for giving the
result to the outside world through the output devices. So, the part
of the computer that executes program instructions is known as the
Central Processing Unit (CPU) or simply Processor. Now at this
point, a question arises that there should have something within
the CPU for executing the program instructions as well as
something to provide the way to carry out the instructions from the
rest of system or simply something for controlling the sequence of
the operations. In the next paragraph we will be able to get the
answer or to know about the components that makes a CPU.
m
Memo Memory
ry
co
System System Bus
a.
Bus
ALU
Registe
ALU
rs
iy
Registers
un
Control Control
Input
Inpu Outpu Output
t
Unit Unit t
sD
CP CPU
Addition
Subtraction
Logical AND
Logical OR
Logical Exclusive OR
Complement
Increment
Decrement
Left Shift, Left Rotate, etc
needs to bring to the CPU and needs to store in a high speed tiny
storage element, called register and then the actual operation will
be done by the ALU. Then the modified number may be stored
back into the memory or retained in the register for the immediate
use. That means, data are presented to the ALU in registers and
the results of an operation are stored in registers. Executing one
operation by the ALU, ALU may also set the flags as the result of
an operation. For example, if in above mentioned decrement
operation done by the ALU, the result becomes zero, then ALU will
set the ZERO flag.
m
In the figure-4.2, it has been shown that how ALU is
interconnected with the rest of the CPU.
co
a.
Control Flags
Unit ALU
Registers iy Registers
un
sD
The control unit, the name itself reflects its functionality that it is a
control center for sending control signal to other units to carry out
to
their job and receiving the status of the other unit. The operations
that will be performed by the ALU must be coordinated in some
way and this is the one of the functionality of the control unit. It
Tu
m
is required. The basic needs of control unit are as follows-
co
execute the micro-operations in an ordered way, based
on the program being executed.
a.
operations by triggering control signal.
iy
not known to us that how the control unit works by issuing the
control signal. Let us have a look at the figure-4.3.
un
sD
al
ri
to
outputs are Control signals within the CPU and Control signals to
control bus.
m
units and peripherals, such as interrupt signal,
acknowledgment signal etc.
co
Control signals within the processor: These signals
cause the data to move from one register to another
and for triggering the appropriate ALU operation.
a.
Control signals to control bus: Signals that sent to
the Memory or to the other I/O module.
iy
Now, let us consider that how to fetch the instruction from
memory. For that Memory Address Register (holds the address of
the memory location to be accessed currently) must be filled by the
un
content of the Program Counter (holds the address of the next
instruction to be executed) and that is done by activating a control
signal from the Control Unit that opens the gates between the bits
of the PC and MAR. Next, a control signal will be issued for
allowing the content of MAR onto the address bus, and then a
sD
memory read control signal will be issued in the control bus to the
memory from the control unit. Now, next control signal is issued by
the control unit to open the gates to move the content of the data
bus in MBR (Memory Buffer Register). Next one control signal will
al
can be moved to the IR and hence fetching is over. So, from the
above mentioned fetching example, it is noticed that why actually
control unit is needed.
to
Tu
4.3.3 REGISTERS
m
Data Register- Data Registers are a bit strict that it can
hold only data, but cannot be employed in the
co
calculation of an operand address.
a.
particular addressing mode. For example,
iy
the address of the base of the segment, when a
machine supports segmented addressing), Index
register (Used for indexed addressing), Stack pointer
un
(dedicated to hold the address of the top element of the
stack in user visible stack addressing)
m
- Sign Flag: It is set to 1, if the result of the last
arithmetic operation is negative; else it is 0.
co
- Carry Flag: It is set to 1, if the execution of the last
arithmetic operation produces a carry, otherwise it is
0. The carry flag is set or reset in case of addition as
well as subtraction. In case of addition, if the
a.
operation results carry and in case of subtraction, if
the borrow occurs in the operation, then carry flag
will be set.
-
iy
Zero Flag: It is set to 1, if the result of the last
arithmetic or logical operation performed is zero;
un
otherwise set to 0.
0.
m
(i) CPU stands for__________________________________.
(ii) Major building blocks of CPU are _________________,
_______________and ______________ also treated as
co
major building block.
(iii) ALU stands for __________________________________.
(iv) The basic needs of control unit are __________________
a.
and _____________________.
(v) The Control unit performs one micro-operation to be
___________________________________.
iy
executed in each clock pulse, which is known as
m
Now, one question may arise that what is a Bus? Well, Bus
is a communication pathway connecting two or more modules. The
information transmission in Bus is broadcast in nature and a Bus
co
consists of multiple communication lines.
a.
4.4.1 SYSTEM BUS
The Bus which is connecting the major three components of a
iy
computer (CPU, Memory and I/O), is called System Bus. The
System Bus generally consists of 50 to 100 separate lines. The
System Bus is divided into three categories based on their
un
functionalities as follows-
Data Lines: These are the lines to carry the data from
one module to another module of the system. Data
al
Some of the basic design elements that can differentiate buses are
as follows-
m
In Centralized Arbitration, one single hardware
device is responsible for selecting the Bus Masters at
co
a given time. In Distributed Arbitration, there is no
central device responsible for arbitration, but each
module contains access control logic and modules act
together to share the Bus.
a.
Timing: Timing refers to the way in which events are
coordinated on the Bus. Two types of Timing exist-
iy
Synchronous Timing and Asynchronous Timing.
m
Zero-Address Instructions
co
OPCODE-field NIL
a.
For example,
One-Address Instructions iy
PUSH A i.e. , insert the content of A into the stack
un
OPCODE-field Address
1
sD
For example,
ADD B i.e. Accumulator Accumulator + B
al
Two-Address Instructions
ri
For example,
ADD A, B i.e. A A+B
Tu
Three-Address Instructions
For example,
ADD A, B, C i.e. A B+C
m
functions. But, problem is with the space, because more space is
required for more opcodes, operands and for more addressing
modes, and hence we say there is a trade off. Again, one more
co
important point is that whether the instruction length should be
equal to the memory transfer length or should be multiple of the
transfer length.
a.
Allocation of Bits: How to allocate the bits of the instruction for
opcode and address field. If more number of bits allocated for the
opcode part, then more number of opcodes can have, but reduces
iy
the number of bits available for addressing. Hence, here is also
trade off in allocation of bits.
un
Some of the factors for determining number of the bits for
the addressing part of the instructions are-
m
address space of 2N can be addressed. The
disadvantage is that instruction execution requires
three or more memory references is required to fetch
co
the operand.
a.
difference is that the address field of the instruction
refers to a register rather than a memory location. So,
only 3 or 4 bits are used as address field to reference
iy
8 to 16 general purpose registers. The advantages of
register addressing are small address field is needed
in the instruction and no time-consuming memory
un
references are need. The disadvantage of register
addressing is that the address space is very limited.
needed.
m
The advantage of this mode is flexibility in addressing
and disadvantage is complexity.
co
Stack addressing mode: Stack is a linear array of
locations referred to as last-in first out queue. The
stack is a reserved block of location, appended or
deleted only at the top of the stack. Stack pointer is a
a.
register which stores the address of top of stack
location. This mode of addressing is also known as
implicit addressing. Here the effective address is the
iy
top of the stack, so no memory reference which is the
advantage. It’s disadvantage is limited applicability.
un
sD
al
ri
to
Tu
m
(ii) The bus connecting CPU, Memory and I/O is known as
_____________.
co
(iii) System bus is composed of ___________, _________ and
______________.
(iv) Two types of method of arbitration (bus design elements)
a.
are _______________ and _____________arbitration.
(v) An Instruction is composed of ______________ and
____________.
iy
(vi) The addressing modes in which data is directly available as
an operand, is known as _______________.
un
3. State whether the following statements are True or False
sD
Multiplexed Bus.
(ii) In asynchronous Timing, the occurrence of event on the Bus
is determined by a clock.
ri
(iii) More number data lines lead to more system capacity and
more number of address lines lead to better system
to
performance.
m
Interrupt Request Signal and the signal issued in response to the
interrupt request to let inform the source of the interrupt request
that interrupt request is granted, is known as Interrupt
co
Acknowledge Signal. Again, the routine executed in response to an
interrupt request is known as Interrupt Service Routine. Interrupt
Service Routine or ISR is similar to sub-routine of a program, but
may not have any relationship with the program.
a.
Now, let us take an example scenario for transferring the
control of the CPU execution from one program segment to
iy
another one program segment or routine, through interrupt, and
let’s have a look to observe that how to make it possible.
un
sD
al
ri
In the above figure 4.4, it has been shown that the interrupt
request arrives during execution of the ith instruction of the program
1 (i.e. COMPUTE routine) and then the CPU suspends the
Tu
m
machine instruction, or reference outside a user’s allowed
memory space.
co
Timer: Generated by the timer within the processor. This
allows the operating system to perform certain functions on
a regular basis.
a.
I/O: Generated by an I/O controller, to signal normal
completion of an operation or to signal a variety of error
conditions.
iy
Hardware Failure: Generated by a failure such as power
failure or memory parity error.
un
Handling Multiple Interrupts:
At this point, we are very much familiar with what an interrupt is,
sD
m
itself interrupted. When the interrupt service routine for
the higher priority interrupt will be completed, then the
interrupt service routine of the lower priority will be
co
resumed first and finally control comes to the user
program. So, it’s a nested kind of nature of the transfer
of the control of the CPU.
a.
4.8 INSTRUCTION EXECUTION CYCLE AND INTERRUPT
iy
We know that the computer is for executing the instructions of the
un
program stored in the memory and the processing required for
executing a single instruction is known as Instruction Cycle and
one important thing is that one instruction cycle consists of two
sub-cycles or steps-
sD
Execute Cycle.
ri
to
Tu
m
4. If operation involves a reference to an operand in memory
or in I/O, then calculate the address of the operand and
read the operand form memory or I/O.
co
5. Perform the operation indicated in the instruction.
6. Store the result back into the memory or out through the I/O
a.
So, the important thing is that one more sub-cycle may be
iy
introduced within the execute cycle, called indirect cycle. Because,
after an instruction is fetch, it is examined to determine if any
indirect addressing is involved and if so, the required operand is
un
fetch using indirect addressing. Therefore, if the execution of an
instruction involves one or more operands in memory, each of
which requires a memory access and it is known as indirect cycle.
m
(i) The routine executed in response to an interrupt request is
known as ___________________.
co
(ii) Methods for handling multiple interrupts are
_____________ and _________________.
a.
(iii)The module that can handle the interrupt requests from
various devices and allow one by one to the processor and
is known as ______________________.
iy
(iv) The processing required for executing a single instruction is
known as ___________________.
un
(v) Reading the instruction from the memory to Instruction
register, called __________, and executing the instruction
from the instruction register, called _____________.
sD
al
execute cycle and interrupt cycle, with only fetch and execute
cycles always occurring. Now, each of the smaller cycles i.e. fetch
cycle, indirect cycle, execute cycle and interrupt cycle, involves a
series of steps, each of which involves the processor registers and
these steps are called Micro-Operations. Therefore, a micro-
operation is an elementary CPU operation, performed during one
clock pulse and an instruction consists of a sequence of micro-
operations.
m
µOp3- Increment the Program Counter by the length of
instruction.
i.e., PC PC + I, where I is the instruction size.
co
µOp4- Move the content of the Memory Buffer Register to
the Instruction Register.
i.e., IR (MBR)
a.
Therefore, it is found that four Micro-Operations are needed for the
fetch sub-cycle. Here, µOp1 will be done in time T1, µOp2 and
iy
µOp3 will be done in time T2 and µOp4 will be done in T3. Or,
µOp1 will be done in time T1, µOp2 will be done in time T2 and
µOp3 and µOp4 will be done in T3.
un
T1: MAR(PC) T1: MAR(PC)
Or,
T2: MBR Memory T2: MBR Memory
sD
PC (PC) + I
T3: PC (PC) + I
T3: IR (MBR) IR (MBR)
al
follows-
to
PC Routine_ Address
m
adding the content of location X with register R1, as follows-
ADD R1, X
co
So, for the execution of the above instruction, the following
Micro-Operations may occurs-
a.
T1: MAR (IR (Address))
Hardwired Control
Tu
m
co
a.
Figure- 4.7
Control unit takes instruction register, the clock, flags and control
iy
bus signals as input. Actually control unit does the use of opcode
and will perform different actions for different instructions. Here, as
shown in the figure-4.7, the opcode of the instructions available in
un
the Instruction Register will be the input to the decoder and will be
decoded by the decoder. So, the decoded output will be the input
to the control unit and hence there would be a unique logic input
for each opcode.
sD
m
1. For executing this microinstruction, turn on all the control
lines indicated by a 1 bit; and leave all other control lines off
indicated by a 0 bit. The resulting control signals will cause
co
one or more micro-operations to be performed.
2. If the condition indicated by the condition bits is false, then
execute the next microinstruction in sequence, else next
instruction to be executed is indicated in the address field.
a.
Now, let us examine the Functioning of Micro programmed
Control Unit in the following figure- 4.8.
iy
un
sD
al
ri
to
Tu
Figure- 4.8
m
2. The micro instruction or the control word, whose address is
specified by the control address register, is then transferred
to the control buffer register.
co
3. The content of the control buffer register generates control
signals and next address information for the sequencing
logic unit.
a.
4. The sequencing logic unit loads a new address into the
control address register based on the next control word
address field from the control buffer register and the ALU
iy
flags. The new address may be next instruction address or
may be a jump to a new routine based on a jump micro
instruction or may be jump to a machine instruction routine.
un
sD
m
The design constraints that lead to the development of
CISC give CISC instructions set some common characteristics:
co
A 2-opearand format, where instructions have a source and
a destination i.e., register to register, registers to memory,
and memory to register commands. It also provides
a.
multiple addressing modes for memory, including
specialized modes for indexing through arrays.
Variable length instructions where the length often varies
according to the addressing modes.
iy
Instructions which need multiple clock cycles to execute.
un
Complex instruction decoding logic, driven by the need of a
single instruction to support multiple addressing modes.
A small number of general purpose registers and several
special purpose registers.
sD
m
2. PROD- means finds the product of the two operands
located within the registers.
co
3. STORE- means moves data from a register to the memory
bank.
a.
above, the programmer needs to code four lines of assembly code
as follows-
LOAD A, 2:3
LOAD B, 5:2 iy
un
PROD A, B
STORE 2:3, A
sD
Architecture:
m
CHECK YOUR PROGRESS - 4
co
a.
5. Fill in the blanks.
(i) A
iy
______________is an elementary CPU operation,
performed during one clock pulse
un
(ii) In hardwired scheme, implementing the control operation is
through _____________.
for ________________.
m
Registers are divided into two categories- User Visible
Registers and Control and Status Registers.
The Bus which is connecting the major three
co
components of a computer (CPU, Memory and I/O), is
called System Bus
An instruction format is the layout of the bits of an
instruction, in terms of its constituent fields.
a.
The mechanism, by which the execution control of the
CPU temporarily gets jumped from one program routine
PROGRESS
to
2. (i) Bus
(ii) System Bus
(iii) address lines, data lines, control lines
(iv) centralized, distributed
(v) opcode, operand
(vi) Intermediate mode
m
Set Computer
(iv) number of instructions per program, number of cycles per
instructions
co
6. (i) false (ii) true (iii) false (iv) false (v) true
a.
4.13 FURTHER READINGS
iy
un
William Stallings: Computer Organization & Architecture,
Designing for Performance. Pearson, Prentice Hall.
sD
m
8. Discuss about the Hardwired implementation of the Control
Unit.
co
9. Discuss about the Micro Programmed Controlled
implementation of the Control unit.
a.
10. Write the differences between Hardwired and Micro
Programmed implementation of Control Unit.
iy
11. Write the differences between the RISC architecture and
CISC architecture.
un
12. What is the Windows 7 Start menu? Explain its parts.
*****
to
Tu
UNIT STRUCTURE
m
5.2 Introduction
5.3 Classification of Parallel Computation
co
5.3.1 The Single-Instruction-Single-Data (SISD)
5.3.2 The Single-Instruction-Multiple-Data (SIMD)
a.
5.3.3 Multiple-Instruction-Single-Data (MISD)
5.3.4 Multiple-Instruction-Multiple-Data (MIMD)
iy
5.4 Single-Instruction-Multiple-Data (SIMD)
5.5 Multiple-Instruction Multiple-Data (MIMD)
un
5.5.1 Shared Memory Organization
5.5.2 Message Passing Organization
5.6 Analysis and Performance of Multiprocessor Architecture
sD
5.2 INTRODUCTION
Today computers are not used for solving only scientific and military
applications but are used in all other areas such as banking, share
markets, Universities, reservation such as airline reservations and
train reservations. Although kind of data to be processed in each
m
application is different but there is one common factor among the
data processed in all these applications. The common factor is data
co
to be processed is huge and in most of the applications there is a
symmetry among the data in each applications. For example airline
reservation has to process the data that has information about
a.
passengers , flight and flight schedules. Thus instead of using high
end server like mainframe to process these related large volume of
data, we can use multiple programs running on different computers
iy
which are interconnected to obtain the same result. This and many
more detailed observations listed below give insight into benefit of
un
executing the programs in parallel.
m
• The single-instruction-single-data (SISD)
• The single-instruction-multiple-data (SIMD)
• Multiple-instruction-single-data (MISD)
co
• Multiple-instruction-multiple-data (MIMD)
a.
SIMD, MISD,MIMD are candidate applications for multiprocessor
architecture. A typical multiprocessor architecture performing the
iy
parallel processing is shown in the following Fig-5.1.
un
sD
al
ri
to
Tu
m
5.3.1 The Single-Instruction-Single-Data (SISD)
co
SISD architecture is equivalent to an machine executing program
entirely in sequential manner. The program of type calculating a
cuberoot of a given number. A single program computes the cuberoot
a.
on a given number. A Uniprocessor architecture is best suited for
this kind of programs. This is the classic “Von Neumann” architecture
iy
dating from 1945, where a control unit coordinates the traditional
“Fetch - Execute” cycle, obtaining a single instruction from memory,
un
feeding it to the processor for execution on the data specified within
the instruction, then storing the result back into memory. Although a
modern computers especially desktop do not adhere completely to
this arrangement,they are still classed as sequential rather than
sD
for greater performance in their use when one wants to employ their
full parallelism. SIMD perform excellently on digital signal and image
processing and on certain types of Monte Carlo simulations. Though
limited to a small number of applications such as image processing
and the solution of 2D and 3D field problems, the speed up factor is
significant in that it is almost directly proportional to the number of
m
processing elements. Fig-5.3 above gives the block diagram for SIMD
architecture.
co
a.
iy
un
sD
m
actual applications currently using this architecture. It has been argued
that Systolic Array processing may fit this criteria. Another applications
co
such as stream based processing may fit into this class. Fig-5.4
gives the block diagram for MISD architecture.
a.
5.3.4 Multiple-Instruction-Multiple-Data (MIMD)
iy
MIMD programs are by far the most common type of parallel
programs. If we extend the program of SIMD to compute cube root
un
for different set of numbers, wherein each set contains different data
types. One set may contain integer data type and other set may
contain floating-point numbers. In this architecture each processor
has it’s own set of instructions which are executed under the control
sD
m
co
a.
iy
Fig. 5.4 Block diagram of MISD architecture
un
sD
al
ri
to
Tu
5.4 SINGLE-INSTRUCTION-MULTIPLE-DATA
(SIMD)
m
operations are most required operations in applications such as image
processing. There are two main SIMD configurations in parallel
co
processing. These two schemes are shown in Fig-5.6.
a.
iy
un
sD
* The ILLIAC IV was one of means each has its own local memory. When processors want to
the most infamous super communicate with each other they done through the interconnection
computers ever built
ri
m
are used. First processors 1 and 3 are connected to memory module
2. Next processor 3 and 4 are connected to memory module 3 .
Finally processor 4 and five are connected to memory module 4.
co
Now if we trace movement of data, data was first places into memory
module 2 by processor 1. Processor 3 transfer from memory module
2 to memory module 3. Then processor 4 translate from memory
a.
module 3 to memory module 4. Finally memory module 4 is read
from processor 5. The BSP** (Burroughs’ Scientific Processor) used
iy
the second SIMD scheme. In order to illustrate the effectiveness of
SIMD in handling array operations, consider the operations of adding
un
the corresponding elements of two one dimensional arrays A and B
and storing the results in a third one-dimensional array C . Assume
also that each of the three arrays has N elements.
Assume SIMD scheme 1 is used. The N additions required
sD
can be done in one step if the elements of the three arrays are
distributed such that M0 contains the elements A (0), B (0), and C
(0), M1 contains the elements A (1), B (1), and C (1), . . . , and MN-
al
the elements of the resultant array C will be stored across the memory
modules such that M0 will store C (0), M1 will store C(1), . . . , and
to
m
architectures has multiple processors and multiple memory modules
connected together through interconnection network. We can divide
MIMD architecture into two broad categories one shared memory
co
based and another message passing. Fig-7 illustrates the general
architecture of these two categories. In shared memory there is
central shared memory and processors exchange information
a.
through this central shared memory. Processors communicate
through a bus and cache memory controller. As multiple servers are
iy
accessing the shared memory, shared memory need to be an
expensive multiported memories supporting simultaneous reads and
un
writes. Some example architectures that use Shared memory for
information exchange are Sun Microsystems multiprocessor servers,
and Silicon Graphics Inc. multiprocessor servers.
sD
al
ri
to
m
nCUBE, iPSC/ 2, and various Transputer-based systems. Concept
of Internet is similar to multiprocessor systems using message
passing. In Internet a node is either Internet servers or clients and
co
information is exchanged using messages.
a.
Distributing memory is one way of efficiently increase the number
of processors managed by a parallel and distributed system. Instead
iy
if we use centralized memory then by increasing number of
processors results in greater conflicts. Thus to scale to larger and
un
larger systems (as measured by the number of processors) the
systems had to use distributed memory techniques. These two forces
created a conflict: programming in the shared memory model was
easier, and designing systems in the message passing model
sD
provided scalability.
Access Control.
Access control determines which process can access which
of possible resources. This list is often maintained in separate table.
For every access request issued by the processors to the shared
m
table during execution. Access control mechanism along with
synchronization rules guide the sharing of shared memory among
multiple processors.
co
Synchronization.
Synchronization controls and coordinates access of shared
a.
resources by multiple processors. Appropriate synchronization
ensures that the information flows properly and ensures system
iy
functionality. Many synchronization primitives such as semaphores
are used along the shared memory access.
un
Protection
Protection is a system feature that prevents processes from
making arbitrary access to resources belonging to other processes.
sD
One must not confuse between access control and protection. Access
control is something to determine what are you allowed to do. This
has to do with the policy making regarding who are all can use which
al
Sharing
Sharing and protection are incompatible; sharing allows
Tu
m
processors. In this requests arrive at the memory module through
its two ports. An arbitration unit within the memory module passes
requests to a memory controller. If the memory module is not busy
co
and a single request arrives, then the arbitration unit passes that
request to the memory controller and the request is granted. During
this time module is placed in the busy state . If a new request arrives
a.
while the memory is busy , then requesting processor put the request
in the queue until the memory becomes free or it may repeat its
iy
request sometime later. un
Depending on the interconnection network, a shared memory
systems are classified as: Uniform Memory Access (UMA),
NonUniform Memory Access (NUMA), and Cache-Only Memory
Architecture (COMA). In the UMA system, a central shared memory
sD
m
processor and its local memory. Nodes are typically able to store
messages in buffers (temporary memory locations where messages
wait until they can be sent or received), and perform send/ receive
co
operations at the same time as processing. Simultaneous message
processing and message processing are handled by the underlying
operating system. Each nodes in this system are interconnected in
a.
a many ways ranging from architecture-specific interconnection
structures to geographically dispersed networks. As stated earlier,
iy
message passing approach is scalable. By scalable, its meant that
the number of processors can be increased without significant
un
decrease in efficiency of operation. Two important design factors
must be considered in designing interconnection networks for
message passing systems. These are the link bandwidth and the
network latency . The link bandwidth is defined as the number of bits
sD
that can be transmitted per unit time (bits/ s). The network latency is
defined as the time to complete a message transfer.
al
fashion with the header flit moves first followed by remaining flits.
When the header flit is blocked due to network congestion, the
Tu
remaining flits are blocked as well. This not only reduces the
unnecessary traffic but also decrease time for transmission of
messages.
Interconnection Networks
There are many ways of classifying interconnection networks.
Classification is often based on mode of operation, control system,
switching techniques used and topology used.
m
According to the control strategy, Interconnection networks
can be classified as centralized versus decentralized. In centralized
control systems, a single central control unit is used to oversee and
co
control the operation of the components of the system. In
decentralized control, the control function is distributed among different
components in the system.
a.
Interconnection networks can be classified according to the
switching mechanism as circuit versus packet switching networks.
iy
In the circuit switching mechanism, a complete path has to be
established prior to the start of communication between a source
un
and a destination. The established path will remain in existence during
the whole communication period. In a packet switching mechanism,
communication between a source and destination takes place via
messages that are divided into smaller entities, called packets. On
sD
their way to the destination, packets can be sent from one node to
another in a store-and-forward manner until they reach their
destination. While packet switching tends to use the network
al
Single bus
Tu
Crossbar networks
m
Each crossbar switch in a crossbar network can be set open
or closed, providing a point-to-point path between processor and
memory module. On each row of a crossbar mesh multiple switches
co
can be connected simultaneously but only one switch in a column is
opened at any point of time. This provides a processor to interact
with multiple memory modules simultaneously but one memory
a.
module can communicate with only one processor at any point of
time.
iy
un
sD
al
ri
to
Tu
Multistage networks
m
multiple such stages are used for interconnections. This provides
reliability and efficiency by deriving the paths through permutation
among different stages. Banyans, Clos , Batcher sorters are
co
examples of this interconnections. Following figure shows one such
multistage network.
a.
iy
un
sD
al
ri
Hypercube Networks
m
using multiprocessors as compared to a single processor? This
question can be formulated into the speed-up factor defined below.
co
S(n) = speed-up factor
Execution time using a single processor
= --------------------------------------------------------------
a.
Execution time using n processors
iy
A related question is that how efficiently each of the n processors is
utilized. The question can be formulated into the efficiency defined
below.
un
E(n) = Efficiency
sD
S (n)
E(n) = 100%
n
E(n) =100%. The assumption that a given task can be divided into n
equal subtasks, each executed by a processor, is unrealistic. Even if
Tu
we assume that time taken for a doing task and the time taken to
execute the task using multiprocessor, then also we cannot achieve
100 % efficiency. Because time of executing job using multiprocessor
not only involves time for execution but also time for dividing the task
into subtasks, assigning them to different machines, time taken for
multiprocessor communication and time for combining the results.
m
and engineering and problems too large to solve on one computer,
may use 100s or 1000s of computers to solve these.
co
Many companies in the 80s/90s “bet” on parallel computing and failed
to make profit because computers got faster too quickly. What initially
thought of computer as slow computing device as taken a
a.
momentum according to Moore’s law and become very fast computing
device such that it overshadowed the concept of parallel processing.
iy
But we still claim the importance of parallel processing because
following two reasons :
un
There are more applications that can take
benefit of parallelism
Moore’s law does not hold any more
sD
Further, Moore’s law no longer holds good. Following graph gives the
number of transistors used in a processor as the years progressed.
Tu
m
co
a.
iy
un
sD
m
b) Concepts of Multiprocessors and pipelining are same
c) Multiprocessors and Multicore architecture are same
d) SISD is a single processor architecture.
co
e) Many concepts of Multiprocessor are applicable in Multicore
architecture.
a.
2) Flynn’s classification while classifying systems and programs
into four groups taken in consideration
iy
a) Computers operating on single or set of instructions
b) Computers operating on single or set of data.
un
State which of the following is true
i) Only a
ii) Only b
iii) Both a & b
sD
m
•Two variants of MIMD are based on use of Shared memory or
message-passing techniques for interprocessor
communication.
co
5.9 ANSWER TO CHECK YOUR
a.
PROGRESS
iy
1)
a) True
un
b) False
c) False
d) True
sD
e) True
2) iii
al
m
multiprocessor architecture with a neat diagram.
5) State and explain message-passing variant of MIMD
multiprocessor architecture with a neat diagram.
co
6) State and explain associated problems with the shared
memory in shared memory variant of MIMD multiprocessor
architecture.
a.
7) Give five tuple representation of SIMD scheme.
8) Explain Wormhole routing for message-passing technique.
iy
9) Give reasons why concepts of Multiprocessor are still relevant
today.
un
*****
sD
al
ri
to
Tu
m
6.8 Merits and Demerits of Pipelining
6.9 Let Us Sum Up
co
6.10 Answers to Check Your Progress
6.11 Further Readings
6.12 Possible Questions
a.
6.1 LEARNING OBJECTIVES
iy
After going through this unit, you will be able to
un
define and elaborate what is pipelining
6.2 INTRODUCTION
al
So far, you have came across with lots of the interesting topics of
ri
niques. In this unit, we will discuss one of the most important tech-
nique called pipelining, which is used in the modern computers to
achieve extreme performance.
m
is done, it is transferred to the dryer and another load is placed in the
washing machine. When the first load is dry, we pull it out for folding
co
or ironing, moving the second load to the dryer and start a third load
in the washing machine. We proceed with folding or ironing of the first
load while the second and third loads are being dried and washed,
a.
respectively. We may have never thought of it this way but we do
laundry by pipeline processing.
iy
A Pipeline is a series of stages, where some work is done at each
un
stage. The work is not finished until it has passed through all stages.
Now let us see, how the idea of pipelining can be used in computers.
We know that the processor executes a program by fetching and
sD
executing instructions, one after the other. In the following figure, shows
two hardware unit of a processor, one for fetching the instructions and
the other for executing instructions. The intermediate storage buffer B1
al
stores the instruction fetched by the fetch unit. This buffer is needed
to enable the execution unit to execute the instruction while the fetch
ri
unit is fetching the next instruction. Thus, with pipelining, the computer
architecture allows the next instructions to be fetched while the pro-
to
performed.
Instruction Execution
fetch unit unit
m
given time period.
co
lapped movement of instruction to the processor to perform an instruc-
tion.
a.
But what will happen without pipelining ? Without a pipeline, a com- NOTE
puter processor gets the first instruction from memory, performs the
iy
operation it calls for, and then goes to get the next instruction from The IBM 7030, also
memory, and so forth. While fetching (getting) a instruction, the execut- known as Stretch,
un
ing part of the processor will remain idle. It must wait until it gets the was IBM’s f i rst
next instruction. Thus, it results slower in execution because less transistorized super-
number of instructions will be executed during a given time slot. com puter in 1961.
sD
IBM 7030 (the Stretch Computer) had attained its over-all peprformance
much slower t han
of 100 times by using the pipelining technique, whereas circuit im-
expected, it was the
ri
John Hayes provides a definition of a pipeline as it applies to a computer first CDC 6600 became
processor. operational in 1964.
Tu
m
the speed of an internal operation are handled along the stages of a pipeline.
clock whose clock rate is
determined by the 6.5 INSTRUCTION PIPELINING
co
frequency of an oscillator
crystal (quartz An instruction pipeline is a technique used in the design of computers
crystal) that, when to increase their instruction throughput (the number of instructions that
a.
subjected to an electrical can be executed in a unit of time). The fundamental idea is to split the
currant, send pulses,
processing of a computer instruction into a series of independent steps,
iy
called "peaks". The clock
speed (also called cycle),
with storage at the end of each step.
corresponds to the
un
Most of the modern CPUs are designed to operate on a synchronization
number of pulses per
signal which known as a clock signal. In each clock pulse, the fetch
second, written in Hertz
(Hz). Thus, a 200 MHz step and execution step should be completed. The above situation
sD
computer has a clock that discussed in the previous section (Fig. 6.1) can be illustrated by
sends 200,000,000 introduction of clock pulse as shown in the figure below :
pulses per second. Clock
In the first clock cycle, the fetch unit fetches an instruction
al
frequency is generally a
multiple of the system I1(step F1) and stores it in buffer B1 at the end of the clock cycle. In
frequency (FSB, Front- the second clock cycle, the instruction fetch unit proceeds with the
ri
board frequency.
is available to it in buffer B1 (step E1).
Tu
Clock cycle 1 2 3 4
Instruction
I1 F1 E1
F2 E2
I2
F3 E3
I3
m
is completed and instruction I2 is available. Instruction I2 is stored in
B1, replacing I1, which is no longer needed. Step E2 is performed by
co
the execution unit during the third clock cycle, while instruction I3 is
being fetched by the fetched unit.
a.
Thus, the above fetch and execute unit constitute a two stage pipeline.
Both the fetch and execute units are kept busy all the time after each
iy
clock pulse and also new information is loaded into the buffer after
each clock pulse.
un
Actually, the steps of processing an instruction is not limited within
two steps, instead four steps are used as shown in Fig. 5.4 to process
sD
an instruction. These steps are :
operands
So, it will need four different hardware units as shown in Fig 5.3 for
Tu
B1 B2 B3
Fig. 8.3 Hardware units need for Instruction Processing
Time
Clock cycle 1 2 3 4 5 6 7
Instruction
I1 F1 D1 E1 W1
m
F2 D2 E2 W2
I2
co
F3 D3 E3 W3
I3
F4 D4 E4 W4
a.
I4
iy
Information is passed from one unit to the next unit through a storage
un
buffer. The information in the buffers at the clock pulse 4 is given in
the following table :
sD
Buffers Contents
m
The potential increase in performance resulting from pipelining should
be proportinal to the number of pipeline stages. But in practical, this
co
is not absolutely correct because due to some other factors the
performace varies.
a.
A pipeline stage may not be able to complete its processing task
during the alloted time slot. The execution stage actually resposible for
iy
arithmetic and logic operation and one clock cycle may not be suffi-
cient for it. The following figure shows that in instruction I2 the Execu-
un
tion stage takes three cycles (4,5,6) to complete. Thus, in cycles 5
and 6, the Write stage must be told to do nothing, because it has no
data to work with.
sD
Time
Clock cycle 1 2 3 4 5 6 7 8 9
al
Instruction
I1 F1 D1 E1 W1
ri
I2
F2 D2 E2 W2
to
I3 F3 D3 E3 W3
F4 D4 E4 W4
Tu
I4
I5 F5 D5 E5
Fig. 6.5 Execution stage taking more than one clock cycle
Thus, the normal pipeline operation interrupted for two clock cycle,
which is called stalled. In stalled situation, the normal pipeline opera-
tion halt for more than one clock cycle. In the figure, we see that,
pipeline functioning resumes normally from the clock cycle 7.
m
6.7 PIPELINE HAZARDS
co
Pipeline hazards are situations that prevent the next instruction in the
instruction stream from executing during its assigned clock cycle. Then,
a.
the instruction is said to be stalled. When an instruction is stalled, all
instructions later in the pipeline than the stalled instruction are also
iy
stalled. Instructions earlier than the stalled one can continue. No new
instructions are fetched during the stall. This condition is clearly
un
illustrated in the Fig. 5.5. Thus, the hazards reduces the performance
gained by pipelining.
a) Structural hazards
b) Data hazards
al
c) Control hazards
ri
a) Structural hazards :
b) Data hazards :
Data hazard arises when the output of one instruction is fed to the
input of the next instruction. A data hazard is any condition in which
either the source or the destination operands of an instruction are
m
not available at the time expected in the pipeline. As a result, some
operation has to be delayed, and the pipeline stalls. We have already
co
discussed a situation (in Fig. 6.5) where the normal pipeline operations
were halted for two clock cycle. That was happened due to the data
hazards only.
a.
c) Control hazards :
iy
Due to the delay in availability of an instruction the pipeline may stalled.
For example, a cache miss causes the instruction to be fetched from
un
the main memory. Such hazards are often called control hazards or
instruction hazards.
sD
will be fetched from the main memory. The instruction fetch unit must
now suspend any further fetch requests and wait for I2 to arrive. At the
to
end of the clock cycle 5, the instruction I2 is received and loaded into
the buffer B1. The pipeline resumes its normal operation from the clock
Tu
cycle 5.
Clock cycle 1 2 3 4 5 6 7 8 9
Instruction
I1 F1 D1 E1 W1
I2 F2 D2 E2 W2
I3 F3 D3 E3 W3
m
6.8 MERITS AND DEMERITS OF PIPELINING
Pipelining does not help in all cases. There are several possible
co
disadvantages. An instruction pipeline is said to be fully pipelined if it
can accept a new instruction at every clock cycle. A pipeline that is not
a.
fully pipelined has wait cycles that delay the progress of the pipeline.
Advantages of Pipelining:
iy
1. The cycle time of the processor is reduced, thus increasing
un
instruction issue-rate in most cases.
Disadvantages of Pipelining:
LET US KNOW
m
We have come to know that pipeline operation will be effective
if it completes the every stage in each clock cycle. So, each clock
cycles should sufficently long to complete the task of each stage.
co
Thus, the performance of a pipeline will increase if the task
performed in each stage require the same amount of time.
a.
Now, consider a fetch operation where the instructions are
iy
fetched from the main memory and for that fetched operation one
clock cycle may not be sufficient. Because the access time of the
un
main memory may be as much as ten times greater than the time
needed to perform basic pipeline stage operations inside the
processor. So, if each instruction fetch requires access to the main
sD
cache, access time to the cache is usually the same as the time
needed to perform other basic operations inside the processor.
to
Thus, the use of the cache memory eliminates the above difficulty
and accelerates the pipeline operations.
Tu
m
d) Data hazard arises when two instruction refers to the same
memory location.
e) Branch instruction causes the control hazard.
co
f) A cache miss causes structural hazard
g) Pipelining increases the CPU instruction throughput.
a.
6.9 LET US SUM UP
iy
1. In computers, a pipeline is the continuous and somewhat
overlapped movement of instruction to the processor or in the
un
arithmetic steps taken by the processor to perform an instruction.
m
1. M. Morris Mano, Computer System Architecture,
co
PEARSON Prentice Hall
a.
3. William Stallings, Computer Organization and Architecture,
iy
PHI un
6.12 POSSIBLE QUESTIONS
*****