0% found this document useful (0 votes)
22 views71 pages

CSA

csa notes for computer science students

Uploaded by

sachin_harne
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
22 views71 pages

CSA

csa notes for computer science students

Uploaded by

sachin_harne
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 71

Q. What is computer? List computer elements.

Ans. A general purpose computer is like an island that helps span the gap between the
desired behavior (application) and the basic building blocks (electronic devices).

A great majority of the computers of our daily use are known as general purpose
machines. These are machines that are built with no specific application in mind, but
rather are capable of performing computation needed by a diversity of applications. These
machines are to be distinguished
from those built to serve (tailored to) specific applications. The latter are known as special
purpose machines.

Architecture & Organization :

• Architecture is those attributes visible to the programmer

— Instruction set, number of bits used for data representation, I/O


mechanisms, addressing techniques.

— e.g. Is there a multiply instruction?

• Organization is how features are implemented

— Control signals, interfaces, memory technology.

— e.g. Is there a hardware multiply unit or is it done by repeated addition?

• All Intel x86 family share the same basic architecture

• The IBM System/370 family share the same basic architecture

• This gives code compatibility

— At least backwards

• Organization differs between different versions

Structure & Function:

• Structure is the way in which components relate to each other

• Function is the operation of individual components as part of the structure

Function:

• All computer functions are:

1
— Data processing

— Data storage

— Data movement

— Control

Functional View:

Data movement:

2
Storage:

Processing from /to storage:

3
Processing from storage to I/O:

Q. Explain CPU organisation. What are the basic components of computer?

Ans. A typical CPU has three major components: (1) register set, (2) arithmetic logic unit
(ALU), and (3) control unit (CU). The register set differs from one computer architecture to
another. It is usually a combination of general-purpose and special purpose registers.
General-purpose registers are used for any purpose, hence the name general purpose.
Special-purpose registers have specific functions within the CPU. For example, the
program counter (PC) is a special-purpose register that is used to hold the address of the
instruction to be executed next. Another example of special-purpose registers is the
instruction register (IR), which is used to hold the instruction that is currently executed.
The ALU provides the circuitry needed to perform the arithmetic, logical and shift
operations demanded of the instruction set. In Chapter 4, we have covered a number of
arithmetic operations and circuits used to support computation in an ALU. The control
unit is the entity responsible for fetching the instruction to be executed from the main
memory and decoding and then executing it. Figure 5.1 shows the main components of
the CPU and its interactions with the memory system and the input/ output devices.

4
Que. Draw and explain Von Neumann Machine.

Ans.

5
Q.3. Explain Stack Organisation ?

Ans.

 Stack: A storage device that stores information in such a manner that the item
stored last is the first item retrieved.

Also called last-in first-out (LIFO) list. Useful for compound arithmetic operations and
nested subroutine calls.

 Stack pointer (SP): A register that holds the address of the top item in the stack.
SP always points at the top item in the stack

 • Push: Operation to insert an item into the stack.


• Pop: Operation to retrieve an item from the stack.

Register Stack :

A stack can be organized as a collection of a finite number of registers.

• In a 64-word stack, the stack pointer contains 6 bits.


• The one-bit register FULL is set to 1 when the stack is full;
EMPTY register is 1 when the stack is empty.
• The data register DR holds the data to be written into or read from the stack.

Initialization
SP ¬ 0, EMPTY ¬ 1, FULL ¬ 0
Push

SP ¬ SP + 1
M[SP] ¬ DR
If (SP = 0) then (FULL ¬ 1) Note that SP becomes 0 after 63
EMPTY ¬ 0

Pop :

DR ¬ M[SP]
SP ¬ SP - 1
If (SP = 0) then (EMPTY ¬ 1)
FULL ¬ 0

6
Q.3. Explain Instruction sets?

Ans. Instruction Set :

• The complete collection of instructions that are understood by a CPU

• Machine Code

• Binary

• Usually represented by assembly codes

Elements of Instruction sets:

• Operation code (Op code)

— Do this

• Source Operand reference

— To this

• Result Operand reference

— Put the answer here

• Next Instruction Reference

— When you have done that, do this...

Instruction cycle diagram:

Instruction representation:

7
• In machine code each instruction has a unique bit pattern

• For human consumption (well, programmers anyway) a symbolic representation is


used

— e.g. ADD, SUB, LOAD

• Operands can also be represented in this way

— ADD A,B

Simple instruction format:

Instruction types:

 Data processing
 Data storage (main memory)
 Data movement (I/O)
 Program flow control

Number of address:
• 3 addresses
— Operand 1, Operand 2, Result
— a = b + c;
— May be a forth - next instruction (usually implicit)
— Not common
— Needs very long words to hold everything
• 2 addresses
— One address doubles as operand and result
— a=a+b
— Reduces length of instruction
— Requires some extra work
• Temporary storage to hold some results
• 1 address
— Implicit second address
8
— Usually a register (accumulator)
— Common on early machines
• 0 (zero) addresses
— All addresses implicit
— Uses a stack
— e.g. push a
— push b
— add
— pop c
— c=a+b

• More addresses
— More complex (powerful?) instructions
— More registers
– Inter-register operations are quicker
— Fewer instructions per program
• Fewer addresses
— Less complex (powerful?) instructions
— More instructions per program
— Faster fetch/execution of instructions

Q.1. Explain floating point representation? Explain algorithm for division?

Ans. Real Numbers

• Numbers with fractions

• Could be done in pure binary

— 1001.1010 = 24 + 20 +2-1 + 2-3 =9.625

• Where is the binary point?

• Fixed?

— Very limited

• Moving?

— How do you show where it is?

Floating point number:

9
• +/- .significand x 2exponent

• Misnomer

• Point is actually fixed between sign bit and body of mantissa

• Exponent indicates place value (point position)

Floating point example:

Signs for Floating Point:

• Mantissa is stored in 2s compliment

• Exponent is in excess or biased notation

— e.g. Excess (bias) 128 means

— 8 bit exponent field

— Pure value range 0-255

— Subtract 128 to get correct value

— Range -128 to +127

Normalization:

• FP numbers are usually normalized


10
• i.e. exponent is adjusted so that leading bit (MSB) of mantissa is 1

• Since it is always 1 there is no need to store it

• (c.f. Scientific notation where numbers are normalized to give a single digit before
the decimal point e.g. 3.123 x 103)

Normalization:

• FP numbers are usually normalized

• i.e. exponent is adjusted so that leading bit (MSB) of mantissa is 1

• Since it is always 1 there is no need to store it

• (c.f. Scientific notation where numbers are normalized to give a single digit before
the decimal point e.g. 3.123 x 103)

Expressible Numbers:

Density of Floating Point Numbers:

Division:

• More complex than multiplication

• Negative numbers are really bad!

• Based on long division

11
Division of Unsigned Binary Integers:

Flowchart for Unsigned Binary

Division :

12
Floating Point Division :

13
Q.1. Derive an algorithm in flowchart form for multiplication of two fixed point binary
number.

Ans. Multiplication

• Complex

• Work out partial product for each digit

• Take care with place value (column)

• Add partial products

Multiplication Example

• 1011 Multiplicand (11 dec)

• x 1101 Multiplier (13 dec)

• 1011 Partial products

• 0000 Note: if multiplier bit is 1 copy

• 1011 multiplicand (place value)

• 1011 otherwise zero

• 10001111 Product (143 dec)

• Note: need double length result

Unsigned Binary Multiplication

14
Execution of Example

Flowchart for fixed point binary Multiplication

15
16
Q.2. Explain the fixed point arithmetic operation along with example.

Ans. Addition and Subtraction

• Normal binary addition

• Monitor sign bit for overflow

• Take twos compliment of subtrahend and add to minuend

— i.e. a - b = a + (-b)

— So we only need addition and complement circuits

Hardware for Addition and Subtraction

17
Q.3. What are the different data representation format? Explain with example. Convert
the following data into: 2314,101,1123

Ans. Integer Representation

• Only have 0 & 1 to represent everything

• Positive numbers stored in binary

— e.g. 41=00101001

• No minus sign

• No period

• Sign-Magnitude

• Two’s compliment

Sign-Magnitude

• Left most bit is sign bit

• 0 means positive

• 1 means negative

• +18 = 00010010

• -18 = 10010010

• Problems

— Need to consider both sign and magnitude in arithmetic

— Two representations of zero (+0 and -0)

Two’s Compliment

• +3 = 00000011

• +2 = 00000010

• +1 = 00000001

• +0 = 00000000

• -1 = 11111111

18
• -2 = 11111110

• -3 = 11111101

Benefits

• One representation of zero

• Arithmetic works easily (see later)

• Negating is fairly easy

— 3 = 00000011

— Boolean complement gives 11111100

— Add 1 to LSB 11111101

Geometric Depiction of Twos Complement Integers

Negation Special Case 1

• 0= 00000000

• Bitwise not 11111111

• Add 1 to LSB +1

• Result 1 00000000

19
• Overflow is ignored

Negation Special Case 2

• -128 = 10000000

• bitwise not 01111111

• Add 1 to LSB +1

• Result 10000000

• So:

• -(-128) = -128 X

• Monitor MSB (sign bit)

• It should change during negation

Range of Numbers

• 8 bit 2s compliment

— +127 = 01111111 = 27 -1

— -128 = 10000000 = -27

• 16 bit 2s compliment

— +32767 = 011111111 11111111 = 215 - 1

— -32768 = 100000000 00000000 = -215

Conversion Between Lengths

• Positive number pack with leading zeros

• +18 = 00010010

• +18 = 00000000 00010010

• Negative numbers pack with leading ones

• -18 = 10010010

• -18 = 11111111 10010010

• i.e. pack with MSB (sign bit)

• Binary – 100100001010(2314)

1100101(101)
20
10001101101(1123)
Hexadecimal - 90A(2314)

- 65(101)

- 46D(1123)

Figure 5.1 Central processing unit main components and interactions with the memory
and I/O:

simple execution cycle can be summarized as follows:


1. The next instruction to be executed, whose address is obtained from the PC, is
fetched from the memory and stored in the IR.
2. The instruction is decoded.
3. Operands are fetched from the memory and stored in CPU registers, if needed.
4. The instruction is executed.
5. Results are transferred from CPU registers to the memory, if needed.

REGISTER SET : Registers are essentially extremely fast memory locations within the CPU
that are used to create and store the results of CPU operations and other calculations.
Different computers have different register sets. They differ in the number of registers,
register types, and the length of each register. They also differ in the usage of each
register. General-purpose registers can be used for multiple purposes and assigned to a
variety of functions by the programmer. Special-purpose registers are restricted to only
specific functions. In some cases, some registers are used only to hold data and cannot be
21
used in the calculations of operand addresses. The length of a data register must be long
enough to hold values of most data types. Some machines allow two contiguous registers
to hold double-length values. Address registers may be dedicated to a particular
addressing mode or may be used as address general purpose. Address registers must be
long enough to hold the largest address. The number of registers in a particular
architecture affects the instruction set design. A very small number of registers may result
in an increase in memory references. Another type of registers is used to hold processor
status bits, or flags. These bits are set by the CPU as the result of the execution of an
operation. The status bits can be tested at a later time as part of another operation.

Types of Registers:

1) Memory Access Registers


2) Instruction Fetching Registers
3) Condition Registers
4) Special-Purpose Address Registers
5) MIPS Registers

DATAPATH
The CPU can be divided into a data section and a control section. The data section, which
is also called the datapath, contains the registers and the ALU. The datapath is capable of
performing certain operations on data items. The control section is basically the control
unit, which issues control signals to the datapath. Internal to the CPU, data move from
one register to another and between ALU and registers. Internal data movements are
performed via local buses, which may carry data, instructions, and addresses. Externally,
data move from registers to memory and I/O devices, often by means of a system bus.
Internal data movement among registers and between the ALU and registers may be
carried out using different organizations including one-bus, two-bus, or three-bus
organizations. Dedicated datapaths may also be used between components that transfer
data between themselves more frequently. For example, the contents of the PC are
transferred to the MAR to fetch a new instruction at the beginning of each instruction
cycle. Hence, a dedicated datapath from the PC to the MAR could be useful in speeding up
this part of instruction execution.

CONTROL UNIT

The control unit is the main component that directs the system operations by sending
control signals to the datapath. These signals control the flow of data within the CPU and
between the CPU and external units such as memory and I/O. Control buses generally
carry signals between the control unit and other computer components in a clock-driven
manner. The system clock produces a continuous sequence of pulses in a specified
duration and frequency.

22
Q.2. What is the difference between combinational ALU and sequential ALU?
Ans. Main difference between combinational ALU and sequential ALU is that
the combinational ALU is made up of combinational circuits and the sequential ALU is
made up of sequential circuits.

Combinational ALU Diagrams:

1-bit ALU:

32-bit ALU:

23
Sequential ALU Diagrams:

S-R (Set-Reset) Latch:

R S Q Q'
0 0 previous value previous value
0110
1001
1 1 unstable unstable

D latch
● Stores the value of the input signal when the clock signal is “high”
Output is stored on the falling clock edge.

D-latch Timing diagram

D-latches are in Master-slave configuration.


Timing diagram Q changes as clock signal moves from high to low

Sequential circuits vs. combinational

Comparison to combinational

24
Combinational circuits implement Boolean functions

Input ---> output, no memory


Only data inputs and control inputs determine the output
Example: Coke (um, Pepsi) machine
Assume: price 75 cents, machine only accepts quarters
Action:
deposit quarter ---> no output
deposit quarter ---> no output
deposit quarter ---> drink delivered
Notice that the output was not always the same for the same input
This is not a Boolean function (combinational circuit)
Memory was used to determine output along with input
Example: linked list object
call list.size() method
What is input? empty
Does it always return same value?
No, its output is the current length of the linked list

Sequential circuit
Inputs: values x, labelled with subscripts
Outputs: values z, labelled with subscripts
Uses clock, unlike combinational circuit
State
made up of devices called flip-flops
k flip-flops store a k-bit number representing the current state
values denoted by q with subscripts
Output z computed from
inputs x
state q
Needed:
store current state
update to new state
Circuit elements
Combinational logic
Clock
State storage

25
Q.3. Explain Multiplier control unit?
Ans.

This is an ARITHMETIC/logic unit (Fig. 4.21)


 ALU is a combinational circuit
 outputs depend only on inputs
 operations performed
 AND
 OR
 ADD
 SUB
 SLT
 Zero (a == b)

What about multiplication?


Result depends only on inputs
1 1 1 0 0 Carry
1 1 1 Multiplicand
1 0 1 Multiplier
1 1 1 Partial products
00 0
1 1 1
1 0 0 01 1

26
Multiplier:

Multiplier Circuit:

27
Q.4. Explain CPU control unit?

Ans. Function of Control Unit

• For each operation a unique code is provided

— e.g. ADD, MOVE

• A hardware segment accepts the code and issues the control signals

• The Control Unit and the Arithmetic and Logic Unit constitute the Central
Processing Unit

• Data and instructions need to get into the system and results out

— Input/output

• Temporary storage of code and results is needed

— Main memory

28
The control unit is the main component that directs the system operations by sending
control signals to the data path. These signals control the flow of data within the CPU and
between the CPU and external units such as memory and I/O. Control buses generally
carry signals between the control unit and other computer components in a clock-driven
manner. The system clock produces a continuous sequence of pulses in a specified
duration and frequency.

Figure 5.7

A sequence of steps t0 , t1 , t2 , . . . , (t0 , t1 , t2 , . . .) are used to execute a certain


instruction. The op-code field of a fetched instruction is decoded to provide the control
signal generator with information about the instruction to be executed. Step information
generated by a logic circuit module is used with other inputs to generate control signals.
The signal generator can be specified simply by a set of Boolean equations for its output in
terms of its inputs. Figure 5.7 shows a block diagram that describes how timing is used in
generating control signals.

29
Que. Explain the difference between hardwired control and micro programmed control?
Ans.There are mainly two different types of control units: microprogrammed and hardwired.
In microprogrammed control, the control signals associated with operations are stored in
special memory units inaccessible by the programmer as control words. A control word is a
microinstruction that specifies one or more microoperations. A sequence of microinstructions
is called a microprogram, which is stored in a ROM or RAM called a control memory CM.
In hardwired control, fixed logic circuits that correspond directly to the Boolean expressions
are used to generate the control signals. Clearly hardwired control is faster than
microprogrammed control. However, hardwired control could be very expensive and
complicated for complex systems. Hardwired control is more economical for small control
units. It should also be noted that microprogrammed control could adapt easily to changes in
the system design. We can easily add new instructions without changing hardware. Hardwired
control will require a redesign of the entire systems in the case of any change.

Que. Explain primary memory and cache memory.

Ans. Computer is having two kind of memories.


1) Primary memory
2) Secondary memory
Primary memory is essential memory of computer. Without it computer can not work. CPU
registers, Cache memory, Main memory comes under primary memory.

CACHE MEMORY

The idea behind using a cache as the first level of the memory hierarchy is to keep the
information expected to be used more frequently by the CPU in the cache (a small high-speed
memory that is near the CPU). The end result is that at any given time some active portion of
the main memory is duplicated in the cache. Therefore, when the processor makes a request
for a memory reference, the request is first sought in the cache. If the request corresponds to
an element that is currently residing in the cache, we call that a cache hit. On the other hand,
if the request corresponds to an element that is not currently in the cache, we call that a
cache miss. A cache hit ratio, hc, is defined as the probability of finding the requested element
in the cache. A cache miss ratio (1 _ hc) is defined as the probability of not finding the
requested element in the cache.
In the case that the requested element is not found in the cache, then it has to be brought
from a subsequent memory level in the memory hierarchy. Assuming that the element exists
in the next memory level, that is, the main memory, then it has to be brought and placed in
the cache. In expectation that the next requested element will be residing in the neighboring
locality of the current requested element (spatial locality), then upon a cache miss what is
actually brought to the main memory is a block of elements that contains the requested
element. The advantage of transferring a block from the main memory to the cache will be
most visible if it could be possible to transfer such a block using one main memory access
time. Such a possibility could be achieved by increasing the rate at which information can be
transferred between the main memory and the cache.

30
Q.2.Explain the memory hierarchy. Compare memory devices in terms of:

1) Bandwidth

2) Size

3) Processing speed

Ans. A typical memory hierarchy starts with a small, expensive, and


relatively fast unit, called the cache, followed by a larger, less expensive, and relatively
slow main memory unit. Cache and main memory are built using solid-state
semiconductor material (typically CMOS transistors). It is customary to call the fast
memory level the primary memory. The solid-state memory is followed by larger, less
expensive, and far slower magnetic memories that consist typically of the (hard) disk and
the tape. It is customary to call the disk the secondary memory, while the tape is
conventionally called the tertiary memory. The objective behind designing a memory
hierarchy is to have a memory system that performs as if it consists entirely of the fastest
unit and whose cost is dominated by the cost of the slowest unit. Thememory hierarchy
can be characterized by a number of parameters. Among these parameters are the access
type, capacity, cycle time, latency, bandwidth, and cost. The term access refers to the
action that physically takes place during a read or write operation. The capacity of a
memory level is usually measured in bytes. The cycle time is defined as the time elapsed
from the start of a read operation to the start of a subsequent
read. The latency is defined as the time interval between the request for information and
the access to the first bit of that information. The bandwidth provides a measure of the
number of bits per second that can be accessed. The cost of a memory level is usually
specified as dollars per megabytes. Figure 6.1 depicts a typical memory hierarchy. The
term random access refers to the fact that any access to any memory location takes the
same fixed amount of time regardless of the actual memory location and/or the sequence
of accesses that takes place. For example, if a write operation to
memory location 100 takes 15 ns and if this operation is followed by a read operation to
memory location 3000, then the latter operation will also take 15 ns. This is to be
compared to sequential access in which if access to location 100 takes 500 ns, and if a
consecutive access to location 101 takes 505 ns, then it is expected that an access to
location 300 may take 1500 ns. This is because the memory has to cycle through locations
100 to 300, with each location requiring 5 ns. The effectiveness of a memory hierarchy
depends on the principle of moving information into the fast memory infrequently and
accessing it many times before replacing it with new information. This principle is possible
due to a phenomenon called locality of reference; that is, within a given period of time,
programs tend to reference a relatively confined area of memory repeatedly. There exist
two forms of locality: spatial and temporal locality. Spatial locality refers to the

31
Comparison in Memory Hierarchy Parameters
Access type Capacity Latency Bandwidth Cost/MB
CPU registers Random 64–1024B 1–10 ns System clock rate High
Cache memory Random 8–512 KB 15–20 ns 10–20 MB/s $500
Main memory Random 16–512 MB 30–50 ns 1–2 MB/s $20–50
Disk memory Direct 1–20 GB 10–30 ms 1–2 MB/s $0.25
Tape memory Sequential 1–20 TB 30–10,000 ms 1–2 MB/s $0.025

Q.3. An address space is represented by 64 bits and memory space is 32 bits:

(i) What is the total size of physical address?

(ii) How many memory blocks will be represented by 32 bits?

(iii) What is the total size of memory represented by 64 address bits? Each memory block
size is of 32 bits.

Ans. (i) Total size of physical address is

64 * 32 = 𝟐𝟔 * 𝟐𝟓 = 𝟐𝟏𝟏 = 2048KB

(ii) No. of memory blocks = 𝟐𝟔 /𝟐𝟓 = 2

(iii) Total memory represented by 64 address bits is 𝟐𝟔 Bytes.

32
Q.4. Explain Pipeline control?

Ans. Pipeline

• Fetch instruction

• Decode instruction

• Calculate operands (i.e. EAs)

• Fetch operands

• Execute instructions

• Write result

• Overlap these operations

Two Stage Instruction Pipeline:

Timing of Pipeline:

33
Branch in pipeline:

34
Six stage Instruction Pipeline:

35
Dealing with branches:

• Multiple Streams

• Pre fetch Branch Target

• Loop buffer

• Branch prediction

• Delayed branching

Multiple streams:

• Have two pipelines

• Pre fetch each branch into a separate pipeline

• Use appropriate pipeline

• Leads to bus & register contention

• Multiple branches lead to further pipelines being needed

Pre fetch Branch Target:

• Target of branch is pre fetched in addition to instructions following branch

• Keep target until branch is executed

• Used by IBM 360/91

Loop buffer :

• Very fast memory

• Maintained by fetch stage of pipeline

• Check buffer before fetching from memory

• Very good for small loops or jumps

• c.f. cache

• Used by CRAY-1

36
Q.2. Explain interrupts and interrupts processing?

Ans. Interrupts:
• Mechanism by which other modules (e.g. I/O) may interrupt normal sequence of
processing
• Program
— e.g. overflow, division by zero
• Timer
— Generated by internal processor timer
— Used in pre-emptive multi-tasking
• I/O
— from I/O controller
• Hardware failure
— e.g. memory parity error

Program flow control:

Interrupt cycles:
• Added to instruction cycle
• Processor checks for interrupt
— Indicated by an interrupt signal
37
• If no interrupt, fetch next instruction
• If interrupt pending:
— Suspend execution of current program
— Save context
— Set PC to start address of interrupt handler routine
— Process interrupt
— Restore context and continue interrupted program
Transfer of Control via Interrupts:

Instruction Cycle with Interrupts:

Interrupt Handling :

38
After the execution of an instruction, a test is performed to check for pending interrupts.

If there is an interrupt request waiting, the following steps take place:


1. The contents of PC are loaded into MDR (to be saved).
2. The MAR is loaded with the address at which the PC contents are to be saved.
3. The PC is loaded with the address of the first instruction of the interrupt handling
routine.
4. The contents of MDR (old value of the PC) are stored in memory.
The following table shows the sequence of events, where t1 , t2 , t3 .

Step Micro-operation :

Q.1. Explain processor level parallelism?

Ans. A multiple processor system consists of two or more processors that are connected in
a manner that allows them to share the simultaneous (parallel) execution of a given
computational task. Parallel processing has been advocated as a promising approach for
building high-performance computer systems. Two basic requirements are inevitable for
the efficient use of the employed processors. These requirementsare (1) low
communication overhead among processors while executing a given task and (2) a degree
of inherent parallelism in the task.
A number of communication styles exist for multiple processor networks. These can be
broadly classified according to (1) the communication model (CM) or (2) the physical
connection (PC). According to the CM, networks can be further classified as (1) multiple
processors (single address space or shared memory computation) or (2) multiple
computers (multiple address space or message passing computation). According to PC,
networks can be further classified as (1) bus-based or (2) networkbased multiple
processors. The organization and performance of a multiple processor system are greatly
influenced by the interconnection network used to connect them. On the one hand, a
single shared bus can be used as the interconnection network for multiple processors. On
the other hand, a crossbar switch can be used as the interconnection network. While the
first technique represents a simple easy-to-expand topology, it is, however, limited in
performance since it does not allow more than one processor/memory transfer at any
given time. The crossbar provides full processor/memory distinct connections but it is

39
expensive. Multistage interconnection networks (MINs) strike a balance between the
limitation of the single, shared bus system and the expense of a
crossbar-based system. In a MIN more than one processor/memory connection can be
established at the same time. The cost of a MIN can be considerably less than that of a
crossbar, particularly for a large number of processors and/or memories. The use of
multiple buses to connect multiple processors to multiple memory modules has also been
suggested as a compromise between the limited single bus and the expensive crossbar.

Flynn’s Classification :

• Single instruction, single data stream – SISD

Single processor

Single instruction stream

Data stored in single memory

Uni-processor

• Single instruction, multiple data stream – SIMD

Single machine instruction

Controls simultaneous execution

Number of processing elements

Lockstep basis

Each processing element has associated data memory

Each instruction executed on different set of data by different processors

Vector and array processors


40
• Multiple instruction, single data stream – MISD

Sequence of data

Transmitted to set of processors

Each processor executes different instruction sequence

Never been implemented

• Multiple instruction, multiple data stream- MIMD

Set of processors

Simultaneously execute different instruction sequences

Different sets of data

SMPs, clusters and NUMA systems

Taxonomy of Parallel Processor Architectures

41
Q.2. Explain DMA(Direct Memory Access)?

Ans.The main idea of direct memory access (DMA) is to enable peripheral devices to cut
out the “middle man” role of the CPU in data transfer. It allows peripheral devices to
transfer data directly from and to memory without the intervention of the CPU. Having
peripheral devices access memory directly would allow the CPU to do other work, which
would lead to improved performance, especially in the cases of large transfers. The DMA
controller is a piece of hardware that controls one or more peripheral devices. It allows
devices to transfer data to or from the system’s memory without the help of the
processor. In a typical DMA transfer, some event notifies the DMA controller that data
needs to be transferred to or from memory. Both the DMA and CPU use memory bus and
only one or the other can use the memory at the same time. The DMA controller then
sends a request to the CPU asking its permission to use the bus. The CPU returns an
acknowledgment to the DMA controller granting it bus access. The DMA can now take
control of the bus to independently conduct memory transfer. When the transfer is
complete the DMA relinquishes its control of the bus to the CPU. Processors that support
DMA provide one or more input signals that the bus requester can assert to gain control
of the bus and one or more output signals that the CPU asserts to indicate it has
relinquished the bus. Figure shows how the DMA controller shares the CPU’s memory
bus.

Direct memory access controllers require initialization by the CPU. Typical setup
parameters include the address of the source area, the address of the destination area,
the length of the block, and whether the DMA controller should generate a processor
interrupt once the block transfer is complete. A DMA controller has an address register, a
word count register, and a control register. The address register contains an address that
specifies the memory location of the data to be transferred. It is typically possible to have
42
the DMA controller automatically increment the address register after each word transfer,
so that the next transfer will be from the next memory location. The word count register
holds the number of words to be transferred. The word count is decremented by one after
each word transfer. The control register
specifies the transfer mode. Direct memory access data transfer can be performed in burst
mode or single cycle mode. In burst mode, the DMA controller keeps control of the bus
until all the data has been transferred to (from) memory from (to) the peripheral device.
This mode of transfer is needed for fast devices where data transfer cannot be stopped
until the entire transfer is done. In single-cycle mode (cycle stealing), the DMA controller
relinquishes the bus after each transfer of one data word. This minimizes the amount of
time that the DMA controller keeps the CPU from controlling the bus, but it requires that
the bus request/acknowledge sequence be performed for every single transfer. This
overhead can result in a degradation of the performance. The single-cycle mode is
preferred if the system cannot tolerate more than a few cycles of added interrupt latency
or if the peripheral devices can buffer very large amounts of data, causing the DMA
controller to tie up the bus for an excessive amount of time.

The following steps summarize the DMA operations:


1. DMA controller initiates data transfer.
2. Data is moved (increasing the address in memory, and reducing the count of words to
be moved).
3. When word count reaches zero, the DMA informs the CPU of the termination by means
of an interrupt.
4. The CPU regains access to the memory bus.

A DMA controller may have multiple channels. Each channel has associated with it an
address register and a count register. To initiate a data transfer the device driver sets up
the DMA channel’s address and count registers together with the direction of the data
transfer, read or write. While the transfer is taking place, the CPU is free to do other
things. When the transfer is complete, the CPU is interrupted. Direct memory access
channels cannot be shared between device drivers. A device driver must be able to
determine which DMA channel to use. Some devices have a fixed DMA channel, while
others are more flexible, where the device driver can simply pick a free DMA channel to
use.

Q.3. Explain I/O processor?

Ans. The way the processor and the I/O devices exchange data. It has been indicated in
the introduction part that there exists a big difference in the rate at which a processor can
process information and those of input and output devices. One simple way to
accommodate this speed difference is to have the input device, for example, a keyboard,
deposit the character struck by the user in a register (input register), which indicates the
availability of that character to the processor. When the input character has been taken
by the processor, this will be indicated to the input device in order to proceed and input

43
the next character, and so on. Similarly, when the processor has a character to output
(display), it deposits it in a specific register dedicated for communication with the graphic
display (output register). When the character has been taken by the graphic display, this
will be indicated to the processor such that it can proceed and output the next character,
and so on.

This simple way of communication between the processor and I/O devices, called I/O
protocol, requires the availability of the input and output registers. In a typical computer
system, there is a number of input registers, each belonging to a specific input device.
There is also a number of output registers, each belonging to a specific output device. In
addition, a mechanism according to which the processor can address those input and
output registers must be adopted. More than one arrangement exists to satisfy the
abovementioned requirements. Among these, two particular methods are explained
below. In the first arrangement, I/O devices are assigned particular addresses, isolated
from the address space assigned to the memory. The execution of an input instruction at
an input device address will cause the character stored in the input register of that device
to be transferred to a specific register in the CPU. Similarly, the execution of an output
instruction at an output device address will cause the character stored in a specific
register in the CPU to be transferred to the output register
of that output device. This arrangement, called shared I/O, is shown schematically in
Figure 8.2. In this case, the address and data lines from the CPU can be shared between
the memory and the I/O devices. A separate control line will have to be used. This is
because of the need for executing input and output instructions. In a typical computer
system, there exists more than one input and more than one output device. Therefore,
there is a need to have address decoder circuitry for device identification. There is also a
need for status registers for each input and output device. The status of an input device,
whether it is ready to send data to the processor, should be stored in the status register of
that device. Similarly, the status of an output device, whether it is ready to receive data
from the processor, should be stored in the status register of that device. Input (output)
44
registers, status registers, and address decoder circuitry represent the main components
of
an I/O interface (module).

The main advantage of the shared I/O arrangement is the separation between the
memory address space and that of the I/O devices. Its main disadvantage is the need to
have special input and output instructions in the processor instruction set. The shared I/O
arrangement is mostly adopted by Intel. The second possible I/O arrangement is to deal
with input and output registers as if they are regular memory locations. In this case, a
read operation from the address corresponding to the input register of an input device,
for example, Read Device 6, is equivalent to performing an input operation from the input
register in Device #6. Similarly, a write operation to the address corresponding to the
output register of an output device, for example, Write Device 9, is equivalent to
performing an output operation into the output register in Device #9. This arrangement is
called memory-mapped I/O. The main advantage of the memory-mapped I/O is the use of
the read and write instructions of the processor to perform the input and output
operations, respectively. It eliminates the need for introducing special I/O instructions.
The main disadvantage of the memory-mapped I/O is the need to reserve a certain part of
the memory address space for addressing I/O devices, that is, a reduction in the available
memory address space. The memory-mapped I/O has been mostly adopted by Motorola.

45
Q.4. Explain Programmed I/O?

Ans.

The main hardware components required for communications between the processor and
I/O devices. The way according to which such communications take place (protocol) is also
indicated. This protocol has to be programmed in the form of routines that run under the
control of the CPU. Consider, for example, an input operation from Device 6 (could be the
keyboard) in the case of shared I/O arrangement. Let us also assume that there are eight
different I/O devices connected to the processor in this case.

The following protocol steps (program) have to be followed:


1. The processor executes an input instruction from device 6, for example,
INPUT 6. The effect of executing this instruction is to send the device
number to the address decoder circuitry in each input device in order to identify the
specific input device to be involved. In this case, the output of the decoder in Device #6
will be enabled, while the outputs of all other decoders will be disabled.
2. The buffers (in the figure we assumed that there are eight such buffers) holding the
data in the specified input device (Device #6) will be enabled by the output of the address
decoder circuitry.
46
3. The data output of the enabled buffers will be available on the data bus.
4. The instruction decoding will gate the data available on the data bus into the input of a
particular register in the CPU, normally the accumulator.

Output operations can be performed in a way similar to the input operation


explained above. The only difference will be the direction of data transfer, which will be
from a specific CPU register to the output register in the specified output device. I/O
operations performed in this manner are called programmed I/O. They are performed
under the CPU control. A complete instruction fetch, decode, and execute cycle will have
to be executed for every input and every output operation. Programmed I/O is useful in
cases whereby one character at a time is to be transferred, for example, keyboard and
character mode printers. Although simple, programmed I/O is slow.

Que. Explain Addressing modes ?

Ans.

. Immediate

• Direct

• Indirect

• Register

• Register Indirect

• Displacement (Indexed)

• Stack

Immediate :

• Operand is part of instruction

• Operand = address field

• e.g. ADD 5

– Add 5 to contents of accumulator

– 5 is operand

• No memory reference to fetch data


47
• Fast

• Limited range

Direct :

• Address field contains address of operand

• Effective address (EA) = address field (A)

• e.g. ADD A

– Add contents of cell A to accumulator

– Look in memory at address A for operand

• Single memory reference to access data

• No additional calculations to work out effective address

• Limited address space

Direct Addressing Diagram


Instruction
Opcode Address A
Memory

Operand

48
Indirect :

Indirect Addressing (1)


• Memory cell pointed to by address field
contains the address of (pointer to) the
operand
• EA = (A)
– Look in A, find address (A) and look there for
operand
• e.g. ADD (A)
– Add contents of cell pointed to by contents of
A to accumulator

Indirect Addressing (2)


• Large address space
• 2n where n = word length
• May be nested, multilevel, cascaded
– e.g. EA = (((A)))
• Draw the diagram yourself
• Multiple memory accesses to find operand
• Hence slower

Indirect Addressing Diagram


Instruction
Opcode Address A
Memory

Pointer to operand

Operand

49
Register :

Register Addressing (1)


• Operand is held in register named in
address filed
• EA = R
• Limited number of registers
• Very small address field needed
– Shorter instructions
– Faster instruction fetch

Register Addressing (2)


• No memory access
• Very fast execution
• Very limited address space
• Multiple registers helps performance
– Requires good assembly programming or
compiler writing
– N.B. C programming
• register int a;
• c.f. Direct addressing

Displacement :

50
Displacement Addressing
• EA = A + (R)
• Address field hold two values
– A = base value
– R = register that holds displacement
– or vice versa

Displacement Addressing Diagram


Instruction
Opcode Register R Address A
Memory

Registers

Pointer to Operand + Operand

51
Relative :

Relative Addressing
• A version of displacement addressing
• R = Program counter, PC
• EA = A + (PC)
• i.e. get operand from A cells from current
location pointed to by PC
• c.f locality of reference & cache usage

Index :

Indexed Addressing
• A = base
• R = displacement
• EA = A + R
• Good for accessing arrays
– EA = A + R
– R++

Stack :

Operand is (implicitly) on top of Stack

Eg:- ADD Pop top two items from stack and add

52
Que. Explain Booth’s Algorithm ?

Booth’s Algorithm

Example of Booth’s Algorithm

53
54
Que. Difference between Horizontal and vertical microprogramming ?

• Each micro-instruction specifies single (or few) micro-operations to be performed

– (vertical micro-programming)

Vertical Micro-programming
• Width is narrow
• n control signals encoded into log2 n bits
• Limited ability to express parallelism
• Considerable encoding of control
information requires external memory
word decoder to identify the exact control
line being manipulated

Vertical Micro-programming
register

Micro-instruction Address
Function Codes

Jump
Condition

• Each micro-instruction specifies many different micro-operations to be performed


in parallel

– (horizontal micro-programming)

• Wide memory word

• High degree of parallel operations possible

• Little encoding of control information


55
Horizontal Micro-programmed diag

Internal CPU Control Signals Micro-instruction Address

System Bus Jump Condition


Control Signals

Microinstruction Sequencing :

56
Microinstruction Execution :

57
Que. Explain Write policies ?

Ans:

• Must not overwrite a cache block unless main memory is up to date

• Multiple CPUs may have individual caches

• I/O may address main memory directly

58
Write through
• All writes go to main memory as well as
cache
• Multiple CPUs can monitor main memory
traffic to keep local (to CPU) cache up to
date
• Lots of traffic
• Slows down writes
• Remember bogus write through caches!

Write back
• Updates initially made in cache only
• Update bit for cache slot is set when
update occurs
• If block is to be replaced, write to main
memory only if update bit is set
• Other caches get out of sync
• I/O must access main memory through
cache
• N.B. 15% of memory references are writes

59
Que. Explain Mapping Function ?

a. Direct Mapping
b. Associative Mapping
c. Set Associative Mapping

Direct Mapping

• Each block of main memory maps to only one cache line

– i.e. if a block is in cache, it must be in one specific place

• Address is in two parts

• Least Significant w bits identify unique word

• Most Significant s bits specify one memory block

• The MSBs are split into a cache line field r and a tag of s-r (most significant)

• Simple

• Inexpensive

• Fixed location for given block

– If a program accesses 2 blocks that map to the same line repeatedly, cache
misses are very high

60
Direct Mapping
Address Structure
Tag s-r Line or Slot r Word w
8 14 2

• 24 bit address
• 2 bit word identifier (4 byte block)
• 22 bit block identifier
– 8 bit tag (=22-14)
– 14 bit slot or line
• No two blocks in the same line have the same Tag field
• Check contents of cache by finding line and checking Tag

Direct Mapping
Cache Line Table
• Cache line Main Memory blocks
held
• 0 0, m, 2m, 3m…2s-m
• 1 1,m+1, 2m+1…2s-m+1

• m-1 m-1, 2m-1,3m-1…2s-1

61
Direct Mapping Cache
Organization

62
Direct Mapping Example

Associative Mapping

• A main memory block can load into any line of cache

• Memory address is interpreted as tag and word

• Tag uniquely identifies block of memory

• Every line’s tag is examined for a match

• Cache searching gets expensive

63
Associative Mapping
Address Structure
Word
Tag 22 bit 2 bit
• 22 bit tag stored with each 32 bit block of data
• Compare tag field with tag entry in cache to check for hit
• Least significant 2 bits of address identify which 16 bit
word is required from 32 bit data block
• e.g.
– Address Tag Data
Cache line
– FFFFFC FFFFFC 24682468
3FFF

64
Set Associative Mapping

Set Associative Mapping


Address Structure
Word
Tag 9 bit Set 13 bit 2 bit

• Use set field to determine cache set to look in


• Compare tag field to see if we have a hit
• e.g.
– Address Tag Data Set
number
– 1FF 7FFC 1FF 12345678 1FFF
– 001 7FFC 001 11223344 1FFF

65
Que. What effect does caching memory have on DMA?

Ans.

• Interrupt driven and programmed I/O require active CPU intervention

– Transfer rate is limited

– CPU is tied up

• Additional Module (hardware) on bus

• DMA controller takes over from CPU for I/O

DMA Operation :

• CPU tells DMA controller:-

– Read/Write

– Device address

– Starting address of memory block for data

– Amount of data to be transferred

• CPU carries on with other work

• DMA controller deals with transfer

• DMA controller sends interrupt when finished

66
DMA TRANSFER CYCLE STEALING :

• DMA controller takes over bus for a cycle

• Transfer of one word of data

• Not an interrupt

– CPU does not switch context

• CPU suspended just before it accesses bus

– i.e. before an operand or data fetch or a data write

• Slows down CPU but not as much as CPU doing transfer

DMA Configurations (1)

DMA I/O I/O Main


CPU
Controller Device Device Memory

• Single Bus, Detached DMA controller


• Each transfer uses bus twice
– I/O to DMA then DMA to memory
• CPU is suspended twice

67
DMA Configurations (2)

DMA DMA Main


CPU
Controller Controller Memory
I/O
I/O I/O Device
Device Device
• Single Bus, Integrated DMA controller
• Controller may support >1 device
• Each transfer uses bus once
– DMA to memory
• CPU is suspended once

DMA Configurations (3)

CPU DMA Main


Controller Memory

I/O I/O I/O I/O


Device Device Device Device
• Separate I/O Bus
• Bus supports all DMA enabled devices
• Each transfer uses bus once
– DMA to memory
• CPU is suspended once

Que. Explain Interrupt Mechanism ? Or

How do you identify the module issuing the interrupt? Or

How do you deal with multiple interrupts?

Ans.

68
• Different line for each module

– PC

– Limits number of devices

• Software poll

69
– CPU asks each module in turn

– Slow

• Daisy Chain or Hardware poll

– Interrupt Acknowledge sent down a chain

– Module responsible places vector on bus

– CPU uses vector to identify handler routine

• Bus Master

– Module must claim the bus before it can raise interrupt

– e.g. PCI & SCSI

Multiple Interrupts :

• Each interrupt line has a priority

• Higher priority lines can interrupt lower priority lines

• If bus mastering only current master can interrupt

Example – PC Bus

• 80x86 has one interrupt line

• 8086 based systems use one 8259A interrupt controller

• 8259A has 8 interrupt lines

70
Sequence of Events
• 8259A accepts interrupts
• 8259A determines priority
• 8259A signals 8086 (raises INTR line)
• CPU Acknowledges
• 8259A puts correct vector on data bus
• CPU processes interrupt

PC Interrupt Layout

8259A 8086

IRQ0
IRQ1
IRQ2
IRQ3 INTR
IRQ4
IRQ5
IRQ6
IRQ7

71

You might also like