0% found this document useful (0 votes)
35 views212 pages

MCA Computer Organization Architecture 01-Merged

Uploaded by

v.k. Moule
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
35 views212 pages

MCA Computer Organization Architecture 01-Merged

Uploaded by

v.k. Moule
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 212

UNIT

01

D
Basic Structure of Computers

E
V
and Instruction Set

R
E
S
Names of Sub-Units
E
Basic Computer Structure, Computer Types, Functional Units, Basic Operational Concepts, Bus
R
Structures, Processor Clock, Basic Performance Equation, Clock Rate, Performance Measurement,
Machine Instructions, Numbers, Arithmetic Operations and Characters, Memory Location and
Addresses, Memory Operations, Instructions and Instruction Sequencing.
T
H

Overview
The unit begins by explaining the basic structure of computer. Further, it discusses the computer
IG

types, functional units, basic operational concepts. This unit explains the bus structures, processor
clock, basic performance equation, and clock rate. Next, the unit discusses the machine instructions.
The unit also discusses the number representation, arithmetic operations and character, memory
R

location and addresses. Towards the end, the unit explains the memory operations, instructions and
instruction sequencing.
Y
P

Learning Objectives
O

In this unit, you will learn to:


C

aa Explain the concept of basic computer structure


aa Describe the types and functional unit of computer
aa Defines the concept of machine instructions
aa Explain the arithmetic operation and memory operations
aa Describe the instructions and instruction sequencing
JGI JAIN
DEEMED-TO-BE UNIVERSITY
Computer Organization and Architecture

Learning Outcomes

At the end of this unit, you would:


aa Examine the concept of basic computer structure
aa Evaluate the knowledge about different types and functional unit of computer
aa Assess the role of bus structures, processor clock, basic performance equation, and clock rate
aa Evaluate the concept of one’s complement and two’s complement

D
aa Analyse the instructions and instruction sequencing

E
Pre-Unit Preparatory Material

V
http://www.ipsgwalior.org/download/computer_system_architecture.pdf

R
aa

E
1.1 Introduction
A computer is an electronic device which is used to perform a variety of operations on the basis of a set

S
of instructions called program. A computer takes input from the user in the form of data or instructions.
E
On receiving the instructions from the user, the computer processes the data and generates some output
and displays it to the user. When the computer processes data, it becomes information. A computer
R
performs a task in the same manner as we do our day-to-day activities.

1.2 Basic Computer Structure


T

In general, computer architecture encompasses three areas of computer design: computer hardware,
H

instruction set architecture, and computer organization. Electronic circuits, displays, magnetic and
optical storage media, and communication capabilities make up computer hardware.
IG

The instruction set, registers, memory management, and exception handling are all examples of
machine interfaces that are accessible to the programmer. CISC (Complex Instruction Set Computer)
and RISC (Reduced Instruction Set Computer) are the two basic techniques (Reduced Instruction Set
R

Computer).
Y

Computer organization refers to the high-level aspects of a design, such as the memory system, bus
structure, and internal CPU design.
P

1.2.1 Computer Types


O

A computer is a high-speed electronic calculating machine that accepts digital input, processes it
according to internally stored instructions (Programs), and outputs the result on a display device. The
C

computer’s internal working is represented in Figure 1:

Fetch the
instruction

Execute the Decode the


instruction instruction

Figure 1: Fetch, Decode and Execute steps in a Computer System

2
UNIT 01: Basic Structure of Computers and Instruction Set JGI JAIN
DEEMED-TO-BE UNIVERSITY

The different types of computers are:


zz Micro computer: A personal computer is one that is designed to satisfy the needs of a single person.
Access to a wide range of computer applications, including word processing, photo editing, e-mail,
and the internet.
zz Laptop computer: All of the components are combined into a single compact unit. It costs more than
a similar desktop computer. It’s also known as a Notebook.
zz Work station: A powerful desktop computer with specialized functions. Typically utilized for tasks
that necessitate a high level of processing speed. A conventional personal computer connected to a

D
LAN can also be used (local area network).
zz Super computer: A computer that is regarded as the world’s fastest. Used to complete activities that

E
would take other computers a long time to complete. Modeling weather systems, for example, or
genomic sequence.

V
zz Main frame: Large, expensive computer capable of processing data for hundreds or thousands of
users at the same time. Large amounts of data must be stored, managed, and processed in a reliable,

R
secure, and centralized manner.

E
zz Hand held: It’s also known as a PDA (Personal Digital Assistant). A pocket computer that operates
on batteries and may be used while holding the device in your hand. Appointment books, address

S
books, calculators, and notepads are all common uses.
zz E
Multi-core: Parallel computing platforms with many cores. In a single chip, there are many Cores or
computer elements. Sony Play Station, Core 2 Duo, i3, i7, and other common examples.
R
zz Note-book computers: These are compact and portable versions of PC.
T

1.2.2 Functional Units


In its most basic form, a computer consists of five functional units: an input unit, an output unit, a
H

memory unit, arithmetic and logic unit, and a control unit. The functional units of a computer system
are depicted in Figure 2:
IG
R

Input alu
Y

memory
P

output control
O

I/O processor
C

Figure 2: Basic Functional Units of a Computer


The description of these functional units is as follows:
zz Input Unit: The computer receives the instruction/information/data with the help of input devices.
The most common input device is the keyboard. When you press a key, one word or number is
converted into binary code and transmits it over a cable to memory or the CPU.

3
JGI JAINDEEMED-TO-BE UNIVERSITY
Computer Organization and Architecture

zz Memory Unit: The memory unit holds the programme instructions (code), data, and computing
results, among other things. The following are the several types of memory units:
 Primary /Main Memory
 Secondary /Auxiliary Memory
Primary memory is a type of semiconductor memory that allows for fast access. The main memory
stores run-time programme instructions and operands. ROM and RAM are the two types of main
memory. BIOS, POST, and I/O Drivers are examples of system applications and firmware procedures
that are required to manage a computer’s hardware. RAM stands for read/write memory or

D
user memory, and it stores programme instructions and data during execution. Primary storage
is necessary, but it is inherently volatile and costly. Additional memory could be provided as an

E
auxiliary memory at a lower cost.
Secondary memory is utilized when huge volumes of data and programmes, especially information

V
that is accessed infrequently, must be kept.

R
zz Arithmetic and Logic Unit (ALU): Adder, comparator, and other logic circuits are used in ALU to
accomplish operations such as addition, multiplication, and number comparison.

E
zz Output Unit: These are the input unit’s polar opposites. Its primary function is to communicate the
processed results to the rest of the world.

S
zz Control Unit: It is the nerve centre that transmits messages to and senses the status of other units.
E
The control unit generates the real timing signals that govern data transfer between the input unit,
processor, memory, and output unit.
R
1.2.3 Basic Operational Concepts
T

A programme containing a list of instructions is stored in the memory to accomplish a certain task.
Individual instructions are transferred from memory to the processor, which then performs the actions.
H

Data that has to be saved is also saved in the memory.


IG

For example, Add LOCA, R0


The operand at memory location LOCA is added to the operand in register R0, and the sum is stored in
register. This instruction necessitates the completion of numerous steps, which are as follows:
R

zz The instruction is first retrieved from memory and entered into the processor.
Y

zz The LOCA operand is retrieved and appended to the contents of R0


Finally, the total is saved in the register R0.
P

zz

An ALU action is combined with a memory access operation in the previous add instruction. For
O

performance considerations, these two sorts of operations are done by different instructions in various
other types of computers.
C

Load LOCA, R1
Add R1, R0

Sending the address of the memory location to be accessed to the memory unit and providing the
relevant control signals initiates transfers between the memory and the processor. After that, the data
is moved to or from memory.

4
UNIT 01: Basic Structure of Computers and Instruction Set JGI JAIN
DEEMED-TO-BE UNIVERSITY

Figure 3 depicts how memory and the CPU can be linked. The processor has a number of registers in
addition to the ALU and control circuitry. These registers are utilized for a variety of reasons.

Main Memory

MAR MDR
Control

D
PC R0

E
R1
ALU
IR .

V
.
.

R
Rn-1
n General Purpose

E
Register

S
Figure 3: Connections Between the Processor and the Memory
E
The instruction register (IR) maintains track those instructions that are presently being performed.
Its output is provided to control circuits that generate timing signals to manage various processing
R
elements throughout a particular instruction’s execution.
The program counter (PC) is a customized register for keeping track of a program’s execution. It holds
T

the memory address of the next to be fetched and executed instruction.


H

The operating steps are as follows:


zz Programs are stored in memory and are often accessed via the I/P unit.
IG

zz When the PC is set to point at the program’s first instruction, the programme will begin to run.
zz The contents of the PC are copied to the MAR, and the memory receives a Read Control Signal.
R

zz The address word is read from memory and loaded into the MDR once the time required to access
the memory has elapsed.
Y

zz Now that the MDR’s contents have been transmitted to the IR, the instruction can be decoded and
executed.
P

zz If the instruction calls for an ALU operation, the required operands must be obtained.
O

zz A memory operand is retrieved by providing its address to MAR and starting a read cycle.
C

zz The operand is moved from the MDR to the ALU once it has been read from memory to the MDR.
zz The ALU can perform the desired action after one or two such repeated cycles.
zz The result of this operation is delivered to MDR if it is to be saved in memory.
zz The address of the result’s storage place is passed to MAR, and a write cycle is started.
zz The contents of PC are incremented so that PC points to the next to be executed instruction.

5
JGI JAIN
DEEMED-TO-BE UNIVERSITY
Computer Organization and Architecture

1.2.4 Bus Structures


A bus is a collection of lines that act as a connecting path for multiple devices (one bit per line). Individual
parts must exchange data, address, and control information across a communication line or path, as
shown in Figure 4:

Input Output Memory Processor

D
E
V
Figure 4: Single Bus Structure

R
Processor to printer as an example. Using the concept of buffer registers to keep the content during the
transmission is a typical method. Only two units can use the bus at any given moment because it can

E
only be used for one transfer at a time. Multiple demands for the usage of a single bus are arbitrated
via bus control lines.

S
Single bus structure is: E
zz Low Cost
R
zz Very flexible for attaching peripheral devices

The performance of a multiple bus arrangement is unquestionably improved, but the cost is greatly
T

increased.
H

1.3 Performance
IG

The most essential indicator of a computer’s performance is how rapidly it can run programmes. The
architecture of a computer’s hardware influences the speed with which it runs programmes. It is vital
to design the compiler for optimal performance. In a coordinated manner, the machine instruction set
and the hardware.
R

The whole time it takes to run the programme is called elapsed time, and it is a measure of the computer
Y

system’s overall performance. The speed of the CPU, disc, and printer all have an impact. The processor
time is the amount of time it takes to execute an instruction.
P

The time taken by the processor is determined by the hardware used in the execution of each machine
O

instructions, much as the time it takes to execute a programme is affected by all of the components of a
computer system. The hardware consists of the CPU and memory, which are normally coupled by a bus.
C

On a single IC chip, the processor and a tiny cache memory can be created. Internally, the speed at
which the basic steps of instruction processing are performed on the chip is extremely rapid, and it
is significantly quicker than the speed at which the instruction and data are acquired from the main
memory. When the movement of instructions and data between the main memory and the processor is
limited, which is achieved by utilising the cache, a programme will run faster.
Consider the following scenario: In a programme loop, a set of instructions are performed repeatedly
over a short period of time. If these instructions are stored in the cache, they can be swiftly retrieved
during periods of frequent use. The same can be said for data that is used frequently.

6
UNIT 01: Basic Structure of Computers and Instruction Set JGI JAIN
DEEMED-TO-BE UNIVERSITY

1.3.1 Processor Clock


Clock is a timing signal that controls processor circuits. Clock cycles are regular time periods created
by the clock designer. To carry out a machine command, the processor separates the action into a series
of simple steps, each of which can be accomplished in a single clock cycle. One clock cycle’s length P is a
critical parameter that impacts processor performance.

1.3.2 Basic Performance Equation


The processor time component of the total elapsed time is now the focus of our attention. Let ‘T’ be

D
the amount of time it takes a processor to run a programme that has been prepared. In a high-level
programming language, a machine language object programme is generated by the compiler.

E
It relates to the source programme. Assume that the programme has completed its execution. N

V
machine cycle language instructions must be executed. The number N is a prime number. The number
of instructions executed is not always the same as the number of instructions written. In the object

R
programme, there are machine cycle instructions.
Depending on the input data utilised, some instructions may be executed multiple times, while others

E
may not be executed at all in the case of instructions inside a programme loop.

S
Assume that the average number of basic steps required to accomplish one machine cycle instruction
is S, and that each basic step takes one clock cycle to complete. The programme execution time is given
E
by if the clock rate is ‘R’ cycles per second.
R
N×S
T=
R
T

The basic performance equation is what it’s called.


H

It’s important to note that N, S, and R are not independent parameters, and changing one can have
an impact on the others. Only if the overall consequence is to reduce the value of T will adding a new
IG

feature to a processor’s architecture result in increased performance.

Pipelining and Super Scalar Operation


R

We presume that instructions are carried out in a sequential order. As a result, the value of S represents
Y

the total number of fundamental steps (or clock cycles) required to complete one instruction.
Pipelining, a technique that involves overlapping the execution of successive instructions, can result in
P

significant speed gains.


O

Consider Add R1 R2 R3
This sums R1 and R2’s contents and places the result in R3. The contents of R1 and R2 are first copied to
C

the ALU’s inputs. The sum is moved to R3 after the addition procedure is completed. While the addition
operation is being done, the CPU can read the following instruction from memory.
At the same time that the add instructions are sent to R3, the operand of that instruction, which also
utilises the ALU, can be supplied to the ALU inputs.
If all instructions are overlapped to the greatest extent possible, execution proceeds at the pace of one
instruction per clock cycle in the ideal scenario. Individual instructions still take a number of clock
cycles to finish. However, the effective value of S for the purpose of determining T is 1.

7
JGI JAIN
DEEMED-TO-BE UNIVERSITY
Computer Organization and Architecture

If numerous instructions pipelines are built in the CPU, a higher level of concurrency can be attained.
This means that numerous functional units are employed to create parallel channels through which
different instructions can be executed in parallel.
As a result of this arrangement, several instructions can be started in one clock cycle. Superscalar
execution is the name for this kind of action.
The effective value of S can be decreased to less than one if it can be maintained for a long time during
programme execution. However, parallel execution must maintain logical correctness of programmes,
i.e., the results produced must be identical to those produced by sequential execution.

D
The execution of programme instructions in a sequential order Many processors are now constructed
in this manner.

E
V
1.3.3 Clock Rate
These are the two options for increasing the clock rate ‘R’, which are as follows:

R
1. Improving IC technology speeds up logical circuits, reducing the time it takes to complete basic

E
steps. This allows for a reduction in the clock period P and an increase in the clock rate R.
2. Reducing the amount of processing done in a single basic step allows the clock period P to be reduced.

S
However, if the acts that an instruction must execute remain constant, the number of basic steps
required may increase. E
With the exception of the time it takes to access the main memory, increases in the value ‘R’ that are
R
solely due to improvements in IC technology effect all elements of the processor’s operation equally. The
fraction of accesses to main memory is small when cache is present. As a result, most of the performance
benefit that would otherwise be gained by using faster technology can be realised.
T
H

Instruction Set – CISC & RISC


Simple instructions just require a few basic steps to complete. There are a lot of steps in complex
IG

instructions. A significant number of instructions may be required to complete a programming task on


a processor with only simple instructions.
R

This could result in a large ‘N’ value and a small ‘S’ value. Individual instructions that perform more
sophisticated processes, on the other hand, will require fewer instructions, resulting in a lower value
Y

of N and a higher value of S. It’s difficult to say whether one option is better than the other. However,
combining complicated instructions with pipelining (effective value of S 1) would result in the best
P

performance. In CPUs with basic instruction sets, however, efficient pipelining is much easier to achieve.
O

1.3.4 Performance Measurement


A benchmark is a regular job that is used to determine how well a CPU performs. A non-profit
C

organisation known as SPEC-System Performance Evaluation Corporation uses agreed-upon real-world


application programmes as benchmarks to measure the performance of computers. As a result, it gives
a computer’s performance as the time it takes to run a specified benchmark software. The following is
how the SPEC rating is calculated:

Running time on the reference computer


SPEC Rating =
Running time on the computer under test

8
UNIT 01: Basic Structure of Computers and Instruction Set JGI JAIN
DEEMED-TO-BE UNIVERSITY

1.4 Machine Instructions


Machine Instructions are commands or programmes encoded in machine code that can be recognised
and executed by a machine (computer). A machine instruction is a set of bytes in memory that instructs
the processor to carry out a certain task.
The CPU goes through the machine instructions in main memory one by one, doing one machine action
for each one. A machine language programme is a collection of machine instructions stored in main
memory.

D
A collection of instructions performed directly by a computer’s central processing unit is known as
machine code or machine language (CPU).

E
Each instruction performs a highly particular duty on a unit of data in a CPU register or memory, such
as a load, a jump, or an ALU operation. A set of such instructions make up any programme that is

V
directly executed by a CPU.

R
The general format of a machine instruction is:

E
[Label:]  Mnemonic  [Operand, Operand]  [; Comments]
The use of brackets denotes that a field is optional. The address of the first byte of the instruction in

S
which it appears is allocated to the label.
E
It must be followed by a colon (:). The placement of spaces is arbitrary, with the exception that at least
one space must be included; otherwise, there would be ambiguity. A semicolon “; ” begins the comment
R
field.
For example:
T

Here:  MOV  R5,#25H ;  load 25H into R5


H

1.4.1 Representation of Number
IG

Representing (or encoding) a number signifies expressing the number in binary form. The representation
of numbers is necessary for storing and manipulating the numbers efficiently. There are several ways to
represent integers in the binary form. Three binary representations of integers are listed below:
R

zz Signed and magnitude representation


Y

zz 1’s (or one’s) complement


2’s (or two’s) complement
P

zz

Let’s discuss each binary representation of integers in the following section.


O

Defining the Signed and Magnitude Representation


C

An integer may consist of one or more digits which indicate its magnitude and may have +ve or –ve sign.
Examples of integer are +13 or 39 and –12 or –18.
In a computer, numbers are represented in the binary form as bits without using any other symbol.
Various methods are used for binary representation of numbers. One such method is known as signed
and magnitude representation.

9
JGI JAIN
DEEMED-TO-BE UNIVERSITY
Computer Organization and Architecture

In this method, the sign of a number is represented by using the leftmost bit of its binary equivalent,
which is called the Most Significant Bit (MSB). If the MSB is 0, then the sign of the number is + and if the
MSB is 1, then number sign is –.
For example, If a computer has a word size of 8 bits, then the positive number 17 in binary notation
would be represented as shown in Figure 5:

MSB is 0, which
represents the 0 0 0 1 0 0 0 1
positive sign

D
E
Figure 5: Representing a Positive Number(17) in Binary Notation
Similarly, the negative number 12 in binary notation would be represented as shown in Figure 6:

V
R
MSB is 1, which
represents the 1 0 0 0 1 1 0 0

E
negative sign

S
Figure 6: Representing a Negative Number (–12) in Binary Notation
E
In the sign and magnitude notation, if a word is represented by X bits, then the total numbers which can
be represented by X bits is 2X – 1. In the preceding example, a word is represented using 8 bits, where MSB
R
is reserved for representing sign.
Therefore, the number of maximum magnitude or value that can be represented by using 8 bits is 27 =
T

128 and the total numbers that can be represented by using 8 bits are 28 – 1 = 255. These numbers range
from –127 to 0 and 0 to +128.
H

One’s Complement
IG

In 1’s complement representation, a positive number is represented by its binary equivalent and a
negative number is represented by taking 1’s complement of its binary equivalent. The 1’s complement
of a binary number is derived by interchanging the positions of 1s and 0s in that number’s binary
R

equivalent. For example, 1’s complement of the binary number 1100 is 0011. This is called 1’s complement
because we can obtain the same number by subtracting the binary number 1100 from 1111.
Y

Let’s find out the binary equivalent of +5 and –5. In case of +5, 1’s complement (i.e. its binary equivalent)
P

is 0101. The 1’s complement of –5 is obtained by interchanging the positions of 1s and 0s in the binary
equivalent of 5. Therefore, the 1’s complement of –5 is 1010.
O

The maximum numbers that can be represented using 1’s complement is 2X – 1; where X is the number
C

of bits in a word. Therefore, an 8-bit word can have a maximum of 28 – 1 = 255 numbers. 1’s complement
can represent numbers in the range of +0(0000) and –0(1111).
For example, the 1’s complement of –12 is found in the following way:
1’s complement (or binary equivalent) of +12 = 0000 1100

By interchanging the positions of 1s and 0s in the binary equivalent +12, we get the 1’s complement of
–12 as 1111 0011.

10
UNIT 01: Basic Structure of Computers and Instruction Set JGI JAIN
DEEMED-TO-BE UNIVERSITY

Therefore, the 1’s complement of –12 is 1111 0011.


For example, find the 1’s complement of –12.
The required complement is found in the following way:
+12 = 0000 1100
Now, to obtain the 1’s complement, subtract 0000 1100 from 1111 1111

1 1 1 1 1111
– 0000 1100

D
1 1 1 1 0011

E
Two’s Complement
In 2’s complement representation, a positive number is represented by its binary equivalent and a

V
negative number is represented by taking the 2’s complement form of the binary equivalent of the
negative number. To get the 2’s complement of a number, we need to add 1 to the 1’s complement of the

R
same number.

E
For example, the 2’s complement form of 5 can be found in the following way:

S
The binary equivalent of 5 is 0101.
1’s complement of 0101 = 1010 E
2’s complement of 0101 = 1010
R
+1
1011
T

As 2’s complement form of 5 is 1011.


H

Therefore 5 and –5 can be represented as 0101 (binary equivalent) and 1011 (2’s complement of 5).
Alternative Method to Represent the 2’s Complement
IG

Another method to find the 2’s complement of a negative number is as follows:


1. Start traversing from the LSB (right to left) of the binary number until 1 is encountered. Note down
R

all the bits till 1, including 1.


Y

2. Take 1’s complement of the remaining bits.


3. Now, first write the complement of the remaining bits and then the bits separated from the binary
P

number in Step 1.
O

1.4.2 Binary Arithmetic


C

For performing various operations in digital computers and in other digital system, binary arithmetic is
mandatory. Binary arithmetic includes binary addition, subtraction, multiplication and division.
Let’s now learn about rules for performing each kind of operation in detail.

Binary Addition
As the name suggests, binary addition is used for performing addition of binary numbers.

11
JGI JAIN
DEEMED-TO-BE UNIVERSITY
Computer Organization and Architecture

The rules for performing binary addition are shown in Table 1:

Table 1: Rules for performing Binary Addition

Case A B Sum(A+B) Carry


I 0 0 0 0
II 0 1 1 0
III 1 0 1 0

D
IV 1 1 0 1
In case IV, 1+1 will give 10. The 0 is retained in the same column while the 1 is carried to the next column.

E
Note that the binary addition of 1+1+1 is 11 in which 1 is retained in the same column and other 1 is

V
carried to the next column.
For example: Let’s perform binary addition on the two numbers 101112(2310) and 101101(4510).

R
Carry111111

E
0010111 (2310)

S
+ 0101101 (4510)
1000100 (6810)
E
R
Binary Subtraction
In binary subtraction, two binary numbers are subtracted. While performing subtraction, the smaller
number must be subtracted from a larger number.
T

The rules for performing binary subtraction are shown in Table 2:


H

Table 2: Rules for Performing Binary Subtraction


IG

Case A B Subtraction(A-B) Carry


I 0 0 0 0
II 1 0 1 0
R

III 1 1 0 0
Y

IV 0 1 0 1

For example: Let’s perform binary addition on the two numbers 101101(4510) and 101112(2310).
P

0100 1 0
O

0 1 0 1 100101 (4510)
– 0 0 1 0 1 1 1 (2310)
C

0 0 1 0 1 1 0 (2210)

Here, 0 becomes 102 after borrow which is equal to 2(decimal equivalent) and 12 (decimal equivalent) = 1,
therefore, 2 - 1 = 1 or 102 - 12 = 12 .

Binary Multiplication
In binary multiplication, the multiplication of binary digits is done similar to decimal numbers.

12
UNIT 01: Basic Structure of Computers and Instruction Set JGI JAIN
DEEMED-TO-BE UNIVERSITY

The rules for performing binary numbers are shown in Table 3:

Table 3: Rules for Performing Binary Multiplication


Case A B Multiplication(A*B)
I 0 0 0
II 0 1 0
III 1 0 0
IV 1 1 1

D
For example: Let’s perform binary multiplication on the two numbers 0101101(4510) and 00011002(1210).

E
0 1 0 1 1 0 1 (4510)

V
X 0 0 0 1 1 0 0 (1210)
0000000

R
0000000+

E
0101101 +
0101101 +

S
1 0 0 0 0 1 1 1 0 0 (54010)

Binary Division
E
R
In binary division of two numbers, the larger number is divided by the smaller. The rules of subtraction
and multiplication are obeyed while performing the division operation. The rules of binary division are
T

same as the rules of decimal number division. Let’s perform binary multiplication on the two numbers
01011012(4510) and 1012(510).
H

1 0 0 1
IG

101√1 0 1 1 0 1
-1 0 1
R

0 0 0 1 0 1
Y

-1 0 1
0 0 0
P

The quotient obtained after division is 10012 which is equivalent to 910.


O

1.4.3 Memory Location & Addresses


C

A computer’s memory stores number and character operands, as well as instructions. The memory is
made up of millions of storage cells, each of which can store a single bit of data with a value of 0 or 1.
Bits are rarely handled separately because they represent such a small quantity of information. The
standard method is to deal with them in groups of a predetermined size. The memory is designed in
such a way that a group of n bits can be stored or retrieved in a single, basic operation for this purpose.
A word of information is made up of n bits, with n being the length of the word.

13
JGI JAIN
DEEMED-TO-BE UNIVERSITY
Computer Organization and Architecture

Figure 7 shows the memory of a computer:

n bits

First Word
Second Word

D
E
i th Word

V
R
E
Last Word

S
E
Figure 7: Memory Structure
R
In most cases, a number takes only one word. By providing its word address, it may be retrieved in
memory. Individual characters can also be retrieved using their byte addresses.
Many applications need the handling of variable-length character strings. The address of the byte
T

containing the string’s first character is used to identify the start of the string. The string’s characters
are stored in successive byte positions. The length of the string can be indicated in two ways. A special
H

control character with the meaning “end of string” is used as the last character in the string, or a
number representing the string’ length in bytes can be put in a separate memory storage address or
IG

CPU register. Memory is made up of several millions of storage cells, each of which can hold one piece
of information. In most cases, data is accessed in n-bit chunks. The number n stands for word length.
Figure 8 shows the 32-bit word length:
R

32 bits
Y

b31 b30 b1 b0
P

Sign bit: b31 = 0 for positive numbers


O

b31 = 1 for negative numbers


C

(a) A Signed integer

8 bits 8 bits 8 bits 8 bits

ASCII ASCII ASCII ASCII


Character Character Character Character

Figure 8: 32-bit Word Length

14
UNIT 01: Basic Structure of Computers and Instruction Set JGI JAIN DEEMED-TO-BE UNIVERSITY

Addresses for each place are required to access information from memory, whether for one word or one
byte (8-bit). Memory space refers to the 2^k memory places in a k-bit address memory, namely 0 to (2^k-
1). Assigning unique addresses to individual bit places in the memory is impracticable.
The most feasible assignment is for consecutive addresses to refer to successive byte places in memory
(byte-addressable memory). The addresses of bytes are 0, 1, 2,... If a word is 32 bits long, the following
words are placed at locations 0, 4, 8,...

Big-Endian and Little-Endian Assignments

D
There are two methods for assigning byte addresses across words, namely, big-endian and little-endian.
Big-endian is the most significant bytes of the word are addressed with lower byte addresses. Little-

E
endian is the antithesis of Big-Endian. For the word’s fewer important bytes, lower byte addresses are
used. Figure 9 shows the Big-endian and Little-endian assignment:

V
Word

R
Address Byte Address Byte Address

0 0 1 2 3 0 3 2 1 0

E
4 4 5 6 7 4 7 6 5 4

S



E •


R
T

2k – 4 2k – 4 2k – 3 2k – 2 2k – 1 2k – 4 2k – 1 2k – 2 2k – 3 2k – 4

(a) Big-endian Assignment (a) Little-endian Assignment


H

Figure 9: Big-Endian and Little-Endian Assignment:


IG

When words begin at a byte address that is a multiple of the number of bytes in a word, they are said to
be aligned in memory.
R

zz 16-bit word: word addresses: 0, 2, 4,….


zz 32-bit word: word addresses: 0, 4, 8,….
Y

zz 64-bit word: word addresses: 0, 8, 16,….


P

1.4.4 Memory Operation


O

The memory stores both programme instructions and data operands. The processor control circuits
must cause the word (or words) holding the instruction to be moved from memory to the processor in
C

order for it to be executed.


Operands and results should be transmitted between the memory and the CPU. Hence, two basic
memory operations need to the executed: load (or read or fetch) and store (or Write).
The load operation copies the contents of a particular memory region and sends them to the processor.
The contents of the memory are unaffected.

15
JGI JAIN
DEEMED-TO-BE UNIVERSITY
Computer Organization and Architecture

The processor initiates a Load operation by sending the address of the target place to memory and
requesting that its contents be read. The data stored at that address is read from the memory and sent
to the CPU.
The store operation copies data from the processor to a specified memory location, erasing the contents
of that location before. The processor transfers the address of the desired place, as well as the data to be
put into that position, to the memory.
In a single operation, an information item of one word or one byte can be transmitted from the CPU to
the memory. This is a transfer between the CPU register and the main memory.

D
Load (or Read or Fetch)

E
zz Make a copy of the material. The memory’s content remains constant.
zz Address – Load

V
zz The usage of registers is possible.

R
Store (or Write)
In memory, overwrite the material.

E
zz

zz Address and Data – Store

S
zz Registers can be used

1.4.5 Instructions and Instruction Sequencing


E
R
A computer programme is made up of a series of small processes such as adding two numbers, checking
for a specific condition, reading the instruction from the input unit, or sending a output to the output
unit, etc. A computer must be able to execute four different sorts of operations, which are as follows:
T

zz Data transfers between the memory and the processor registers


H

zz Arithmetic and logic operations on data


zz Program sequencing and control
IG

zz I/O transfers
There are two type of notation that is used in instruction sequencing. These notation are as follows:
R

zz Register transfer notation: Information is transferred from one computer place to another.
Memory locations, CPU registers, or registers in the I/O subsystem are all possible places that may
Y

be implicated in such transfers. We usually refer to a location by a symbolic name that represents
its hardware binary address.
P

zz Assembly language notation: Another way to express machine instructions and programmes is to
O

use notation. We utilise an assembly language format for this. The statement, for example, specifies
an instruction that causes the transfer indicated above, from memory address LOC to processor
C

register R1.

Move LOC, R1
The old values of register R1 are rewritten when this instruction is executed, while the contents of
LOC remain unaltered.
The assembly language line may be used to specify the second example of adding two integers in
processor registers R1 and R2 and storing their sum in R3.

Add R1, R2, R3

16
UNIT 01: Basic Structure of Computers and Instruction Set JGI JAIN
DEEMED-TO-BE UNIVERSITY

Conclusion 1.5 Conclusion

zz A computer is a high-speed electronic calculating machine that accepts digital input, processes it
according to internally stored instructions (Programs), and outputs the result on a display device.
zz In general, computer architecture encompasses three areas of computer design: computer hardware,
instruction set architecture, and computer organization.
zz A computer consists of five functional units: an input unit, an output unit, a memory unit, arithmetic
and logic unit, and a control unit.

D
zz The memory unit holds the programme instructions (Code), data, and computing results, among

E
other things.
zz A programme containing a list of instructions is stored in the memory to accomplish a certain task.

V
zz A bus is a collection of lines that act as a connecting path for multiple devices (one bit per line).

R
zz Clock is a timing signal that controls processor circuits.
Machine Instructions are commands or programmes encoded in machine code that can be

E
zz
recognised and executed by a machine (computer).

S
zz Representing (or encoding) a number signifies expressing the number in binary form.
zz Binary arithmetic includes binary addition, subtraction, multiplication and division.
E
R
1.6 Glossary

zz Computer: An electronic calculating machine that accepts digital input, processes it according to
T

internally stored instructions (Programs), and outputs the result on a display device.
H

zz Programme: It contains a list of instructions is stored in the memory to accomplish a certain task.
zz Bus: It is a collection of lines that act as a connecting path for multiple devices (one bit per line).
IG

zz Processor clock: It is a timing signal that controls processor circuits.


zz Machine instruction: It is a set of bytes in memory that instructs the processor to carry out a certain
task.
R

zz Workstations: A powerful desktop computer with specialized functions


Y

zz Memory Operation: The memory stores both programme instructions and data operands.
P

zz Numbers: Both positive and negative numbers must obviously be represented.


O

1.7 Self-Assessment Questions


C

A. Multiple Choice Questions


1. The bus used to connect the monitor to the CPU is ______
a. PCI bus
b. SCSI bus
c. Memory bus
d. Rambus

17
JGI JAIN
DEEMED-TO-BE UNIVERSITY
Computer Organization and Architecture

2. Which of the following is a small, portable computer that can be powered by a power supply or a
battery?
a. Laptop computer b. Desktop computer
c. Workstation d. Super computer
3. Which of the following computer is also known as Personal Digital Assistant?
a. Laptop computer b. Mainframe computer
c. Handheld computer d. Super computer

D
4. A computer’s memory stores:
a. Number b. Character operands

E
c. Instructions d. All of these

V
5. A __________ is a collection of lines that act as a connecting path for multiple devices (one bit per
line).

R
a. Circuit b. Bus

E
c. Memory d. Processor
6. Clock cycles are regular time periods created by the ______________.

S
a. Programmer b. Circuit programmer
c. Clock designer
E
d. CPU
R
7. Which of the following technique that involves overlapping the execution of successive instructions,
can result in significant speed gains?
a. Clock rate b. Pipelining
T

c. Throughput d. All of these


H

8. Which of the following is a collection of instructions performed directly by a computer’s central


processing unit?
IG

a. Machine code b. Processor language


c. Both a and b d. None of these
R

9. Which of the following is the binary representations of integers?


a. Signed and magnitude representation b. 1’s (or one’s) complement
Y

c. 2’s (or two’s) complement d. All of these


P

10. Which of the following is used to communicate the processed results to the rest of the world?
O

a. Input unit b. Output unit


c. Control unit d. None of these
C

B. Essay Type Questions


1. Different types of computers are used for different purposes. Discuss the different types of computers.
2. Discuss the functional units of a computer.
3. Machine instructions are commands or programmes encoded in machine code that can be
recognised and executed by a machine. Discuss.

18
UNIT 01: Basic Structure of Computers and Instruction Set JGI JAIN
DEEMED-TO-BE UNIVERSITY

4. Encoding a number signifies expressing the number in binary form. What are the three binary
representations of integers?
5. The memory stores both programme instructions and data operands. Discuss the concept of memory
operations.

1.8 Answers AND HINTS FOR Self-Assessment Questions

A. Answers to Multiple Choice Questions

D
Q. No. Answer

E
1. b. SCSI bus2. a. Laptop computer

V
3. c. Handheld computer

R
4. d. All of these
5. b. Bus

E
6. c. Clock designer

S
7. b. Pipelining
8.
9.
a.
d.
Machine code
All of these
E
R
10. b. Output unit
10. a. Single bus
T

B. Hints for Essay Type Questions


H

1. The different types of computers are:


IG

 Micro Computer: A personal computer is one that is designed to satisfy the needs of a single
person. Access to a wide range of computer applications, including word processing, photo
editing, e-mail, and the internet.efer to Section Basic Computer Structure
R

2. In its most basic form, a computer consists of five functional units: an input unit, an output unit, a
memory unit, arithmetic and logic unit, and a control unit. Refer to Section Basic Computer Structure
Y

3. A machine instruction is a set of bytes in memory that instructs the processor to carry out a certain
P

task. The CPU goes through the machine instructions in main memory one by one, doing one machine
action for each one. A machine language programme is a collection of machine instructions stored
O

in main memory. Refer to Section Machine Instructions


4. The representation of numbers is necessary for storing and manipulating the numbers efficiently.
C

There are several ways to represent integers in the binary form. Three binary representations of
integers are listed below:
 Signed and magnitude representation
 1’s (or one’s) complement
 2’s (or two’s) complement
Refer to Section Machine Instructions

19
JGI JAINDEEMED-TO-BE UNIVERSITY
Computer Organization and Architecture

5. The processor control circuits must cause the word (or words) holding the instruction to be moved
from memory to the processor in order for it to be executed. Operands and results should be
transmitted between the memory and the CPU. Hence, two basic memory operations need to the
executed: load (or read or fetch) and store (or Write). Refer to Section Machine Instructions

@ 1.9 Post-Unit Reading Material

zz https://www.studocu.com/in/document/psg-college-of-technology/computer-architecture/m-

D
morris-mano-solution-manual-computer-system-architecture/10775236
zz http://sites.alaqsa.edu.ps/ekhsharaf/wp-content/uploads/sites/333/2016/03/machine-instruction-

E
and-programs.pdf

V
1.10 Topics for Discussion Forums

R
Do the online research on the basics of computer and make a presentation on in-depth knowledge of

E
zz
internal working, structuring, and implementation of a computer system. Also, discuss the concept

S
cover in the presentation with your classmates.

E
R
T
H
IG
R
Y
P
O
C

20
UNIT

02

D
E
Machine Instructions

V
and Programs

R
E
S
Names of Sub-Units
E
Introduction to Machine Instructions and Programs, Addressing Modes, Assembly Language, Basic
R
Input and Output Operations, Subroutines, Additional Instructions, Encoding of Machine Instructions.
T

Overview
H

This unit begins by discussing the machine instructions and programs. Next, the unit discusses the
addressing modes and assembly language. Further the unit explains the basic input and output
IG

operations. This unit also discusses the subroutines. Towards the end, the unit discusses the encoding
of machine instructions.
R

Learning Objectives
Y

In this unit, you will learn to:


P

aa Explain the different addressing modes


O

aa Define the machine instructions, data, and programmes are represented in assembly language.
aa Discuss the Input/Output activities that are controlled by a programme
C

aa Explain the branching and subroutine call and return actions


aa Describe the as machine instructions and programme execution
JGI JAIN
DEEMED-TO-BE UNIVERSITY
Computer Organization and Architecture

Learning Outcomes

At the end of this unit, you would:


aa Assess the knowledge about the different addressing modes
aa Examine the role of machine instructions, data, and programmes in assembly language
aa Evaluate the Input/Output activities
aa Analyse the subroutine call and return actions

D
aa Assess the machine instructions and programme execution

E
Pre-Unit Preparatory Material

V
http://www.mhhe.com/engcs/electrical/hamacher/5e/graphics/ch02_025-102.pdf

R
aa

E
2.1 Introduction

S
Machine instructions and operand addressing information are represented by symbolic names in
assembly language. The instruction set architecture (ISA) of a CPU refers to the whole instruction set.
E
Machine instructions and operand addressing techniques that are common in commercial processors.
We need to provide a sufficient amount of instructions and addressing techniques to present complete,
R
realistic programmes for simple tasks. The assembly language is used to specify these generic
programmes.
T

2.2 Addressing Modes


H

We’ve now looked at a few simple assembly language applications. A programme, in general, works with
data stored in the computer’s memory. These records can be arranged in a number of ways. We can keep
IG

track of pupils’ names by writing them down on a list. We may organise this information in the form of a
table if we wish to connect information with each name, such as phone numbers or grades in particular
courses. Data structures are used by programmers to represent the data utilised in computations. Lists,
linked lists, arrays, queues, and other data structures are examples.
R

Constants, local and global variables, pointers, and arrays are often used in high-level programming
Y

languages. The compiler must be able to implement these structures utilising the capabilities given in
the instruction set of the machine on which the programme will be run when converting a high-level
P

language programme into assembly language. Addressing modes relate to the many methods in which
an operand’s location is stated in an instruction.
O

The generic addressing modes are shown in Table 1:


C

Table 1: Generic Addressing Modes


Name Assembles Syntax Addressing function
Immediate #value Operand=Value
Register Ri EA=Ri
Absolute (Direct) LOC EA=LOC
Indirect (Ri) EA=(Ri)

2
UNIT 02: Machine Instructions and Programs JGI JAIN
DEEMED-TO-BE UNIVERSITY

Name Assembles Syntax Addressing function


Immediate #value Operand=Value
(LOC) EA=(LOC)
Index X(Ri) EA=[Ri]+X
Base with Index (Ri,Rj) EA=[Ri]+[Rj]
Base with index and offset X(Ri, Rj) EA=[Ri]+[Rj]÷X
Relative X(PC) EA\[PC]÷X

D
Auto increment (Ri)+ EA=[Ri]; Increment Ri
Auto decrement =(Ri) Decrement Ri EA=[Ri]

E
The description of these addressing modes is as follows:

V
zz Register mode: The operand is the contents of a processor register; the register’s name (address) is
specified in the instruction.

R
Instruction Register

E
Register Data

S
zz Absolute mode: The operand is stored in memory, and the location’s address is specified directly in
E
the instruction. (This mode is known as Direct in various assembly languages.)
R
Move LOC,R2
zz Immediate mode: The operand is given explicitly in the instruction.
T

For example, the instruction

Move 200 immediate, R0


H

Register R0 with the value 200. Obviously, the Immediate mode is only used to express a source
IG

operand’s value. In assembly languages, using a subscript to signify the Immediate mode is not
acceptable. The use of the sharp sign (#) in front of a value to signal that it is to be used as an
immediate operand is a frequent convention.
R

As a result, we write the instruction in the following format:


Y

Move #200,R0
In high-level language applications, constant values are commonly employed. Consider the following
P

example:
O

A=B+6
holds the value of 6. This statement can be compiled as follows, assuming A and B have been declared
C

as variables and can be accessed using the Absolute mode:

Move B,R1
Add #6,R1
Move R1,A
In assembly language, constants are also used to increment a counter, test for a bit pattern, and
so on.

3
JGI JAIN
DEEMED-TO-BE UNIVERSITY
Computer Organization and Architecture

zz Indirect mode: The contents of a register or memory location whose address is specified in the
instruction are the operand’s effective address.
Indirect method is divided into two types based on the availability of effective address:
1. Register indirect: In this method, the effective address is kept in the register, and the matching
register name is kept in the instruction’s address field. To access the data, one register reference
and one memory reference are required.
2. Memory Indirect: In this method, the effective address is stored in memory, and the matching
memory address is stored in the instruction’s address field. To access the data, two memory

D
references are necessary.
zz Index mode: By adding a constant value to the contents of a register, the operand’s effective address

E
is produced.

V
Example: MOV AX, [SI +05]
Relative mode: The Index mode determines the effective address by utilising the programme counter

R
zz
instead of the general-purpose register Ri.

E
zz Auto increment mode: The contents of a register provided in the instruction are the operand’s
effective address. The contents of this register are automatically increased to refer to the next item

S
in a list after accessing the operand.

E
Add R1, (R2)+ // OR
R1 = R1 +M[R2]
R
R2 = R2 + d
Auto decrement mode: The contents of a register provided in the instruction are decremented
T

zz
automatically before being utilised as the operand’s effective address.
H

Add R1,-(R2) //OR


IG

R2 = R2-d
R1 = R1 + M[R2]
R

2.3 Assembly Language


Y

Patterns of 0s and 1s are used to represent machine instructions. When discussing or planning plans, such
tendencies are difficult to cope with. As a result, we refer to the patterns by symbolic names. To describe
P

the matching binary code patterns, we have so far utilised common terms like Move, Add, Increment,
and Branch for the instruction operations. Such phrases are usually substituted with abbreviations
O

called mnemonics, such as MOV, ADD, INC, and BR, when creating programmes for a certain machine.
We use the notation R3 to refer to register 3 and LOC to refer to a memory location in the same way. A
C

programming language, gen, is made up of a comprehensive collection of such symbolic names and
rules for their use. A programming language, often known as an assembly language, is made up of a
comprehensive collection of such symbolic names and rules for their use. The syntax of the language is
a set of rules for employing mnemonics in the definition of full instructions and programmes.
A programme called an assembler can automatically transform programmes written in an assembly
language into a series of machine instructions. The assembler programme is one of the utility
applications included in the system software. The assembler, like any other programme, is stored in
the computer’s memory as a series of machine instructions. A user programme is typically typed into

4
UNIT 02: Machine Instructions and Programs JGI JAIN
DEEMED-TO-BE UNIVERSITY

a computer through the keyboard and saved in memory or on a magnetic drive. At this stage, the
user programme is nothing more than a series of alphanumeric characters on a line. When you run
the assembler programme, it reads the user programme, analyses it, and then creates the machine
language programme you want. The latter comprises patterns of 0s and 1s that indicate the computer’s
instructions to be performed. A source programme is the user programme in its original alphanumeric
text format, while an object programme is the constructed machine language programme.
A computer’s assembly language may or may not be case sensitive, that is, it may or may not differentiate
between capital and lower case letters. To increase the readability of the text, we will use capital letters
to designate all names and labels in our examples. We’ll write a Move instruction as follows:

D
MOVE R0,SUM

E
The binary pattern, or OP code, representing the operation executed by the instruction is represented by
the mnemonic MOVE. This mnemonic is translated into binary OP code that the computer understands

V
by the assembler. At least one blank space character follows the OP-code mnemonic. Then comes the
information that defines the operands. The source operand in our case is in register R0. There are no

R
blanks between this information and the specification of the destination operand, which is separated
from the source operand by a comma. The destination operand is in the memory location SUM, which is

E
represented by its binary address.

S
Because there are numerous different ways to define operand locations, the assembly language must
declare which one is being utilised. To represent the Absolute mode, a numerical number or a word
E
used alone, such as SUM in the previous command, might be used. An immediate operand is generally
R
denoted by a sharp sign. As a result, the directive is:
ADD #5,R3
T

adds the number 5 to the contents of register R3 and returns the result to R3. The sharp symbol isn’t
the only technique to indicate that the mode is Immediate. The desired addressing mode is specified in
H

the OP code mnemonic in various assembly languages. In this example, distinct OP-code mnemonics
are used for different addressing modes for the same instruction. The preceding Add instruction, for
IG

example, might be expressed as:


ADDI 5,R3
R

The suffix I indicate that the source operand is supplied in the Immediate addressing mode in the
mnemonic ADDI. Indirect addressing is often defined by placing parentheses around the name or
Y

symbol indicating the operand’s pointer. If the number 5 is to be stored in a memory location whose
address is maintained in register R2, for example, the desired action can be expressed as:
P

MOVE #5,(R2)
O

MOVEI 5,(R2)
C

2.3.1 Assembler Directives


The assembly language allows the programmer to provide extra information needed to transform
the source programme into the object programme in addition to providing a system for expressing
instructions in a programme. We’ve already established that any names used in a programme must be
given numerical values. Assume that the number 200 is represented by the term SUM. This information
may be sent to the assembly programme with a statement like:
SUM EQU 200

5
JGI JAIN
DEEMED-TO-BE UNIVERSITY
Computer Organization and Architecture

2.3.2 Assembly and Execution of Programs


Before a source programme written in assembly language can be run, it must first be assembled into
a machine language object programme. The assembler programme does this by replacing all symbols
indicating operations and addressing modes with binary codes used in machine instructions, as well as
all names and labels with their real values.
Starting at the location specified in the ORIGIN assembler directives, the assembler assigns addresses to
instructions and data blocks. It also inserts constants specified in DATAWORD instructions and reserves
memory space when RESERVE directives are sent.

D
2.4 Basic Input and Output Operations

E
An efficient mode of communication between the central system and the outside environment is provided

V
by a computer’s I/O subsystem. It is in charge of the computer system’s entire input-output operations.

R
2.4.1 Peripheral Devices

E
Peripheral devices are input or output devices that are connected to a computer. These devices, which
are regarded to be part of a computer system, are designed to read information into or out of the

S
memory unit in response to commands from the CPU. Peripherals are another name for these gadgets.
For example: Keyboards, display units and printers are common peripheral devices.
E
There are three types of peripherals:
R
1. Input peripherals: These devices allow users to enter data from the outside world into the computer.
For instance, a keyboard, a mouse, and so on.
T

2. Output peripherals: These devices allow data to be sent from the computer to the outside world. For
instance, a printer, a monitor, and so on.
H

3. Peripherals for input and output: Allows both input (from the outside world to the computer) and
output (from the computer to the outside world) (from computer to the outside world). For instance,
IG

a touch screen.
R

2.4.2 Programmed I/O


I/O instructions written in a computer programme result in programmed I/O instructions. The
Y

program’s command triggers the transmission of each data item. Typically, the software is in charge
of data transmission between the CPU and peripherals. Transferring data through programmed I/O
P

necessitates the CPU’s continual supervision of the peripherals.


O

2.4.3 Interrupt Initiated I/O


C

The CPU remains in the programme loop in the programmed I/O technique until the I/O unit indicates
that it is ready for data transfer. This is a time-consuming procedure since it wastes the processor’s
time.
Interrupt started I/O can be used to solve this problem. The interface creates an interrupt when it
determines that the peripheral is ready for data transmission. When the CPU receives an interrupt
signal, it stops what it’s doing and handles the I/O transfer before returning to its prior processing
activity.

6
UNIT 02: Machine Instructions and Programs JGI JAIN
DEEMED-TO-BE UNIVERSITY

2.4.4 Direct Memory Access


The speed of transmission would be improved by removing the CPU from the pipeline and allowing the
peripheral device to control the memory buses directly. DMA is the name for this method.
The interface transfers data to and from memory through the memory bus in this case. A DMA controller
is responsible for data transmission between peripherals and the memory unit.
DMA is used by many hardware systems, including disc drive controllers, graphics cards, network cards,
and sound cards, among others. In multicore CPUs, it’s also utilised for intra-chip data transmission.
In DMA, the CPU would start the transfer, perform other tasks while it was running, and then get an

D
interrupt from the DMA controller when the transfer was finished.

E
2.5 Subroutines

V
In a given programme, it’s common to have to repeat a subtask several times with various data values.
A subroutine is a term used to describe such a subtask.

R
It is feasible to include the block of instructions that make up a subroutine at any point in the programme

E
where it is required. However, to save memory, just one copy of the instructions that make up the
subroutine is stored in memory, and any programme that needs to utilise it simply branches to the

S
beginning of the subroutine. We say that a programme is invoking a subroutine when it branches to it.
Call is the name of the instruction that conducts this branch action.
E
The calling programme must continue execution once a subroutine has been run, commencing
R
immediately after the instruction that called the function. By performing a Return instruction, the
subroutine is said to return to the programme that called it. Because the subroutine may be called from
several locations in a calling programme, it must be possible to return to the correct position. While
T

the Call instruction is being executed, the updated PC points to the place where the calling programme
resumes execution. As a result, the Call instruction must preserve the contents of the PC in order to
H

return to the caller application correctly.


IG

The subroutine linkage technique is the mechanism through which a computer allows you to call and
return from subroutines. The most basic way of linking subroutines is to save the return address in a
specified place, which may be a register devoted to this function. The link register is a type of register. The
R

Return instruction returns to the caller programme by branching indirectly through the link register
after the subroutine has completed its work.
Y

The Call instruction is a special branch instruction that does the following tasks:
In the link register, save the contents of the PC.
P

zz

zz Go to the address given in the instruction’s target address.


O

Return is a special branch instruction that performs the following operation:


C

zz Make a branch to the link register’s address.

2.6 Additional Instructions


So far, we’ve covered Move, Load, Store, Clear, Add, Subtract, Increment, Decrement, Branch, Testbit,
Compare, Call, and Return commands. We were able to build routines to demonstrate machine
instruction sequencing, including branching and the subroutine structure, using these 13 instructions
and the addressing modes. The fundamental memory-mapped I/O operations were also shown. Move
can be used in place of Store, while Add and Subtract can be used in place of Increment and Decrement,

7
JGI JAIN
DEEMED-TO-BE UNIVERSITY
Computer Organization and Architecture

respectively. A Move instruction with a zero immediate operand can also be used to replace Clear. As a
result, simply 8 instructions would have sufficed for our needs. However, considerable redundancy in
real machine instruction sets is not uncommon. Certain basic procedures may typically be carried out
in a variety of ways. Some options could be more effective than others. We’ll go over a couple more key
instructions that are included in most instruction sets in this section.
zz Logic Instructions: The basic building blocks of digital circuits are logic operations such as AND, OR,
and NOT applied to individual bits. It’s also useful to be able to conduct logic operations in software,
which is done using instructions that apply these operations separately and in parallel to all bits of
a word or byte. For instance, consider the instruction.

D
Not dst

E
zz Shift and Rotate Instructions: Many applications demand that the bits of an operand be shifted
right or left by a certain amount of bit positions. Whether the operand is a signed integer or some

V
more generic binary-coded information determines the specifics of how the shifts are executed. We
employ a logical shift for generic operands. We utilise an arithmetic shift for a number, which keeps

R
the integer’s sign.
zz Logical Shifts: Two logical shift instructions are required, one for left (LShiftL) and the other for

E
right (LShiftR) (LShiftR). These instructions shift an operand across a number of bit locations given
in the instruction’s count operand. A logical left shift instruction might have several different forms.

S
LShiftL count,dst

2.7 Encoding of Machine Instructions


E
R
A number of helpful instructions and addressing modes have been implemented. These instructions
define the steps that the processor circuitry must do in order to complete the tasks. We’ve referred to
T

them as “machine instructions” on several occasions. Actually, the format in which the instructions were
provided is similar to that used in assembly languages, with the exception that we attempted to avoid
H

using acronyms for the various processes, which are difficult to remember and are likely to be peculiar
to a certain commercial processor. An instruction must be encoded in a compact binary pattern in order
IG

to be executed in a processor. Machine instructions are the correct term for such encoded instructions.
Assembly instructions are those that employ symbolic names and acronyms.
The assembler software is used to transform the language instructions into machine instructions. We’ve
R

seen instructions for adding, subtracting, moving, shifting, rotating, and branching. These instructions
can employ a variety of operands, including 32-bit and 8-bit integers, as well as 8-bit ASCII-encoded
Y

characters. An encoded binary pattern known as the OP code for the given instruction can be used to
specify the type of operation to be performed and the type of operands to be used. Assume that 8 bits are
P

set aside for this purpose, allowing you 256 distinct ways to define various instructions. The remaining
24 bits are used to specify the remaining information.
O

Let’s look at a few examples. The directive is:


C

Add R1, R2
In addition to the OP code, the registers R1 and R2 must be specified. If the CPU has 16 registers, each one
requires four bits to identify. For each operand, additional bits are required to signal that the Register
addressing mode is being utilised.
The Instruction is:
Move 24(R0), R5

8
UNIT 02: Machine Instructions and Programs JGI JAIN
DEEMED-TO-BE UNIVERSITY

16 bits are needed to represent the OP code and the two registers, as well as a few bits to indicate that
the source operand utilises Index addressing mode and that the index value is 24.
The instructions for the shift is:
LShiftR #2, R0
The move instruction is:
Move #$3A, R1

D
In addition to the 18 bits needed to describe the OP code, the addressing modes, and the register, the
immediate values 2 and #$3A must be specified. The size of the immediate operand is thus limited to

E
what can be expressed in 14 bits. Consider next the branch instruction:
Branch >0 LOOP

V
The OP code is 8 bits again, leaving 24 bits to define the branch offset. Because the offset is a two-

R
symbol integer, the branch target address must be within 223 bytes of the branch instruction’s position.
An alternative addressing mode, such as Absolute or Register Indirect, must be used to branch to an

E
instruction beyond this range. Jump instructions are branch instructions that make advantage of these
modes.

S
In all these examples, the instructions can be encoded in a 32-bit word, as shown in Figure 1:
E
R
8 7 7 10
OP code Source Dest Other info
T

(a) Own-word instruction


H

OP code Source Dest Other info


IG

Memory address/Immediate operand

(b) Two-word instruction


R

OP code Ri Rj Other info


Y

(c) Three-operand instruction


P

Figure 1: Encoding Instruction of 32 bit Word


O

Depicts a possible format. There is an 8-bit Op-code field and two 7-bit fields for specifying the source
and destination operands. The 7-bit field identifies the addressing mode and the register involved (if
C

any). The “Other info” field allows us to specify the additional information that may be needed, such as
an index value or an immediate operand.
But what if we wish to use the Absolute addressing mode to specify a memory operand?
The instruction
Move R2, LOC

9
JGI JAIN DEEMED-TO-BE UNIVERSITY
Computer Organization and Architecture

The OP code, the addressing modes, and the register all require 18 bits. This leaves 14 bits to express the
LOC-related address, which is plainly insufficient.
And #$FF000000. R2
In which case, the second word gives a full 32-bit immediate operand. If we wish to use the Absolute
addressing mode to provide two operands for an instruction, we can do so. for example
Move LOC1, LOC2
The operands’ 32-bit addresses must then be represented using two extra words. This method yields

D
instructions of varying lengths, depending on the number of operands and addressing modes employed.
We may create fairly complicated instructions using several words, which are quite similar to operations

E
in high-level programming languages. The name “complex instruction set computer” (CISC) has been
applied to processors that employ this sort of instruction set. The requirement that an instruction take

V
up just one word has resulted in a type of computer called as a limited instruction set computer (RISC).
Other constraints imposed by the RISC method include the need that all data modification is done on

R
operands existing in processor registers. Because of this limitation, the above addition would need a
two-instruction sequence:

E
Move (R3), R1

S
Add R1, R2
E
Only a part of a 32-bit word is required if the Add instruction only needs to identify the two registers.
As a result, we may be able to give a more powerful command with three operands:
R
Add R1, R2, R3
T

Which performs the operation,


R3= [R1] + [R2]
H
IG

Conclusion 2.8 Conclusion

zz Machine instructions and operand addressing information are represented by symbolic names in
R

assembly language.
zz The assembly language is used to specify these generic programmes.
Y

zz Data structures are used by programmers to represent the data utilised in computations.
P

zz Constants, local and global variables, pointers, and arrays are often used in high-level programming
languages.
O

zz Addressing modes relate to the many methods in which an operand’s location is stated in an
instruction.
C

zz A programme called an assembler can automatically transform programmes written in an assembly


language into a series of machine instructions.
zz The assembler programme is one of the utility applications included in the system software.
zz An efficient mode of communication between the central system and the outside environment is
provided by a computer’s I/O subsystem. It is in charge of the computer system’s entire input-output
operations.

10
UNIT 02: Machine Instructions and Programs JGI JAIN
DEEMED-TO-BE UNIVERSITY

zz Peripheral devices are input or output devices that are connected to a computer.
zz I/O instructions written in a computer programme result in programmed I/O instructions.
zz The CPU remains in the programme loop in the programmed I/O technique until the I/O unit
indicates that it is ready for data transfer.
zz The speed of transmission would be improved by removing the CPU from the pipeline and allowing
the peripheral device to control the memory buses directly. DMA is the name for this method.
zz An instruction must be encoded in a compact binary pattern in order to be executed in a processor.
Machine instructions are the correct term for such encoded instructions.

D
zz The assembler software is used to transform the language instructions into machine instructions.

E
zz An encoded binary pattern known as the OP code for the given instruction can be used to specify the
type of operation to be performed and the type of operands to be used.

V
2.9 Glossary

R
Indirect mode: The contents of a register or memory location whose address is specified in the

E
zz
instruction are the operand’s effective address.

S
zz Index mode: By adding a constant value to the contents of a register, the operand’s effective address
is produced.
zz
E
Relative mode: The Index mode determines the effective address by utilising the programme counter
R
instead of the general-purpose register Ri.
zz Auto increment mode: The contents of a register provided in the instruction are the operand’s
effective address. The contents of this register are automatically increased to refer to the next item
T

in a list after accessing the operand.


H

zz Auto decrement mode: The contents of a register provided in the instruction are decremented
automatically before being utilised as the operand’s effective address.
IG

zz Register mode: The operand is the contents of a processor register; the register’s name (address) is
specified in the instruction.
zz Absolute mode: The operand is stored in memory, and the location’s address is specified directly in
R

the instruction.
Immediate mode: The operand is given explicitly in the instruction.
Y

zz
P

2.10 Self-Assessment Questions


O

A. Multiple Choice Questions


C

1. The field for the operation code can be found in __________.


a. programming language instruction b. assembly language instruction
c. machine language instruction d. none of the mentioned
2. Which of the following are the elements of a machine language instruction format?
a. Operand field b. Operation code field
c. Operation code field & operand field d. none of the mentioned Microeconomics

11
JGI JAIN
DEEMED-TO-BE UNIVERSITY
Computer Organization and Architecture

3. The one-byte instruction has a length of __________.


a. 2 bytes b. 1 byte
c. 3 bytes d. 4 bytes
4. The ‘register to register’ instruction format has a length of __________.
a. 2 bytes b. 1 byte
c. 3 bytes d. 4 bytes
5. In a computer instruction format, the R/M field provides __________.

D
a. another register b. another memory location
c. other operands d. all of the mentioned

E
6. In a machine instruction format, S-bit is the __________.

V
a. status bit b. sign bit

R
c. sign extension bit d. none of the mentioned
7. The bit that the ‘REP’ instruction uses is __________.

E
a. W-bit b. S-bit

S
c. V-bit d. Z-bit

8. The operand is of __________ if a W-bit value is ‘1’.


a. 8 bits
E
b. 4 bits
R
c. 16 bits d. 2 bits
9. In which of the following mode the operand is stored in memory, and the location’s address is
T

specified directly in the instruction?


H

a. Autodecrement mode
b. Register mode
IG

c. Absolute mode
d. Immediate mode
R

10. Instructions that transfer control to a preset address or an address specified in the instruction are
referred to as.
Y

a. sequential control flow instructions


b. control transfer instructions
P

c. sequential control flow & control transfer instructions


O

d. none of the mentioned


11. JUMP is a command that belongs to __________.
C

a. sequential control flow instructions b. control transfer instructions


c. branch instructions d. control transfer & branch instructions
12. The address mode instruction MOV AX, 0005H corresponds to __________.
a. register b. direct
c. immediate d. register relative

12
UNIT 02: Machine Instructions and Programs JGI JAIN
DEEMED-TO-BE UNIVERSITY

13. The instruction, MOV AX, 1234H is an example of __________.


a. register addressing mode b. direct addressing mode
c. immediate addressing mode d. based indexed addressing mode
14. The instruction, MOV AX, [2500H] is an example of __________.
a. immediate addressing mode b. direct addressing mode
c. indirect addressing mode d. register addressing mode
15. The instruction, MOV AX,[BX] is an example of __________.

D
a. direct addressing mode b. register addressing mode
c. register relative addressing mode d. register indirect addressing mode

E
V
B. Essay Type Questions
1. Addressing modes relate to the many methods in which an operand’s location is stated in an

R
instruction. Discuss the different types of addressing modes.
2. What do you understand by assembly language?

E
3. Peripheral devices are input or output devices that are connected to a computer. Discuss the different

S
types of peripheral devices.
4. Write the brief note on direct memory access. E
5. Discuss the shift and rotate instructions.
R
2.11 Answers AND HINTS FOR Self-Assessment Questions
T
H

A. Answers to Multiple Choice Questions


IG

Q. No. Answer
1. c. branch instructions
2. c. Operation code field & operand field
R

3. b. 1 byte
Y

4. a. 2 bytes
5. d. all of the mentioned
P

6. c. sign extension bit


O

7. d. Z-bit
8. c. 16 bits
C

9. c. Absolute mode
10. b. control transfer instructions
11. d. control transfer & branch instructions
12. c. immediate
13. c. immediate addressing mode
14. b. direct addressing mode

13
JGI JAINDEEMED-TO-BE UNIVERSITY
Computer Organization and Architecture

Q. No. Answer
15. d. register indirect addressing mode

B. Hints for Essay Type Questions


1. Different types of addressing modes are as follows:
 Register mode: The operand is the contents of a processor register; the register’s name (address)
is specified in the instruction.
Absolute mode: The operand is stored in memory, and the location’s address is specified directly

D

in the instruction. (This mode is known as Direct in various assembly languages.)

E
Refer to Section Addressing Modes
2. A programming language, gen, is made up of a comprehensive collection of such symbolic names

V
and rules for their use. A programming language, often known as an assembly language, is made
up of a comprehensive collection of such symbolic names and rules for their use. Refer to Section

R
Assembly Language

E
3. There are three types of peripherals:
Input peripherals: These devices allow users to enter data from the outside world into the

S

computer. For instance, a keyboard, a mouse, and so on.
Refer to Section Basic Input and Output Operations E
4. The speed of transmission would be improved by removing the CPU from the pipeline and allowing
R
the peripheral device to control the memory buses directly. DMA is the name for this method. The
interface transfers data to and from the memory through the memory bus in this case. A DMA
controller is responsible for data transmission between peripherals and the memory unit. Refer to
T

Section Basic Input and Output Operations


H

5. Many applications demand that the bits of an operand be shifted right or left by a certain amount
of bit positions. Whether the operand is a signed integer or some more generic binary-coded
IG

information determines the specifics of how the shifts are executed. We employ a logical shift for
generic operands. We utilise an arithmetic shift for a number, which keeps the integer’s sign. Refer
to Section Additional Instructions
R

@ 2.12 Post-Unit Reading Material


Y

https://www.studocu.com/in/document/psg-college-of-technology/computer-architecture/m-
P

zz
morris-mano-solution-manual-computer-system-architecture/10775236
O

zz https://www.educba.com/what-is-assembly-language/
C

2.13 Topics for Discussion Forums

zz Discuss with the classmates about the manner in which sequences of instructions are transferred
from memory into the processor.

14
UNIT

03

D
E
Input/Output (I/O) Organisation

V
R
E
S
Names of Sub-Units
E
Introduction to Input/Output (I/O) Organisation, Accessing I/O Devices, Interrupts: Interrupt
R
Hardware, Direct Memory Access, Buses, Standard I/O Interfaces: PCI Bus, SCSI Bus, Universal Serial
Bus (USB).
T

Overview
H

This unit begins by learning about accessing the input/output devices. Then, the unit discusses the
IG

interrupt hardware and direct memory access. Next, the unit explain the buses. Towards the end, the
unit discusses the standard i/o interfaces.
R

Learning Objectives
Y

In this unit, you will learn to:


P

aa Explain the significance of accessing the I/O devices


aa Describe the concept of hardware interrupt
O

aa Define the use of direct memory access


C

aa Explain the concept of buses


aa Describe the standard input/output interfaces
JGI JAIN
DEEMED-TO-BE UNIVERSITY
Computer Organization and Architecture

Learning Outcomes

At the end of this unit, you would:


aa Analyse the use of accessing the I/O devices
aa Assess the knowledge about the hardware interrupt
aa Explore the use of direct memory access
aa Evaluate the concept of buses

D
aa Assess the knowledge about the standard input/output interfaces

E
V
Pre-Unit Preparatory Material

R
aa http://www.cse.iitm.ac.in/~vplab/courses/comp_org/Input_Output_Organization_11.pdf

E
3.1 Introduction

S
An efficient means of communication between the central system and the outside world is provided by
a computer’s I/O subsystem. It is in charge of the computer system’s input/output processes.
E
Peripheral devices are input or output devices that are connected to a computer. These devices, which
R
are regarded to be part of a computer system, are designed to read information into or out of the
memory unit in response to commands from the CPU.
T

3.2 Accessing I/O Devices


H

A single bus configuration is a straightforward way to connect I/O devices to a computer. The bus allows
all of the devices connected to it to communicate with one another. It usually has three sets of lines for
IG

carrying address, data, and control signals. A unique set of addresses is assigned to each I/O device.
The device that recognises a certain address responds to the commands sent on the control lines when
the processor places that address on the address line. When I/O devices and memory share the same
address space, the CPU requests either a read or a write operation, and the required data is sent via the
R

data lines, the arrangement is known as memory-mapped.


Y

Any machine instruction that can access memory can be used to transport data to or from an I/O device
when memory-mapped I/O is employed. If DATAIN is the address of the keyboard’s input buffer, the
P

instruction will be executed. For example,


O

Move DATAIN, R0
The data from DATAIN is read and stored in processor register R0. Likewise, the instruction:
C

Move R0, DATAOUT


Sends the contents of register R0 to location DATAOUT, which might be a display unit’s or printer’s
output data buffer.
Memory-mapped I/O is used in the majority of computer systems. To execute I/O transfers, certain CPUs
feature specific In and Out instructions. When designing a computer system based on these CPUs, the
designer might link I/O devices to utilise the dedicated I/O address space or simply include them in the

2
UNIT 03: Input/Output (I/O) Organisation JGI JAIN
DEEMED-TO-BE UNIVERSITY

memory address space. The low-order bits of the address bus are examined by the I/O devices to decide
if they should reply. To connect an I/O device to the bus, you’ll need the following hardware. When the
device’s address appears on the address lines, the address decoder allows it to recognise it. The data
register stores the information being sent to or received by the processor. The status register includes
information about the I/O device’s functioning. The data and status registers are both connected to the
data bus and have their own addresses. The device’s interface circuit consists of the address decoder,
data and status registers, and the control circuitry necessary to coordinate I/O transfers.
I/O devices run at rates that are significantly slower than the processor. When a human operator
types characters on a keyboard, the CPU may process millions of instructions in the time between each

D
character entry. Only when a character is available in the keyboard interface’s input buffer should an
instruction to read a character from the keyboard be performed. We must also ensure that each input

E
character is only read once. Figure 1 shows accessing the I/O devices:

V
Memory-Mapped I/O (1)
Memory Address Space

R
Two Address One Address Space Two Address Space

OxFFFF... Memory

E
I/O ports

S
0 (a) (b) (c)
E
I/O Address Space
(a) Separate I/O and Memory space
R
(b) Memory-mapped I/O
(c) Hybrid
T

Figure 1: Accessing I/O Device


H

3.3 Interrupts
IG

When a process or event requires immediate attention, hardware or software emits an interrupt
signal. It notifies the processor of a high-priority task that requires the present working process to be
interrupted. One of the bus control lines in I/O devices is devoted to this function and is known as the
R

Interrupt Service Routine (ISR).


Y

3.3.1 Interrupt Hardware


P

We said that an I/O device requests an interrupt by activating the interrupt-request bus line. Almost all
computers have several I/O devices that can request an interrupt. As shown, a single interrupt-request
O

line can be utilised to service n devices. Switches to ground are used to connect all devices to the line. A
device shuts its connected switch to request an interrupt. The voltage on the interrupt-request line will
C

be equal to Vdd if all interrupt-request signals INTR1 to INTRn are inactive, that is, if all switches are
open. This is the line’s dormant state. Because the line voltage will drop to zero if one or more switches
are closed, the value of INTR is the logical OR of the requests from individual devices, that is, INTR
INTR = INTR1 + ………+INTRn
The complemented form is commonly used. INTR, to refer to the interrupt-request signal on the common
line, which is active while the voltage is low.

3
JGI JAIN
DEEMED-TO-BE UNIVERSITY
Computer Organization and Architecture

3.4 Direct Memory Access


A brief overview of how the CPU may transmit data to or from a number of external (non-memory)
devices was given during the discussion on computer architecture and the role of the Central Processing
Unit. The operation used address, data lines, and WR RD control lines to read and write to the I/O system
in the same way it did memory.
This necessitates CPU involvement and is time-consuming. Direct Memory Access (DMA) refers to an
I/O subsystem’s ability to transmit data to and from a memory subsystem without the need for a CPU.
A DMA controller is a device that controls data transfers between an I/O subsystem and a memory

D
subsystem in the same way that a CPU does. The following are the elements of DMA:
DMA controller: The DMA controller can send orders to the memory that are identical to the CPU’s

E
zz
directives. In a way, the DMA controller is a second CPU in the system, although it is dedicated to
I/O. The DMA controller, links one or more I/O ports directly to memory, with the I/O data stream

V
passing via the DMA controller faster and more effectively than through the CPU since the DMA
channel is dedicated to data transport.

R
Figure 2 shows the DMA controller:

E
S
Address bus

Data bus
Data bus
buffers
E Address bus
buffers
R
Address register
Internal bus
T

DMA select DS
Register select RS
H

Word count register


Read RD
Control
Write WR
IG

logic
Bus request BR Control register
Bus grant BG
DMA request
Interupt Interupt
R

DMA Acknowledge to I/O device


Y

Figure 2: DMA Controller


P

zz The DMA interface: Because a DMA controller has independent memory access, it adds another
layer of complexity to the I/O interface. A single transaction can be carried on a single pair of wires
O

(bus). If a DMA and a microprocessor share a signal wire to memory, a method must be in place to
determine who gets access to memory when both try at the same time.
C

zz DMA interface operation: A typical direct memory-access controller interface. In this DMA-
controlled example, the I/O ports are solely connected to the DMA controller. Signal lines are the
identical ones that connect the ports to the processor in most cases. Memory lines are similar to
regular lines, except that they are used by both the CPU and the DMA controller. The HALT and
HALT ACKNOWLEDGE lines are the two new lines. The DMA controller and the CPU are synchronised
using these lines. When the DMA controller wants to access memory, it asserts the HALT signal,
which causes the CPU to halt.

4
UNIT 03: Input/Output (I/O) Organisation JGI JAINDEEMED-TO-BE UNIVERSITY

At a later time, the CPU reacts with HALT ACKNOWLEDGE, and the DMA controller takes control of
memory. The DMA controller removes its HALT request after it completes its work, the processor
resumes its suspension, and the HALT ACKNOWLEDGE is removed. A third form of DMA request
is the dotted IMMEDIATE HALT. Due to the time it takes for the CPU to reach a condition where it
may halt operations, the processor may take several clock cycles to recognise the HALT request.
Data kept in dynamic registers that are updated during regular processing must be transferred to
status registers, otherwise the data is no longer required. The IMMEDIATE HALT line eliminates the
wait, but it comes with a slew of limitations. The IMMEDIATE HALT can only be used once or twice,
otherwise the processor may not be able to appropriately recover its state and resume the halted

D
operation. Figure 3 shows the DMA controller interface:

E
Interrupt

V
Random-access
BG memory (RAM)
CPU

R
BR
RD WR Addr Data RD WR Addr Data

E
Read Control
Write Control

S
Data Bus
E Address Bus
R
Address
select

RD WR Addr Data
T

DMA Aok.
DS
H

RS I/O
DMA
Controller Peripheral
BR
device
IG

BG DMA Request

Interrupt
R

Figure 3: DMA Controller Interface


Y

3.5 Buses
P

The Central Processing Unit (CPU), memory chips, and input/output (I/O) devices are all components
of a conventional computer system. A bus is a cable or a common channel that connects these diverse
O

subsystems. As a result, the bus allows the various components to interact with one another.
In computer terms, a bus is a conduit that allows data to move between units or devices. It usually
C

has access points, which are locations where a device may tap to join the channel. The majority of
buses are bidirectional, meaning that devices may transmit and receive data. A bus is a form of public
transportation that connects people from various locations’ devices.
It permits the addition of new peripheral devices and the mobility of peripheral devices across various
computer systems. When too many devices are linked to the same bus, however, the bus’ bandwidth
might constitute a bottleneck. A bus usually contains more than two devices or subsystems, and channels
linking only two components are sometimes referred to as ports rather than buses.

5
JGI JAIN
DEEMED-TO-BE UNIVERSITY
Computer Organization and Architecture

Figure 4 shows the bus:


Proc Men

Device 1 Device 2

Figure 4: Bus

D
Some important concepts related to buses are as follows:
zz Bus protocols: Because a bus is a communication channel used by numerous devices, rules must

E
be created to ensure that communication occurs appropriately. The regulations are referred to as
bus protocols. The width of the data bus, data transfer size, bus protocols, timing, and other factors

V
all play a role in bus architectural design. Buses are classed as synchronous or asynchronous
depending on whether or not the transactions are regulated by a clock. Parallel and serial buses are

R
distinguished by whether data bits are transmitted on parallel cables or multiplexed onto a single
wire.

E
zz Synchronous and asynchronous buses: Bus activities are synchronised with reference to a clock

S
signal in a synchronous bus. Although the bus clock is usually taken from the computer system
clock, it is frequently slower than the master clock. 66MHz buses, for example, are utilised in systems
E
with CPU speeds exceeding 500MHz. Because memory access periods are generally longer than
processor clock cycles, buses have traditionally been slower than processors. Although many people
R
refer to the cycles as a bus cycle, a bus transaction might take many clock cycles.
Figure 5 shows the for a synchronous read operation:
T
H

T1 T2 T3
IG

Clock

Status
Status signals
R

lines

Address
Stable address
Y

lines

Address
P

enable

Data
O

Valid data in
lines
Resd
cycle
C

Resd

Data
Valid data out
Write lines
Cycle

Write

Figure 5: Synchronous Read Operation

6
UNIT 03: Input/Output (I/O) Organisation JGI JAIN
DEEMED-TO-BE UNIVERSITY

There is no system clock on an asynchronous bus. Handshaking is used to ensure that data is sent correctly
between the transmitter and the receiver. The bus master places the address and control signals on the
bus before asserting a synchronisation signal in an asynchronous read operation. The slave is prompted
to become synchronised by the master’s synchronisation signal, and after it has accessed the data, it
asserts its own synchronisation signal. The synchronisation signal from the slave informs the processor
that there is valid data on the bus, which it reads. After then, the master deasserts its synchronisation
signal, signalling to the slave that the master has read the data. The slave’s synchronisation signal is
then deasserted. A complete handshake is the name for this type of synchronisation. It’s worth noting
that there’s no clock and that the beginning and conclusion of the data transfer are signalled by specific

D
synchronisation signals. A pair of Finite State Machines (FSMs) that function in such a way that one FSM
does not advance until the other FSM has achieved a specific state might be called an asynchronous

E
communication protocol. Figure 6 shows the for an asynchronous read operation:

V
t1 t2 t3

R
addr Address Valid
data Data Valid

E
RD
MA

S
MSYN
SSYN
Time
E
R
Figure 6: Asynchronous Read Operation
T

3.6 Standard I/O Interfaces


H

The processor bus is the bus that is used by the processor chip’s signals. This bus can be used to link
devices that require very high-speed connectivity to the processor, such as the main memory. Only a
IG

few gadgets can be linked in this way due to electrical issues. A second bus is generally provided by the
motherboard, which can handle extra devices.
R

The two buses are linked by a circuit, which we’ll refer to as a bridge, that converts one bus’s signals
and protocols into those of the other. The CPU sees devices attached to the extension bus as if they were
Y

connected directly to the processor’s own bus. Only the bridge circuit adds a tiny delay to data flows
between the CPU and those devices.
P

It is impossible to create a universal processor bus standard. The architecture of the CPU is intimately
linked to the topology of this bus. It is also influenced by the processor chip’s electrical properties, such
O

as its clock speed. Because the extension bus is not constrained by these constraints, it may employ
a standardised signalling method. A variety of guidelines have been created. Some have emerged by
C

accident, as a result of a commercially successful design. For example, IBM created the ISA (Industry
Standard Architecture) bus for its PC AT (Personal Computer at the Time).
Some standards have been created by collaborative efforts among industries, even among competitors,
motivated by a common desire to have interoperable goods. Organisations like the IEEE (Institute of
Electrical and Electronics Engineers), ANSI (American National Standards Institute), and international
organisations like the ISO (International Standards Organisation) have granted these standards formal
status in some circumstances.

7
JGI JAIN
DEEMED-TO-BE UNIVERSITY
Computer Organization and Architecture

3.6.1 Peripheral Component Interconnect (PCI) Bus


The PCI bus is an excellent example of a system bus that arose from a desire for standardisation. It
provides functionalities located on a processor bus bit in a defined format that is not specific to any
processor. The CPU sees devices linked to the PCI bus as if they were connected directly to the processor
bus. They are given addresses in the processor’s memory address area.
The PCI is based on a series of bus specifications that were largely utilised in IBM PCs. The 8-bit XT bus was
used in early PCs, and its signals were quite similar to Intel’s. Processors with the 80x86 instruction set.
The ISA bus was later named after the 16-bit bus used in PC At computers. The EISA bus is the expanded

D
32-bit version of the bus. The Microchannel, which is used in IBM PCs, and the NuBus, which is used in
Macintosh computers, are two more buses developed in the 1980s with comparable characteristics.

E
The PCI bus was created as a low-cost, processor-independent bus. Its design anticipated a rapidly rising

V
requirement for bus capacity to accommodate high-speed discs, graphic and video devices, as well as
multiprocessor systems’ specific demands. As a result, over a decade after its introduction in 1992, the

R
PCI is still widely used as an industry standard. A plug-and-play capability for connecting I/O devices is
an essential feature that the PCI pioneered. The user simply attaches the device interface board to the

E
bus to connect a new device. The remainder is handled by the programme.

S
Data Transfer
E
Most memory transfers in today’s computers use a burst of data rather than a single word. The reason
for this is that contemporary CPUs have cache memory built in. Data is transmitted in bursts of multiple
R
words between the cache and the main memory. The words involved in such a transfer are stored in
different memory regions at different times. When the processor (in this case, the cache controller)
requests a read operation from the main memory, the memory replies by delivering a series of data
T

words beginning at that address. Similarly, during a write operation, the processor provides a memory
address followed by a series of data words, which are written in order starting at the address. The PCI
H

was created with this manner of operation in mind. A single-word read or write operation is simply
considered as a burst of length one.
IG

Device Configuration
R

When a computer is linked to an I/O device, many steps must be taken to setup both the hardware and
the software that interfaces with it.
Y

The PCI streamlines this procedure by including a tiny configuration ROM memory in each I/O device
interface that saves information about that device. In the configuration address space, all devices’
P

configuration ROMs are accessible. When the system is powered up or reset, the PCI initialisation
software reads these ROMs. It detects if the device in question is a printer, a keyboard, an Ethernet
O

interface, or a disc controller in each scenario. It can also learn about different device features and
choices.
C

During the startup procedure, addresses are allocated to devices. This implies that devices that have
not yet been allocated an address cannot be accessible during the bus setup procedure. As a result,
the configuration address space employs a unique technique. IDSEL#, or Initialisation Device Select, is
an input signal that each device possesses. In the PC world, the PCI bus has acquired a lot of traction.
Many other systems, such as SUNs, employ it to take advantage of the large range of I/O devices that a
PCI interface may support. The PCI-processor bridge circuit is built on the processor chip itself in some
processors, such as the Compaq Alpha, simplifying system design and packaging even more.

8
UNIT 03: Input/Output (I/O) Organisation JGI JAIN
DEEMED-TO-BE UNIVERSITY

3.6.2 SCSI Bus


Small Computer System Interface is abbreviated as SCSI. It refers to the ANSI-defined X3.131 standard
bus. Devices like discs are linked to a computer via a 50-wire connection that may be up to 25 metres long
and transport data at up to 5 megabytes per second, according to the standard’s initial requirements.
The SCSI bus standard has gone through several modifications, and its data transmission capability
has almost doubled every two years. SCSI-2 and SCSI-3 have been specified, with several choices for
each. When a SCSI bus has eight data lines, it is referred to as a narrow bus, and data is sent one byte
at a time. A wide SCSI bus, on the other hand, contains 16 data lines and transmits data in 16 bits at a
time. There are also a variety of electrical signalling schemes to choose from. In the same manner that

D
devices attached to the processor bus are not part of the CPU’s address space, devices linked to the
SCSI bus are not. Through a SCSI controller, the SCSI bus is linked to the processor bus. DMA is used by

E
this controller to transport data packets from main memory to the device and vice versa. A packet can
include a data block, orders from the CPU to the device, or device status information.

V
Consider how the SCSI bus may be utilised with a disc drive to demonstrate how it works. The connection

R
with a disc drive is very different from the contact with the main memory. An initiator or a target are
the two types of controllers linked to a SCSI bus. An initiator can choose a specific target and transmit

E
orders defining the actions to be carried out. Clearly, the processor’s controller, such as the SCSI
controller, must be capable of acting as an initiator. As a target, the disc controller is used. It executes

S
the orders given to it by the initiator. The initiator creates a logical link with the intended recipient. Once
established, this connection can be interrupted and resumed as needed to send instructions and data
E
bursts. Other devices can utilise the bus to transport data when a particular connection is suspended.
One of the major aspects of the SCSI bus that contributes to its great performance is its ability to overlap
R
data transfer requests.
The target controller is always in charge of data transfers on the SCSI bus. To transmit a command
T

to a target, an initiator first requests control of the bus, then picks the controller it wishes to interact
with and gives control of the bus onto it after winning arbitration. The controller then initiates a data
H

transmission operation in order to receive the initiator’s command. The CPU issues an instruction to the
SCSI controller, which sets in motion the following series of events:
IG

1. The SCSI controller competes for control of the bus as an initiator.


2. If the initiator wins the arbitration, it chooses the target controller and gives it control of the bus.
R

3. The initiator sends a command indicating the needed read operation in response to the target
starting an output operation (from initiator to target).
Y

4. The destination, recognising that it must first execute a disc search operation, sends a message to
the initiator indicating that the connection between them will be momentarily suspended. The bus
P

is then released.
O

5. The target controller instructs the disc drive to advance the read head to the first sector in the
desired read operation. The data contained in that sector is then read and stored in a data buffer.
C

The target requests control of the bus when it is ready to begin transmitting data to the initiator.
6. The target sends the contents of the data buffer to the initiator before disconnecting the connection.
Depending on the bus width, data is transmitted in 8 or 16 bits in parallel.
7. The disc drive receives a directive from the target controller to execute another seek operation.
The contents of the second disc sector are then sent to the initiator as previously. The logical link
between the two controllers is severed at the end of these transfers.

9
JGI JAIN
DEEMED-TO-BE UNIVERSITY
Computer Organization and Architecture

8. After receiving the data, the initiator controller uses the DMA method to put it in the main memory.
9. The CPU receives an interrupt from the SCSI controller indicating that the desired operation has
been performed.

The messages exchanged over the SCSI bus are at a higher level than those transferred via the CPU bus,
as seen in this instance. A “higher level” message in this sense refers to actions that, depending on the
device, may need many stages to accomplish. The CPU and the SCSI controller do not need to be aware
of the specifics of the device involved in a data transfer. The processor is not required to participate in
the disc search procedure in the preceding case.

D
3.6.3 Universal Serial Bus (USB)

E
Computer-communications synergy is at the heart of today’s information technology revolution.

V
Keyboards, microphones, cameras, speakers, and display devices are likely to be used in a modern
computer system. A wired or wireless Internet connection is available on most PCs. A crucial necessity in

R
such an environment is the availability of a simple, low-cost method to link these devices to the computer,
and the advent of the Universal Serial Bus represents a significant recent advance in this respect (USB).

E
Low-speed (1.5 megabits/s) and full-speed (12 megabits/s) operation are supported by the USB. The most
recent edition of the bus standard (USB 2.0) added a third operating speed, known as high-speed (480

S
megabits/s). The USB is rapidly gaining commercial popularity, and with the inclusion of high-speed
capabilities, it may easily become the preferred connecting mechanism for most computer equipment.
E
The USB has been designed to meet several key objectives:
R
zz Provides a simple, low-cost, and simple-to-use connectivity solution that solves the challenges posed
by a computer’s restricted number of I/O ports.
Support a variety of data transmission characteristics for I/O devices, such as phone and Internet
T

zz
connections.
H

zz Increase user convenience by enabling “plug-and-play” operation.


IG

Port Limitation
The parallel and serial ports mentioned in the preceding section provide a general-purpose point of
R

connection for connecting a computer to a variety of low-to medium-speed devices. A normal computer
only has a couple of these ports for practical reasons.
Y

Characteristics of the Device


P

The gadgets that can be linked to a computer can perform a wide range of tasks. Data transmissions to
and from such devices are subject to a wide range of speed, volume, and timing restrictions.
O

A wide range of basic devices that may be connected to a computer produce data that is similar in
C

nature - slow and asynchronous. Computer mice, as well as video game controllers and manipulators,
are ideal examples.

Plug-and-Play
As computers grow more integrated into daily life, their existence should become more apparent. When
running a home theatre system with at least one computer, for example, the user should not be required
to switch the computer off or restart the system in order to connect or disconnect a device.

10
UNIT 03: Input/Output (I/O) Organisation JGI JAIN
DEEMED-TO-BE UNIVERSITY

A new device, such as an additional speaker, may be attached at any moment while the system is
running, thanks to the plug-and-play function. The system should immediately detect the presence of
this new device, identify the relevant device-driver software and any other facilities required to serve
it, and create the necessary addresses and logical connections to allow them to interact. The necessity
for plug-and-play has ramifications at all levels of the system, from hardware to operating system
and application software. One of the major goals of the USB design was to allow for plug-and-play
functionality.

USB Architecture

D
The necessity for an interconnection system that combines low cost, flexibility, and high data-transfer
capacity has been highlighted in the preceding debate. I/O devices may also be physically separated

E
from the computer to which they are attached. When large bandwidth is required, a broad bus carrying
8, 16, or more bits in parallel is typically used. A high number of wires, on the other hand, adds expense

V
and complexity while also being troublesome for the user.

R
Because of the data skew problem, it’s also difficult to build a broad bus that can transport data over a
long distance. With increasing distance, the number of skew rises, limiting the quantity of data that can

E
be utilised.

S
For the USB, a serial communication protocol was chosen because it meets the low-cost and flexibility
criteria. The clock and data information are combined and sent as a single signal. As a result, data skew
E
imposes no restrictions on clock frequency or distance. As a result, a high clock frequency may be used
to provide a large data transmission bandwidth. As previously stated, the USB provides three-bit rates
R
ranging from 1.5 to 480 megabits/s to accommodate the demands of various I/O devices.
The USB has the tree structure to support a large number of devices that may be added or deleted at
T

any moment, is shown in Figure 7:


H
IG

Host Computer

Root Hub
R

Hub Hub
Y
P

Hub I/O devices I/O devices I/O devices


O

I/O devices I/O devices


C

Figure 7: USB Tree Structure


A hub is a device that functions as an intermediary control point between the host and the I/O devices
at each node of the tree. A root hub at the bottom of the tree links the entire tree to the host computer.
The I/O devices being served (for example, a keyboard, an Internet connection, a speaker, or a digital
television) are represented by the tree’s leaves, which are referred to as functions in USB language. We’ll
refer to these devices as I/O devices to keep things consistent with the rest of the book.

11
JGI JAIN DEEMED-TO-BE UNIVERSITY
Computer Organization and Architecture

Conclusion 3.7 Conclusion

zz Input/output architecture is a method of regulating interaction with the outside world through a
system of input/output architecture.
zz Peripheral devices are input or output devices that are connected to a computer.
zz A single bus configuration is a straightforward way to connect I/O devices to a computer.
zz A unique set of addresses is assigned to each I/O device. The device that recognises a certain address

D
responds to the commands sent on the control lines when the processor places that address on the
address line.

E
zz Memory-mapped I/O is used in the majority of computer systems. To execute I/O transfers, certain
CPUs feature specific In and Out instructions.

V
zz When a process or event requires immediate attention, hardware or software emits an interrupt
signal. It notifies the processor to a high-priority task that requires the present working process to

R
be interrupted.

E
zz Direct Memory Access (DMA) refers to an I/O subsystem’s ability to transmit data to and from a
memory subsystem without the need of a CPU.

S
zz A DMA Controller is a device that controls data transfers between an I/O subsystem and a memory
subsystem in the same way that a CPU does. E
zz The Central Processing Unit (CPU), memory chips, and input/output (I/O) devices are all components
R
of a conventional computer system.
zz A bus is a cable or a common channel that connects these diverse subsystems.
T

zz A bus is a communication channel used by numerous devices, rules must be created to ensure that
communication occurs appropriately.
H

zz Buses are classed as synchronous or asynchronous depending on whether or not the transactions
are regulated by a clock.
IG

zz The processor bus is the bus that is used by the processor chip’s signals.
zz Another bus is generally provided by the motherboard, which can handle extra devices.
R

zz Organizations like the IEEE (Institute of Electrical and Electronics Engineers), ANSI (American
National Standards Institute), and international organisations like the ISO (International Standards
Y

Organization) have granted these standards formal status in some circumstances.


P

zz The PCI bus is an excellent example of a system bus that arose from a desire for standardisation. It
provides functionalities located on a processor bus bit in a defined format that is not specific to any
O

processor
zz SCSI bus refers to the ANSI-defined X3.131 standard bus.
C

zz DMA is used by this controller to transport data packets from main memory to the device and vice
versa.

3.8 Glossary

zz Peripheral devices: It refers to input or output devices that are connected to a computer.
zz Direct Memory Access (DMA): It refers to an I/O subsystem’s ability to transmit data to and from a
memory subsystem without the need of a CPU.
12
UNIT 03: Input/Output (I/O) Organisation JGI JAIN
DEEMED-TO-BE UNIVERSITY

zz DMA controller: A device that controls data transfers between an I/O subsystem and a memory
subsystem in the same way that a CPU does.
zz Bus: A cable or a common channel that connects these diverse subsystems.
zz Universal Serial Bus (USB): Computer-communications synergy is at the heart of today’s information
technology revolution.
zz SCSI bus: It refers to the ANSI-defined X3.131 standard bus. Devices like discs are linked to a computer
via a 50-wire connection that may be up to 25 metres long and transport data at up to 5 megabytes
per second, according to the standard’s initial requirements.

D
3.9 Self-Assessment Questions

E
V
A. Multiple Choice Questions

R
1. Input or output devices that are connected to computer are called __________.
a. Input/Output subsystem

E
b. Peripheral devices

S
c. Interfaces
d. Interrupt
2. How many types of modes of I/O Data Transfer?
E
R
a. 2 b. 3
c. 4 d. 5
T

3. Keyboard and mouse comes under?


H

a. Input peripherals B Output peripherals


c. Input-Output peripherals d. None of these
IG

4. The method which offers higher speeds of I/O transfers is __________.


a. Interrupts b. Memory mapping
R

c. Program-controlled I/O d. DMA


5. In memory-mapped I/O __________.
Y

a. The I/O devices have a separate address space


P

b. The I/O devices and the memory share the same address space
O

c. A part of the memory is specifically set aside for the I/O operation
d. The memory and I/O devices have an associated address space
C

6. The __________ circuit is basically used to extend the processor BUS to connect devices.
a. Router b. Router
c. Bridge d. None of these
7. The ISA is an architectural standard developed by __________.
a. IBM b. AT&T Labs
c. Microsoft d. Oracle

13
JGI JAIN
DEEMED-TO-BE UNIVERSITY
Computer Organization and Architecture

8. The SCSI BUS is used to connect the video devices to a processor by providing a __________.
a. Single Bus b. USB
c. SCSI d. Parallel Bus
9. Which of the following is used by the processor chip’s signals?
a. Processor bus b. SCSI bus
c. USB bus d. None of these
10. The registers of the controller are __________.

D
a. 16 bit b. 32 bit
c. 64 bit d. 128 bit

E
11. The main job of the interrupt system is to identify the __________ of the interrupt.

V
a. signal b. device

R
c. source d. peripherals
12. The hardware interrupts which can be delayed when a much high priority interrupt has occurred at

E
the same time are known as __________.

S
a. Non-maskable interrupt b. Maskable interrupt
c. Normal Interrupt d. None of these
E
13. The interrupts that are caused by software instructions are called __________.
R
a. Exception interrupts b. Normal interrupt
c. hardware interrupt. d. None of these
T

14. In Daisy Chaining Priority, the device with the highest priority is placed at the __________.
a. First position b. Last position
H

c. Can be placed anywhere d. Depend on device


IG

15. Which interrupt is unmaskable?


a. RST 5.5 b. RST 6.5
c. RST 7.5 d. Trap
R

16. Which microprocessor is designed to complete the execution of the current instruction and then to
Y

service the interrupts?


a. 8081 b. 8082
P

c. 8084 d. 8085
O

17. Open-collector type circuits are generally used for __________.


a. open-drain b. Batch processing
C

c. interrupt service lines. d. None of these


18. The Interrupt-request line is a __________ along which the device is allowed to send the interrupt
signal.
a. Data line b. control line
c. Address line d. None of these

14
UNIT 03: Input/Output (I/O) Organisation JGI JAIN
DEEMED-TO-BE UNIVERSITY

19. Which table handle stores the addresses of the interrupt handling sub-routines?
a. Interrupt-vector table b. Vector table
c. Symbol link table d. All of these
20. Interrupts initiated by an instruction is called as __________.
a. Internal b. External
c. hardware d. Software

B. Essay Type Questions

D
1. The bus allows all of the devices connected to it to communicate with one another. Discuss.

E
2. When a process or event requires immediate attention, hardware or software emits an interrupt
signal. What do you mean by hardware interrupt?

V
3. What do you understand by Direct Memory Access (DMA)?

R
4. Bus allows the various components to interact with one another. Discuss.
5. Discuss the concept of PCI bus.

E
S
3.10 Answers AND HINTS FOR Self-Assessment Questions

A. Answers to Multiple Choice Questions


E
R
Q. No. Answer
T

1. b. Peripheral fevices
2. b. 3
H

3. a. Input peripherals
IG

4. d. DMA
5. b. The I/O devices and the memory share the same address space
6. c. Bridge
R

7. a. IBM
Y

8. d. Parallel bus
9. a. Processor bus
P

10. b. 32 bit
O

11. c. source
12. b. Maskable interrupt
C

13. b. Normal interrupt


14. a. First position
15. d. Trap
16. d. 8085
17. c. interrupt service lines.

15
JGI JAIN
DEEMED-TO-BE UNIVERSITY
Computer Organization and Architecture

Q. No. Answer
18. b. control line
19. a. Interrupt-vector table
20. b. External

B. Hints for Essay Type Questions


1. It usually has three sets of lines for carrying address, data, and control signals. A unique set of
addresses is assigned to each I/O device. The device that recognises a certain address responds to

D
the commands sent on the control lines when the processor places that address on the address line.
Refer to Section Accessing I/O Devices

E
2. We said that an I/O device requests an interrupt by activating the interrupt-request bus line. Almost

V
all computers have several I/O devices that can request an interrupt. As shown, a single interrupt-
request line can be utilised to service n devices. Switches to ground are used to connect all devices to

R
the line. Refer to Section Interrupts
3. Direct Memory Access (DMA) refers to an I/O subsystem’s ability to transmit data to and from a

E
memory subsystem without the need for a CPU. A DMA Controller is a device that controls data
transfers between an I/O subsystem and a memory subsystem in the same way that a CPU does.

S
Refer to Section Direct Memory Access
E
4. In computer terms, a bus is a conduit that allows data to move between units or devices. It usually
has access points, which are locations where a device may tap to join the channel. The majority of
R
buses are bidirectional, meaning that devices may transmit and receive data. Refer to Section Buses
5. The PCI bus is an excellent example of a system bus that arose from a desire for standardisation.
T

It provides functionalities located on a processor bus bit in a defined format that is not specific to
any processor. The CPU sees devices linked to the PCI bus as if they were connected directly to the
H

processor bus. Refer to Section Standard I/O Interfaces


IG

@ 3.11 Post-Unit Reading Material

zz https://www.studocu.com/in/document/psg-college-of-technology/computer-architecture/m-
R

morris-mano-solution-manual-computer-system-architecture/10775236
https://www.pvpsiddhartha.ac.in/dep_it/lecturenotes/CSA/unit-5.pdf
Y

zz
P

3.12 Topics for Discussion Forums


O

zz Discuss about Input/Output (I/O) Organisation with your friends.


C

16
UNIT

04

D
E
Introduction to Memory System

V
R
E
S
Names of Sub-Units
E
Basic Concepts of Memory System, Semiconductor RAM Memories, Read Only Memories (ROM), Speed,
R
Size, Cost.
T

Overview
H

The unit begins by discussing about the concept of memory system. Next, the unit discusses the
semiconductor RAM memories. Then, the unit discusses the Read Only Memories (ROM). Towards the
end, the unit discusses the speed, size and cost of memories.
IG

Learning Objectives
R

In this unit, you will learn to:


Y

aa Explain the concept of computer memories


P

aa Discuss the concept of semiconductor


Explain about Random Access Memory (RAM)
O

aa

aa Describe the concept of Read Only Memory (ROM)


C

aa Discuss the speed, size, and cost of memories


JGI JAIN
DEEMED-TO-BE UNIVERSITY
Computer Organization and Architecture

Learning Outcomes

At the end of this unit, you would:


aa Evaluate the importance of computer memories
aa Analyse the concept of semiconductor
aa Assess the importance of Random Access Memory (RAM)
aa Evaluate the use of Read Only Memory (ROM)

D
aa Analyse the speed, size, and cost of memories

E
Pre-Unit Preparatory Material

V
https://www.uobabylon.edu.iq/eprints/publication_12_21274_1610.pdf

R
aa

E
4.1 Introduction

S
One of the most crucial components of a computer system is its memory. It saves the data and instructions
needed for data processing and output outcomes. Storage may be necessary for a short amount of time,
E
immediately, or for a long time. The electrical storing area for instructions and data that the processor
can read rapidly is referred to as computer memory. Computer memory is divided into two categories
R
– primary and secondary. Primary memory is directly accessed by a processor to execute instructions.
An example of primary memory is Random Access Memory (RAM). Secondary memory, such as hard
disks drives and Solid State Drive (SSD), are used to store and retrieve data from a computer. Both
T

memories are an integral part of a computer. The failure of any of the two memories stops a computer
from functioning. The storage capacity of computer memory is measured in bits, bytes, Kilobyte (KB),
H

Megabyte (MB), Gigabyte (GB), Terabyte (TB), and Petabyte (PB).


IG

4.2 Basic Concepts of Memory System


The addressing method determines the maximum size of memory that may be used in every machine.
R

A 16-bit computer, for example, that generates 16-bit addresses, may address up to 216=64K memory
locations. Machines that produce 32-bit addresses, on the other hand, may use a memory with up to
Y

232=4G memory locations. Byte-addressable computers make up the majority of today’s computers.
Figure 1 shows the block diagram of memory system:
P

Main Memory System


O

Address Data Instruction


Central Processing Unit (CPU)
C

Operational
Registers Arithmetics
and Instruction
Cache Program Logic Unit
memory Counter Sets

Control Unit

Input/Output System

Figure 1: Block Diagram of Memory System

2
UNIT 04: Introduction to Memory System JGI JAIN
DEEMED-TO-BE UNIVERSITY

The time it takes for an operation to start and finish is a valuable indicator of memory unit speed.
Memory access time is what it’s called. Another significant metric is the memory cycle time, which
is the shortest time between two memory operations. If any place can be reached for a Read or Write
operation in some defined period of time, regardless of the location’s address, the memory unit is known
as Random Access Memory (RAM). The system’s bottleneck is its memory cycle time.
Using a cache memory is one approach to minimise memory access time. Cache memory is a tiny, quick
memory that sits between the CPU and the bigger, slower main memory. Virtual memory is used to
make the actual memory appear larger. Data is addressed in a virtual address space that is as big as the
processor’s addressing capacity. However, only the active fraction of this space is mapped onto physical

D
memory locations at any one time. The remaining virtual addresses are mapped to the bulk storage
devices (such as magnetic drives) that are utilised.

E
V
4.2.1 CPU-Main Memory Connection – A Block Schematic
The Main Memory (MM) unit can be thought of as a “block box” in terms of the system. The MAR (Memory

R
Address Register) and MDR (Memory Data Register) are two CPU registers that transmit data between
the CPU and the MM (Memory Data Register). If MAR is K bits long and MDR is ‘n’ bits long, the MM unit

E
can hold up to 2k addressable locations, each of which is ‘n’ bits wide and has a word length of ‘n’ bits. n
bits of data may be exchanged between the MM and the CPU during a “memory cycle.”

S
The CPU loads the address into MAR, sets READ to 1, and sets additional control signals as needed for a
E
read operation. MDR receives the data from the MM and MFC is set to 1. For a write operation, the CPU
loads MAR and MDR appropriately, sets write to 1, and sets the other control signals appropriately. The
R
data is loaded into suitable places by the MM control circuitry, which also sets MFC to 1. The following
block diagram depicts this arrangement.
T

Connection of the Memory to the Processor


H

Some basic concepts related to memory system are as follows:


zz Memory access time: It is a valuable indicator of the memory unit’s speed. It is the amount of time
IG

that passes between the start of an operation and its completion (for example, the time that passes
between READ and MFC).
zz Memory cycle time: It’s a crucial metric for the memory system. It is the shortest time interval
R

between two consecutive memory operations (for example, the time between two consecutive READ
operations). In most cases, the cycle time is slightly longer than the access time.
Y

zz Transfer rate: The pace at which data may be moved in and out of a memory unit.
P

4.2.2 Units of Memory


O

A computer stores data internally in the form of binary numbers, 0 (OFF/low voltage) and 1 (ON/high
voltage). All the digits (0–9), alphabet (a to z or A to Z), special characters ($, %, @, * etc.) are stored in the
C

computer in the binary form. ASCII (American Standard Code for Information Interchange) assigned a
unique number or code to all the alphabet and special characters. Later on, this number/code could be
converted into the binary form to store the corresponding letter/character in the computer.
In a computer, characters are represented by a group of bits that depends upon the encoding scheme
being used. The encoding scheme is the manner of specifying the binary code for each character. There
are two types of encoding scheme: ASCII and Unicode. ASCII represents a character as a group of 8 bits
or 1 byte. Unicode represents a character as a group of 16 bits or 2 bytes. For example, if you want to store
the word “bottle” in the computer memory, then as per the ASCII scheme, it is 6 bytes (6 characters of 1

3
JGI JAIN
DEEMED-TO-BE UNIVERSITY
Computer Organization and Architecture

byte each); and as per the Unicode scheme, the same word is stored as 12 bytes (6 characters of 2 bytes
each). Apart from the byte, there are various other units of memory. The following is the description of
the units of memory in a computer:
zz Byte (B) = 8 bits
zz Kilobyte (KB) = 1,024 bytes
zz Megabyte (MB) = 1,048,576 bytes = 1024 KB
zz Gigabyte (GB) = 1,024 megabytes
Terabyte (TB) = 1,024 gigabytes

D
zz

zz Petabyte (PB) = 1,024 terabytes

E
zz Exabyte (EB) = 1,024 petabytes

V
zz Zettabyte (ZB) = 1,024 exabytes
zz Yottabyte (YB) = 1,024 zettabytes

R
zz Brontobyte = 1,024 yottabytes

E
zz Geopbyte = 1,024 brontobytes

S
4.3 Semiconductor RAM Memories
E
Any electronics assembly that employs computer processing technologies uses semiconductor memory.
Semiconductor memory is a critical electronic component for any PCB assembly based on a computer.
R
Memory cards have also become widespread for temporarily storing data, ranging from portable
flash memory cards used for file transfers to semiconductor memory cards used in cameras, mobile
T

phones, and other devices. As the need for greater and larger quantities of storage has grown, the use
of semiconductor memory has expanded, as has the size of these memory cards.
H

Many different types and methods are employed to fulfil the rising need for semiconductor memory.
New memory technologies are being launched as demand rises, and old types and technologies are
IG

being improved.
There is a multitude of memory technologies available, each with its own set of advantages and
R

disadvantages. There are many different forms of memory, including ROM, RAM, EPROM, EEPROM,
Flash memory, DRAM, SRAM, SDRAM, F-RAM, and MRAM, and new varieties are being created to
Y

increase performance. DDR3, DDR4, DDR5, and a variety of other terms are used to describe different
types of SDRAM semiconductor memory.
P

Furthermore, semiconductor devices are accessible in a variety of formats, including integrated circuits
for printed circuit boards, USB memory cards, Compact Flash cards, SD memory cards, and even solid
O

state hard drives. Semiconductor memory is even used as on-board memory in several microprocessor
processors.
C

4.3.1 Types of Semiconductor Memory


There are two primary sorts or categories of semiconductor technology that can be employed. These
memory kinds or categories distinguish the memory in terms of how it works:
zz Random Access Memory (RAM): RAM is a type of memory that is used to store data. It is a
semiconductor-based memory where the CPU or the other hardware devices can read and write

4
UNIT 04: Introduction to Memory System JGI JAIN
DEEMED-TO-BE UNIVERSITY

data. In this memory, the data is temporarily stored since it is a volatile memory. Once the system
turns off, it loses the data. As a result, RAM is used as a temporary data storage area. This form of
memory is used to store and read data multiple times.
zz Read Only Memory (ROM): A ROM is a type of semiconductor memory in which data is written once
and then is not altered. As a result, it is employed in situations where data must be kept permanently,
even when the power is turned off (many memory technologies lose data when the power is turned
off).

4.3.2 Types of RAM

D
The following are the different types of RAM:

E
zz Static Random Access Memory (SRAM): It is a type of memory that stores data in a fixed location.
The data in this type of semiconductor memory does not need to be updated dynamically, unlike

V
DRAM.

R
zz Dynamic Random Access Memory (DRAM): It is a kind of random access memory. Each bit of data
in DRAM is stored on a capacitor, and the charge level on each capacitor determines whether the

E
bit is a logical 1 or 0. However, because these capacitors can not keep their charge permanently, the
data must be updated on a regular basis. It gets its moniker as a dynamic RAM as a consequence of

S
this dynamic updating. DRAM is a kind of semiconductor memory that is often found in equipment
such as personal computers and workstations, where it serves as the computer’s primary RAM.
E
Semiconductors are often supplied as integrated circuits in the form of surface mount devices or,
less commonly, as leaded components for use in PCB assembly.
R
zz Synchronous DRAM (SDRAM): It is a kind of DRAM that is synchronised. This type of semiconductor
memory can operate at higher rates than standard DRAM. It is synchronised with the processor’s
T

clock and can maintain two sets of memory addresses open at the same time.
Ferroelectric Random Access Memory (F-RAM) is a random access memory technology that is
H

zz
quite comparable to regular DRAM. The main distinction is that it has a ferroelectric layer rather
than the more common dielectric layer, which gives it its non-volatile properties. F-RAM is a direct
IG

rival to Flash since it has a non-volatile capability.


zz Double Data Rate SDRAM (DDR SDRAM): Refers to the memory that transfers data at high speed.
R

The clock speed of DDR SDRAM ranges from 133 MHz to 2133 MHz.
zz Rambus DRAM (RDRAM): It is the fastest among all the random memory types with the data transfer
Y

speed of 1 GHz. Generally, RDRAM is used for the purpose of video memory on graphics accelerator
cards. Dynamic RDRAM is an improvement to the existing RDRAM. The RDRAM chip provides high
P

bandwidth and therefore used by workstations and servers. This memory chip places under the
RIMM (Rambus Inline Memory Module) module. In addition, the number of chips placed under the
O

module completely relies on the bus width of the RAM. RDRAM (RAM Bus DRAM) of 160 or 184 Pins
operates at 300-400 MHz.
C

zz Magnetic RAM (MRAM): It is a kind of magneto-resistive RAM. It’s a non-volatile RAM memory that
stores data using magnetic charges rather than electric charges.
zz Phase Change Random Access Memory (P-RAM) or Phase Change Memory (PCM): It is based on a
phenomena in which a chalcogenide glass changes state or phase from amorphous (high resistance)
to polycrystalline (low resistance) (low resistance). It is feasible to detect the status of a single cell
and so use this information to store data.

5
JGI JAIN
DEEMED-TO-BE UNIVERSITY
Computer Organization and Architecture

4.4 Read Only Memory (ROM)


A read-only memory, or ROM, is a form of semiconductor memory in which data is written once and
then not changed. As a result, it’s used in circumstances when data has to be saved indefinitely, even
after the power is switched off (many memory technologies lose data when the power is turned off).
As a result, this form of semiconductor memory technology is extensively employed to store programmes
and data that must survive the power down of a computer or processor. The BIOS of a computer, for
example, is stored in ROM. Data cannot be easily written to ROM, as the name indicates. Writing data
into the ROM may require specific hardware at first, depending on the ROM’s technology. Although data

D
may frequently be changed, this benefit necessitates the use of specific technology to delete the data so
that new data can be written in.

E
4.4.1 Types of Read-Only Memory (ROM)

V
Following are the different types of ROM:

R
zz Programmable Read Only Memory (PROM): It is a type of memory that can be programmed.
It’s a type of semiconductor memory that can only have data written to it once before it becomes

E
permanent. These memories are purchased in a blank state and are programmed with a PROM
programmer.

S
zz Erasable Programmable Read Only Memory (EPROM): It is a type of memory that can be erased
E
and reprogrammed. These semiconductor devices have the ability to be programmed and then
deleted at a later date. Normally, this is accomplished by exposing the semiconductor device to UV
R
radiation. To make this possible, the EPROM packaging has a circular window that allows light to
pass through to the device’s silicon. This window is usually covered with a label when the PROM is in
use, especially if the data has to be maintained for a long time.
T

Flash memory may be thought of as an advancement of EEPROM technology. Data may be written
H

to it and deleted from it in blocks, but data can only be read from individual cells.
zz Electrically Erasable Programmable Read-On Memory (EEPROM): It is a kind of electrically erasable
IG

programmable read-only memory. Data may be written and deleted on these semiconductor devices
using an electrical voltage. This is usually applied to a chip’s erase pin. EEPROM, like other forms of
PROM, keeps the memory’s contents even after the power is switched off. EEPROM, like other kinds
R

of ROM, is slower than RAM.


Y

4.5 Memory Interleaving


This method splits the memory system into a number of memory modules and organizes the addressing
P

so that consecutive words in the address space are assigned to distinct modules. Memory access requests
O

involving successive addresses will be sent to separate modules.


The average pace of obtaining words from the Main Memory can be improved since parallel access to
C

these modules is available.


Memory cells are commonly arranged in an array, with each cell capable of storing one bit of information.
Each row of cells represents a memory word, and each row’s cells are linked together by a common line
known as the word line.
Two bit lines link the cells in each column to the Sense/Write circuit. The Sense/Write circuits are linked
to the chip’s data input or output lines. The sense / write circuit receives input information and stores it
in the cells of the chosen word during a write operation.

6
UNIT 04: Introduction to Memory System JGI JAIN
DEEMED-TO-BE UNIVERSITY

4.6 Speed, Size, and Cost of Memory


Figure 2 shows the speed, size and cost of memory:

Increasing Registers Increasing Increasing


Size Speed Cost per bit
Primary Cache

D
E
Secondary Cache

V
Main Memory

R
E
Magnetic disk
Secondary Memory

S
E
Figure 2: Speed, Size, and Cost of Memory
R
4.6.1 Speed
A perfect memory would be quick, large, and cheap. SRAM chips can be used to create a very fast
T

memory. However, these chips are costly because their basic cells have six transistors, which prevents
a large number of cells from being packed onto a single chip. As a result, using SRAM chips to build a
H

large memory is impractical due to cost. Dynamic RAM chips, which have much simpler basic cells and
are thus much less expensive, are an alternative. However, such memories are much slower.
IG

4.6.2 Size
R

A very small amount of memory can be implemented directly on the processor chip at the next level of
the hierarchy. This memory, known as a processor cache, stores copies of instructions and data that are
Y

stored in a much larger external memory. Caches are frequently divided into two tiers. On the CPU chip,
there is always a primary cache. Because it competes for space on the processor chip, which must also
P

implement many other functions, this cache is tiny. The primary cache is known as the level 1 (L1) cache.
Between the main cache and the rest of the memory lies a bigger secondary cache.
O

The main memory is the next level in the hierarchy. Dynamic memory components, including as SIMMs,
DIMMs, and RIMMs, are used to create this fairly big memory. The main memory is substantially larger
C

than the cache memory, but it is a lot slower. The access time to the main memory in a typical computer
is about ten times longer than the access time to the L 1 cache.
Disk drives can store a lot of data at a low price. When compared to the semiconductor devices used to
implement the main memory, they are extremely sluggish. A hard disc drive (HDD; sometimes known
as a hard drive, hard disc, magnetic disc, or disc drive) is a storage and retrieval device for digital
data, particularly computer data. It is made up of one or more rigid (thus “hard”) fast spinning discs
(commonly referred to as platters) that are covered with magnetic material and have magnetic heads

7
JGI JAIN DEEMED-TO-BE UNIVERSITY
Computer Organization and Architecture

positioned to write and read data from the surfaces. The speed with which a software may access
memory is critical during execution.

4.6.3 Cost
Although dynamic memory units in the hundreds of megabytes range can be implemented for a
reasonable price, the size is still small when compared to the demands of large programmes with large
amounts of data. To implement large memory spaces, a solution is provided by using secondary storage,
primarily magnetic discs. Large discs can be purchased for a reasonable price and are widely used
in computer systems. They are, however, significantly slower than semiconductor memory units. So

D
Magnetic discs can provide a huge amount of cost-effective storage. Dynamic RAM technology can be
used to create a large, yet affordable main memory.

E
V
Conclusion 4.7 Conclusion

R
zz One of the most crucial components of a computer system is its memory.
The electrical storing area for instructions and data that the processor can read rapidly is referred

E
zz
to as computer memory.

S
zz Computer memory is divided into two categories – primary and secondary. Primary memory is
directly accessed by a processor to execute instructions.
zz
E
The storage capacity of computer memory is measured in bits, bytes, Kilobyte (KB), Megabyte (MB),
R
Gigabyte (GB), Terabyte (TB), and Petabyte (PB).
zz The addressing method determines the maximum size of memory that may be used in every machine.
Cache memory is a tiny, quick memory that sits between the CPU and the bigger, slower main
T

zz
memory.
H

zz The Main Memory (MM) unit can be thought of as a “block box” in terms of the system.
zz The MAR (Memory Address Register) and MDR (Memory Data Register) are two CPU registers that
IG

transmit data between the CPU and the MM (Memory Data Register).
zz Memory access time is a valuable indicator of the memory unit’s speed.
R

zz Memory cycle time is a crucial metric for the memory system.


zz The pace at which data may be moved in and out of a memory unit is called transfer rate.
Y

zz ASCII (American Standard Code for Information Interchange) assigned a unique number or code to
all the alphabet and special characters.
P

zz In a computer, characters are represented by a group of bits that depends upon the encoding scheme
O

being used.
There are two types of encoding scheme: ASCII and Unicode.
C

zz

zz ASCII represents a character as a group of 8 bits or 1 byte. Unicode represents a character as a


group of 16 bits or 2 bytes.
zz Any electronics assembly that employs computer processing technologies uses semiconductor
memory.
zz Semiconductor memory is a critical electronic component for any PCB assembly based on a
computer.

8
UNIT 04: Introduction to Memory System JGI JAIN
DEEMED-TO-BE UNIVERSITY

zz There are many different forms of memory, including ROM, RAM, EPROM, EEPROM, Flash memory,
DRAM, SRAM, SDRAM, F-RAM, and MRAM, and new varieties are being created to increase
performance.
zz RAM is a semiconductor-based memory where the CPU or the other hardware devices can read and
write data. In this memory, the data is temporarily stored since it is a volatile memory. Once the
system turns off, it loses the data.
zz A ROM is a type of semiconductor memory in which data is written once and then is not altered.
zz This method splits the memory system into a number of memory modules and organizes the

D
addressing so that consecutive words in the address space are assigned to distinct modules.
zz Memory cells are commonly arranged in an array, with each cell capable of storing one bit of

E
information.

V
4.8 Glossary

R
zz ROM: It is a form of semiconductor memory in which data is written once and then not changed.

E
zz EEPROM: It is a kind of electrically erasable programmable read-only memory.

S
zz RAM: RAM is a semiconductor-based memory where the CPU or the other hardware devices can
read and write data. In this memory, the data is temporarily stored since it is a volatile memory.
Once the system turns off, it loses the data. E
Computer memory: The electrical storing area for instructions and data that the processor can
R
zz
read rapidly.
zz Cache memory: It is a tiny, quick memory that sits between the CPU and the bigger, slower main
T

memory.
Semiconductor memory: It is a critical electronic component for any PCB assembly based on a
H

zz
computer.
IG

4.9 Self-Assessment Questions


R

A. Multiple Choice Questions


Y

1. Which of the following memory is directly accessed by a processor to execute instructions?


a. Primary memory
P

b. Secondary memory
O

c. Both a and b
d. None of these
C

2. In which of the following the computer memory is measured?


a. Kilobyte
b. Bit
c. Petabyte
d. All of these

9
JGI JAIN
DEEMED-TO-BE UNIVERSITY
Computer Organization and Architecture

3. Which of the following is a tiny, quick memory that sits between the CPU and the main memory?
a. RAM
b. ROM
c. Virtual memory
d. Cache memory
4. Which of the following is the registers of CPU?
a. MAR

D
b. MDR

E
c. MNR
d. Both a and b

V
5. The pace at which data may be moved in and out of a memory unit is called ___________.

R
a. Memory access time
b. Memory cycle time

E
c. Transfer rate

S
d. None of these
E
6. Unicode encoding scheme represents a character as a group of _____________.
a. 8 bits or 1 byte
R
b. 16 bits or 2 bytes
c. 24 bits or 3 bytes
T

d. 32 bits or 4 bytes
H

7. Which of the following type of memory that stores data in a fixed location?
a. Dynamic RAM
IG

b. Video RAM
c. Static RAM
R

d. Synchronous DRAM
8. Which of the following memory is based on a phenomena in which a chalcogenide glass changes
Y

state or phase from amorphous (high resistance) to polycrystalline (low resistance) (low resistance)?
P

a. Phase Change RAM


b. Magnetic RAM
O

c. Dynamic RAM
C

d. Synchronous DRAM
9. In which types of memory data may be written and deleted on these semiconductor devices using an
electrical voltage?
a. PROM
b. EEPROM
c. EPROM
d. None of these

10
UNIT 04: Introduction to Memory System JGI JAIN
DEEMED-TO-BE UNIVERSITY

10. ___________ cells are commonly arranged in an array, with each cell capable of storing one bit of
information.
a. Memory
b. Data
c. Load
d. Information

B. Essay Type Questions

D
1. One of the most crucial components of a computer system is its memory. Discuss.

E
2. The storage capacity of computer memory is measured in bits, bytes, Kilobyte (KB), etc. What do you
understand by units of memory?

V
3. Any electronics assembly that employs computer processing technologies uses semiconductor
memory. Discuss.

R
4. RAM is categorised in different types. Discuss the RDRAM.

E
5. ROM is categorised in different types. Discuss one of the categories of ROM, namely, EPROM.

S
4.10 Answers AND HINTS FOR Self-Assessment Questions
E
R
A. Answers to Multiple Choice Questions

Q. No. Answer
T

1. a. Primary memory
H

2. d. All of these
IG

3. d. Cache memory
4. d. Both a and b
5. c. Transfer rate
R

6. b. 16 bits or 2 bytes
Y

7. c. Static RAM
P

8. a. Phase Change RAM


9. b. EEPROM
O

10. a. Memory
C

B. Hints for Essay Type Questions


1. Memory saves the data and instructions needed for data processing and output outcomes. Storage
may be necessary for a short amount of time, immediately, or for a long time. The electrical storing
area for instructions and data that the processor can read rapidly is referred to as computer
memory.

Refer to Section Introduction

11
JGI JAINDEEMED-TO-BE UNIVERSITY
Computer Organization and Architecture

2. A computer stores data internally in the form of binary numbers, 0 (OFF/low voltage) and 1 (ON/high
voltage). All the digits (0–9), alphabet (a to z or A to Z), special characters ($, %, @, * etc.) are stored
in the computer in the binary form. ASCII (American Standard Code for Information Interchange)
assigned a unique number or code to all the alphabet and special characters.

Refer to Section Basic Concepts of Memory System


3. Semiconductor memory is a critical electronic component for any PCB assembly based on a computer.
Memory cards have also become widespread for temporarily storing data, ranging from portable
flash memory cards used for file transfers to semiconductor memory cards used in cameras, mobile

D
phones, and other devices.

Refer to Section Semiconductor RAM Memories

E
4. RDRAM is the fastest among all the random memory types with the data transfer speed of 1 GHz.

V
Generally, RDRAM is used for the purpose of video memory on graphics accelerator cards. Dynamic
RDRAM is an improvement to the existing RDRAM.

R
Refer to Section Semiconductor RAM Memories

E
5. EPROM is a type of memory that can be erased and reprogrammed. These semiconductor devices
have the ability to be programmed and then deleted at a later date. Normally, this is accomplished

S
by exposing the semiconductor device to UV radiation.

@ 4.11 Post-Unit Reading Material


E
R
zz https://www.studocu.com/in/document/psg-college-of-technology/computer-architecture/m-
morris-mano-solution-manual-computer-system-architecture/10775236
T

zz https://www.jbiet.edu.in/coursefiles/cse/HO/cse2/DLD1.pdf
H

4.12 Topics for Discussion Forums


IG

zz Do the online research and discuss about real application of memory system with your classmates.
R
Y
P
O
C

12
UNIT

05

D
Introduction to Cache &

E
V
Virtual Memory

R
E
S
Names of Sub-Units
E
Overview of Cache Memory, Mapping Functions, Replacement Algorithms, Performance
R
Considerations, Overview of Virtual Memory, Virtual Memory Organisation, Address Translation.
T

Overview
H

The unit begins by discussing the concept of cache memory. Next, the unit discusses the mapping
function and replacement algorithms. Then, the unit discusses the performance considerations.
Further the unit discusses the concept of virtual memory. This unit also discusses the virtual memory
IG

organisation. Towards the end, the unit discusses the address translation in virtual memory.
R

Learning Objectives
Y

In this unit, you will learn to:


Explain the concept of cache memory
P

aa

aa Discuss the mapping function and replacement algorithms


O

aa Explain about concept of virtual memory


C

aa Describe the concept of virtual memory organisation


aa Discuss the address translation in virtual memory
JGI JAIN
DEEMED-TO-BE UNIVERSITY
Computer Organization and Architecture

Learning Outcomes

At the end of this unit, you would:


aa Evaluate the importance of cache memory
aa Analyse the concept of mapping function and replacement algorithms
aa Assess the importance of virtual memory
aa Analyse the significance of virtual memory organisation

D
aa Evaluate the use of address translation in virtual memory

E
Pre-Unit Preparatory Material

V
aa https://www.cs.umd.edu/~meesh/cmsc411/website/proj01/cache/cache.pdf

R
5.1 Introduction

E
Cache memory is a type of memory that operates at extremely fast speeds. It’s used to boost performance

S
and synchronise with high-speed processors. Although cache memory is more expensive than main
memory or disc memory, it is less expensive than CPU registers. Cache memory is a form of memory
E
that works as a buffer between the RAM and the CPU and is highly fast. It stores frequently requested
data and instructions so that they may be accessed quickly by the CPU.
R
5.2 Cache Memory
T

A computer’s CPU can generally execute instructions and data quicker than it can acquire them from
a low-cost main memory unit. As a result, the memory cycle time becomes the system’s bottleneck.
H

Using cache memory is one technique to minimise memory access time. This is a tiny, quick memory
that sits between the CPU and the bigger, slower main memory. This is where a programme’s presently
IG

active segments and data are stored. Because address references are local, the CPU can usually find the
relevant information in the cache memory itself (cache hit) and only needs access to the main memory
infrequently (cache miss). With a large enough cache memory, cache hit rates of over 90% are possible,
R

resulting in a cost-effective increase in system performance.


Y

The usage of cache memory reduces the average time it takes to access data from the main memory.
The cache is a more compact and quicker memory that stores copies of data from frequently accessed
P

main memory locations. In a CPU, there are several distinct, independent caches that store instructions
and data. Figure 1 shows the structure of cache memory:
O
C

Cache Memory

CPU Primary Memory Secondary Memory

Figure 1: Structure of Cache Memory

2
UNIT 05: Introduction to Cache & Virtual Memory JGI JAIN
DEEMED-TO-BE UNIVERSITY

This method splits the memory system into a number of memory modules and organises the addressing
so that consecutive words in the address space are assigned to distinct modules. Memory access requests
involving successive addresses will be sent to separate modules. The average pace of obtaining words
from the Main Memory can be improved since parallel access to these modules is available.

5.2.1 Mapping Functions


Various types of mapping functions that are used by cache memory are as follows:
zz Direct Mapping zz Associative Mapping zz Set-associative Mapping

D
Direct Mapping

E
Direct mapping is the most basic method, which maps each block of main memory into only one potential

V
cache line. Assign each memory block to a specified line in the cache via direct mapping. If a memory
block filled a line before and a new block needs to be loaded, then old block gets deleted. There are two

R
elements to an address space: an index field and a tag field. The tag field is saved in the cache, while
the remaining keys are kept in the main memory. The Hit ratio is directly related to the performance of

E
direct mapping. The following formula is used to express this mapping:
i = j modulo m

S
where,
i = cache line number
E
R
j = main memory block number
m = number of lines in the cache
T

Each main memory address may be thought of as having three fields for the purposes of cache access.
Within a block of main memory, the least significant w bits indicate a unique word or byte. The address
H

in most modern computers is at the byte level. The remaining bits designate one of the main memory’s
2s blocks. These s bits are interpreted by the cache logic as a tag of s-r bits (most significant chunk)
IG

and an r-bit line field. This last field indicates one of the cache’s m=2r lines. Figure 2 shows the direct
mapping of cache memory:
R

Memory address from


Y

processor
Tag and index
Tag Index
Main
P

Main
Cache memory
memory
accessed
O

Index
if tags do
Tag Data not match
C

Read

Compare
Different
Same

Access location

Figure 2: Direct Mapping

3
JGI JAIN
DEEMED-TO-BE UNIVERSITY
Computer Organization and Architecture

Associative Mapping
The associative memory is utilised to store the content and addresses of the memory word in this form
of mapping. Any block can be placed in any cache line. This implies that the word id bits are utilised
to determine which word in the block is required, but the tag becomes the remainder of the bits. This
allows any word to be placed wherever in the cache memory. It is said to be the quickest and most
adaptable mapping method.
Figure 3 shows the associative mapping of cache memory:

D
Memory address from
Main memory accessed

E
processor
if address not in cache

V
Cache Main

R
memory

Compare with

E
all stored Address Data
addresses

S
simultaneously
E Address not
R
Address found found in cache

Access location
T

Figure 3: Associative Mapping


H

Set-associative Mapping
IG

This type of mapping is an improved version of direct mapping that eliminates the disadvantages of
direct mapping. The concern of potential thrashing in the direct mapping approach is addressed by
set associative. It does this by stating that rather than having exactly one line in the cache to which a
R

block can map, we will combine a few lines together to form a set. Then a memory block can correspond
to any one of a set’s lines. Each word in the cache can have two or more words in the main memory at
Y

the same index address, thanks to set-associative mapping. The benefits of both direct and associative
cache mapping techniques are combined in set associative cache mapping. The following formula is
P

used to express this mapping:


m=v*k
O

i = j mod v
C

where,
i = cache set number
j = main memory block number
v = number of sets
m = number of lines in the cache number of sets
k = number of lines in each set

4
UNIT 05: Introduction to Cache & Virtual Memory JGI JAINDEEMED-TO-BE UNIVERSITY

Figure 4 shows the set-associative mapping of cache memory:

Memory address from


processor
Tag Index

Line Cache

D
Index

E
Tag Data Tag Data Tag Data Tag Data

V
R
E
Compare Main
memory

S
Same accessed
if tags do
E
Access word
not match
R
Figure 4: Set-associative Mapping
T

5.2.2 Replacement Algorithms
H

Page replacement is obvious in demand paging, and there are many page replacement algorithms.
Executing an algorithm on a particular memory reference for a process and calculating the number of
IG

page faults assess an algorithm. The sequence of memory references is called reference string.
Some of the most widely used page replacement mechanisms are discussed as follows:
R

zz Random-replacement: Refers to the random-replacement policy in which the replaced page is


chosen at random. This is the memory manager randomly choosing any loaded page. Because
Y

the policy calls for selecting the page to be replaced by choosing the page from any frame with
equal probability, it uses no knowledge of the reference stream (or the locality) when it selects the
P

page frame to replace. In general, random replacement does not perform well. On most reference
streams, it causes more missing page faults than the other algorithm discussed in this section. After
O

early exploration with random replacement, it has been recognised that several other policies would
produce fewer missing page faults.
C

zz First-in-First-out: Refers to replacement algorithm that replaces the page that has been in the
memory the longest. FIFO emphasises on the interval of time of a page that has been present in the
memory instead of how much the page is being used. The advantage of FIFO is that it is simple to
implement.
A FIFO replacement algorithm is related to the time of each page when it was brought into the
memory. When there is a need for page replacement, the oldest page is chosen for the replacement.
A FIFO queue can be created that holds all pages brought in the memory. The page at the head of the

5
JGI JAIN
DEEMED-TO-BE UNIVERSITY
Computer Organization and Architecture

queue can be replaced. When a page is brought into the memory, it is inserted at the tail of the queue
of pages.
zz Belady’s Optimal Algorithm: Refers to the replacement policy which has “perfect knowledge” of the
page reference string; thus, it always chooses an optimal page to be removed from the memory. Let
the forward distance of a page p at time t be the distance from the current point in the reference
stream to the next place in the stream where the same page is referenced again. In the optimal
algorithm, the replaced page y is one that has maximal forward distance:

Yt = Max FWDt(x)

D
Since more than one page is loaded at time t, there may be more than one page that never appears
again in the reference stream (x), that is, there may be more than one loaded page with maximal

E
forward distance (Yt). In this case, Belady’s optimal algorithm chooses an arbitrary loaded page
with maximal forward distance.

V
The optimal algorithm can only be implemented if the full page reference stream is known in advance.
Since it is rare for the system to have such knowledge, the algorithm is not practically realisable.

R
Instead, its theoretical behavior is used to compare the performance of realisable algorithms with
the optimal performance.

E
Although it is usually not possible to exactly predict the page reference stream, one can sometimes

S
predict the next page with a high probability that the prediction will be correct. For example, the
conditional branch instruction at the end of a loop almost always branches back to the beginning of
E
the loop rather than exiting it. Such predictions are based on static analysis.
R
Figure 5 shows Belady’s optimal algorithm behavior:
T

Frame 2 1 3 4 2 1 3 4 2 1 3 4 5 6 7 8
H

0 2* 2 2 2 2 2 2 2 2 1* 1 1 5* 5 5 8*
IG

1 1* 1 1 1 1 3* 3 3 3 3 3 3 6* 6 6

2 3* 4* 4 4 4 4 4 4 4 4 4 4 7* 7
R
Y

Figure 5: Belady’s Optimal Algorithm Behavior


On the basis of source code or on the dynamic behavior of the programme, this analysis can produce
P

enough information to incorporate replacement “hints” in the source code. The compiler and paging
systems can then be designed to use these hints to predict the future behavior of the page reference
O

stream.
An example of Belady’s optimal algorithm with m=3 page frames is as follows:
C

Reference 3 4 3 2
2 1 2 1 4 1 3 4 5 6 7 8
Strings

Figure 5 has a row for each of the three page frames and a column for each reference in the page
stream. A table entry at row i, column j shows the page loaded at page frame i after rj has been

6
UNIT 05: Introduction to Cache & Virtual Memory JGI JAIN
DEEMED-TO-BE UNIVERSITY

referenced. The optimal algorithm will behave as shown in Figure 5, which incurs 10 page faults. An
optimal page replacement algorithm has the minimum page fault rate among all algorithms.
An optimal algorithm never suffers from Belady’s anomaly. An optimal page replacement algorithm
resides and is called OPT or MIN. It states that replace the page which will not be accessed for the
longest period of time. By using this page replacement algorithm, the lowest possible page fault rate
for a fixed number of frames occurs.
The optimal page replacement algorithm is hard to implement because it needs preinformation of
the reference string. The optimal algorithm is mainly used for comparison studies.

D
zz Least Recently Used [LRU]: Refers to the algorithm that is designed to take advantage of “normal”
programme behavior. Programmes are developed that contain loops and cause the main line of the

E
code to execute repeatedly. In the code portion of the address space, the control unit will repeatedly
access the set of pages containing these loops. This set of pages containing the code is known as the

V
locality of the process. If the loop or loops that are executed are stored in a small number of pages,
then the programme has a small code locality.

R
zz Least Frequently Used [LFU]: Refers to the algorithm that selects a page for replacement if the
page was not often used in the past. There may be more than one page that satisfies the criteria for

E
replacement, so many of the qualifying pages can be selected for replacement. Thus, an actively used
page should have larger reference count. If a programme changes the set of pages it is currently

S
using, the frequency counts will tend to cause the pages in the new locality to be replaced even
though they are currently being used. Another problem that lies with LFU is that it uses frequency
E
counts from the beginning of the page reference stream. The frequency counter is reset each time
R
a page is loaded rather than being allowed to monotonically increase throughout the execution of
the programme.
zz Not Recently Used [NRU]: Refers to the algorithm in which a resident page that has not been
T

accessed in the near past or recently is replaced. This algorithm keeps all resident pages in a circular
list. A referenced bit is set for each page whenever it is accessed. When the referenced bit is set as 1, it
H

means that the page is recently accessed, and if the referenced bit is set as 0, then the page has not
been referenced in the recent past.
IG

zz Second chance replacement: Resembles FIFO replacement algorithm. When a page has been
selected, its reference bit is checked. If it is 0, the page is replaced; else, if the reference bit is 1, the
page is given a second chance and the system moves on to select the next FIFO page. When a page
R

is brought second time, its reference bit is cleared and its arrival time is updated to the current
time. The page which is brought second time will not be replaced till all other pages are replaced
Y

(or provided a second chance). If a page is used rapidly to keep its reference bit set, it will never be
replaced. This algorithm looks for an old page that has not been used in the previous clock intervals.
P

If the reference bit is set for all pages, second chance algorithm gets transformed into pure FIFO
algorithm.
O

zz Most Frequently Used [MFU]: Assumes that the page with the smallest count was probably just
C

brought into the memory and has yet to be used and maybe referenced soon. However, neither MFU
nor LFU is very common, as the implementation of these algorithms is fairly expensive.
zz Page classes: Refer to another page replacement policy that can be implemented by using the
classes of page. For example, if the reference bit and the dirty bit for a page is considered together,
the pages can be classified into four classes:
1. (0, 0) neither referenced nor dirty
2. (0, 1) not referenced (recently) but dirty

7
JGI JAIN
DEEMED-TO-BE UNIVERSITY
Computer Organization and Architecture

3. (1, 0) referenced but clean


4. (1, 1) referenced and dirty
Thus, each page will belong to one of these four classes. The page in the lowest nonempty class will
be replaced. If there is more than one page in the lowest class, the page for replacement can be
chosen on FIFO basis or choose a page randomly among them.
zz Page Locking: Refers to the algorithm which sets the lock table entry in the page to avoid the
swapping out of page from the memory. When demand paging is used, it is sometimes necessary
to allow some of its pages to be locked in the memory. Most important is the code that selects the

D
next page to be swapped in should never be swapped out, i.e., it could never execute itself to swap
itself back in. Device I/O operations may read or write data directly at the memory locations of the

E
process; then, the pages containing those memory locations do not get swapped out. Although the
process may not reference them presently, the I/O device is referencing them. Thus, setting a lock

V
table entry in the page table prevents a page from being swapped out.

R
5.2.3 Performance Considerations

E
When the processor wants to read or write data from main memory, it first looks in the cache for a
matching item. A cache hit occurs when the CPU discovers that the memory location is in the cache,

S
and data is read from the cache. A cache miss occurs when the CPU cannot locate the memory location
in the cache. When a cache miss occurs, the cache creates a new entry and transfers data from main
E
memory, after which the request is completed using the cache’s contents.
R
Cache memory performance is usually assessed in terms of a metric known as Hit ratio.
Hit ratio = hit / (hit + miss) = no. of hits/total accesses
T

Greater cache block size, higher associativity, lower miss rate, lower miss penalty, and lower time to hit
in the cache can all help to enhance cache performance.
H

5.3 Virtual Memory


IG

A virtual or logical address is the address created by the CPU in a virtual memory system. The needed
mapping is done by a specific memory control unit, sometimes known as the memory management
R

unit, which might have a distinct physical address. According to system needs, the mapping function
can be modified during programme execution.
Y

Because of the distinction created between the logical (virtual) and physical (physical) address spaces,
the former can be as big as the CPU’s addressing capabilities, while the latter can be considerably less.
P

Only the active portion of the virtual address space is mapped to physical memory, while the rest is
transferred to the bulk storage device. If the requested information isn’t available, it is accessible and
O

execution continues since it is in the Main Memory (MM).


C

5.3.1 Virtual Memory Organisation


Early computers’ programing flexibility was severely limited due to a finite amount of main memory as
compared to programme sizes. Small main memory volumes made big programme execution difficult
and prevented flexible memory space management in the case of several co-existing processes. It was
inconvenient since programmers had to spend a lot of time figuring out how to distribute data and code
between the main memory and the auxiliary storage.

8
UNIT 05: Introduction to Cache & Virtual Memory JGI JAIN
DEEMED-TO-BE UNIVERSITY

Virtual memory gives a computer programmer an addressing space that is several times bigger than
the main memory’s physically available addressing area. Virtual addresses, which might be considered
artificial in certain ways, are used to place data and instructions in this area.
Data and instructions are stored in both the main memory and the auxiliary memory in reality (usually
disc memory). It’s done under the watchful eye of the virtual memory control system, which oversees
the real-time insertion of data based on virtual addresses. This system fetches data and instructions
requested by currently running programmes to main memory automatically (i.e. without the intervention
of a programmer). The virtual memory’s overall organisation is depicted in Figure 6:

D
E
Operating system

V
Exchange Auxiliary
Address Main
Processor store

R
translator memory
Virtual Physical (disk)
Virtual mem. address
address

E
control

S
E
Figure 6: General Scheme of the Virtual Memory

Figure 6: General Scheme of the Virtual Memory


R
The address space of virtual memory is split into pieces with predetermined sizes and identifiers that are
the sequential numbers of these fragments in the virtual memory’s collection of fragments. Depending
T

on the kind of virtual memory used, the sequences of virtual memory addresses that correspond to these
pieces are referred to as pages or segments. The number of the relevant fragment of the virtual memory
H

address space and the word or byte number in the supplied fragment make up a virtual memory address.
For modern virtual memory systems, we differentiate the following options:
IG

zz Paged virtual memory


zz Segmented virtual memory
R

zz Segmented virtual memory with paging


Y

When accessing data stored at a virtual address, address translation is used to translate the virtual
address to a physical memory address. Prior to beginning to translate, the virtual memory system
verifies whether the segment or page carrying the requested word or byte is available in primary
P

memory. Tests of page or segment descriptors in corresponding tables in the main memory are used. If
O

the test result is negative, the requested page or segment is allocated a physical address sub-space in
main memory, and it is then loaded from the auxiliary store into this address sub-space.
C

The virtual memory system then updates the page or segment descriptions in the descriptor tables and
gives the processor instruction that emitted the virtual address access to the requested word or byte.
Today, the virtual memory control system is partially realised as a hardware and software system.
Computer hardware is responsible for accessing descriptor tables and translating virtual to physical
addresses. The operating system is responsible for retrieving lost pages or segments and updating their
descriptors, although it is aided by specific memory management hardware. This hardware typically
consists of a specific functional unit for virtual memory management and special functional blocks for
virtual address translation computations.

9
JGI JAIN
DEEMED-TO-BE UNIVERSITY
Computer Organization and Architecture

Paged Virtual Memory


The virtual address of a paged memory is made up of two parts: the page number and the word (byte)
displacement (offset). The amount of words (bytes) on a page is determined by the page’s fixed size. It’s a
two-digit number. For each virtual memory address space, a page table is kept in main memory. A page
descriptor is used to characterise each page in this table. First and foremost, a page descriptor includes
the page’s current physical address. It might be a memory address or a location in the auxiliary storage.
The main memory is split into frames, which are regions of the same size as a page. The physical
address of a page is the first address of the frame that the page occupies. The auxiliary store address is

D
determined in a method that matches to the kind of memory used as the auxiliary store (usually a disk).
Figure 7 showsb virtual address translation scheme for paging:

E
virtual address

V
page # offset Page table Main memory

R
s p

page table address

E
frame 0

S
ster. r
s frame1
page table base
address
+ E
R
control bits
frame n
T

frame address for the page p


H

page descriptor
physical address
IG

r p
virtual address translation scheme for paging
Figure 7: Virtual Address Translation Scheme for Paging
R

In addition, a page descriptor has a number of control bits. They are the ones who decide on the page’s
status and kind. A page existence bit in main memory (page status), the acceptable access code, a
Y

modification registration bit, a swapping lock bit, and an operating system inclusion bit are examples
of control bits.
P

Address translation converts a virtual address of a word (byte) into a physical address in the main
O

memory. The page description is used to complete the translation. The descriptor gets found inside the
page table by linking the base page table address with the page number supplied in the virtual address
C

addition. The page status is read in the descriptor. The frame address is read from the descriptor if the
page is in the main memory. Following that comes the frame address.
The word (byte) offset from the virtual address is used to index the data. The physical address that results
is utilised to access data in the main memory. If the requested page does not exist in main memory, the
programme’s execution is halted and the “missing page” exception is thrown. The operating system
handles this exception. As a result, the missing page is transferred from the auxiliary store to the
main memory, and the address of the allocated frame is saved in the page descriptor with a control bit
adjustment. The stopped programme is then restarted, and the required word or byte is accessed.

10
UNIT 05: Introduction to Cache & Virtual Memory JGI JAIN
DEEMED-TO-BE UNIVERSITY

When a computer system has a large number of users or huge applications (tasks), each user or task
might have their own page table. In this example, the virtual memory control uses a two-level page
table. The Page Table Directory table is kept at the first level. It includes the base addresses for all of the
system’s page tables. Three fields are included in the virtual address: the page table number in the page
table directory, the requested page number, and the word (byte) offset.
Figure 8 shows the virtual address translation scheme with two-level page tables:

virtual address

D
Page table # Page # Offset

E
Page table directory Page tables

V
Page table
directory base

R
E
S
Physical
Page table base address +
E address
R
page descriptor Frame address
T
H

Figure 8: Virtual Address Translation Scheme with Two-level Page Tables


IG

Segmented Virtual Memory


Segmented memory is another form of virtual memory. Programmes are constructed using this type of
R

virtual memory based on segments defined by a programmer or a compiler. Segments have their own
distinct IDs, lengths, and address spaces.
Y

Data or instruction needs to be written in segments at successive addresses in the memory. Other users’
P

access privileges and owners have been identified by segments. Segments can be “private” for a single
person or “shared,” meaning they can be accessed by other users. Segment parameters may be modified
O

while the application is running, allowing you to specify the segment length and restrictions for mutual
access by many users on the fly.
C

The names and lengths of segments are used to arrange them in a shared virtual address space.
They can be found in either the main memory or the auxiliary store, which is often disc memory. The
operating system’s segmented memory control mechanism automatically fetches segments requested
by presently running applications and stores them in main memory.
In a system with multiple users, segmentation is a technique to extend the address space of user
programmes as well as a tool for intentional organised programme organisation with specified access
rights and segment protection.

11
JGI JAIN
DEEMED-TO-BE UNIVERSITY
Computer Organization and Architecture

Figure 9 shows different segments in a programme:

Segment 1 Segment 2 Segment 3 Segment 4 Segment 5


0
Symbol Source Constants Object Stack
table code code
4K

D
8K

E
12K

V
R
16K

E
Figure 9: Different Segments in a Programme

S
In segmentation, the virtual address is consists of two fields: one is a segment number and other is
a word displacement inside the segment. The segment table stores a description for each segment. The
E
parameters of a segment, such as control bits, segment address, segment length, and protection bits,
are determined in a descriptor. Usually, the control bit contains the main memory bit, a segment type
R
code, authorised access type code, and size extension control. The protection bits stores the segment’s
privilege code in the data protection scheme.
T

When attempting to access the contents of a segment, the privilege level of the accessing programme is
compared to the privilege level of the segment, and the access rules are verified. Access to the segment
H

is prohibited if the access rules are not followed, and the exception “segment access rules violation” is
thrown. Figure 10 shows the virtual address translation scheme with segmentation:
IG

virtual address
segment #
R

offset
Y

Segment table
P

control bits segment address length protection


O
C

Comparison
segment descriptor
in limit
+

physical address

virtual address translation scheme with segmentation


Figure 10: Virtual Address Translation Scheme with Segmentation

12
UNIT 05: Introduction to Cache & Virtual Memory JGI JAIN
DEEMED-TO-BE UNIVERSITY

Segmented Paged Virtual Memory


Segmented virtual memory with paging is the third form of virtual memory. Segments are paged in this
memory, which means they contain the number of pages specified by a programmer or compiler. In
segmented paged virtual memory, a virtual address consists of three parts, namely, a segment number,
a page number, and a word or byte offset on a segmented page.
The segment table, which includes segment descriptors, and the segment page table, which contains
descriptors of pages belonging to the segment, are used to translate virtual addresses into physical
addresses, as illustrated in Figure 11:

D
E
virtual address
segment # page # offset Main memory

V
R
Segment page
tables

E
Segment table frames
segment table base +
+

S
address

E
+
physical
R
segment descriptor address
word
(byte)
T

segment page table


page frame
base address
descriptor address
H

virtual address translation for segmentation with paging


IG

Figure 11: Virtual Address Translation for Segmentation with Paging


By indexing the base address of the segment table, a segment number is utilised to choose a segment
descriptor. This base address is usually saved in a special register (segment table address pointer
R

register) that is loaded before the application starts. The address of the page table for the specified
segment, the segment size in pages, control bits, and protection bits are all contained in a segment
Y

descriptor.
P

The existence bit in the main memory, a segment type code, a code of authorised access type, and
extension control bits are all contained in the control bits. The privilege level of the segment in the
O

general protection system is stored in the protection bits. Figure 12 shows the segment descriptor
structure in segmented virtual memory with paging:
C

Protection Segment
Control bits length Segment page table address
bits (in pages)

Figure 12: Segment Descriptor Structure in Segmented Virtual Memory with Paging

13
JGI JAIN DEEMED-TO-BE UNIVERSITY
Computer Organization and Architecture

5.3.2 Address Translation
Process always uses virtual addresses. The Memory Management Unit (MMU) (a part of CPU) does
address translation. Caches recently used translations in a Translation Lookaside Buffer (Page Table
Cache). The page tables are stored in OS’s virtual address space. The page tables are (at best) present
in the MM – One main memory reference per address translation. To translate a virtual memory
address, the MMU has to read the relevant page table entry out of memory. Figure 13 shows the address
translation table:

D
MAIN MEMORY HARD DISK
Physical Page Numbers Disk Addresses V

E
Virtual Address VPN 0

V
VPN PO VPN 1

R
PPN Disk Address

E
S
PPN PO
Physical Address VPN N
E
R
Figure 13: Address Translation Table
T

Conclusion 5.4 Conclusion


H

zz Cache Memory is a type of memory that operates at extremely fast speeds.


Cache memory is a form of memory that works as a buffer between the RAM and the CPU and is
IG

zz
highly fast
zz Various types of mapping functions that is used by cache memory are as follows:
R

 Direct Mapping
Associative Mapping
Y



 Set-associative Mapping
P

zz Page replacement is obvious in demand paging, and there are many page replacement algorithms.
Executing an algorithm on a particular memory reference for a process and calculating the number
O

of page faults assess an algorithm. The sequence of memory references is called reference string.
C

zz A virtual or logical address is the address created by the CPU in a virtual memory system.
zz Virtual memory gives a computer programmer an addressing space that is several times bigger
than the main memory's physically available addressing area.
zz Virtual addresses, which might be considered artificial in certain ways, are used to place data and
instructions in this area.
zz Depending on the kind of virtual memory used, the sequences of virtual memory addresses that
correspond to these pieces are referred to as pages or segments.

14
UNIT 05: Introduction to Cache & Virtual Memory JGI JAIN
DEEMED-TO-BE UNIVERSITY

zz The number of the relevant fragment of the virtual memory address space and the word or byte
number in the supplied fragment make up a virtual memory address.
zz For modern virtual memory systems, we differentiate the following options:
 Paged virtual memory
 Segmented virtual memory
 Segmented virtual memory with paging
zz The Memory Management Unit (MMU) (a part of CPU) does address translation.

D
5.5 Glossary

E
zz Cache memory: It is a form of memory that works as a buffer between the RAM and the CPU and is

V
highly fast.
Direct mapping: This method maps each block of main memory into only one potential cache line.

R
zz

zz Associative memory: It is utilised to store the content and addresses of the memory word in this

E
form of mapping.
Set-associative Mapping: This type of mapping is an improved version of direct mapping that

S
zz
eliminates the disadvantages of direct mapping.
zz
are larger in size than the physical main memory.
E
Virtual memory: It is a memory management technique that helps in executing programmes that
R
zz Demand paging: A set of techniques is provided by the virtual memory to execute the programme
that is not present entirely on the memory.
T

5.6 Self-Assessment Questions


H
IG

A. Multiple Choice Questions


1. Which of the following is a type of memory that operates at extremely fast speeds?
a. Virtual memory
R

b. Cache memory
Y

c. SSD
P

d. None of these
2. Which of the following is/are type(s) of mapping functions that is used by cache memory?
O

a. Direct mapping
C

b. Associative mapping
c. Set-associative mapping
d. All of these
3. Which of the following mapping method maps each block of main memory into only one potential
cache line?
a. Direct mapping
b. Associative mapping

15
JGI JAIN
DEEMED-TO-BE UNIVERSITY
Computer Organization and Architecture

c. Set-associative mapping
d. All of these
4. In which of the following page replacement algorithm refers to replacement algorithm that replaces
the page that has been in the memory the longest?
a. Random-replacement
b. First-in-First-out
c. Belady’s Optimal Algorithm

D
d. Least Recently Used
5. Which of the following is a page replacement algorithm?

E
a. Least Frequently Used

V
b. Second chance replacement
c. Page classes

R
d. All of these

E
6. Which of the following page replacement algorithm selects a page for replacement if the page was
not often used in the past?

S
a. Not Recently Used
b. Least Recently Used
c. Belady’s Optimal Algorithm
E
R
d. Least Frequently Used
7. Which of the following is the address created by the CPU in a virtual memory system?
T

a. Virtual address
H

b. Cache address
c. Physical address
IG

d. None of these
8. The virtual address of a paged memory is made up of two parts: the page number and
R

the _________________.
a. Word displacement
Y

b. Virtual address
P

c. Both a and b
O

d. None of these
9. In which of the following segments are paged in virtual memory, which means they contain the
C

number of pages specified by a programmer or compiler?


a. Paged virtual memory
b. Segmented virtual memory
c. Segmented virtual memory with paging
d. None of these

16
UNIT 05: Introduction to Cache & Virtual Memory JGI JAIN
DEEMED-TO-BE UNIVERSITY

10. Which of the following gives a computer programmer an addressing space that is several times
bigger than the main memory’s physically available addressing area?
a. Virtual memory
b. Cache memory
c. Physical memory
d. None of these

B. Essay Type Questions

D
1. Cache memory is a type of memory that operates at extremely fast speeds. Discuss.

E
2. Various types of mapping functions that is used by cache memory are direct mapping, associative
mapping, and set-associative mapping. What do you understand by direct mapping?

V
3. Page replacement is obvious in demand paging, and there are many page replacement algorithms.
Discuss the random replacement algorithm.

R
4. A virtual or logical address is the address created by the CPU in a virtual memory system. Discuss.

E
5. What do you understand by segmented virtual memory?

S
5.7 Answers AND HINTS FOR Self-Assessment Questions
E
R
A. Answers to Multiple Choice Questions

Q. No. Answer
T

1. b. Cache memory
H

2. d. All of these
3. a. Direct mapping
IG

4. b. First-in-First-out
5. c. Page classes
R

6. d. Least Frequently Used


7. a. Virtual address
Y

8. a. Word displacement
P

9. c. Segmented virtual memory with paging


10. a. Virtual memory
O

B. Hints for Essay Type Questions


C

1. A computer’s CPU can generally execute instructions and data quicker than it can acquire them from
a low-cost main memory unit. As a result, the memory cycle time becomes the system’s bottleneck.
Using cache memory is one technique to minimise memory access time. This is a tiny, quick memory
that sits between the CPU and the bigger, slower main memory. Refer to Section Cache Memory
2. Direct mapping is the most basic method, which maps each block of main memory into only one
potential cache line. Assign each memory block to a specified line in the cache via direct mapping. If

17
JGI JAINDEEMED-TO-BE UNIVERSITY
Computer Organization and Architecture

a memory block filled a line before and a new block needs to be loaded, then old block gets deleted.
Refer to Section Cache Memory
3. In random-replacement algorithm the replaced page is chosen at random. This is the memory
manager randomly choosing any loaded page. Because the policy calls for selecting the page to be
replaced by choosing the page from any frame with equal probability, it uses no knowledge of the
reference stream (or the locality) when it selects the page frame to replace. Refer to Section Cache
Memory
4. A virtual or logical address is the address created by the CPU in a virtual memory system. The
needed mapping is done by a specific memory control unit, sometimes known as the memory

D
management unit, which might have a distinct physical address. According to system needs, the
mapping function can be modified during programme execution. Refer to Section Virtual Memory

E
5. Segmented memory is another form of virtual memory. Programmes are constructed using this type

V
of virtual memory based on segments defined by a programmer or a compiler. Segments have their
own distinct IDs, lengths, and address spaces. Data or instruction needs to be written in segments at

R
successive addresses in the memory. Other users’ access privileges and owners have been identified
by segments. Refer to Section Virtual Memory

E
S
@ 5.8 Post-Unit Reading Material

zz

https://www.msuniv.ac.in/Download/Pdf/19055a11803e457
E
https://www.cmpe.boun.edu.tr/~uskudarli/courses/cmpe235/Virtual%20Memory.pdf
R
zz

5.9 Topics for Discussion Forums


T
H

zz Do the online research on cache memory and virtual memory and discuss with your classmates
about uses, advantages and disadvantages of cache memory & virtual memory.
IG
R
Y
P
O
C

18
UNIT

06

D
E
Arithmetic Operations

V
R
E
S
Names of Sub-Units
E
Introduction, Logic Gates, Flip Flops, Addition and Subtraction of Signed Numbers, Design of Fast
R
Adders, Multiplication of Positive Numbers, Signed Operand Multiplication, Fast Multiplication,
Integer Division.
T

Overview
H

This unit begins by discussing about the concept of arithmetic operations, logic gates and flip flops.
IG

Next, the unit discusses the addition and subtraction of signed numbers. Further, the unit explains the
design of fast adders and multiplication of positive numbers. Towards the end, the unit discusses the
signed operand multiplication, fast multiplication and integer division.
R
Y

Learning Objectives
P

In this unit, you will learn to:


aa Discuss the concept of arithmetic operations, logic gates and flip flops
O

aa Explain the concept of addition and subtraction of signed numbers


C

aa Describe the design of fast adders and multiplication of positive numbers


aa Explain the significance of signed operand multiplication
aa Discuss the concept fast multiplication and integer division
JGI JAIN
DEEMED-TO-BE UNIVERSITY
Computer Organization and Architecture

Learning Outcomes

At the end of this unit, you would:


aa Evaluate the concept of arithmetic operations, logic gates and flip flops
aa Assess the concept of addition and subtraction of signed numbers
aa Evaluate the importance of design of fast adders and multiplication of positive numbers
aa Determine the significance of signed operand multiplication

D
aa Explore the concept of fast multiplication and integer division

E
Pre-Unit Preparatory Material

V
aa http://flint.cs.yale.edu/cs422/doc/art-of-asm/pdf/CH09.PDF

R
6.1 Introduction

E
The arithmetic instructions in digital computers are used to modify data. Data is modified to provide the

S
results required to solve computing issues. The four basic arithmetic operations are addition, subtraction,
multiplication, and division. We can use these four operations to generate further operations if we wish.
E
R
6.2 Logic Gates
In the central processing unit, there is a distinct component called the arithmetic processing unit that
performs arithmetic operations. Arithmetic instructions are usually applied to binary or decimal data.
T

Integers and fractions are represented using fixed-point numbers. Negative numbers can be signed or
unsigned. The simplest arithmetic operation is fixed-point addition. When we wish to address an issue,
H

we follow a set of stages that are well-defined.


IG

Algorithm refers to all of these stages taken together. We provide algorithms to tackle diverse issues. A
logic gate is a component in digital circuits that serves as a building block. They carry out basic logical
operations that are essential in digital circuits. Logic gates are included in almost every electrical
R

gadget we use today. Logic gates, for example, maybe found in gadgets such as smartphones, tablets,
and memory devices.
Y

Logic gates in a circuit make choices depending on a mix of digital signals from their inputs. There
are two inputs and one output on most logic gates. Boolean algebra is used to create logic gates. Every
P

terminal is in one of two binary states at any one time: false or true. False is equal to 0 and true is equal
to 1. The binary output will vary depending on the type of logic gate and the mix of inputs. A logic gate
O

may be compared to a light switch, with the output being off in one position and on in the other. In
integrated circuits, logic gates are commonly used (IC).
C

6.2.1 Basic Logic Gates


There are seven basic logic gates are as follows: AND, OR, XOR, NOT, NAND, NOR, and XNOR gate.
zz AND gate: The AND gate gets its name from the fact that it behaves similarly to the logical “and”
operator when 0 is called “false” and 1 is called “true.” The circuit symbol and logic possibilities for
an AND gate are shown in the diagram and table below. (The input terminals are on the left in the
symbol, and the output terminal is on the right.) The output is “true” when both inputs are “true”

2
UNIT 06: Arithmetic Operations JGI JAIN
DEEMED-TO-BE UNIVERSITY

and “false” otherwise. In other words, the output is 1 only when both inputs one AND two are 1. The
truth table of AND gate is shown in Table 1:

Table 1: Truth Table of AND Gate


Input 1 Input 2 Output
0 0 0
0 1 0
1 0 0

D
1 1 1

E
zz OR gate: The OR gate takes its name from the fact that it works in the same way as the logical
inclusive "or." If any or both of the inputs are "true," the result is "true." The output is "false" if both

V
inputs are "false." To put it another way, for the output to be 1, at least one OR both of the inputs
must be 1. The truth table of OR gate is shown in Table 2:

R
Table 2: Truth Table of OR Gate

E
Input 1 Input 2 Output

S
0 0 0
0 E
1 1
R
1 0 1
1 1 1
T

zz NOT gate: A NOT gate is sometimes called a logical inverter to distinguish it from other forms of
electronic inverter devices, which have only one input. It contraries the logic state. If the input is 1,
H

then the output is 0 and if the input is 0, then the output is 1. The truth table of NOT gate is shown
in Table 3:
IG

Table 3: Truth Table of Not Gate


Input Output
R

0 1
Y

1 0

NAND gate: The NAND gate works as an AND gate followed by a NOT gate. It works in the way of the
P

zz
logical operation “AND” followed by a cancellation. The output is “false” if both inputs are “true.” If
O

not, the output is “true”. The truth table of NAND gate is shown in Table 4:

Table 4: Truth Table of NAND Gate


C

Input 1 Input 2 Output


0 0 1
0 1 1
1 0 1
1 1 0

3
JGI JAIN
DEEMED-TO-BE UNIVERSITY
Computer Organization and Architecture

zz NOR gate: The NOR gate is a mixture of OR gate followed by an inverter. Its output is "true" if both
inputs are "false." If not, the output is "false." The truth table of NOR gate is shown in Table in 5:

Table 5: Truth Table of NOR Gate


Input 1 Input 2 Output
0 0 1
0 1 1
1 0 1

D
1 1 0

XOR gate: The XOR (exclusive-OR) gate works in the similar way as the OR gate. The output is "true"

E
zz
if either 1 or high and the output is "false" if both inputs are "false" or if both inputs are "true." An
additional method to look at this circuit is to sign that the output is 1 if the inputs are different and

V
the output is 0 if the inputs are same. The XOR operation is shown in Table 6:

R
Table 6: Truth Table of XOR Gate

E
Input1 Input 2 Output
0 0 0

S
0 1 1
1
1
0E
1
1
0
R
zz XNOR gate: The XNOR (exclusive-NOR) gate is a mixture of XOR gate followed by an inverter. The
output of the XNOR gate is “true” if the inputs are same and the output of XNOR gate is “false” if the
inputs are different. The operation of XNOR gate is shown in Table 7:
T

Table 7: Truth Table of XNOR Gate


H

Input 1 Input 2 Output


IG

0 0 1
0 1 0
1 0 0
R

1 1 1
Y

6.3 Flip Flops


P

In a consecutive circuit, the current output is not simply determined by the current input but also
depends on the previous output. Flip flops are the simplest kind of consecutive circuits. A flip-flop can
O

preserve a binary state uniqueness which means it can act as 1-bit memory cell. It is a circuit that keeps
its state until it is commanded to alter it by input. Four NAND or four NOR gates can be used to make a
C

simple flip-flop. The types of flip flops are as follows:


zz RS Flip Flop zz D Flip Flop
zz JK Flip Flop zz T Flip Flop

6.3.1 RS Flip Flop


The RS Flip Flop is measured as one of the simplest consecutive logic circuits. It is a one-bit memory
bi-stable device. It requires two inputs, one is called “SET” which will set the device (output = 1) and is

4
UNIT 06: Arithmetic Operations JGI JAIN DEEMED-TO-BE UNIVERSITY

considered as S and another is known as “RESET” which will reset the device (output = 0) considered as
R. The RS stands for ET/RESET. The truth table of RS flip flop is shown in Table 8:

Table 8: Truth Table of SR Flip Flop

S R QN QN+1
0 0 0 0
0 0 1 1
0 1 0 0

D
0 1 1 0

E
1 0 0 1
1 0 1 1

V
1 1 0 -

R
1 1 1 -

E
6.3.2 JK Flip Flop

S
The JK flip flop is also known as SR flip flop having the accumulation of a clock input circuitry. The
unacceptable or prohibited output condition arises when both the inputs are set to 1 and are prohibited
E
by the addition of a clock input circuit. So, the JK flip flop has four possible input mixtures, i.e., 1, 0, “no
change” and “clasp”. The truth table of JK flip flop is shown in Table 9:
R
Table 9: Truth Table of JK Flip Flop
T

J K QN QN+1
0 0 0 0
H

0 0 1 1
IG

0 1 0 0
0 1 1 0
1 0 0 1
R

1 0 1 1
Y

1 1 0 1
1 1 1 0
P
O

6.3.3 D Flip Flop


The D flip flop is a clocked flip flop with only digital input ‘D’. For each time a D flip flop is clocked and its
C

output follows the state of ‘D’. It requires only two inputs D and CP. The truth table of D flip flop is shown
in Table 10:
Table 10: Truth Tables of D Flip Flop

Q D Q(t+1)
0 0 0
0 1 1

5
JGI JAIN
DEEMED-TO-BE UNIVERSITY
Computer Organization and Architecture

Q D Q(t+1)
1 0 0
1 1 1

6.3.4 T Flip Flop


The T flip flop is also known as the toggles flip flop. It is a variation of the JK flip flop. It is acknowledged
by connecting both inputs of a JK flip flop. The truth table of T flip flop is shown in Table 11:

D
Table 11: Truth Tables of T Flip Flop

E
T Qn Qn+1

V
0 0 0
0 1 1

R
1 0 1
1 1 0

E
S
6.4 Addition and Subtraction of Signed Numbers
The magnitude of the two integers is denoted by the letters A and B. There are eight distinct criteria
E
to consider when signed numbers are added or subtracted, depending on the sign of the numbers and
R
the operation done. First column has a list of these situations. The table’s other columns depict the
actual procedure to be carried out, together with the size of the figures. To show a negative zero, the last
column is required. In other words, the outcome of subtracting two equal integers should be +0, not -0.
T

The signed magnitude addition and subtraction is shown in Figure 1:


H

Addition: A + B ; A: Augend; B: Addend


IG

Subtraction: A - B ; A: Minuend; B: Subtrahend


Add
Subtract Magnitude
Operation Magnitude When A>B When A=B
When A<B
R

(+A) + (+B) +(A + B)


(+A) + (-B) +(A - B) -(B - A) +(A - B)
(-A) + (+B) -(A - B) +(B - A) +(A - B)
Y

(-A) + (-B) -(A + B)


(+A) - (+B) +(A - B) -(B - A) +(A - B)
(+A) - (-B) +(A + B)
P

(-A) - (+B) -(A + B)


(-A) - (-B) -(A - B) +(B - A) +(A - B)
O

Hardware Implementation Bs B Register


C

AVF Complementer M(Mode Control)


Input
E Output Parallel Adder
Carry Carry
S
As A Register Load Sum

Figure 1: Signed Magnitude of Addition and Subtraction

6
UNIT 06: Arithmetic Operations JGI JAIN
DEEMED-TO-BE UNIVERSITY

The signed 2’s complement addition and subtraction method is shown in Figure 2:

Hardware
B Register

Complementer and
V
Parallel Adder
Overflow

AC

D
Algorithm
Subtract Add

E
Minuend in AC Augend in AC
Subtrahend in B Addend in B

V
AC AC + B’+ 1 AC AC + B

R
V overflow V overflow

E
END END

S
Figure 2: Signed 2’s Complement Addition and Subtraction

6.4.1 Flowchart of Addition and Subtraction of Signed Numbers


E
R
Figure 3 shows the flowchart of addition and subtraction of signed numbers:
T

Subtract operation Add operation

Minuend in A Augend in A
H

Subtrahend in B Addend in B

=0 =1 =1 =0
IG

As Bs As Bs
As = Bs As = Bs
As = Bs As = Bs

EA A+B+1 EA A+B
AVF O
R

=0 =1 AVF E
E
A<B A>B
Y

A A =0 =0
A
P

A A+1 As O
As As
O

END
(result is in A and As)
C

Figure 3: Flowchart of Addition and Subtraction

6.5 Design of Fast Adders


The two bits to be added instantly exist in the current carry adders for every adder block. Each adder
block, on the other hand, waits for the carry from the preceding block. As a result, unless the input carry
is known, it is impossible to produce the sum and carry of any block. The ith block waits for the i-1st block

7
JGI JAIN
DEEMED-TO-BE UNIVERSITY
Computer Organization and Architecture

to supply its carry before proceeding. As a result, there will be a significant time delay due to carry
propagation delay. The design of fast adders is shown in Figure 4:

A3 B3 A2 B2 A1 B1 A0 B0

1 - bit 1 - bit 1 - bit 1 - bit


Full Adder C3 Full Adder C2 Full Adder C1 Full Adder C0
C4

D
S3 S2 S1 S0

E
Figure 4: Design of Fast Adders

V
The input signals are applied as soon as to the consistent comprehensive adder, the sum S3 is produced.
However, until carry C3 is accessible at its steady state value, the carry input C4 is not available on its

R
ultimate steady state value. Similarly, C3 is dependent on C2 and C2 is dependent on C1. As a result, the
carry must propagate through all stages for output S3 and carry C4 to reach their ultimate steady-state

E
value.

S
The circulation time is intended by multiplying for each adder block’s circulation delay by the number
of adder blocks in the circuit. For example, if each comprehensive adder stage has a 20 nanosecond
E
circulation delay, S3 will reach its final exact value after 60 (20*3) nanoseconds. If we increase the
number of steps to add additional bits, the problem becomes worse.
R
6.5.1 Carry Look-ahead Adder
T

By incorporating more complicated circuitry, a carry look-ahead adder lowers propagation latency. The
ripple carry design is properly modified in this design, reducing the carry logic across fixed groups of
H

bits of the adder to two-level logic. Let’s take a closer look at the design. An example of carry look-ahead
adder as shown in Figure 5:
IG

A
B S
R

Cin
Y

Cout
P
O

Figure 5: Carry Look-ahead Adder


C

The truth table of Carry Look-ahead Adder is shown in Table 12:

Table 12: Truth Table of Carry Look-ahead Adder

A B C C+1 Conditions
0 0 0 0 No Carry Generate
0 0 1 0
0 1 0 0

8
UNIT 06: Arithmetic Operations JGI JAIN
DEEMED-TO-BE UNIVERSITY

A B C C+1 Conditions
0 1 1 1 No Carry Propogate
1 0 0 0
1 0 1 1
1 1 0 1 Carry Generate
1 1 1 1

6.6 Multiplication

D
We observe the successive bits of the multiplier in the operation of multiplication, starting with the

E
minimum substantial bit. The multiplicand is copied down of the multiplier bit is 1, else 0s are copied
down. The numbers copied down in subsequent lines are moved to the left by one position from the

V
preceding number. After that, the numbers are put together to make the product. The sign of the
multiplicand and multiplier determines the sign of the result. If they’re the same, it’s a good indication,

R
but if they’re not, it’s a bad omen.

E
6.6.1 Signed Operand Multiplication

S
In the beginning, the multiplicand is in B and the multiplier in Q. Their corresponding signs are in Bs and
Qs respectively. We compare the signs of both A and Q and set to corresponding sign of the product since
E
a double-length product will be stored in registers A and Q. Registers A and E are vacant and the order
counter SC is a group of the number of bits of the multiplier. Subsequently an operand must be kept
R
with its sign, one bit of the word will be obtained by the sign and the magnitude will contain of n-1 bits.
The multiplier’s of minimum order bit in Qn is now being tested. The multiplicand (B) is further added
T

to the current fractional product (A) if it is 1, else it is 0. Next the register EAQ is stimulated to the right
once more to produce the new fractional product. Then it is reset to 1 and the next value is verified. The
H

procedure is repeated if it is not equal to zero, and a new partial product is created. When SC = 0, the
process is terminated. The hardware implementation of signed multiple operations is shown in Figure 6:
IG
R

B Sign B register Sequence Counter (SC)


Y

Complementer
P

and Parallel adder Qn


O
C

A Sign A register Q register Q Sign

0 E

Figure 6: Hardware Implementation of Signed Multiple Operations

9
JGI JAIN
DEEMED-TO-BE UNIVERSITY
Computer Organization and Architecture

The signed operand multiplication is represented by the flowchart as shown in Figure 7:

Multiply operation

Multiplicand in B
Multiplier in Q

As Qs Bs

D
Qs Qs Bs
A 0, E 0

E
SC n-1

V
=0 Qn =1

R
E
shr EAQ EA A+B
SC SC-1

S
=0 END
SC
=0
E (products is in AQ)
R
Figure 7: Flowchart of Signed Multiple Operations
T

6.6.2 Unsigned Operand Multiplication


H

Unsigned numbers don’t have any sign; these can hold only magnitude of the number. So, representation
of unsigned binary numbers is all positive numbers only. For example, representation of positive decimal
IG

numbers is positive by default. We always adopt that there is a positive sign symbol in front of every
number. The block diagram of unsigned operand multiplication is shown in Figure 8:
R

Multiplicand
Mn-1 M0
Y
P
O

Add Shift and Add


n-Bit Adder
Control Logic
C

Shift Right

C An-1 A0 Qn-1 Q0

Multiplier

Figure 8: Unsigned Operand Multiplication

10
UNIT 06: Arithmetic Operations JGI JAIN
DEEMED-TO-BE UNIVERSITY

The unsigned operand multiplication is represented by the flowchart as shown in Figure 9:

START

C,A 0
M Multiplicand
Q Multiplier
Count n

No Yes

D
Q0 = 1?

E
C,A A+M

V
Shift C, A, Q
Count Count = 1

R
No Yes

E
Count = 0? END Product
in A,Q

S
Figure 9: Flowchart of Unsigned Operand Multiplication

6.6.3 Fast Multiplication E


R
The fast multiplication can be accomplished in three common ways are as follows:
zz The successive multipliers successively create the partial products and add them with the earlier
stored partial products.
T

zz In the second method, high speed parallel multipliers create the partial products in a parallel way
and add them by a fast multiplier operand adder.
H

zz The third method is to use of an array of identical blocks that creates and adds the partial products
IG

concurrently.

Booth’s Algorithm
R

In the booth algorithm we can multiply the two signed binary integer’s values in 2’s complement. It is
also used to speed up the presentation of the multiplication process. It is very effective too. The booth’s
Y

algorithm multiplication is shown in Figure 10:


P

Sequence
BR Register
O

Counter
C

Complementer and parallel


adder
Qn Qn + 1

AC Register QR Register

Figure 10: Booth’s Algorithm Multiplication

11
JGI JAIN
DEEMED-TO-BE UNIVERSITY
Computer Organization and Architecture

The flowchart of Booth algorithm is shown in Figure 11:

START

A 0,Q -1 0
M Multiplicand
Q Multiplier
Count n

D
E
=10 = 01

V
Q0, Q-1

R
=11
=00

E
A A-M A A+M

S
Arithmetic Shift
Right: A, Q, Q-1
Count
E Count - 1
R
No Yes
Count = 0? END
T
H

Figure 11: Flowchart of Booth’s Algorithm


IG

6.7 Integer Division


A sequence of successive link, shift, and subtract operations is used to divide the two fixed-point binary
integers in signed magnitude arrangement with paper and pencil. Because the quotient digits are either
R

0 or 1, binary division is significantly easier than decimal division because there is no need to predict
how many times the dividend or partial residual fits inside the divisor.
Y

Figure below depicts the division process. An example of integer division is as follows:
P
O

000010101 Quotient
Divisor 1101 100010010 Dividend
C

- 1101
10000
- 1101
1110
- 1101
1 Remainder

12
UNIT 06: Arithmetic Operations JGI JAIN
DEEMED-TO-BE UNIVERSITY

The devisor is compared to the dividend’s five most important bits. We repeat the operation since the
5-bit integer is smaller than B. As the 6-bit number is greater than B, we can add a (1) in the 6th place over
the dividend on behalf of quotient bit. The divisor is now shifted to the right and subtracted from the
payout. Since, the division might have completed here and resulting in a quotient of 1 and the remaining
is equal to the partial remainder, the difference is known as a partial remainder. The process is continual
by associating a partial remainder to the divisor.
The quotient bit is equal to 1 if the partial remainder is larger than or equal to the divisor. After that, the
divisor is moved right and the partial remainder is subtracted.

D
The quotient bit is 0 if the partial residual is smaller than the divisor, and no subtraction is required.
In each case, the divisor is moved one to the right. Clearly, the result yields a quotient as well as a

E
remainder.

V
6.7.1 Hardware Implementation for Signed-Magnitude Data

R
It is suitable to change the method slightly in hardware execution for signed-magnitude data in a digital
computer. Rather than moving the divisor to the right, two dividends, or partial remainders, are shifted

E
to the left, putting the two integers in the proper relative position. Subtraction is accomplished by
multiplying A by B’s 2’s complement.

S
The relative magnitudes are shown by the end carry. The hardware requirements are the same as for
E
multiplication. With 0 put into Qn, register EAQ is now moved to the left, and the prior value of E is lost.
Figure 12 depicts an example of the suggested division procedure. The divisor is kept in the B register,
R
whereas the double-length dividend is kept in the A and Q registers. The divisor is subtracted by
accumulation its 2’s complement value and the dividend are stimulated to the left.
T

The hardware implementation of divide operations in Figure 12:


H
IG

Bs

B Register Sequence counter (SC)


R
Y

Complementer and
P

parallel adder
O

Qs (rightmost bit)
C

As Qn

E A Register Q Register

Figure 12: Hardware Implementation of Divide Operations

13
JGI JAIN
DEEMED-TO-BE UNIVERSITY
Computer Organization and Architecture

The flowchart of divide operation is shown in Figure 13:

Divide Operation

Dividend in AQ
Divisor in B

D
Divide magnitudes

E
V
Qs<-As Bs
SC<- n - 1 shl EAQ

R
E
=0 =1
E

S
EA <- A + B’ + 1

E EA <- A + B’ + 1 A <- A + B’ + 1
R
=1 =0
E
=1
T

A>B A<B
E
H

EA <- A + B EA <- A + B A<B =0 A>B


DCF <- 1 DCF <- 0
IG

EA <- A + B Qn<- 1
R

SC<- SC - 1
Y
P

=0 =0
SC
O
C

END
END (Quotient is in Q
(Divide Overflow) remainder is in A)

Figure 13: Flowchart of Divide Operation

14
UNIT 06: Arithmetic Operations JGI JAIN
DEEMED-TO-BE UNIVERSITY

Conclusion 6.8 Conclusion

zz A logic gate is a component in digital circuits that serves as a building block.


zz The AND gate behaves similarly to the logical “and” operator.
zz The XOR (exclusive-OR) gate works in the similar way as the OR gate.
zz The NOT gate sometimes called as a logical inverter.
zz The NAND gate works as an AND gate followed by a NOT gate.

D
zz The NOR gate is a mixture of OR gate followed by an inverter.

E
zz The XNOR (exclusive-NOR) gate is a mixture of XOR gate followed by an inverter.
The flip flops are the simplest kind of consecutive circuits.

V
zz

zz The RS flip flop is a one-bit memory bi-stable device.

R
zz The JK flip flop has four possible input mixtures, i.e., 1, 0, “no change” and “clasp”.
The design of fast adders is the two bits to be added instantly exist in the current carry adders for

E
zz
every adder block.

S
zz A sequence of successive link, shift, and subtract operations is used to divide the two fixed-point
binary integers.
E
R
6.9 Glossary

zz Flip-Flop: A flip-flop is a circuit that keeps its state until it is commanded to alter it by input. Four
T

NAND or four NOR gates can be used to make a simple flip-flop.


Logic Gate: The AND gate gets its name from the fact that it behaves similarly to the logical “and”
H

zz
operator.
IG

zz Multiplication: The techniques of consecutive shifts and add operation is used to multiply two fixed
point binary numbers in signed magnitude format.
zz RS Flip Flop: It is measured as one of the simplest consecutive logic circuits.
R

zz D flip flop: It is a clocked flip flop with only digital input ‘D’.
JK flip flop: It is also known as SR flip flop having the accumulation of a clock input circuitry.
Y

zz

zz T flip flop: It is also known as toggles flip flop.


P

zz Design of Fast Adders: In the design of fast adders the two bits to be added instantly exist in the
current carry adders for every adder block.
O

zz Integer division: A sequence of successive link, shifts, and subtracts operations is used to divide the
C

two fixed-point binary integers.


zz AND gate: The AND gate gets its name from the fact that it behaves similarly to the logical “and”
operator.

15
JGI JAIN
DEEMED-TO-BE UNIVERSITY
Computer Organization and Architecture

6.10 Self-Assessment Questions

A. Multiple Choice Questions


1. Which among the following circuit is used to accumulate one bit of data is called ____________.
a. Flip Flop
b. Register

D
c. Encoder
d. All of these

E
2. ____________ is considered to be the important characteristics of computers.

V
a. Precision
b. Storage

R
c. Speed

E
d. All of these

S
3. Which among the following is the normal form of S-R flip flop?
a. Single-Reset
b. Set Reset
E
R
c. Simple-Reset
d. None of these
T

4. In the arithmetic operation which is transitional terms is used in binary multiplication?


a. Operand
H

b. Mid-terms
IG

c. Partial Products
d. Multipliers
5. Which among the following is frequently called the double accuracy format?
R

a. 64-bit
Y

b. 16-bit
P

c. 32-bit
d. 128-bit
O

6. By using a single transistor which of these is used to build the digital logic gates?
C

a. AND gates
b. XOR gates
c. NOT gates
d. OR gates
7. When the set is restricted and reset is allowed in S-R flip flop then the output will be ____________.
a. Set Reset
b. Reset

16
UNIT 06: Arithmetic Operations JGI JAIN
DEEMED-TO-BE UNIVERSITY

c. No change
d. None of these
8. In the logic gates the JK flip flop has ____________ memory.
a. Random b. Permanent
c. True d. Temporary
9. Which of the following flip flop is available during the conversion of SR flip flop to JK flip flop?
a. D flip flop b. SR flip flop

D
c. T flip flop d. All of these
10. Which among the following is the application of flip flop?

E
a. Decoders b. Storage devices

V
c. Encoders d. None of these

R
B. Essay Type Questions

E
1. The arithmetic instructions in digital computers are used to modify data. What is an arithmetic
operation?

S
2. In the central processing unit, there is a distinct component called the arithmetic processing unit
that performs arithmetic operations. Determine the concept of logic gates.
E
3. In a consecutive circuit, the current output is not simply determined by the current input but also
R
depends on the previous output. Explain the types of flip flops.
4. There are eight distinct criteria to consider when signed numbers are added or subtracted. Describe
the method of addition and subtraction of signed numbers.
T

5. A sequence of successive link, shift, and subtract operations is used to divide the two fixed-point
H

binary integers. Determine the method of integer division.


IG

6.11 Answers AND HINTS FOR Self-Assessment Questions


R

A. Answers to Multiple Choice Questions


Y

Q. No. Answer
P

1. a. Flip Flop
2. d. All of these
O

3. b. Set Reset
C

4. c. Partial Products
5. a. 64-bit
6. c. NOT gates
7. b. Reset
8. c. True
9. b. SR flip flop
10. b. Storage devices

17
JGI JAINDEEMED-TO-BE UNIVERSITY
Computer Organization and Architecture

B. Hints for Essay Type Questions


1. The arithmetic instructions in digital computers are used to modify data. Data is modified to provide
the results required to solve computing issues. The four basic arithmetic operations are addition,
subtraction, multiplication, and division. We can use these four operations to generate further
operations if we wish.
Refer to Section Introduction
2. A logic gate is a component in digital circuits that serves as a building block. They carry out
basic logical operations that are essential in digital circuits. Logic gates are included in almost

D
every electrical gadget we use today. Logic gates, for example, maybe found in gadgets such as
smartphones, tablets, and memory devices.

E
Refer to Section Logic Gates

V
3. In a consecutive circuit, the current output is not simply determined by the current input but also
depends on the previous output. Flip flops are the simplest kind of consecutive circuits. A flip-flop

R
can preserve a binary state uniqueness which means it can act as 1-bit memory cell. It is a circuit
that keeps its state until it is commanded to alter it by input. Four NAND or four NOR gates can be

E
used to make a simple flip-flop.
Refer to Section Flip Flops

S
4. The magnitude of the two integers is denoted by the letters A and B. There are eight distinct criteria
E
to consider when signed numbers are added or subtracted, depending on the sign of the numbers
and the operation done.
R
Refer to Section Addition and Subtraction of Signed Numbers
5. A sequence of successive link, shift, and subtract operations is used to divide the two fixed-point
T

binary integers in signed magnitude arrangement with paper and pencil. Because the quotient
digits are either 0 or 1, binary division is significantly easier than decimal division because there is
H

no need to predict how many times the dividend or partial residual fits inside the divisor.
Refer to Section Integer Division
IG

@ 6.12 Post-Unit Reading Material


R

zz https://vignan.ac.in/subjectspg/MC119.pdf
Y

zz http://flint.cs.yale.edu/cs422/doc/art-of-asm/pdf/CH09.PDF
P

6.13 Topics for Discussion Forums


O
C

zz Discuss with your friends and classmates about the concept of arithmetic operations and its uses
with real world examples. Also, discuss about the concept of different logic gates, addition and
subtraction of signed numbers.

18
UNIT

07

D
E
Basic Processing Unit

V
R
E
S
Names of Sub-Units
E
Fundamental Concepts of Processing Unit, Execution of a Complete Instruction, Multiple Bus
R
Organization, Hardwired Control, Micro Programmed Control Unit
T

Overview
H

This unit begins by discussing the concept of the processing unit. Furthermore, the unit explains the
IG

complete execution process of instruction. Next, the unit discusses the multiple bus organisation. In
addition, the unit describe the concept of hardwired control unit. Towards the end, the unit discusses
the microprogrammed control unit.
R
Y

Learning Objectives
P

In this unit, you will learn to:


O

aa Explain the concept of the processing unit


aa Describe the execution process of instruction
C

aa Discuss the concept of multiple bus organisation


aa Explain the role of the hardwired control unit
aa Describe the concept of microprogrammed control unit
JGI JAIN
DEEMED-TO-BE UNIVERSITY
Computer Organization and Architecture

Learning Outcomes

At the end of this unit, you would:


aa Assess the concept of processing unit
aa Evaluate the execution process of instruction
aa Assess the role of multiple bus organisation
aa Appraise the fundamentals of hardwired control unit

D
aa Analyse the fundamentals of microprogrammed control unit

E
Pre-Unit Preparatory Material

V
https://miet.ac.in/assets/uploads/cs/Punit%20Mittal%20Monogram%20Control%20Unit.pdf

R
aa

E
7.1 INTRODUCTION
The processing unit of a computer is responsible for executing machine-language instructions and

S
coordinating the actions of other units. The central processing unit is another name for the processing
unit. The CPU performs all sorts of data processing activities as well as all of the computer’s key tasks.
E
It enables input and output devices to communicate with one another and carry out their functions. It
also keeps track of input data, interim outcomes from processing and commands. A typical computer
R
task consists of several steps described by a programme, which is made up of a sequence of machine
instructions. Instruction is carried out by performing a series of simpler actions.
T

7.2 FUNDAMENTAL CONCEPTS


H

A normal computational task includes a set of actions described by a programme, which is made up
of machine-language instructions. The processor fetches each instruction one by one and executes the
IG

defined operation. Until a branch or a jump instruction is encountered, instructions are fetched from
sequential memory blocks. The CPU keeps a list of the address of the next instruction to be fetched and
processed using the programme counter (PC). After retrieval of an instruction, the PC’s contents are
R

modified to point to the next instruction in the series.


The processor must conduct the following steps in order to execute an instruction:
Y

1. Fetch the contents of the memory location that the PC has pointed to. The instructions to be executed
P

that are stored at this memory location, which is then loaded into the IR. The needed action in
register transfer notation is:
O

IR ← [[PC]]
2. Increase the PC to point out the next instruction. For example, if the PC is incremented by 8, then the
C

register transfer notation is:


PC ← [PC] + 8
3. Execute the operation provided in the IR instruction.

7.3 EXECUTION OF A COMPLETE INSTRUCTION


A programme is a collection of instructions stored in the memory unit of a computer. For each instruction,
the CPU goes through a cycle of execution.

2
UNIT 07: Basic Processing Unit JGI JAIN
DEEMED-TO-BE UNIVERSITY

Each instruction cycle in a simple computer consists of the following phases:


zz Fetch the instruction from memory
zz Decode the instruction (what action should be performed)
zz Read the effective memory address
zz Carry out the instruction
Various registers are used for executing an instruction. The description of some important registers are
as follows:

D
zz Memory address registers (MAR): The System Bus address lines are linked to it. MAR register
defines the location in memory where a read or write operation will take place

E
zz Memory buffer register (MBR): The data lines of the system bus are linked to it. MBR register
contains either the memory value to be saved or the most recent value read from the memory

V
zz Program counter (PC): It stores the address of the next instruction to be fetched

R
zz Instruction register (IR): It stores the last instruction to be fetched

E
Fetch Cycle
At the start of the fetch cycle, the PC contains the location of the next instruction to execute. The steps

S
of the fetch cycle are as follows:
zz
address in the PC is transferred to it.
E
Step 1: Because the MAR is the sole register connected to the address lines of the system bus, the
R
zz Step 2: The address in MAR is placed on the address bus; the control unit then sends a Read order on
the control bus, and the result is shown on the data bus before being transferred into the memory
buffer register. To prepare for the following instruction, the programme counter PC is increased by
T

one. To save time, these two actions can be performed simultaneously.


H

zz Step 3: The MBR’s contents are copied to the IR.


IG

Decode Cycle
Once an instruction has been acquired, the following step is to fetch source operands. Source Operand
obtains indirect addressing (which may be achieved via any addressing mode, but is done here via
R

indirect addressing). Register-based operands do not need to be fetched. If the opcode is used, a similar
procedure will be required to store the result in the main memory.
Y

The steps of the decode cycle are as follows:


P

zz Step 1: The instruction address field is provided to the MAR. This is used to fetch the address of the
operand.
O

zz Step 2: The MBR is used to update the address field of the IR.
Step 3: The IR has now been assigned to the state. IR is now ready for the execution phase.
C

zz

Execute Cycle
The steps of execute cycle are as follows:
zz Step 1: The address component of IR is loaded into the MAR.
zz Step 2: The MBR is used to update the address field of the IR, allowing the reference memory location
to be read.
zz Step 3: The ALU now combines the contents of IR and MBR.

3
JGI JAIN
DEEMED-TO-BE UNIVERSITY
Computer Organization and Architecture

Figure 1 shows the instruction cycle state diagram:

Instruction Operand Operand


Fetch Fetch Store

Multiple Multiple

D
Operands Results

E
Instruction Instruction Operand Operand

V
Data
Address Operation Address Address
Operation
Calculation Decoding Calculation Calculation

R
E
Instructon Complete, Return for String
Fetch Next Instruction or Vector Data

S
E
Figure 1: Instruction Cycle State Diagram
R
7.3.1 Fetching a Word from Memory
The address of the requested information word is transferred from the CPU to the MAR. The requested
T

word’s address is moved to the main memory. Meanwhile, the CPU utilises the memory bus’s control
lines to indicate that a read operation is required.
H

Following this request, the CPU waits for a response from memory, indicating that the needed function
has been performed completely. Memory function completed (MFC) is a control signal on the memory
IG

bus that is used to achieve this. This signal is set to one by the memory to indicate that the contents of a
specific memory location have been read and are accessible on the memory bus data lines.
R

We’ll assume that when the MFC signal is set to one, the data lines’ information is loaded into MDR and
thus ready for use within the CPU. It completes the memory retrieval procedure. Figure 2 shows the
Y

connection and controlled signal for MDR:


P

Memory-bus Internal
Data Lines Processor Bus
O

MDRoutE MDRout
C

X X

MDR

X X

MDRinE MDRin

Figure 2: Connection and Controlled Signal for MDR

4
UNIT 07: Basic Processing Unit JGI JAIN
DEEMED-TO-BE UNIVERSITY

For example, the course of actions needed for the Move (R1), R2 instruction are as follows:
MAR ← [R1]
zz Start Read operation on the memory bus
zz Wait for the response of the MFC from the memory
zz Load MDR from the memory bus
zz R2 ← [MDR]

D
7.3.2 Storing a Word in Memory
A similar technique is used when writing a word into a memory region. MAR is loaded with the required

E
address. The data to be written is then loaded into MDR, followed by a write command.

V
Example: The course of actions needed for the Move R2,(R1) instruction are as follows:
zz R1 out, MAR in

R
zz R2 out, MDR in, Write

E
zz MDR outE, WMFC

S
7.3.3 Execution of a Complete Instruction

zz Fetch the instruction


E
The steps for executing an instruction Add (R3), R1 are as follows:
R
zz Fetch the first operand (the contents of the memory location pointed to by R3)
zz Perform the addition
T

zz Load the result into R1


H

The steps that are used to control the sequence for execution of the instruction Add (R3),R1 are as follows:
1. PC out, MAR in, Read, Select 4, Add, Z in
IG

2. Z out, PC in, Y in, WMF C


3. MDR out, IR in
R

4. R3 out, MAR in, Read


Y

5. R1 out, Y in, WMFC


6. MDR out, Select Y, Add, Z in
P

7. Z out, R1 in, End


O

7.3.4 Execution of Branch Instructions


C

The contents of the PC are replaced with the branch destination address, which is typically acquired by
adding an offset X to the branch instruction. The difference between the branch destination address
and the address immediately after is generally the offset X.
The steps that are used to control sequence for an unconditional branch instruction are as follows:
1. PC out, MAR in, Read, Select4, Add, Z in
2. Z out, PC in, Y in, WMF C
3. MDR out, IR in

5
JGI JAIN
DEEMED-TO-BE UNIVERSITY
Computer Organization and Architecture

4. Offset-field-of-IR out, Add, Z in


5. Z out, PC in, End

7.4 MULTIPLE BUS ORGANIZATION


During a clock cycle in a single-bus organisation, just one data item can be transmitted across the bus.
Commercial processors have numerous internal pathways that allow multiple transfers to take place in
parallel to minimise the number of steps needed.
Figure 3 shows the three-bus organisation of the data path:

D
Bus A Bus B Bus C

E
Incrementer

V
PC

R
Register
File

E
Constant 4

S
A
ALU R
B
E
R
Instruction
Decoder
T

IR
H

MDR
IG

MAR

Memory-bus Address
R

data Lines Lines


Y

Figure 3: Three Bus Organisation of Data Path


In Figure 3, a processor’s registers and ALU are connected through three buses. The register file
P

contains all general-purpose registers. The register file in this picture has three ports, according to
the illustration’s caption. Both buses A and B are equipped with outputs, allowing the contents of two
O

separate registers to be accessed concurrently. Data from bus C can be put into a third register during
the same clock cycle using port number 3.
C

The source operands are sent to the ALU’s A and B inputs, where an arithmetic or logic operation can
be performed, via Buses A and B. The outcome is transported to the final destination using bus C. If
necessary, the ALU can simply transfer one of its two input operands to bus C without modification.
R=A or R=B are the ALU control signals for such an operation. The introduction of the Incrementer
unit, which is used to increment the PC by 4, is the second innovation in Figures Using the Incrementer
instead of the main ALD to add 4 to the PC removes the requirement for a single bus organisation to do
so. The constant 4 sources at the ALU input multiplexer can still be used.

6
UNIT 07: Basic Processing Unit JGI JAIN
DEEMED-TO-BE UNIVERSITY

The steps that are used to control the sequence for execution of the instruction Add R4, R5, R6 are as
follows:
1. PCout, R=B, MARin, Read, IncPC
2. WMFC
3. MDRout, R=B, IRin
4. R4outA, R5outB, SelectA, Add, R6in, End

D
The description of the steps of instruction execution is as follows:
zz Step 1: The contents of the PC are transmitted via the ALU and loaded into the MAR using the R=B

E
control signal to initiate a memory read operation. The PC is also increased by 4 at the same time. It
is worth noting that the value placed into MAR is the PC’s original contents. The increased value is

V
put into the PC at the conclusion of each clock cycle and has no effect on MAR’s contents.
Step 2: the processor waits for MFC to arrive.

R
zz

zz Step 3: The processor stores the desired data into the MDR and then sends it to the IR.

E
zz Step 4: In a single step, the instruction is decoded and the add operation is performed.

S
7.5 HARDWIRED CONTROL UNIT
E
A hardwired control is a method of generating control signals with the help of finite state machines
(FSM). It is made in the form of a sequential logic circuit. Physically connecting elements such as gates,
R
flip flops and drums are used to create the finished circuit. As a result, it is known as a hardwired
controller.
T

Hardwired control unit utilises fixed logic circuits to understand the instructions and produce control
signals. These control signals are determined with the help of the contents of the control step counter,
H

instruction register, and condition code flags as well as the external input signals such as MFC and
interrupt requests. Figure 4 shows the hardwired control unit:
IG

CLK Control Step


Clock Counter
R
Y

External
P

Inputs
IR Decoder/
O

Encoder
Condition
C

Codes

Control Signals

Figure 4: Hardwired Control Unit

7
JGI JAIN
DEEMED-TO-BE UNIVERSITY
Computer Organization and Architecture

In Figure 4, the decoder/encoder block is a hybrid circuit that creates the needed control outputs based
on the current state of all of its inputs.
The more comprehensive block diagram is obtained by separating the decoding and encoding processes
as shown in Figure 5:

CLK
Clock
Control Step
Counter

D
E
V
Step Decoder

R
T1 T2 T3

E
S
INS1

INS2
E External Inputs
R
IR Instruction
Decoder Encoder
T

INS3 Condition Codes


H
IG

End End
R
Y

Control Signals
P

Figure 5: Detailed Block Description of Hardwired Control Unit


O

The instruction decoder reads the instruction from the IR and decodes it. If IR is an 8-bit register, the
instruction decoder will produce 28 (256) lines, one for each instruction. Each machine instruction is
C

represented by a distinct output line INS1 through INSm. Depending on the code in the IR, one of the
output lines from INS1 to INSm is set to 1 and the rest all output lines are set to 0.
Each step in the control sequence has its own signal line, which is provided by the step decoder. The
encoder takes the input from the Instruction decoders, step decoders, external inputs, and condition
codes all feed into the encoder. Now, the encoder produces the individual control signals Zin, such as Yin,
PCout, Add, and End with the help of these inputs. The equation of the Zin control signal is as follows:
Zin=T1+T6.ADD+T4.BR

8
UNIT 07: Basic Processing Unit JGI JAIN
DEEMED-TO-BE UNIVERSITY

Figure 6 shows a logic circuit that depicts how the encoder creates the Zin control signal for the processor
organisation:
Branch Add

T4 T6

T1

D
E
Zin

V
Figure 6: Creating the Zin Control Signal for the Processor Organisation

R
The logic function is implemented in this circuit. This signal is asserted for all instructions within time

E
slot T1, an Add instruction during T6, an unconditional branch instruction during T4, and so on.

S
The end signal is generated when each instruction has been completed. The equation of the End control
signal is as follows:
E
End = T7 . Add + T6.BR + (T5.N + T4. N¯). BRN + ...
R
Figure 7 shows the logic circuit of the End function:
T

Add Branch <0 Branch


H

N N

T7 T5 T4 T5
IG
R
Y
P

End
O

Figure 7: Logic Circuit of End Function


C

The advantages of the hardwired control unit are as follows:


zz Hardwired Control Unit is quick due to the usage of combinational circuits to create signals.
zz The amount of delay that can occur in the creation of control signals is dependent on the number
of gates.
zz It may be tweaked to get the fastest mode of operation.
zz Faster than a microprocessor control unit.

9
JGI JAIN
DEEMED-TO-BE UNIVERSITY
Computer Organization and Architecture

The disadvantages of the hardwired control unit are as follows:


zz As we require additional control signals to be created, the design becomes more complicated (need
of more encoders & decoders).
zz Changes to control signals are challenging since they need rearranging wires in the hardware
circuit.
zz It is tough and time-consuming to add a new feature.
zz It is difficult to evaluate and fix flaws in the initial design.
It is too expensive.

D
zz

E
7.6 MICROPROGRAMMED CONTROL UNIT
In a microprogrammed control unit, the control signals are generated by using a software technique.

V
With the help of the software, the creation of control signals may be identified. This software is saved
in the processor’s special memory, which is smaller and quicker than regular memory. The software is

R
called a microprogram, and the memory is called a microprogram memory or control store.

E
Figure 8 shows the microprogrammed control unit:

S
Instruction
Register Clock
E
R
Input from Status
and Flag Registers Address Generation
T

Select on
H

Instruction
Microprogram Memory Control
Store
IG

Buffer
R
Y

Subroutine Run for the


Decoder
P

Specific Microinstruction
O
C

Control Signals

Figure 8: Microprogrammed Control Unit


The following are the steps that show the process of the microprogram control unit:
1. Fetch and store the instruction that you want to be processed in the instruction register.
2. The instruction is fetched from the instruction register and decoded by the microinstruction address
generating component.

10
UNIT 07: Basic Processing Unit JGI JAIN
DEEMED-TO-BE UNIVERSITY

3. The microinstruction address generator produces the beginning address of the associated micro
procedure in the control store after decoding the instruction.
4. The microinstruction address generator is also responsible to loads the beginning address of the
microprogram routine into the microprogram counter. This allows the microinstruction address
generator to keep track of the addresses of the routine’s subsequent microinstructions.
5. The microinstruction address generator increases the microprogram counter in order to read the
next instruction in the control store’s micro procedure.
6. With the use of a bit in the micro routine’s last instruction, the end of the micro routine may be

D
established. The last bit is referred to as the end bit. When the end bit is set to 1, the micro procedure
has been completed successfully. The new instruction is fetched as a result of this.

E
V
7.6.1 Types of Microprogrammed Control Unit
Depending on the type of control word stored in the control memory (CM), the microprogrammed

R
control units are categorised into two categories, which are as follows:

E
zz Horizontal microprogrammed control unit: The control signals are provided in a 1 bit/CS decoded
binary format. For example, if the CPU has 53 control signals, then 53 bits are necessary. At any

S
given moment, more than one control signal can be activated. Longer control words are supported.
It is employed in applications that require parallel processing. It provides for a greater amount of
E
parallelism. When the degree is set to n, n CS is activated at the same time. There is no need for any
R
additional gear (decoders). It indicates that it is quicker than Vertical Microprogrammed. It is more
adaptable than vertically microprogrammed systems.
Vertical microprogrammed control unit: The encoded binary format is used to represent the
T

zz
control signals. Log2(N) bits are required for N control signals. Shorter control words are supported.
It is more adaptable since it facilitates the addition of new control signals. It supports a low
H

degree of parallelism, i.e., the degree of parallelism might be 0 or 1. It is slower than horizontal
microprogrammed because it requires additional hardware (decoders) to create control signals. It
IG

has less flexibility than a horizontal control unit, but it is more adaptable than a hardwired control
unit.
R

7.6.2 Advantages and Disadvantages of Microprogrammed Control Unit


Y

The following are some of the advantages of microprogrammed control unit:


P

zz It allows for a more methodical control unit design.


It is easier to troubleshoot and modify.
O

zz

zz It can keep the control function’s fundamental structure.


C

zz It can make the control unit is design easier. As a result, it is less costly and less prone to errors.
zz It can design in a methodical and ordered manner.
zz It is used to control software-based operations rather than hardware-based functions.
zz It is more adaptable.
zz It is used to do complicated functions with ease.

11
JGI JAIN DEEMED-TO-BE UNIVERSITY
Computer Organization and Architecture

The following are some of the downsides of microprogrammed control unit:


zz Adaptability comes at a higher price.
zz It is slower than a hardwired control unit.

Conclusion 7.7 CONCLUSION

zz The processing unit of a computer is responsible for executing machine-language instructions and
coordinating the actions of other units.

D
zz The CPU performs all sorts of data processing activities as well as all of the computer’s key tasks.

E
zz A normal computational task includes a set of actions described by a programme, which is made up
of machine-language instructions.

V
zz The processor fetches each instruction one by one and executes the defined operation.

R
zz A programme is a collection of instructions stored in the memory unit of a computer.

E
zz The address of the requested information word is transferred from the CPU to the MAR.
A similar technique is used when writing a word into a memory region. MAR is loaded with the

S
zz
required address. The data to be written is then loaded into MDR, followed by a write command.
zz
bus.
E
During a clock cycle in a single-bus organisation, just one data item can be transmitted across the
R
zz Commercial processors have numerous internal pathways that allow multiple transfers to take
place in parallel to minimise the number of steps needed.
T

zz A hardwired control is a method of generating control signals with the help of finite state machines
(FSM).
H

zz Hardwired control unit utilises fixed logic circuits to understand the instructions and produce
IG

control signals.
zz In a microprogrammed control unit, the control signals are generated by using a software technique.
R

7.8 GLOSSARY
Y

zz Processing unit: It is responsible for executing machine-language instructions and coordinating


P

the actions of other units.


Programme: It is a collection of instructions stored in the memory unit of a computer.
O

zz

zz Memory address registers (MAR): It defines the location in memory where a read or write operation
C

will take place.


zz Memory buffer register (MBR): It contains either the memory value to be saved or the most recent
value read from the memory.
zz Program counter (PC): It stores the address of the next instruction to be fetched.
zz Instruction register (IR): It stores the last instruction to be fetched.
zz Hardwired control unit: It is a method of generating control signals with the help of finite state
machines (FSM).

12
UNIT 07: Basic Processing Unit JGI JAIN
DEEMED-TO-BE UNIVERSITY

7.9 SELF-ASSESSMENT QUESTIONS

A. Multiple Choice Questions


1. Which of the following is responsible for executing machine-language instructions and coordinating
the actions of other units?
a. Processing unit
b. Data unit

D
c. Data storage

E
d. Memory unit

V
2. Which of the following contains either the memory value to be saved or the most recent value read
from the memory?

R
a. MAR
b. MBR

E
c. PC

S
d. IR
3. E
Which of the following performs all sorts of data processing activities as well as all of the computer’s
key tasks?
R
a. Control unit
b. CPU
T

c. Functional unit
H

d. Memory unit
4. A normal computational task includes a set of actions described by a programme, which is made up
IG

of __________ instructions.
a. High-level language
b. Machine language
R

c. Processor
Y

d. Raw
5. Which of the following is a collection of instructions stored in the memory unit of a computer?
P

a. Logic gates
O

b. Data
C

c. Information
d. Programme
6. The system bus address lines are linked to __________.
a. MAR
b. MBR
c. PC
d. IR

13
JGI JAIN DEEMED-TO-BE UNIVERSITY
Computer Organization and Architecture

7. Which of the following utilises fixed logic circuits to understand the instructions and produce control
signals?
a. Data unit
b. Microprogrammed control unit
c. Hardwired control unit
d. Multiple bus organisation
8. Which of the following is not a disadvantage of hardwired control unit?

D
a. It usage the combinational circuits to create signals.
b. Changes to control signals are challenging since they need rearranging wires in the hardware

E
circuit.

V
c. It is difficult to evaluate and fix flaws in the initial design.
d. It is too expensive.

R
9. In which of the following the control signals are generated by using a software technique?

E
a. Data unit
b. Microprogrammed control unit

S
c. Hardwired control unit
d. Multiple bus organisation
E
R
10. The CPU keeps a list of the address of the next instruction to be fetched and processed using
the __________.
a. MAR b. MBR
T

c. PC d. IR
H

B. Essay Type Questions


IG

1. The processing unit of a computer is responsible for executing machine-language instructions and
coordinating the actions of other units. Discuss.
2. Discuss the execution process of an instruction.
R

3. What do you understand by multiple bus organization?


Y

4. A hardwired control is a method of generating control signals with the help of finite state machines
(FSM). Discuss.
P

5. In a microprogrammed control unit, the control signals are generated by using a software technique.
O

Discuss.
C

7.10 ANSWERS AND HINTS FOR SELF-ASSESSMENT QUESTIONS

A. Answers to Multiple Choice Questions

Q. No. Answer
1. a. Processing unit

14
UNIT 07: Basic Processing Unit JGI JAIN
DEEMED-TO-BE UNIVERSITY

Q. No. Answer
2. b. MBR

3. b. CPU

4. b. Machine language

5. d. Programme

6. a. MAR

D
7. c. Hardwired control unit

E
8. a. It usage the combinational circuits to create signals.

V
9. b. Microprogrammed control unit

R
10. c. PC

E
B. Hints for Essay Type Questions
1. The central processing unit is another name for the processing unit. The CPU performs all sorts

S
of data processing activities as well as all of the computer’s key tasks. It enables input and output
devices to communicate with one another and carry out their functions.
E
Refer to Section Introduction
R
2. A programme is a collection of instructions stored in the memory unit of a computer. For each
instruction, the CPU goes through a cycle of execution. Each instruction cycle in a simple computer
consists of the following phases:
T

 Fetch the instruction from memory


H

 Decode the instruction (what action should be performed)


Read the effective memory address
IG



 Carry out the instruction


Refer to Section Execution of a Complete Instruction
R

3. During a clock cycle in a single-bus organisation, just one data item can be transmitted across the
bus. Commercial processors have numerous internal pathways that allow multiple transfers to take
Y

place in parallel to minimise the number of steps needed.


Refer to Section Multiple Bus Organization
P

4. Hardwired control unit utilises fixed logic circuits to understand the instructions and produce
O

control signals. These control signals are determined with the help of the contents of the control
step counter, instruction register, and condition code flags as well as the external input signals, such
C

as MFC and interrupt requests.


Refer to Section Hardwired Control Unit
5. With the help of the software, the creation of control signals may be identified. This software is
saved in the processor’s special memory, which is smaller and quicker than regular memory. The
software is called a microprogram, and the memory is called a microprogram memory or control
store.
Refer to Section Microprogrammed Control Unit

15
JGI JAINDEEMED-TO-BE UNIVERSITY
Computer Organization and Architecture

@ 7.11 POST-UNIT READING MATERIAL

zz http://www.cs.binghamton.edu/~reckert/hardwire3new.html
zz https://www.google.co.in/books/edition/COMPUTER_ORGANIZATION_AND_DESIGN/5LNwVRpfkR
gC?hl=en&gbpv=1&dq=Hardwired+control+unit&pg=PA332&printsec=frontcover

7.12 TOPICS FOR DISCUSSION FORUMS

D
zz Discuss the concept of the processing unit with your classmates. Also, prepare a presentation on the

E
hardwired control unit and microprogrammed control unit.

V
R
E
S
E
R
T
H
IG
R
Y
P
O
C

16
D
E
UNIT

08

V
Pipelining

R
E
S
Names of Sub-Units

E
R
Introduction to Pipelining, Basic Concepts of Pipelining, Data Hazards, Advantages and Disadvantages
of Pipelining, Instruction Hazards, Influence on Instruction Sets
T

Overview
H

This unit begins by discussing the concept of pipelining. Next, the unit explains the basic concepts
of pipelining. Further, the unit describes the data hazards and advantages and disadvantages of
pipelining. Towards the end, the unit discusses the instruction hazards and influence on instruction
IG

sets.

Learning Objectives
R

In this unit, you will learn to:


Y

aa Discuss the concept of pipelining


P

aa Explain the basic concepts of pipelining


aa Describe the data hazards and advantages and disadvantages of pipelining
O

aa Summarise the significance of instruction hazards


aa Discuss the importance of influence on instruction sets
C
JGI JAIN
DEEMED-TO-BE UNIVERSITY
Computer Organization and Architecture

Learning Outcomes

At the end of this unit, you would:


aa Evaluate the concept of pipelining

D
aa Assess the basic concepts of pipelining
aa Evaluate the importance of data hazards and advantages and disadvantages of pipelining

E
aa Determine the significance of instruction hazards

V
aa Explore the importance of influence on instruction sets

R
Pre-Unit Preparatory Material

E
aa https://www.nitsri.ac.in/Department/Electronics%20&%20Communication%20Engineering/
COA-_Unit_3.pdf

S
8.1 Introduction

E
Pipelining is the process of gathering instructions from the processor through a pipeline. It provides for
the systematic storage and execution of instructions. Pipeline processing is another name for it.
R
Pipelining is a method in which numerous instructions are stacked on top of each other during execution.
The pipeline is divided into stages, each of which is connected to the next to for a pipe-like structure.
Instructions come in from one end and leave from the other.
T

8.2 Basic Concepts of Pipelining


H

The basic concept of pipelining is that it increases the performance of system to create some simple
changes in design in the hardware. It enables implementation of parallelism at the hardware level.
The execution of a specific instruction is not decrease by a pipelining but it reduces the overall
IG

implementation of time which is essential for a program. While designing a pipeline a no. of instructions
are implemented at a same time, in an overlapped manner.

8.2.1 Requirement for Pipelining


R

Pipelining is required to decrease the time and to enhance the performance of CPU within the system
by just making some changes or rearranging the element in the hardware, as no further elements are
Y

added to it. The combinational circuit manipulates the data that is stored in the register. The output of
the combinational circuit is applied to the input register of the following segment as shown in Figure 1:
P

Clock
O

Input Output
C

S1 R1 S2 R2 S3 R3

Figure 1: Arrangement in Pipelining

2
UNIT 08: Pipelining JGI JAIN
DEEMED-TO-BE UNIVERSITY

A pipeline system is similar to a factory’s current assembly line configuration. For example, in the
automobile sector, massive assembly lines are built up with robotic arms performing specific tasks at
each stage, and the car then moves on to the next arm.

8.2.2 Types of Pipelining

D
A pipelining processor is categorised by Handler and Ramamurthy in 1977, into two ways according to
their functionality are as follows:

E
zz Arithmetic pipelining
Instruction pipelining

V
zz

Arithmetic Pipelining

R
An arithmetic pipeline can be found in almost every computer. It is considered to divides an arithmetic
problem into several sub-problems to accomplish high-speed floating point operations such as addition,

E
multiplication, division and other calculations. The flowchart of arithmetic pipeline for floating point
additions is shown in Figure 2:

S
Exponents Mantissa
a    b a    b

R
E R
R
Compare exponent Difference
Segment 1
by substraction
T

R
H

Segment 2 Choose exponent Align Mantissas


IG

R
R

Add or Substract
Mantissas
Y

Segment 3 R R
P
O

Segment 4 Adjust exponent Normalise Result


C

R R

Figure 2: Flowchart of Arithmetic Pipeline

3
JGI JAIN
DEEMED-TO-BE UNIVERSITY
Computer Organization and Architecture

Instruction Pipelining
Instruction pipeline is used in a digital computer has a multiple instruction to carry out operations such
as decoding, fetching and implementation of instructions.
Generally, the instructions have to be processed in sequential steps are as follows:

D
zz Fetching the instruction
zz Decoding the instruction

E
zz Compute the effective address
Fetching the operand

V
zz

zz Implementation of instruction

R
zz Store the result

The flowchart of instruction set in pipelining is shown in Figure 3:

E
S
Segment 1: Fetch Instruction
from Memory

E
R
Decode Instruction and
Segment 2:
Calculate Effectie Adress
T

Yes
Branch?
H

No

Fetch Operant
IG

from Memory

Execute Instruction
R

Yes
Interrupt
Y

Interrupt?
Handling
P

No
Update PC
O

Empty Pipe
C

Figure 3: Flowchart of Instruction Pipelining

4
UNIT 08: Pipelining JGI JAIN
DEEMED-TO-BE UNIVERSITY

8.3 Data Hazards
When there is a circumstance in a data hazard in which either the source or the target operands of
an instruction do not exist during the predictable time in the pipeline. As a result, because of some
operation has to be late.

D
Pipeline risks occur when a pipeline is forced to halt for whatever reason. Four pipelining risks are as
follows:

E
zz Data dependency
zz Memory delay

V
zz Branch delay
zz Resource limitation

R
8.3.1 Data Dependency

E
Consider the following two instructions and their pipeline execution as shown in Figure 4:

S
10 11 12 13 14 15 16 17

Add, R2, R3 #100 IF ID OF


E
IE OS
R
Sub, R9, R3 #30 IF ID OF IE OS
T

Figure 4: Data Dependency


H

The result of the Add instruction is placed in the register R2 in the figure above, and we know that the
final result is stored at the conclusion of the instruction execution, which occurs at clock cycle t4.
IG

However, the value of the register R2 at cycle t3 is required by the Sub instruction. As a result, the Sub
instruction must pause for two clock cycles. It will produce an erroneous result if it does not stall. Data
dependence occurs when one instruction relies on another for data.
R

8.3.2 Memory Delay


Y

When an instruction or piece of data is needed, it is initially looked for in the cache memory; if it is
not found, it is considered a cache miss. The data is then searched in memory, which can take up to 10
P

cycles. As a result, the pipeline must stall for that many cycles, posing a memory latency risk. All future
instructions are delayed as a result of the cache miss.
O

8.3.3 Branch Delay


C

Assume the four instructions are pipelined in the following order: I1, I2, I3, and I4. The Ik is the target
instruction for instruction I1, which is a branch instruction. At the fourth step of cycle t3, processing
begins, with instruction I1 being fetched, decoded, and the target address being calculated.

5
JGI JAIN
DEEMED-TO-BE UNIVERSITY
Computer Organization and Architecture

However, before the target branch address is determined, the instructions I2, I3, and I4 are fetched in
cycles 1, 2, and 3. Because Ik must be performed before I1, the instructions I2, I3, and I4 must be ignored
because I1 is revealed to be a branch instruction. As a result, this three-cycle delay (1, 2, 3) is a branch
delay as shown in Figure 5:

D
10 11 12 13 14 15 16 17 18

E
Instruction 1 IF ID OF IE

V
Instruction 2 IF ID OF

R
Instruction 3 IF ID

E
Instruction 4 IF

S
Instruction K IF ID OF IE OS

E
Branch Pentalty
R
Figure 5: Branch Delay

8.3.4 Resource Limitation
T

If two instructions request access to the same resource within the same clock cycle, one of the instructions
must stall and allow the other to use the resource. The halting is caused by a lack of resources. It can,
H

however, be avoided by installing extra hardware.


Advantages of resource limitation are as follows:
IG

zz Pipelining increases the system’s throughput.


zz A new instruction completes its execution every clock cycle.
Allow for the execution of several instructions at the same time.
R

zz

8.4 Advantages and Disadvantages of Pipelining


Y

The process of organizing the components of the CPU such as hardware elements to increase the overall
performance, in the pipelined processor the implementation of more than one instruction takes place. It
P

has some advantages and disadvantages also.


O

8.4.1 Advantages of Pipelining
The following points describe the advantages of pipelining are as follows:
C

zz The processor’s cycle time is reduced, and the instruction count is increased the throughput.
zz The CPU Arithmetic logic unit may be constructed faster if pipelining is employed. However, it is
more complicated.

6
UNIT 08: Pipelining JGI JAIN
DEEMED-TO-BE UNIVERSITY

zz In principle, pipelining improves performance by a factor of two over a core that isn’t piped.
zz The clock frequency of pipelined CPUs is higher than that of non-pipelined CPUs.

8.4.2 Disadvantages of Pipelining

D
Pipelining has a number of downsides, however CPU and compiler designers have devised a number of
ways to mitigate the majority of them; the following is a list of the most frequent difficulties.

E
8.5 Instruction Hazards

V
In the context of conflicts induced by hardware resource constraints (structural hazards) and
dependencies between instructions, scoreboards are designed to regulate the flow of data between
registers and various arithmetic units (data hazards). Flow dependencies (Read-After-Write), output

R
dependencies (Write-After-Write), and structural hazards are the four types of data risks (Write-After-
Read).

E
8.5.1 Read-After-Write (RAW) Hazards

S
When one instruction necessitates the outcome of a previously issued but unfinished instruction, a
Read-After-Write danger arises. An example of RAW hazards is shown in Figure 6:

E Write
R
Inst 1 1 2 3 4 5

Inst 2 1 2 2 3 5
T

Read
H

Figure 6: Read-After-Write Hazard


IG

8.5.2 Write-After-Write (WAW) Hazards


A Write-After-Write hazards is arises, when an instruction attempts to write its output in the same
register before its execution completed. Both instructions in the WAW example in the picture send their
R

results to R6. Despite the fact that this last example is unlikely to occur in everyday programming, it
must provide the proper result. Without correct interlocks, the add operation would finish first, and the
result in R6 would be overwritten by the multiplication process. An example of WAW hazard is as follows:
Y

R6 = R1 * R2
R6 = R4 + R5
P

8.5.3 Write-After-Read (WAR) Hazards


O

When an instruction attempts to write to a register that has not yet been read by a previously issued
but unfinished instruction, a Write-After-Read hazard arises. Because of the way instructions were sent
C

to the arithmetic units, this danger cannot arise in most systems, but it may in the CDC6600. The WAR
example is based on the CDC 6600, which used X registers to store floating-point data is as follows:
X3 =X1/X2

7
JGI JAIN DEEMED-TO-BE UNIVERSITY
Computer Organization and Architecture

X5 = X4 * X3
X4 = X + X6
In the third instruction, the WAR danger is on register X4. It occurs because instructions that are stalled
due to a RAW hazard are nevertheless sent to their arithmetic unit, where they await their operands. As

D
a result, the second instruction can be given immediately after the first, but it is held up in the add unit
as it waits for its operands because to the WAR hazard on X3, which cannot be read until the division unit
finishes its operation. The third command can also be given directly after the second, and it will begin to

E
work. However, because the floating-point add operation takes considerably less time than division, the
add unit is ready to save its output.

V
8.5.4 Structural Hazards

R
A structure hazard is arises when more than two instructions exist in the pipeline and require the same
resource. In this case when, these two instructions already exist in pipeline then they must be executed

E
in sequence rather than parallel, it is also stated as resource hazards. An example of structure hazard
is shown in Figure 7:

S
Clock Cycle Number

Insir 1 2 3 4 5 6 7 8 9
Load
Insir 1
IF ID
IF
EX
ID
MEM
EX E WB
MEM WB
R
Insir 2 IF ID EX MEM WB

Figure 7: Structure Hazard


T

8.6 Influences on Instruction Sets


H

In a computer, instruction set is a part which relates to the programming as it also considered as
machine language. The processor work when the instruction set provides commands to tell it what it
needs to do. It contains the addressing modes, instructions, registers, memory architecture, external
IG

input or output, and so on.


In pipelining an instruction set is a method for executing instruction parallelism with a single processor.
It pertains to make busy each and every part of a processor with some instructions, as it integrate the
R

input instructions into a sequence which is executed by distinct processor units with distinct parts of
instructions.
Y

Conclusion 8.7 Conclusion
P

zz Pipelining is the process of gathering instructions from the processor through a pipeline
O

zz The basic concept of pipelining is that it increases the performance of system.


zz An arithmetic operation is used to accomplish high-speed floating point operations.
C

zz Data hazard in which either the source or the target operands of an instruction do not exist during
the predictable time in the pipeline.
zz A Write-After-Write (WAR) hazards is arises, when an instruction attempts to write its output in the
same register.

8
UNIT 08: Pipelining JGI JAIN
DEEMED-TO-BE UNIVERSITY

zz A WAR hazard arises if an instruction attempts to write to a register that has not yet been read by
a previously issued.
zz A structure hazard is arises when more than two instructions exist in the pipeline.
zz Instruction set is a part which relates to the programming as it also considered as machine language.

D
8.8 Glossary

E
zz Pipelining: It is the process of gathering instructions from the processor through a pipeline

V
zz Arithmetic pipeline: It is used to accomplish high-speed floating point operations
zz Data hazard: In which either the source or the target operands of an instruction do not exist during

R
the predictable time in the pipeline
zz Write-After-Write (WAR) hazards: It is arises, when an instruction attempts to write its output in
the same register

E
zz WAR hazard: It arises if an instruction attempts to write to a register that has not yet been read by
a previously issued

S
zz Structure hazard: It is arises when more than two instructions exist in the pipeline
zz
language
E
Instruction set: It is a part which relates to the programming as it also considered as machine
R
8.9 Self-Assessment Questions
T

A. Multiple Choice Questions


1. __________ is a method in which numerous instructions are stacked on top of each other during
H

execution.
a. Pipelining
IG

b. Instruction set
c. Data hazards
d. Structural hazards
R

2. A pipelining processor is categorised by Handler and Ramamurthy in __________.


a. 1978
Y

b. 1992
P

c. 1977
d. 1972
O

3. __________ is used in a digital computer has a multiple instruction to carry out operations such as
decoding or fetching.
C

a. Data hazard
b. Instruction pipeline
c. Arithmetic pipeline
d. Pipelining

9
JGI JAIN
DEEMED-TO-BE UNIVERSITY
Computer Organization and Architecture

4. A __________ hazards is arises, when an instruction attempts to write its output in the same register
before its execution completed.
a. WAR
b. RAW

D
c. WAW
d. Structural

E
5. Which among the following hazard is arises when more than two instructions exist in the pipeline
and require the same resource?

V
a. Data hazards
b. Structural hazards

R
c. Instruction hazards
d. RAW hazards

E
6. In pipelining __________ is a method for executing instruction parallelism with a single processor.
a. Data hazard

S
b. Instruction hazard
c. Instruction set
d. Resource limitation
E
R
7. It is considered to divides the problem into several sub-problems to accomplish high-speed floating
point operations, it is called as __________.
a. Arithmetic pipeline
T

b. Instruction pipeline
c. Pipelining
H

d. Instruction set
8. How many Pipeline risks occur when a pipeline is forced to halt?
IG

a. 1
b. 2
c. 3
R

d. 4
Y

9. If two instructions request access to the same resource within the same clock cycle, then it is a
condition of __________.
P

a. Pipelining
b. Data hazards
O

c. Instruction set
d. Resource limitation
C

10. When an instruction attempts to write to a register that has not yet been read by a previously issued
but unfinished instruction, it is called as __________.
a. RAW hazard
b. Structural hazard

10
UNIT 08: Pipelining JGI JAIN
DEEMED-TO-BE UNIVERSITY

c. WAW hazard
d. WAR hazard

B. Essay Type Questions


1. Pipelining is a method in which numerous instructions are stacked on top of each other during

D
execution. What do you understand by the term pipelining?
2. A pipeline system is similar to a factory’s current assembly line configuration. Explain the basic

E
concept of pipelining and also discuss its types.
3. Pipeline risks occur when a pipeline is forced to halt. Describe the concept of data hazards and also

V
discuss different types of risk arises during pipelining.
4. In the pipelined processor, the implementation of more than one instruction takes place. Determine

R
the advantages and disadvantages of pipelining.
5. In pipelining, an instruction set is a method for executing instruction parallelism with a single

E
processor. Define the concept of instruction set in pipelining.

S
8.10 Answers AND HINTS FOR Self-Assessment Questions

A. Answers to Multiple Choice Questions


E
R
Q. No. Answer
1. a. Pipelining
T

2. c. 1977
3. b. Instruction pipeline
H

4. c. WAW
5. b. Structural hazards
IG

6. c. Instruction set
7. a. Arithmetic pipeline
R

8. d. 4
9. d. Resource limitation
Y

10. d. WAR hazard


P

B. Hints for Essay Type Questions


1. Pipelining is the process of gathering instructions from the processor through a pipeline. It provides
O

for the systematic storage and execution of instructions. Refer to Section Introduction
2. The basic concept of pipelining is that it increases the performance of system to create some simple
C

changes in design in the hardware. Refer to Section Basic Concept of Pipelining


3. When there is a circumstance in a data hazard in which either the source or the target operands of
an instruction do not exist during the predictable time in the pipeline. Refer to Section Data Hazards

11
JGI JAINDEEMED-TO-BE UNIVERSITY
Computer Organization and Architecture

4. The process of organizing the components of the CPU such as hardware elements to increase the
overall performance. Refer to Section Advantages and Disadvantages of Pipelining
5. In a computer, instruction set is a part which relates to the programming as it also considered as
machine language. Refer to Section Influences on Instruction Sets

D
@ 8.11 Post-Unit Reading Material

E
zz https://www.motivationbank.in/2021/09/computer-organization-and-architecture.html
zz https://www.javatpoint.com/computer-organization-and-architecture-tutorial

V
8.12 Topics for Discussion Forums

R
zz Discuss the concept of pipelining, its types and data hazards with your friends and classmates. Also,

E
discuss the pipelining instruction hazards with real world examples.

S
E
R
T
H
IG
R
Y
P
O
C

12
UNIT

09

D
E
Parallel Computer Models

V
R
E
S
Names of Sub-Units
E
Introduction to Parallel Computer Models, The State of Computing, Multiprocessors, Multicomputer,
R
PRAM Model, VLSI Model
T

Overview
H

This unit begins by discussing the concept of parallel computer models. Next, the unit discusses the
state of computing. Further, the unit explains the multiprocessors and multicomputer. Towards the
IG

end, the unit discusses the PRAM and VLSI model.


R

Learning Objectives
Y

In this unit, you will learn to:


Discuss the concept of parallel computer models
P

aa

aa Explain the concept of the state of computing


O

aa Describe the multiprocessors and multicomputer


C

aa Explain the significance of PRAM model


aa Summarise the importance of VLSI model
JGI JAIN
DEEMED-TO-BE UNIVERSITY
Computer Organization and Architecture

Learning Outcomes

At the end of this unit, you would:


aa Evaluate the concept of parallel computer models
aa Assess the concept of the state of computing
aa Evaluate the importance of multiprocessors and multicomputer
aa Understand the significance of PRAM model

D
aa Explore the importance of VLSI model

E
Pre-Unit Preparatory Material

V
aa https://www.philadelphia.edu.jo/academics/kaubaidy/uploads/ACA-Lect18.pdf

R
9.1 Introduction

E
Parallel processing has emerged as a useful technique in modern computers to fulfil the demand for

S
better speed, cheaper costs and more accurate outcomes in real-world applications. Due to the technique
of multiprogramming, multiprocessing or multicomputing, concurrent occurrences are frequent
E
in today’s computers. Software packages on modern computers are strong and comprehensive. To
study the evolution of computer performance, we must first comprehend the fundamental evolution
R
of hardware and software. Milestones in computer development mechanical and electromechanical
portions are the two primary stages of computer development. After the development of electrical
components, modern computers emerged. The operating components of mechanical computers were
T

replaced with high mobility electrons in electronic computers. Electric signals, which move at almost
the speed of light, have replaced mechanical gears and levers in data transmission.
H

9.1.1 Parallel Architecture


IG

Data locality and data communication are also linked to parallel computing. The approach of arranging
all resources to optimise speed and programmability within the constraints set by technology and cost
at any particular moment is known as parallel computer architecture. By utilising an increasing number
R

of processors, parallel computer architecture offers a new dimension to the development of computer
systems. In theory, performance gained via the use of a large number of processors is superior to the
Y

performance of a single processor at any given moment. Components of today’s computer hardware,
instruction sets, application programs, system software and user interface make up a contemporary
P

computer system as shown in Figure 1:


O
C

Figure 1: Parallel Architecture

2
UNIT 09: Parallel Computer Models JGI JAIN
DEEMED-TO-BE UNIVERSITY

Numerical computing, logical reasoning and transaction processing are the three types of computing
issues. Some complicated issues may necessitate the use of all three processing techniques. Some of the
issues in parallel architecture are as follows:
zz Evolution of computer architecture: Computer architecture has undergone revolutionary
transformations over the previous four decades computer architecture has undergone revolutionary
changes over the last four decades. From von Neumann architecture to multicomputer and
multiprocessors, we have come a long way.
zz Performance of a computer system: The efficiency of a computer system performance is determined
by both machine capabilities and software behaviour. Better hardware technology, innovative

D
architectural features and effective resource management may all help machines perform better.
Because it is reliant on the application and run-time variables, programe behaviour is unpredictable.

E
V
9.2 The State of Computing
Modern computers have powerful hardware that is controlled by a large number of software programs.

R
To assess the state of computing, we must first examine historical milestones in computer development.

E
9.2.1 Computer Generations

S
Electronic computers have gone through several phases of advancement during the last five decades.
The first three generations each lasted around ten years. The fourth generation spanned a 15-year
E
period. With the usage of CPUs and memory devices with more than one million transistors on a single
silicon chip, we have just reached the fifth generation. Each iteration introduces additional hardware
R
and software features, as shows in the Table 1. The majority of traits introduced in previous generations
have been handed down to subsequent generations is shown in Table 1:
T

Table 1: Five Generations of Electronic Computers


H

Generation Technology & Architecture Software & Application Representative System


First (1945-54) Vacuum tubes & relay Machine/assembly ENIAC, Princeton, IAS, IBM
IG

memories, CPU motivated by languages, single user, 701


PC & accumulator, fixed point no subroutine linkage,
arithmetic programmed I/O using CPU
R

Second (1955-64) Discrete transistors and core Shall use with compiler, IBM 7090, CDC 1604,
memories, floating point subroutine libraries, batch Univac LARC
Y

arithmetic, I/O processors, processing monitor


multiplexed memory access
P

Third (1965-74) Integrated circuits (SSI- Multiprogramming & time IBM 360/370, CDC 6600,
MSI), microprogramming, sharing OS, multi user TI- ASC, PDP-8 VAX 9000,
O

pipelining, cache & lookahead applications Gay XMP, IBM 3090 BBN
processors TC 2000
C

Fourth (1975-90) LSI/VLSI & semiconductor Multiprocessor OS, Fujitsu VPP 500, Gay/MPP,
memory, multiprocessors, languages, compilers & TMC/CM-5, Intel Paragon
vector supercomputers, multi environment for parallel
computers processing
Fifth (1991 present) ULSI/VHSIC processors, Massively parallel
memory & switches, high processing, grand challenge
density packaging, scalable applications, heterogeneous
architectures process

3
JGI JAIN
DEEMED-TO-BE UNIVERSITY
Computer Organization and Architecture

In other words, the latest generation computers have inherited all the bad ones found in previous
generations.

9.3 Multiprocessors
A multiprocessor is a part of computer system that has two or more central processing units (CPUs)
share with each one the common main memory, bus as well as the peripherals. It helps in simultaneous
processing of programs. The key objective of using a multiprocessor is to enhance the system’s
implementation speed, with another objective of being fault tolerance and application matching.
It is considered as a means to enhance computing speeds, performance as well as to give enhanced

D
availability and reliability. A multiprocessor system is shown in Figure 2:

E
V
R
E
S
E
Figure 2: Multiprocessor
A multiprocessor are of two types are as follows:
R
zz Shared memory multiprocessor
zz Distributed memory multiprocessor
T
H

9.3.1 Shared Memory Multiprocessor


It is a multiprocessor in which the CPUs share the common memory is known as shared memory
IG

multiprocessor. Three most common shared memory multiprocessors models are as follows:

Uniform Memory Access (UMA)


R

The physical memory is shared evenly by all processors in this approach. All processors have the same
Y

amount of time to access all memory words. A private cache memory may be available to each CPU. The
same rule applies to peripheral devices. A symmetric multiprocessor is one in which all of the processors
P

have equal access to all of the peripheral devices. An asymmetric multiprocessor is one in which only
one or a few processors have access to external devices, as shown in Figure 3:
O
C

Figure 3: Uniform Memory Access (UMA)

4
UNIT 09: Parallel Computer Models JGI JAIN
DEEMED-TO-BE UNIVERSITY

Non-Uniform Memory Access (NUMA)


The access time in a NUMA multiprocessor architecture varies depending on the memory word’s
placement. Local memories are used to physically distribute shared memory among all CPUs. All local
memories are combined to produce a global address space that can be accessed by all CPUs, as shown
in Figure 4:

D
E
V
R
E
Figure 4: Non-Uniform Memory Access (NUMA)

S
Cache Only Memory Architecture (COMA)
E
A particular instance of the NUMA model is the COMA model. All distributed main memories are
transformed to cache memories in this model, as shown in Figure 5:
R
T
H
IG
R
Y

Figure 5: Cache Only Memory Architecture (COMA)


P

9.3.2 Distributed Memory Multiprocessor


O

It is a multiprocessor in which each and every CPU of a computer system has its own private memory.
In the present state of distributed memory multiprocessor system programming technology it is
C

commonly expected that the well-organised way of manipulating large scale parallelism by means
of the single program multiple data (SPMD) programming model. The programming approach is to
involve in defining the most suitable distribution of the data space.

9.4 Multicomputer
A multicomputer system is considered as a message passing system that is connected to other computer
system to solve the problem. Each and every processor in a multicomputer has its own private memory
and it is available by that specific processor, the communication takes place with each other through an

5
JGI JAIN
DEEMED-TO-BE UNIVERSITY
Computer Organization and Architecture

interconnection network. As it is a message passing system it is probable to distribute the task among
the processor to accomplish the task. A multicomputer is shown in Figure 6:

D
E
V
R
E
Figure 6: Multicomputer

S
9.4.1 Distributed Memory Multicomputer
E
A distributed memory multicomputer system is made up of many computers called nodes that are
linked together via a message passing network. Each node functions as a self-contained computer with
R
a CPU, local memory and, in certain cases, I/O devices. All local memories are private in this scenario
and only the local processors have access to them. This is why conventional machines are referred to be
NORMA (no-remote-memory-access) computers, as shown in Figure 7:
T
H
IG
R
Y
P
O
C

Figure 7: Distributed Memory Multicomputer

9.4.2 Vector Supercomputers


A vector processor is an optional feature of a vector computer and it is coupled to the scalar processor.
The host computer loads the software and data into main memory first. The scalar control unit then
decodes all of the commands. These decoded instructions are considered as scalar as well as program
operations where the scalar processor is used to perform the operations by using functional pipelines.

6
UNIT 09: Parallel Computer Models JGI JAIN
DEEMED-TO-BE UNIVERSITY

If the decoded instructions, on the other hand, are vector operations, then they will be transmitted to the
vector control unit, as shown in Figure 8:

D
E
V
R
E
S
E
R
Figure 8: Vector Supercomputers
T

9.4.3 SIMD Supercomputers


H

If the decoded instructions, on the other hand, are vector operations, they will be transmitted to the
vector control unit. A SIMD computers have an ‘N’ number of processors coupled to a control unit and
IG

each CPU has its own memory unit. An interconnection network connects all of the processors as shown
in Figure 9:
R
Y
P
O
C

Figure 9: SIMD Supercomputers

7
JGI JAIN
DEEMED-TO-BE UNIVERSITY
Computer Organization and Architecture

9.5 Parallel Random Access Machine (PRAM) Model


Parallel Random Access Machine is also termed as PRAM; it is a model which is used for the parallel
computation known as parallel algorithm. It is the extension of RAM model for sequential algorithm.
In this model the several processors are connected to a single block of memory is termed as global
memory. As shown in the Figure 10 P is refers to the number of processors perform operations
independently on p number of data in a specific time. The result is in concurrent access of same memory
location by distinct processors.
The traditional uniprocessor computers were represented as random-access machines by Shepherdson

D
and Sturgis (1963) in RAM. Fortune and Wyllie (1978) created a Parallel Random Access Machine (PRAM)
model to simulate an idealized parallel computer with no memory access cost and synchronisation.

E
A PRAM model is shown in Figure 10:

V
R
E
S
E
R
T
H
IG
R
Y

Figure 10: PRAM Model


P

A shared memory unit is present in an N-processor PRAM. This shared memory might be shared across
O

the processors or centralised. A synchronised read-memory, write-memory and computation cycle is


used by these processors. As a result, these models define how read and write activities are handled
C

concurrently.
Following are the possible memory update operations in PRAM model are as follows:
zz Exclusive Read (ER): It allows only single processor to read from any memory location.
zz Exclusive Write (EW): It allows only a single processor at a time to write into a memory location.
zz Concurrent Read (CR): It allows several processors to read the information from the same memory.
zz Concurrent Write (CW): It allows concurrent write operation in the same memory location.

8
UNIT 09: Parallel Computer Models JGI JAIN
DEEMED-TO-BE UNIVERSITY

9.6 Very Large Scale Integration (VLSI) Model


The performance and functionality of a computer system have vastly improved during the previous 50
years. The use of Very Large Scale Integration (VLSI) technology made this possible. VLSI technology
enables for the integration of a high number of components on a single chip, as well as increased clock
speeds. As a result, more processes may be done in parallel at the same time.
VLSI chips are used to build processor arrays, memory arrays and large-scale switching networks in
parallel computers. This technology is now two-dimensional in nature. The amount of storage (memory)
space available in a VLSI chip is proportional to its size.

D
The chip area (A) of the VLSI chip implementation of an algorithm may be used to compute the space

E
complexity of that algorithm. A.T. puts an upper constraint on the total number of bits processed via
the chip (or I/O) if T is the time (delay) required to run the algorithm. There is a lower bound, f(s), for

V
particular computation is as follows:
A.T.2 >= O (f(s))

R
The design flow of VLSI model is shown in Figure 11:

E
S
E
R
T
H
IG
R
Y
P
O
C

Figure 11: Design Flow of VLSI Model

9
JGI JAIN DEEMED-TO-BE UNIVERSITY
Computer Organization and Architecture

Conclusion 9.7 Conclusion

zz Parallel processing has emerged as a useful technique in modern computers to fulfil the demand for
better speed.
zz Parallel computer architecture offers a new dimension to the development of computer systems.
zz Electronic computers have gone through several phases of advancement during the last five decades.
zz A multiprocessor is a part of computer system that has two or more central processing units (CPUs).

D
zz A multicomputer system is considered as a message passing system that is connected to other
computer system.

E
zz A distributed memory multicomputer system is made up of many computers called nodes.

V
zz A vector processor is an optional feature of a vector computer and it is coupled to the scalar processor.
A SIMD computers have a ‘N’ number of processors coupled to a control unit.

R
zz

zz VLSI technology enables for the integration of a high number of components on a single chip.

E
zz Parallel Random Access Machin (PRAM)e is a model which is used for the parallel computation
known as parallel algorithm.

S
9.8 Glossary E
R
zz Parallel processing: It has emerged as a useful technique in modern computers to fulfil the demand
for better speed
Parallel computer architecture: It offers a new dimension to the development of computer systems.
T

zz

zz Electronic computers: They have gone through several phases of advancement during the last five
H

decades
Multiprocessor: It is a part of computer system that has two or more central processing units (CPUs).
IG

zz

zz Multicomputer: It is considered as a message passing system that is connected to other computer


system.
R

zz Distributed memory multicomputer system: It is made up of many computers called nodes.


Vector processor: It is an optional feature of a vector computer and it is coupled to the scalar
Y

zz
processor
P

zz SIMD computers: They have an ‘N’ number of processors coupled to a control unit.
zz VLSI model: It is a technology that enables for the integration of a high number of components on
O

a single chip
Uniform Memory Access (UMA): The physical memory is shared evenly by all processors in this
C

zz
approach
zz Parallel Random Access Machine (PRAM): It is a model which is used for the parallel computation
known as parallel algorithm

10
UNIT 09: Parallel Computer Models JGI JAIN
DEEMED-TO-BE UNIVERSITY

9.9 Self-Assessment Questions

A. Multiple Choice Questions


1. The approach of arranging all resources to optimise speed and programmability within the
constraints set by technology and cost at any particular moment is called __________.
a. Parallel architecture
b. Parallel processing

D
c. Parallel computer

E
d. Multiprocessor

V
2. A __________ is a part of computer system that has two or more central processing units (CPUs)
share with each one the common main memory.

R
a. PRAM

E
b. Parallel processing
c. Multiprocessor

S
d. Multicomputer
E
3. The physical memory is shared evenly by all processors in this approach, it is a __________.
a. COMA
R
b. UMA
c. NUMA
T

d. Multiprocessor
H

4. Which among the following leads to the concurrency?


a. Synchronisation
IG

b. Multiprocessing
c. Distribution
R

d. Parallelism
Y

5. Which among the following is considered as a message passing system that is connected to other
computer system to solve the problem?
P

a. Multicomputer
O

b. Multiprocessor
c. Shared memory processor
C

d. Distributed memory processor


6. __________ allows only single processor to read from any memory location.
a. EW
b. CW
c. CR
d. ER

11
JGI JAIN
DEEMED-TO-BE UNIVERSITY
Computer Organization and Architecture

7. Which of these technologies enables for the integration of a high number of components on a single
chip, as well as increased clock speeds?
a. Parallel processing
b. VLSI
c. PRAM
d. Parallel architecture
8. The traditional uniprocessor computers were represented as Random-Access Machines by
Shepherdson and Sturgis in __________.

D
a. 1961

E
b. 1962

V
c. 1963
d. 1964

R
9. In the PRAM model the several processors are connected to a single block of memory is termed
as __________.

E
a. Super memory

S
b. Supercomputer
c. Vector computer
d. Global memory
E
R
10. A distributed memory multicomputer system is made up of many computers called __________.
a. Nodes
T

b. CPU
H

c. Multiprocessor
d. Multicomputer
IG

B. Essay Type Questions


R

1. By utilising an increasing number of processors, parallel computer architecture offers a new


dimension to the development of computer systems. What is a concept of parallel processing and
Y

parallel architecture?
2. Electronic computers have gone through several phases of advancement during the last five
P

decades. Describe the concept of electronic computer and its generations in brief.
O

3. The key objective of using a multiprocessor is to enhance the system’s implementation speed. Explain
the importance of multiprocessor with its types in parallel computers in brief.
C

4. It is considered as a model which is used for the parallel computation known as parallel algorithm.
Describe PRAM model in detail.
5. VLSI chips are used to build processor arrays, memory arrays and large-scale switching networks in
parallel computers. Determine the concept of VLSI model in parallel computers.

12
UNIT 09: Parallel Computer Models JGI JAIN
DEEMED-TO-BE UNIVERSITY

9.10 Answers AND HINTS FOR Self-Assessment Questions

A. Answers to Multiple Choice Questions

Q. No. Answer
1. a. Parallel architecture
2. c. Multiprocessor

D
3. b. UMA

E
4. d. Parallelism
5. a. Multicomputer

V
6. d. ER

R
7. b. VLSI
8. c. 1963

E
9. d. Global memory

S
10. a. Nodes

B. Hints for Essay Type Questions


E
R
1. Parallel processing has emerged as a useful technique in modern computers to fulfil the demand for
better speed, cheaper costs and more accurate outcomes in real-world applications.
Refer to Section Introduction
T

2. Modern computers have powerful hardware that is controlled by a large number of software
H

programs. To assess the state of computing, we must first examine historical milestones.
Refer to Section The State of Computing
IG

3. A multiprocessor is a part of computer system that has two or more central processing units
(CPUs) share with each one the common main memory, bus as well as the peripherals. It helps in
simultaneous processing of programs.
R

Refer to Section Multiprocessor


Y

4. Parallel Random Access Machine is also termed as PRAM; it is a model which is used for the parallel
computation known as parallel algorithm. It is the extension of RAM model for sequential algorithm.
P

Refer to Section Parallel Random Access Machine (PRAM) Model


O

5. The performance and functionality of a computer system have vastly improved during the previous
50 years. The use of Very Large Scale Integration (VLSI) technology made this possible.
C

Refer to Section Very Large Scale Integration (VLSI) Model

@ 9.11 Post-Unit Reading Material

zz https://www.javatpoint.com/what-is-parallel-computing
zz https://rc.fas.harvard.edu/wp-content/uploads/2016/03/Intro_Parallel_Computing.pdf

13
JGI JAIN
DEEMED-TO-BE UNIVERSITY
Computer Organization and Architecture

9.12 Topics for Discussion Forums

zz Discuss the concept of parallel computers and parallel computer architecture with your friends and
classmates. Also, discuss about the concept of parallel computer model such as PRAM and VLSI
model, Multiprocessor and multicomputer with real world examples.

D
E
V
R
E
S
E
R
T
H
IG
R
Y
P
O
C

14
UNIT

10

D
E
Program and Network Properties

V
R
E
S
Names of Sub-Units
E
Introduction, Conditions of Parallelism, Program Partitioning and Scheduling, Program Flow
R
Mechanisms, System Interconnect Architectures
T

Overview
H

This unit begins by discussing the concept of program and network properties. Next, the unit discusses
IG

the conditions of parallelism and program partitioning and scheduling. Further, the unit explains the
program flow mechanisms. Towards the end, the unit discusses the system interconnect architectures.
R

Learning Objectives
Y
P

In this unit, you will learn to:


O

aa Discuss the concept of program and network properties


aa Explain the concept of conditions of parallelism
C

aa Describe the program partitioning and scheduling


aa Explain the significance of program flow mechanisms
aa Discuss the importance of system interconnect architectures
JGI JAIN
DEEMED-TO-BE UNIVERSITY
Computer Organization and Architecture

Learning Outcomes

At the end of this unit, you would:


aa Evaluate the concept of program and network properties
aa Assess the concept of conditions of parallelism
aa Evaluate the importance of program partitioning and scheduling
aa Determine the significance of program flow mechanisms

D
aa Explore the importance of system interconnect architectures

E
V
Pre-Unit Preparatory Material

R
aa https://www.inf.ed.ac.uk/teaching/courses/pa/Notes/lecture02-types.pdf

E
10.1 Introduction

S
Parallelism is a key idea in today’s computers, and the usage of several functional units within the CPU
E
is an example of parallelism. Because early computers only had one arithmetic and functional unit, only
one operation could be performed at a time. As a result, the ALU function may be spread among many
R
functional units that run in parallel.
H.T. Kung has identified the need to advance in three areas: parallel computing calculation models,
T

parallel architecture inter process communication, and system integration for integrating parallel
systems into a general computing environment.
H

10.2 Conditions of Parallelism
IG

The various conditions of parallelism are as follows:


zz Data and resource dependencies
R

zz Bernstein’s condition
Y

zz Software parallelism
zz Hardware parallelism
P
O

10.2.1 Data and Resource Dependencies


Programme is made up of several part, so the ability of executing several programme segment in parallel
C

requires that each segment should be independent other segment. Dependencies in various segment
of a programme may be in various form such as resource dependency, control depending and data
depending. Dependence graph is used to describe the relation. Program statements are represented
by nodes and the directed edge with different labels shows the ordered relation among the statements.
After analysing the dependency structure, it may be seen where parallelization and vectorization
opportunities exist. The relation between statements is shown by data dependences.

2
UNIT 10: Program and Network Properties JGI JAIN
DEEMED-TO-BE UNIVERSITY

There are five types of data dependencies are as follows:


zz Anti-dependency: A statement S2 is anti-dependent on statement ST1, if ST2 follows ST1 in order
and if the output of ST2 overlap the input to ST1.
zz Input dependence: Both read and write are input statements, and their input dependence arises not
because of the equivalent variables involved, but because both input statements denote to the same
file.
zz Unknown dependence: The dependence relation between two statements cannot be found in
following situation:

D
 The subscript of variable is itself subscribed.
The subscript does not have the loop index variable.

E


 Subscript is non-linear.

V
zz Output dependence: Two statements are output dependence if they produce the same output
variable.

R
zz Flow dependence: If an expression path occurs from ST1 to ST2 and at least one of ST’s outputs feeds

E
in an input to ST2, the statement ST2 is flow dependent.

S
10.2.2 Bernstein’s Condition
E
Bernstein identified a set of criteria that determine whether two processes may run concurrently. A
programme in progress is referred to as a process. Process is a living thing. It is, in fact, a stumbling
R
block of a programme fragment specified at multiple stages of processing. Ii is the input set of process
Pi, which is a collection of all input variables required to complete the process; similarly, the output set
of process Pi is a collection of all output variables created once all processes Pi have been completed. The
T

operands are retrieved from memory or register and are called input variables. The result is placed in
working registers or memory locations as output variables:
H

zz Let there are 2 processes P1 & P2


IG

zz Input sets are I1 & I2


zz Output sets are O1 & O2
R

The two processes P1 & P2 can execute in parallel & are directed by P1/P2 if & only if they are independent
and do not create confusing results.
Y

10.2.3 Software Parallelism


P

Control and data dependencies of programmes define software dependence. The programme profile or
O

programme flow graph indicates the degree of parallelism. Algorithm, programming style, and compiler
optimization all influence software parallelism. The pattern of concurrently executed operations is
C

shown in programme flow graphs. During the execution of a programme, the amount of parallelism
changes.

10.2.4 Hardware Parallelism


Hardware multiplicity and machine hardware define parallelism. It is a trade-off between expense and
performance. It displays the resource usage patterns of operations that are being executed at the same
time. It also shows how well the processing resources are performing.

3
JGI JAIN
DEEMED-TO-BE UNIVERSITY
Computer Organization and Architecture

10.3 Program Partitioning and Scheduling


The amount of instructions issued every machine cycle is one technique of detecting parallelism in
hardware. A programme is partitioned into two (or more) portions, each signifying a subgroup of the
original. The partitioning must be complete in such a way that the bugs in the partition are very prospective
to represent bugs in the original programme, as our goal is to examine programme partitions to uncover
true defects that were not identified through whole-program analysis. Partitioning is based on the idea
of a program’s control flow graph (CFG). A Control Flow Diagram (CFG) is a static representation of
the programme that depicts all control flow options. Each node in the graph represents a basic block.

D
A programme is partitioned into two (or more) portions, each signifying a subgroup of the original.
The partitioning must be complete in such a way that the bugs in the partition are very prospective to

E
represent bugs in the original programme, as our goal is to examine programme partitions to uncover
true defects that were not identified through whole-program analysis. Partitioning is based on the idea

V
of a program’s CFG is a static representation of the programme that depicts all control flow options.
Each node in the graph represents a basic block, which is a chunk of code that runs in a straight line

R
with no jumps or targets. There are numerous options which are a chunk of code that runs in a straight
line with no jumps or targets. There are numerous options.

E
S
10.4 Program Flow Mechanisms
Traditional computers rely on a control flow system in which the sequence in which programmes
E
are executed is specified directly in the user programme. At the fine grain instruction level, data flow
R
computers have a high degree of parallelism. Reduction computers are built on a demand-driven
technique that involves which commences operation based on the demand for its result by other
computations.
T

10.4.1 Control Flow Computers


H

Shared memory is used by Control Flow computers to store programme instructions and data items.
IG

Many instructions change variables in shared memory. Because memory is shared, the execution of one
instruction may have unintended consequences for other instructions. The negative effects of parallel
processing preclude parallel processing in many instances. In reality, due to the employment of control-
R

driven mechanisms, a uniprocessor computer is essentially sequential.


Y

10.4.2 Data Flow Computers


P

In a data flow computer, rather than being directed by a programme counter, the execution of an
instruction is governed by data availability. Any instruction should theoretically be ready to execute
O

whenever operands become available. In a data-driven programme, the instructions are not in any way
arranged. Data is directly kept inside instructions rather to being saved in shared memory. The results
C

of computations are transmitted directly between instructions. The data created by an instruction will
be copied and transmitted to any instructions that require it.
There is no shared memory, no programme counter, and no control sequencer in this data-driven
architecture. It does, however, necessitate a unique mechanism for detecting data availability, matching
data tokens with required instructions, and enabling the chain reaction of asynchronous instructions
execution.

4
UNIT 10: Program and Network Properties JGI JAIN
DEEMED-TO-BE UNIVERSITY

There are a slew of new data flow computer projects in the horizon. Arvind and his MIT colleagues
created a tagged token architecture for building data flow computers.
An n x n routing network connects then processing elements (PEs) in the global architecture. In all n
PEs, the whole system supports pipelined data flow operations. The pipelined routing network is used to
communicate between PEs.
Within each PE, the machine has a low-level token matching mechanism that only sends out instructions
for which the input data is already accessible. Each datum is labelled with the address of the instruction
it belongs to as well as the context in which it is being performed. In programme memory, instructions

D
are saved. A local route is used to bring tagged tokens into the PE. The tokens can also be transferred
over the routing network to the other PE. All internal circulation processes are pipelined so that there

E
are no bottlenecks.
In a dataflow computer, the instruction address replaces the programme counter, and in a control flow

V
computer, the context identifier replaces the frame base register. It is the responsibility of the machine
to match data with the same tag to the required instructions. New data will be created as a result, along

R
with a new tag identifying the successor instructions. As a result, each instruction is a synchronization
operation. New tokens are created and distributed along the PE pipeline for sense, or through the global

E
route, which is also pipelined, to other PEs.

S
10.5 System Interconnect Architectures
E
Modern information system operate federal agencies, like other types of organisations, rarely function
R
in isolation, and they usually integrate two or more systems to exchange data or share information
resources. Any direct link between information systems is referred to as a system interconnection;
information system owners must describe all system interconnections in their security plans and select
T

suitable security safeguards for each interconnection.


H

Some of the system interconnects architectures are as follows:


zz Network properties
IG

zz Node degree and network diameter


zz Data routing functions
R

10.5.1 Network Properties


Y

A network’s topology might be static or dynamic. There are built static networks that consist of point-
P

to-point direct connections that do not change during the execution process. Programs that require
communication are able to communicate with each other via switched channels in dynamic networks.
O

When subsystems of a central computer system or numerous computing nodes are connected using
static networks, the link is permanent. In the shared memory multiprocessors, the dynamic networks
C

such as buses, multistage networks, are utilized frequently. SIMD computers have also used both types of
networks to route data between PEs. Most networks consist of nodes connected by directed or undirected
edges, which are represented by a graph. The network size is the number of nodes in the graph.

10.5.2 Node Degree and Network Diameter


It is termed the node degree d the number of edges that are incident on a node Number of channels
entering a node is called the ‘in’ degree, while the number of channels out of a node is called the ‘out’

5
JGI JAIN DEEMED-TO-BE UNIVERSITY
Computer Organization and Architecture

degree for unidirectional channels. In this case, the node degree is calculated by adding the two degrees
together. There are a number of I/O ports required per node as well as a cost for each port. In order to
decrease cost, the node degree should be kept constant. For scalable systems, a constant node degree is
desirable.
As a network grows, its diameter D increases, as does the shortest path between any two nodes. It is
based on the number of links that have been followed. It shows the maximum number of hops between
any two nodes, offering a measure of the network’s communication capacity. A small network diameter
is desirable from a communication standpoint.

D
Bisection Width b is the minimal number of edges along channel bisection when a network is split into
two identical halves. When it comes to communication networks, each edge corresponds to one w-bit

E
channel. The bisection width of a wire is thus B = bw. The wire density of a network is represented by
parameter B in this equation. w = B/b when B is constant. A network’s bisect ion width can be used to

V
determine the maximum communication band width. Bisection width should limit rest cross sections.

R
10.5.3 Data Routing Functions

E
Data interchange between PEs is carried out via data routing networks. It is possible to have a static

S
or dynamic data routing network. Messages are used to route data in multicomputer networks. In
addition, routing networks improve system performance by reducing the amount of time necessary for
E
data interchange. Commonly used data routing functions include shifting, rotation, permutations and
broadcasting.
R
Conclusion 10.6 Conclusion
T

zz Parallelism is a key idea in today’s computers, and the usage of several functional units within the
H

CPU is an example of parallelism.


zz Hardware multiplicity and machine hardware define parallelism.
IG

zz Control and data dependencies of programmes define software dependence.


zz Shared memory is used by control flow computers to store programme instructions and data items.
R

zz In a data flow computer, rather than being directed by a programme counter, the execution of an
instruction is governed by data availability.
Y

zz Bernstein identified a set of criteria that determine whether two processes may run concurrently
P

zz A network’s topology might be static or dynamic.


Data interchange between PEs is carried out via data routing networks.
O

zz

zz A programme is partitioned into two (or more) portions, each signifying a subgroup of the original.
C

10.7 Glossary

zz Parallelism: It is a key idea in today’s computers, and the usage of several functional units within
the CPU is an example of parallelism
zz Hardware: Its multiplicity and machine hardware define parallelism

6
UNIT 10: Program and Network Properties JGI JAIN
DEEMED-TO-BE UNIVERSITY

zz Control and data dependencies: It defines software dependence


zz Shared memory: It is used by control flow computers to store programme instructions and data
items
zz Data flow computer: It is directed by a programme counter, the execution of an instruction is
governed by data availability
zz Bernstein condition: It identified a set of criteria that determine whether two processes may run
concurrently

D
zz Network: The network topology might be static or dynamic

E
10.8 Self-Assessment Questions

V
A. Multiple Choice Questions

R
1. Which among the following type of data dependencies produce the same output variable?

E
a. Output dependence

S
b. Unknown dependence
c. Flow dependence E
d. Anti-dependence
R
2. __________ identified a set of criteria that determine whether two processes may run concurrently.
a. Data and resource dependencies
T

b. Bernstein’s condition
H

c. Software parallelism
d. Hardware parallelism
IG

3. Which of these is used by control flow computers to store programme instructions and data items?
a. Compiler
R

b. Algorithms
c. Shared memory
Y

d. Network
P

4. An n x n routing network connects the n __________ in the global architecture.


O

a. CFG
b. Shared memory
C

c. Data flow computer


d. Processing elements (PEs)
5. Which among the following is used several functional units within the CPU?
a. Parallelism
b. Scheduling

7
JGI JAIN
DEEMED-TO-BE UNIVERSITY
Computer Organization and Architecture

c. Program partitioning
d. System interconnection
6. If the network grows the diameter between any two nodes will be
a. Decreases
b. Slightly decrease
c. Increases
d. No change

D
7. The number of channels out of a node is called __________.

E
a. In degree
b. Out degree

V
c. Channels

R
d. Edges

E
8. Which among the following is commonly used in shifting, rotation, permutations and broadcasting?
a. Network properties

S
b. Node degree and network diameter
c. Parallelism
E
R
d. Data routing functions
9. Which of these displays the resource usage patterns of operations that are being executed at the
T

same time?
a. Scheduling
H

b. Program partitioning
IG

c. Software parallelism
d. Hardware parallelism
R

10. The __________ is retrieved from memory or registers and is called input variables.
a. Operands
Y

b. CPU
P

c. ALU
d. Memory
O

B. Essay Type Questions


C

1. Dependencies in various segment of a programme may be in various forms such as resource


dependency. What is the concept of data and resource dependencies?
2. The partitioning must be complete in such a way that the bugs in the partition are very prospective
to represent bugs in the original programme. Describe the significance of partitioning.

8
UNIT 10: Program and Network Properties JGI JAIN
DEEMED-TO-BE UNIVERSITY

3. In a dataflow computer, the instruction address replaces the programme counter. Explain the
importance of data flow computer.
4. Programs that require communication are able to communicate with each other via switched
channels in dynamic networks. Describe network properties briefly.
5. What do you understand by parallelism?

10.9 Answers AND HINTS FOR Self-Assessment Questions

D
E
A. Answers to Multiple Choice Questions

V
Q. No. Answer
1. a. Output dependence

R
2. b. Bernstein’s condition

E
3. c. Shared memory

S
4. d. Processing elements (PEs)
5. a. Parallelism E
6. c. Increases
R
7. b. Out degree
8. d. Data routing functions
T

9. d. Hardware parallelism
H

10. a. Operands
IG

B. Hints for Essay Type Questions


1. Programme is made up of several part, so the ability of executing several programme segment in
R

parallel requires that each segment should be independent other segment.


Refer to Section Condition of Parallelism
Y

2. A programme is partitioned into two (or more) portions, each signifying a subgroup of the original.
P

Refer to Section Program Partitioning and Scheduling


3. In a data flow computer, rather than being directed by a programme counter, the execution of an
O

instruction is governed by data availability.


C

Refer to Section Program Flow Mechanisms


4. A network’s topology might be static or dynamic. Refer to Section System Interconnect Architectures
5. Parallelism is a key idea in today’s computers, and the usage of several functional units within the
CPU is an example of parallelism.
Refer to Section Introduction

9
JGI JAINDEEMED-TO-BE UNIVERSITY
Computer Organization and Architecture

@ 10.10 Post-Unit Reading Material

zz https://www.nvidia.com/content/cudazone/cudau/courses/ucdavis/lectures/ilp1.pdf
zz https://examradar.com/memory-management/

10.11 Topics for Discussion Forums

D
zz Discuss with your friends and classmates about the concept of program and network properties,
parallelism conditions and program partitioning. Also, discuss about the scheduling and system

E
interconnection architectures where they used.

V
R
E
S
E
R
T
H
IG
R
Y
P
O
C

10
UNIT

11

D
E
Principles of Scalable Performance

V
R
E
S
Names of Sub-Units
E
Performance Metrics and Measures, Parallel Processing Applications, Speedup Performance Laws
R
Overview
T
H

This unit begins by discussing the concept of principles of scalable performance. Next, the unit
describes the performance metrics and measures. Further, the unit explains the parallel processing
IG

applications. Towards the end, the unit discusses the speedup performance laws.
R

Learning Objectives
Y

In this unit, you will learn to:


P

aa Discuss the concept of principles of scalable performance


aa Explain the concept of performance metrics and measures
O

aa Describe the parallel processing applications


C

aa Outline the significance of speedup performance laws


aa Interpret the concept of Amdahl’s law
JGI JAIN
DEEMED-TO-BE UNIVERSITY
Computer Organization and Architecture

Learning Outcomes

At the end of this unit, you would:


aa Evaluate the concept of principles of scalable performance
aa Assess the concept of performance metrics and measures
aa Understand the importance of parallel processing applications
aa Determine the significance of speedup performance laws

D
aa Describe the concept of Amdahl’s law

E
V
Pre-Unit Preparatory Material

R
aa http://www.nitjsr.ac.in/course_assignment/CS16CS601INTRODUCTION%20TO%20PARALLEL%20
COMPUTING.pdf

E
11.1 Introduction

S
Scalability, at its most basic level, refers to the ability to accomplish more of something. This might include
E
managing more data, responding to more user requests, or completing more tasks. While developing
software capable of performing a lot of work has its own set of challenges, designing software that can
R
do a lot of work has its own set of challenges.

11.1.1 Decrease Processing Time


T

Reduce the time it takes for individual work units to finish to increase the quantity of work an application
H

accomplishes. Reduce the time it takes to process a user request, for example, and you will be able to
handle more requests in the same amount of time. Here are several scenarios in which this theory is
IG

applicable, as well as some potential implementation techniques.


By collocating the data and the code, any overheads associated with obtaining data necessary for a
piece of work may be reduced. If the data and the code cannot be collocated, cache the data to decrease
R

the overhead of retrieving it again. Pooling: By pooling expensive resources, you may decrease the
overhead associated with their use. Decompose the task and parallelise the various stages to shorten
Y

the time it takes to accomplish a unit of work. Dividing: By partitioning the code and collocating relevant
partitions, related processes may be concentrated as near together as feasible.
P

Your possessions reduce the amount of time spent contacting distant services by making the interfaces
O

coarser-grained, for example. It is also important to remember that remote vs. local is an intentional
design decision rather than a switch, and to remember the first law of distributed computing: do not
C

share your objects.

11.2 Performance Metrics and Measures

11.2.1 Parallelism Profile in Programs


The degree of parallelism (DOP) is the number of processors employed to run a programme at any given
time, and it can change over time.

2
UNIT 11: Principles of Scalable Performance JGI JAIN
DEEMED-TO-BE UNIVERSITY

DOP implies an unlimited number of processors; this is not possible in actual computers, thus certain
parallel programme parts must be run in smaller parallel segments sequentially. Other resources might
create restrictions. A plot of DOP vs. time is called a parallelism profile as shown in Figure 1:

DOP
Average
Parallelism

D
E
Time →
t1 t2

V
Figure 1: Parallelism Profile

R
Average Parallelism – 1

E
Assume the following points for average parallelism-1:
n homogeneous processors

S
zz

zz Maximum parallelism in a profile is m


zz Ideally, n >> m
E
R
zz D, the computing capacity of a processor, is something like MIPS or Mflops w/o regard to the memory
latency, etc.
i is the number of processors busy in an observation period (for example, DOP = i )
T

zz

zz W is the total work (instructions or computations) performed by a program


H

zz A is the average parallelism in the program


IG

Average Parallelism – 2
The area under the profile curve is related to the total amount of work done:
R

t2

W = ∆ ∫ DOP(t)dt
Y

t1

∆∑
m
P

W
W==∆ ∑ ii ⋅⋅ ttii
i=1
i=1
where
where tt ii =
= total
total time
time that
that DOP = i,i, and
DOP = and
O


m

∑ tt
i=1
i=1
i
i
= tt 22 −
= − tt 11
C

Average Parallelism – 3
t
1 2
t 2 − t 1 t∫
A= DOP (t) dt
1

 m   m 
A =  ∑ i ⋅ ti  /  ∑ ti 
 i=1   i=1 

3
JGI JAIN DEEMED-TO-BE UNIVERSITY
Computer Organization and Architecture

11.2.2 Available Parallelism
Several investigations have demonstrated that parallelism in scientific and engineering computations
may be quite high (e.g. hundreds or thousands of instructions per clock cycle). But in real machines, the
actual parallelism is much smaller (e.g. 10 or 20).

11.2.3 Basic Blocks


A simple block is a set of instructions that has only one entry and exit.

In compilers, basic blocks are commonly employed as the focus of optimisers (since it’s easier to manage

D
the use of registers utilised in the block).

E
Limiting optimisation to fundamental blocks reduces the amount of parallelism that may be achieved
at the instruction level (to about 2 to 5 in typical code).

V
Asymptotic Speedup – 1

R
Wi = i∆t i
(work done when DOP = i)

E
m
W = ∑ Wi
(relates sum of Wi terms to W)

S
i=1

t i (k) = Wi / k∆

t i (∞) = Wi / i∆
E
(execution time with k processors)
R
(for 1 ≤ i ≤ m)
Asymptotic Speedup – 2
T

m m
Wi
T(1) = ∑ t 1 (1) =∑ (resp. time w/1 proc.)
i=1 i=1 ∆
H

m m
Wi
T(∞) = ∑ t i (∞) = ∑ (resp. time w/ ∞ proc.)
IG

i=1 i=1 i∆

∑ W
m
T(1)
S∞ = = mi = 1 i = A (in the ideal case)
R

T(∞) ∑ W / i
i=1 i
Y

11.2.4 Mean Performance


P

We are looking for a metric that describes the mean, or average, performance of a group of benchmark
O

programmes with a variety of execution options (for example, scalar, vector, sequential, parallel).
We may also want to assign weights to these programmes to highlight the various modes and provide a
C

more meaningful performance metric.

11.2.5 Arithmetic Mean
The arithmetic mean is a well-known concept (sum of the terms divided by the number of terms).
Execution rates given in MIPS or Mflops will be used in our calculations.
The sum of the inverses of the execution times is proportional to the arithmetic mean of a collection of
execution rates; it is not proportional to the total to execution times.

4
UNIT 11: Principles of Scalable Performance JGI JAIN
DEEMED-TO-BE UNIVERSITY

As a result, when the benchmarks are run, the arithmetic mean fails to represent the true time used by
the benchmarks.

11.2.6 Harmonic Mean


We utilise the harmonic mean execution rate instead of the arithmetic or geometric means, which is
just the inverse of the arithmetic mean of the execution time (thus guaranteeing the inverse relation
not exhibited by the other means).

D
11.2.7 Weighted Harmonic Mean
We may compute the weighted harmonic mean by associating weights fi with the benchmarks:

E
m
Rh =

V

m
i=1
(fi / Ri )

R
11.2.8 Weighted Harmonic Mean Speedup

E
T1 = 1/R1 = 1 is the sequential execution time on a single processor with rate R1 = 1.

S
Ti = 1/Ri = 1/i = is the execution time using i processors with a combined execution rate of Ri = i.

Assume there are n execution modes and corresponding weights f1... f n for a programme. It is possible
E
to calculate the weighted harmonic mean speedup as follows:
R
T* = 1 / R h*
1
S = T1 / T* = (weighted arithmetic
( ∑ i= 1 fi / Ri
m
)
T

mean excution time)


H

11.2.9 Amdahl’s Law
IG

Assume Ri = i, and w (the weights) are (a, 0… 0, 1-a).

Basically, this means the system is used sequentially (with probability a) or all n processors are used
R

(with probability 1- a).


This yields the speedup equation known as Amdahl’s law:
Y

n
Sn =
P

1 + (n − 1)α
O

The implication is that the best speedup possible is 1/ a, regardless of n, the number of processors.
C

11.2.10 Efficiency, Utilisations and Quality


System Efficiency – 1
Assume that the following definitions are correct:
zz (n) is the total number of “unit operations” carried out by an n-processor machine to complete a
programme P.
zz T (n) is the amount of time it takes to run the programme P on an n-processor machine.

5
JGI JAINDEEMED-TO-BE UNIVERSITY
Computer Organization and Architecture

zz (n) is roughly equivalent to the total number of instructions executed by the n processors, scaled by
a constant factor.
zz If we define O (1) = T (1), we may safely assume that T (n) O (n) for n > 1 if the programme P can take
advantage of the extra processor in any way (s).

System Efficiency – 2
The speedup factor (how much quicker the programme runs with n CPUs) can now clearly be represented
as:

D
S (n) = T (1) / T (n)
Recall that we expect T (n) < T (1), so S (n) ³ 1.

E
System efficiency is defined as:

V
E (n) = S (n) / n = T (1) / (n ´ T (n))

R
It indicates the actual degree of speedup achieved in a system as compared with the maximum possible
speedup. Thus 1 / n £ E (n) £ 1. The value is 1/n when only one processor is used (regardless of n), and the

E
value is 1 when all processors are fully utilised.

S
11.2.11 Redundancy
In a parallel computation, redundancy is defined as:
E
R
R (n) = O (n) / O (1)
What values can R (n) obtain?
T

R (n) = 1 when O (n) = O (1), or when the number of operations performed is independent of the number
H

of processors, n. This is the ideal case.


R (n) = n when all processors perform the same number of operations as when only a single processor is
IG

used; this implies that n completely redundant computations are performed!

11.2.12 System Utilisation


R

System utilisation is defined as:


Y

U (n) = R (n) xE (n) = O (n) / (nxT (n))


P

It indicates the degree to which the system resources were kept busy during the execution of the program.
Since 1 £ R (n) £ n, and 1 / n £ E (n) £
O

1, the best possible value for U (n) is 1, and the worst is 1 / n.


C

1 / n £ E (n) £ U (n) £ 1
1 £ R (n) £ 1 / E (n) £ n

11.2.13 Quality of Parallelism


The quality of a parallel computation is defined as Q (n) = S (n) ´ E (n) / R (n)
= T 3 (1) / (n ´ T 2 (n) ´ O (n))

6
UNIT 11: Principles of Scalable Performance JGI JAIN
DEEMED-TO-BE UNIVERSITY

This measure is directly related to speedup (S) and efficiency (E), and inversely related to redundancy
(R).
The quality measure is bounded by the speedup (that is, Q (n) £ S (n)).

11.2.14 Standard Industry Performance Measures


MIPS and Mflops, while easily understood, are poor measures of system performance, since their
interpretation depends on machine clock cycles and instruction sets. For example, which of these
machines is faster?

D
a. 10 MIPS CISC computer

E
b. 20 MIPS RISC computer

V
It is impossible to tell without knowing more details about the instruction sets on the machines. Even the
question, “which machine is faster,” is suspect, since we need to say “faster at doing what?”

R
Transactions per second (TPS) is a measure that is appropriate for online systems like those used to
support ATMs, reservation systems and point of sale terminals. The measure may include communication

E
overhead, database search and update as well as logging operations. The benchmark is also useful for
rating relational database performance.

S
KLIPS is the measure of the number of logical inferences per second that can be performed by a system,
E
presumably to relate how well that system will perform at certain AI applications. Since one inference
requires about 100 instructions (in the benchmark), a rating of 400 KLIPS is roughly equivalent to 40
R
MIPS.

11.3 Parallel Processing Applications


T

The phrase “parallel processing” is used to indicate a broad class of simultaneous data-processing
H

processes to improve the computing performance of a computer system. In addition, a parallel


processing system may process data in parallel, resulting in quicker execution times.
IG

For example, while one instruction is being performed in ALU, the following instruction might be fetched
from memory. The system may have two or more ALUs and be capable of simultaneously executing two
or more instructions. Furthermore, two or more processing is utilised to speed up computer processing
R

capacity, which grows with parallel processing and raises the system’s cost. However, technical
advancements have brought down hardware costs to the point where parallel processing methods are
Y

now cost-effective.
P

Multiple layers of complexity result in parallel processing. The type of registers utilised at the lowest
level distinguishes parallel from serial processes. Shift registers function serially with one bit at a time,
O

whereas parallel registers work with all bits of the word at the same time. Parallel processing is defined
as a set of functional units that conduct independent or comparable tasks at the same time at high
C

degrees of complexity. Parallel processing is set up by spreading data across many functional units.
For example, arithmetic, shift and logic operations may be broken down into three units, each of which
is then turned into a teach unit under the supervision of a control unit.
Figure 2 depicts one technique of splitting the execution unit into eight functional units that operate in
parallel. Operands in the registers are moved to one of the units associated with the operands, depending
on the operation indicated by the instruction. The operation done in each functional unit is indicated

7
JGI JAIN
DEEMED-TO-BE UNIVERSITY
Computer Organization and Architecture

in each block of the diagram. The adder and integer multiplier execute arithmetic operations using
integer numbers.
Three parallel circuits can be used to do floating-point calculations. On distinct data, logic, shift and
increment operations are all executed at the same time. Because all units are independent of one
another, one number is moved while another is increased. A complex control unit is usually connected
with a multi-functional organisation to coordinate all of the operations amongst the many components.
The most essential indicator of a computer’s performance is how rapidly it can run programmes. The
architecture of a computer’s hardware influences the speed with which it runs programmes. It is vital

D
to design the compiler for optimal performance. In a coordinated manner, the machine instruction set
and the hardware.

E
The whole time it takes to run the programme is called elapsed time, and it is a measure of the computer
system’s overall performance. The speed of the CPU, disc and printer all has an impact. The processor

V
time is the amount of time it takes to execute an instruction.

R
The processor time is dependent on the hardware involved in the execution of individual machine
instructions, just as the elapsed time for the execution of a programme is dependent on all units in a

E
computer system.

S
On a single IC chip, the processor and tiny cache memory can be created. Internally, the speed at
which the basic steps of instruction processing are performed on the chip is extremely rapid, and it
E
is significantly quicker than the speed at which the instruction and data are acquired from the main
memory. When the movement of instructions and data between the main memory and the processor is
R
limited, which is achieved by utilising the cache, a programme will run faster.
Consider the following scenario: In a programme loop, a set of instructions are performed repeatedly
T

over a short period. If these instructions are stored in the cache, they can be swiftly retrieved during
periods of frequent use. The same can be said for data that is used frequently. Figure 2 depicts Processor
H

with Multiple functional Units:


IG

Adder-subtractor

Integer Multiply
R

Logic Unit
Y

Shift Unit
Processor To Memory
P

Incrementer Registers
O

Floating-point Add-subtract

Floating-point Multiply
C

Floating-point Divide

Figure 2: Processor with Multiple functional Units

8
UNIT 11: Principles of Scalable Performance JGI JAINDEEMED-TO-BE UNIVERSITY

The major benefit of parallel processing is that it improves system resource usage by increasing resource
multiplicity, which increases total system throughput.

11.4 Speedup Performance Laws


Speedup can be defined as the ratio of execution time for the whole task without using the enhancement
to execution time for the whole task or as the ratio of performance time for the whole task using the
enhancement to execution time for the entire task without using the enhancement.
In parallel computing, Amdahl’s rule is frequently used to forecast the theoretical speedup when

D
employing additional processors. For example, if a programme takes 20 hours to complete using a single
thread, but only 19 hours (p = 0.95) of execution time can be parallelised, the minimum execution time

E
cannot be less than one hour, regardless of how many threads are allocated to a parallelized execution
of this programme. As a result, the potential speedup is capped at 20.

V
Times the single-thread performance,

R
 1 
 = 20 

E
 1 −p 

S
Figure 3 depicts Amdahl’s law is as follows:

20
E
Amdahl’s Law
R
18
Parallel portion
16 50%
T

75%
14 90%
95%
H

12
Speed up

10
IG

6
R

2
Y

0
1

16

32

64

128

256

512

1024

2048

4096

8192

16384

32768

65536
P

Number of processors
O

Figure 3: Amdahl’s Law


C

The key goal is to have the results as soon as possible. In other words, the major objective is to have a
quick turnaround time.
Amdahl’s law can be formulated in the following way:

1
Slatency (8) =
p
(1 − p ) +
s

9
JGI JAIN DEEMED-TO-BE UNIVERSITY
Computer Organization and Architecture

Where,
Slatency is the theoretical speedup of the execution of the whole task;
S is the speedup of the part of the task that benefits from improved system resources;
P is the proportion of execution time that the part benefiting from improved resources originally
occupied. Furthermore,

 1
Slatency (S ) ≤
1 − p

D


E
1
lim Slatency (S ) =
s →∞ 1 −p

V
Illustrates that the theoretical speedup of the entire job rises as the system’s resources improve and

R
that the theoretical speedup is always restricted by the part of the work that cannot benefit from the
improvement, regardless of the degree of the improvement.

E
Only in instances when the issue size is fixed does Amdahl’s law apply. In practise, when more computer

S
resources become available, they tend to be utilised on larger issues (larger datasets) and the time spent
on the parallelizable component of the job frequently rises considerably faster than the time spent on
the fundamentally serial task.
E
R
The following points defined three performance laws are as follows:
zz Amdahl’s law (1967) assumes a fixed workload or issue size.
T

zz Gustafson’s law (1987) is used to solve scalable issues, in which the problem size grows as the machine
size grows.
H

zz Sun and Ni’s (1993) speed-up model is for scaling issues with memory constraints.
IG

Amdahl’s Law for fixed workload, in many practical applications, have a set computing burden and
fixed issue size. The fixed workload is dispersed as the number of processors grows. Fixed-load speedup
is a type of speedup acquired for time-critical applications.
R

11.5 Conclusion
Y

Conclusion

zz Scalability refers to the ability to accomplish more of something.


P

zz The degree of parallelism (DOP) is the number of processors employed to run a programme at any
O

given time.
zz A simple block is a set of instructions that has only one entry and exit.
C

zz Transactions per second (TPS) is a measure that is appropriate for online systems like those used to
support ATMs.
zz KLIPS is the measure of the number of logical inferences per second that can be performed by a
system.
zz A parallel processing system may process data in parallel, resulting in quicker execution times.
zz Fixed-load speedup is a type of speedup acquired for time-critical applications.

10
UNIT 11: Principles of Scalable Performance JGI JAIN
DEEMED-TO-BE UNIVERSITY

11.6 Glossary

zz Scalability: It refers to the ability to accomplish more of something.


zz Degree of Parallelism (DOP): It is the number of processors employed to run a programme at any
given time.
zz Simple block: It is a set of instructions that has only one entry and exit.
zz Transactions Per Second (TPS): It is a measure that is appropriate for online systems like those used

D
to support ATMs.
zz KLIPS: It is the measure of the number of logical inferences per second that can be performed by a

E
system.
Parallel processing: It is a system that may process data in parallel, resulting in quicker execution

V
zz
times.

R
zz Fixed-load speedup: It is a type of speedup acquired for time-critical applications.

E
11.7 Self-Assessment Questions

S
A. Multiple Choice Questions E
1. Which among the following refers to the ability to accomplish more of something?
R
a. Scalability
b. Parallelism
T

c. Measures
H

d. Mean
2. A simple __________ is a set of instructions that has only one entry and exit.
IG

a. Segment
b. Block
R

c. Profile
d. Measures
Y

3. In a parallel computation, redundancy is defined as:


P

a. R (n) = 1 (n) / O (1)


b. R (n) = O (n) / O (0)
O

c. R (n) = O (n) / O (1)


C

d. R (n) = 1 (n) / O (1)


4. Which of these is a measure that is appropriate for online systems like those used to support ATMs,
reservation systems and point of sale terminals?
a. KLIPS
b. MIPS
c. ALU
d. TPS

11
JGI JAIN
DEEMED-TO-BE UNIVERSITY
Computer Organization and Architecture

5. Which among the following is used to indicate a broad class of simultaneous data-processing
processes with the aim of improving the computing performance of a computer system?
a. Parallel processing
b. Processor
c. Redundancy
d. Harmonic mean
6. Gustafson’s law is used to solve scalable issues, in which the problem size grows as the machine size
grows, in:

D
a. 1967

E
b. 1993
c. 1987

V
d. 1990

R
7. Which of the following can be defined as the ratio of execution time for the whole task without by
means of the enhancement to execution time for the whole task?

E
a. Speedup

S
b. Scalability
c. Parallelism E
d. System utilisation
R
8. Choose the option that implies an unlimited number of processors.
a. Parallel processing
T

b. Metrics
H

c. Mean performance
d. Degree of Parallelism (DOP)
IG

9. The sum of the inverses of the execution times is proportional to the__________ of a collection of
execution rates.
a. Harmonic mean
R

b. Arithmetic mean
Y

c. Weighted harmonic mean


d. Basic block
P

10. Which among the following is the execution time that the part benefiting from improved resources
O

originally occupied?
a. Segments
C

b. Blocks
c. Proportion
d. Parallel

B. Essay Type Questions


1. Explain the concept of scalability.
2. DOP implies an unlimited number of processors. Define a DOP.

12
UNIT 11: Principles of Scalable Performance JGI JAIN
DEEMED-TO-BE UNIVERSITY

3. Multiple layers of complexity result in parallel processing. Outline the significance of parallel
processing.
4. Describe the importance of speedup laws.
5. Transactions Per Second (TPS) is a measure that is appropriate for online systems like those used to
support ATMs. Examine the significance of performance measures.

11.8 Answers AND HINTS FOR Self-Assessment Questions

D
A. Answers to Multiple Choice Questions

E
V
Q. No. Answer
1. a. Scalability

R
2. b. Block

E
3. c. R (n) = O (n) / O (1)

S
4. d. TPS

5. a. Parallel processing E
6. c. 1987
R
7. a. Speedup

8. d. Degree of Parallelism (DOP)


T

9. b. Arithmetic mean
H

10. c. Proportion
IG

B. Hints for Essay Type Questions


1. Scalability, at its most basic level, refers to the ability to accomplish more of something. This might
R

include managing more data, responding to more user requests, or completing more tasks. Refer to
Section Introduction
Y

2. The degree of parallelism (DOP) is the number of processors employed to run a programme at any
given time, and it can change over time.
P

Refer to Section Performance Metrics and Measures


O

3. The phrase “parallel processing” is used to indicate a broad class of simultaneous data-processing
processes with the aim of improving the computing performance of a computer system. Refer to
C

Section Parallel Processing Applications


4. Speedup can be defined as the ratio of execution time for the whole task without by means of the
enhancement to execution time for the whole task, or as the ratio of performance time for the whole
task using the enhancement to execution time for the entire task without using the enhancement.
Refer to Section Speedup Performance Laws
5. MIPS and Mflops, while easily understood, are poor measures of system performance, since their
interpretation depends on machine clock cycles and instruction sets.

13
JGI JAINDEEMED-TO-BE UNIVERSITY
Computer Organization and Architecture

For example, which of these machines is faster?


a. 10 MIPS CISC computer
b. 20 MIPS RISC computer
Refer to Section Performance Metrics and Measures

@ 11.9 Post-Unit Reading Material

https://www.cs.umd.edu/~meesh/411/CA-online/chapter/performance-metrics/index.html

D
zz

zz https://www.sciencedirect.com/topics/computer-science/parallel-processing

E
V
11.10 Topics for Discussion Forums

R
zz Discuss with your friends and classmates the concept of scalable performance. Also, discuss about
the performance metrics and measures and parallel processing applications.

E
S
E
R
T
H
IG
R
Y
P
O
C

14
UNIT

12

D
E
Advanced Processor Technology

V
R
E
S
Names of Sub-Units
E
Introduction, Design Space of Processors, CISC Architecture, RISC Architecture, Superscalar
R
Processors, VLIW Architecture, Overview of Vector and Symbolic Processors
T

Overview
H
IG

This unit begins by discussing the concept of advanced processor technology. Next, the unit discusses
the design space for processors and CICS architecture. The unit also covers the RISC architecture,
superscalar processors and VLIW architecture. Towards the end, the unit discusses the overview of
vector and symbolic processors.
R
Y

Learning Objectives
P
O

In this unit, you will learn to:


aa Discuss the concept of advanced processor technology
C

aa Explain the concept of design space of processors and CISC architecture


aa Describe the RISC architecture and superscalar processors
aa Explain the significance of VLIW architecture
aa Analyse vector and symbolic processors
JGI JAIN
DEEMED-TO-BE UNIVERSITY
Computer Organization and Architecture

Learning Outcomes

At the end of this unit, you would:


aa Evaluate the concept of advanced processor technology
aa Assess the concept of design space of processors and CISC architecture
aa Evaluate the importance of RISC architecture and superscalar processors

D
aa Understand the significance of VLIW architecture
aa Analyse the overview of vector and symbolic processors

E
V
Pre-Unit Preparatory Material

R
E
aa https://link.springer.com/content/pdf/bfm%3A978-3-642-58589-0%2F1.pdf

S
12.1 Introduction E
The architectural families of modern CPUs are listed here. CISC, RISC, superscalar, VLIW, super pipelined,
R
vector, and symbolic processors are among the key processor families that will be discussed Numerical
computations are performed using scalar and vector computers. For AI applications, symbolic
processors have been created.
T
H

12.2 Design Space of Processors


Processor design is a branch of computer engineering and electronics engineering (fabrication)
IG

concerned with the design of a processor, which is a critical component of computer hardware.

The design process begins with the selection of an instruction set and an execution paradigm (e.g. VLIW
R

or RISC), and ends with the creation of a microarchitecture that can be specified in VHDL or Verilog.
This description is then created for microprocessor design using one of the many semiconductor device
Y

fabrication procedures, resulting in a die that is bonded to a chip carrier. This chip carrier is then
attached to a printed circuit board or put into a socket on one (PCB).
P

Any processor’s method of operation is the execution of lists of instructions. Computer or manipulate
O

data values utilising registers, update or retrieve values in read/write memory, perform relational
checks between data values, and regulate programme execution are all examples of instructions.
C

Before sending a CPU design to a foundry for semiconductor production, it is frequently tested and
validated on one or more FPGAs.

2
UNIT 12: Advanced Processor Technology JGI JAIN
DEEMED-TO-BE UNIVERSITY

The space of clock rate versus cycles per instruction is shown in Figure 1:

Multi-core, embedded, low cost,


low power High performance

D
CISC
3

E
CPI

V
2

R
RISC
1
VP

E
S
1 2 3
Clock speed (GHz)
E
R
Figure 1: Clock Rate vs. Cycles per Instruction
Processor families can be mapped onto a coordinated space of clock rate versus cycles per instruction
T

(CPI).
The clock rates of many processors have evolved from low to higher speeds toward the right of the
H

design space as implementation technology advances fast (i.e., increase in clock rate). And processor
makers have been utilising novel hardware techniques to reduce the CPI rate (cycles taken to execute
IG

an instruction).
Two main categories of processors are:
R

zz CISC (eg: X86 architecture)


RISC (e.g., Power series, SPARC, MIPS, etc.).
Y

zz
P

12.3 Complex Instruction Set Computer (CISC) Architecture


The term CISC stands for ‘’Complex Instruction Set Computer’’. It is a CPU design plan based on particular
O

commands, which are able in performing multi-step operations.


C

Products intended for multi-core processors, embedded applications, or cheap cost and/or low power
consumption tend to have lower clock rates in both CISC and RISC categories. Processors with great
performance must be designed to run at high clock rates. Vector processors have been labelled as VP;
vector processing capabilities may be found in either CISC or RISC main processors.
It consists of such computers having insignificant programs. CISC has an enormous number of multiple
instructions, which take a long time to perform. In CISC, numerous steps are performed to protect a
particular set of instructions, in which an individual set of instruction has an additional more than

3
JGI JAIN
DEEMED-TO-BE UNIVERSITY
Computer Organization and Architecture

300 discrete instructions. Many instructions are completed in either two to then machine cycles. The
pipelining of instruction in CISC is not executed easily. Figure 2 depicts basic architecture of CISC:

Instruction and
Control Unit
Data Path

Microprogrammed

D
Cache
Control Memory

E
V
Main Memory

R
E
Figure 2: Basic Architecture of CISC

S
The CISC machineries have good performances, based on the overview of program compilers; as the
range of advanced instructions are merely accessible just in a single instruction set. They design
E
multiple instructions in a single and simple set of instructions.
They accomplish processes are of low-level, also has a large no of addressing nodes and some extra
R
data types of a machine in the hardware simply. However, CICS is referred as less efficient than RISC,
for its ineffectiveness to abolish the codes of the cycles which tends to wasting. In CISC, a chip of a
T

microprocessor is complicated to understand because of the hardware complication.


The following points describe the characteristic of CISC are as follows:
H

zz Decoding of complex instruction


IG

zz The size of instruction is higher than one-word size


zz Instruction may take more than a single clock cycle to get executed.
R

zz Fewer numbers of general-purpose registers as operations becomes accomplished in memory itself


zz Complex addressing modes
Y

zz It may have several data types


P

Some of the advantages of CISC are as follows:


Smaller program size (fewer instructions)
O

zz

zz Simpler control unit design


C

Some of the downsides of CISC are as follows:


zz The amount of clock time reduces the performance of the machine, which is taken by distinct
instructions will be different.
zz In a distinctive programming only 20% of the current instructions should be used.
zz Each and every instruction takes time their setting because the conditional codes are set by the CISC
instructions.

4
UNIT 12: Advanced Processor Technology JGI JAIN
DEEMED-TO-BE UNIVERSITY

12.4 Reduced Instruction Set Computer (RISC) Architecture


The term RISC stands for ‘’Reduced Instruction Set Computer’’. It is a CPU design plan based on simple
orders and performs the instructions faster.
This is a condensed or simplified set of instructions. Every instruction is predictable to complete
extremely modest tasks here. The instruction sets of this machine are modest and straightforward,
which aid in the compilation of more sophisticated commands. Each instruction is around the same
length, and they’re strung together to do complex tasks in a single operation. The majority of commands
are executed in a single machine cycle. This pipelining is an important technique used to speed up RISC

D
machines. Figure 3 depicts the basic architecture of RISC:

E
Hardwired
Data Path

V
Control Unit

R
Instruction
Data Cache
Cache

E
S
(Instruction) (Data)
Main Memory
E
R
Figure 3: Basic Architecture of RISC
A Reduced Instruction Set Computer (RISC) is a microprocessor that executes a small number of
instructions at once. These devices use fewer transistors because they are based on tiny commands,
T

making transistor design and production less expensive. The following are some of the characteristics
H

of RISC:
zz There is less need for decoding.
IG

zz Hardware has a limited number of data kinds.


zz Identical general-purpose register
R

zz A set of uniform instructions


Addressing nodes that are simple
Y

zz

The following points describe the simple compiler design of RISC is as follows:
P

zz Small set of instructions with fixed (32 bit)


O

zz 3-5 addressing modes


Large no of general purpose registers (32-195)
C

zz

zz Use separate instruction and data cache

The following points describe the characteristics of RISC architecture are as follows:
zz Decoding of simple instruction because of simple instruction
zz The size of Instruction comes undersize of one word
zz Requirement of a single cycle to execute instructions

5
JGI JAIN
DEEMED-TO-BE UNIVERSITY
Computer Organization and Architecture

zz It has general-purpose registers


zz It consist a simple addressing modes
zz It has less data types
zz A pipeline can be accomplished

Some of the advantages of RISC architecture are as follows:


zz It has a high level language compiler to produce more effective code.
Due to its simplicity it allows liberty of consuming the space on microprocessors.

D
zz

zz It uses register for passing arguments.

E
zz It uses only few parameters and the processor of RISC cannot use call instructions.

V
zz In RISC, the operation speed is higher and the time of the execution is lower.

Some of the drawbacks of RISC architecture are as follows:

R
zz The performance of RISC processors is mostly determined by the programmer or compiler, as

E
compiler knowledge is critical when converting CISC code to RISC code.
A code extension, or changing the CISC code to a RISC code, will increase the size. The quality of this

S
zz
code extension will be determined by the compiler as well as the instruction set of the machine.
zz
E
The RISC processor’s first level cache is also a disadvantage, because these processors have enormous
memory caches on the chip itself. They need very rapid memory systems to feed the instructions.
R
Some of the problems of RISC which arises during execution are as follows:
More complicated register decoding system
T

zz

zz Hardwired control is less flexible than microcode


H

12.5 Superscalar Processors


IG

A superscalar or vector architecture can improve a CISC or RISC scalar CPU. One instruction per cycle
is executed by scalar processors. Only one instruction is issued each cycle, and the pipeline is intended
R

to complete only one instruction per cycle. In a superscalar processor, multiple instructions are issued
per cycle and multiple results are generated per cycle. A vector processor runs vector instructions on
Y

data arrays; each vector instruction consists of a series of repeated operations, making them suitable
for pipelining with one result every cycle follow the following points:
P

zz Designed to take use of parallelism at the instruction level in user pgms.


O

zz Parallel execution of independent instrs is possible.


zz A degree m superscalar processor can send out m instructions every cycle.
C

zz Superscalar processors were created as a replacement for vector processors in order to take use of
higher levels of instruction level parallelism.

12.5.1 Pipelining in Superscalar Processors


A superscalar technology and pipelining is used by a microprocessor to have a design. Pipelining allows
the processor to read a new instruction from memory previous to the end of the existing process.

6
UNIT 12: Advanced Processor Technology JGI JAIN
DEEMED-TO-BE UNIVERSITY

Figure 4 depicts a three-issue (m=3) superscalar pipeline, m instructions execute in parallel:

Ifetch Decode Execute Write


book

D
E
0 1 2 3 4 5 6 7 8 9 Time in Base Cycles

V
R
Figure 4: A Superscalar Processor of Degree m=3

E
The following points describe the degree of superscalar processor according to the above figure are as
follows:

S
zz A superscalar processor of degree m can only be fully used if m instructions can be executed in
parallel. E
zz This may not be the case for all clock cycles. In that situation, certain pipelines may be stuck in a
R
holding pattern.
zz Simple operation delay should only take one cycle in a superscalar processor, as it does in a basic
scalar processor.
T

zz The superscalar processor is more reliant on an optimising compiler to leverage parallelism due to
H

the demand for a higher degree of instruction-level parallelism in applications.


zz From the early 1990s, several notable instances of superscalar processors are included in the table
IG

below.

12.6 Very-Long Instruction Word (VLIW) Architecture


R

Very-Long Instruction Word (VLIW) architectures are referred to as VLIW architectures. It’s a good
Y

option for taking advantage of instruction-level parallelism (ILP) in programmes, especially when
running multiple basic (primitive) instructions at once.
P

These processors combine several functional units to retrieve a Very-Long Instruction Word (VLIW)
containing numerous primitive instructions from the instruction cache and dispatch the entire VLIW
O

for concurrent execution.


C

Compilers take advantage of these capabilities by producing code with many primitive instructions
that can be executed in parallel. Because they do not perform dynamic scheduling or reordering of
operations, the processors have associatively basic control logic.
VLIW’s major goal is to eliminate the convoluted instruction scheduling and parallel dispatch found in
today’s microprocessors. A VLIW processor should be faster and more efficient.

7
JGI JAIN
DEEMED-TO-BE UNIVERSITY
Computer Organization and Architecture

Figure 5 depicts the basic architecture of VLIW:

VLIW Architecture

Multiported Register File

Program Control Unit


Read/Write Cross Bar

D
E
Functional ____ Functional
Unit 1 Unit 1

V
R
Instruction Cache

E
Figure 5: Basic Architecture of VLIW

S
The numerous functional units share a common multi-ported register file for fetching the operands and
E
storing the results, as illustrated in the diagram. The read/write crossbar facilitates parallel random
access to the register file by the functional units. The load/store operation of data between a RAM and a
R
register file occurs concurrently with the execution of activities in the functional units.
Figure 6 depicts a typical VLIW processor with degree m=3:
T
H

Main
Memory Register File
IG

Load/
Store
F.P. Add Integer Branch
Unit ALU Unit
R

Unit
Y

Load/Store FP Add FP Multiple Branch Integer ALU


P

Figure 6: A Typical VLIW Processor with Degree m=3


O

The following points describe the advantages of VLIW architecture are as follows:
It has the ability to boost performance
C

zz

zz It is used to enhance the scalable


zz More implementation units can be inserted, allowing the VLIW instruction to overflow with
instructions

The following points describe the disadvantages of VLIW architecture are as follows:
zz It can be utilised by a brand-new coder
zz The programme should maintain track of the scheduling of instruction

8
UNIT 12: Advanced Processor Technology JGI JAIN
DEEMED-TO-BE UNIVERSITY

zz It has the potential to boost memory usage


zz It is utilised in situations where there is a lot of power consumption

12.7 Overview of Vector and Symbolic Processors


Vector processors are built specifically to do vector computations, and vector instructions have a
large number of operands. (An array of data will be processed in the same way.) Vector processors can
have a register-to-register architecture (which uses shorter instructions and vector register files) or a
memory-to-memory architecture (which uses longer instructions and vector register files) (use longer

D
instructions including memory address).

E
12.7.1 Vector Instructions

V
A vector instruction is a set or class of instructions that allows parallel processing in a data sets. It
is executed by a vector unit, which alike to a conservative Single Instruction Multiple Data (SIMD)

R
instruction. In a vector instruction, the array of floating point and integer’s numbers is processed with
in a single operation. It also supports recursive implementation of operation of vectors which are not
kept in a memory space.

E
S
12.7.2 Symbolic Processors
E
Because the data and knowledge representations, operations, memory, I/O, and communication aspects
in these applications differ from those in numerical computing, it is used in fields such as theorem
R
proving, pattern recognition, expert systems, machine intelligence, and so on.
Symbolic manipulators are also known as Prolog processors, Lisp processors, or Prolog processors.
T

The following points describe the attributes characteristics of symbolic processing are as follows:
Knowledge representation lists, relational databases, Semantic nets, Frames, Production systems
H

zz

zz Common operations search, sort, pattern matching, unification


IG

zz Memory requirement large memory with intensive access pattern


zz Communication pattern Message traffic varies in size, destination and format
R

zz Properties of algorithm Parallel and distributed, irregular in pattern


A Lisp programme, for example, may be thought of as a collection of functions with data passed from
Y

one to the next. Parallelism is based on the concurrent execution of these functions. Lisp’s applicative
and recursive nature need an environment that enables stack calculations and function calls quickly.
P

The use of linked lists as the primary data structure allows for the implementation of an automated
garbage collection system.
O
C

Conclusion 12.8 Conclusion

zz Processor design is a branch of computer engineering and electronics engineering concerned with
the design of a processor.
zz The term CISC stands for ‘’Complex Instruction Set Computer’’. It is a CPU design plan based on
particular commands, which are able in performing multi-step operations.
zz The term RISC stands for ‘’Reduced Instruction Set Computer’’. It is a CPU design plan based on
simple orders and performs the instructions faster.

9
JGI JAIN
DEEMED-TO-BE UNIVERSITY
Computer Organization and Architecture

zz A superscalar or vector architecture can improve a CISC or RISC scalar CPU.


zz Pipelining allows the processor to read a new instruction from memory previous to the end of the
existing process.
zz Very-Long Instruction Word (VLIW) is a good option for taking advantage of instruction-level
parallelism (ILP) in programmes.
zz Vector processors are built specifically to do vector computations, and vector instructions have a
large number of operands.

D
12.9 Glossary

E
zz Processor design: It is a branch of computer engineering and electronics engineering concerned
with the design of a processor.

V
zz Complex Instruction Set Computer (CISC): It is a CPU design plan based on particular commands,

R
which are able in performing multi-step operations.
zz Reduced Instruction Set Computer (RISC): It is a CPU design plan based on simple orders and

E
performs the instructions faster.
Superscalar: It can improve a CISC or RISC scalar CPU.

S
zz

zz Pipelining: It allows the processor to read a new instruction from memory previous to the end of the
existing process. E
Very-Long Instruction Word (VLIW): It is a good option for taking advantage of instruction-level
R
zz
parallelism (ILP) in programmes.
zz Vector processors: It is used to build specifically to do vector computations, and vector instructions
T

have a large number of operands.


H

12.10 Self Assessment Questions


IG

A. Multiple Choice Questions


R

1. Which among the following is a branch of computer engineering and electronics engineering
(fabrication) concerned with the design?
Y

a. Processor b. Operation
c. Architecture d. CPU
P

2. The clock rates of many processors have evolved from __________ speeds toward the right of the
O

design space.
a. High to low b. Low to high
C

c. Low to middle d. Middle to high


3. The term CISC stands for:
a. Complicated Instruction Set Computer
b. Complex Instruction Simple Computer
c. Complex Instruction Set Computer
d. Complete Instruction Set Computer

10
UNIT 12: Advanced Processor Technology JGI JAIN
DEEMED-TO-BE UNIVERSITY

4. Which of these in CISC of a microprocessor makes complicated to understand because of the


hardware complication?
a. Memory b. Register
c. Cache d. Chip
5. The term RISC stands for:
a. Reduced Instruction Set Computer b. Reduced Impression Set Computer
c. Reduced Instruction Set Command d. Remote Instruction Set Computer

D
6. Which among the following is good option for taking advantage of instruction-level parallelism
(ILP) in programmes?

E
a. RISC b. CISC

V
c. VLIW d. CPI
7. Which of these is correct in the context of CISC?

R
a. It has general-purpose registers

E
b. It has less data types
c. Decoding of simple instruction because of simple instruction

S
d. Instruction may take more than a single clock cycle to get executed.
8. CICS is referred as less efficient than _________.
E
R
a. VLIW b. ILP
c. RISC d. CPI
T

9. Which among the following can have a register-to-register architecture or a memory-to-memory


architecture?
H

a. Vector processors b. Vector instructions


c. Symbolic processors d. Superscalar processors
IG

10. ___________ allows the processor to read a new instruction from memory previous to the end of the
existing process.
R

a. Operation b. Processor
c. CPU d. Pipelining
Y
P

B. Essay Type Questions


1. Processor families can be mapped onto a coordinated space of clock rate versus cycles per instruction
O

(CPI). What is the concept of processor design?


2. CISC contains a large number of multiple instructions, each of which takes a long time to execute.
C

Briefly explain the relevance of CISC architecture.


3. A RISC is a microprocessor that executes a small number of instructions at once. Describe the
concept of RISC architecture and also discuss its advantages and disadvantages.
4. In a superscalar processor, multiple instructions are issued per cycle and multiple results are
generated per cycle. Discuss
5. A VLIW processor should be faster and more efficient. Describe the concept of VLIW in brief.

11
JGI JAINDEEMED-TO-BE UNIVERSITY
Computer Organization and Architecture

12.11 ANSWERS AND HINTS FOR SELF ASSESSMENT QUESTIONS

A. Answers to Multiple Choice Questions

Q. No. Answer
1. a. Processor
2. b. low to high

D
3. c. Complex Instruction Set Computer
4. d. Chip

E
5. a. Reduced Instruction Set Computer
6. c. VLIW

V
7. d. Instruction may take more than a single clock cycle to get executed.

R
8. c. RISC
9. a. Vector processors

E
10. d. Pipelining

S
B. Hints for Essay Type Questions
1. E
Processor design is a branch of computer engineering and electronics engineering (fabrication)
concerned with the design of a processor, which is a critical component of computer hardware.
R
Refer to Section Design Space of Processors
2. The term CISC stands for ‘’Complex Instruction Set Computer’’. It is a CPU design plan based on
particular commands, which are able in performing multi-step operations. Refer to Section Complex
T

Instruction Set Computer (CISC) Architecture


H

3. The term RISC stands for ‘’Reduced Instruction Set Computer’’. It is a CPU design plan based on simple
orders and performs the instructions faster. Refer to Section Reduced Instruction Set Computer
(RISC) Architecture
IG

4. A superscalar or vector architecture can improve a CISC or RISC scalar CPU. One instruction per
cycle is executed by scalar processors. Refer to Section Superscalar Processors
R

5. Very-Long Instruction Word (VLIW) architectures are referred to as VLIW architectures. It’s a good
option for taking advantage of instruction-level parallelism (ILP) in programmes, especially when
Y

running multiple basic (primitive) instructions at once. Refer to Section Very-Long Instruction Word
(VLIW) Architecture
P
O

@ 12.12 Post-Unit Reading Material

https://www.geeksforgeeks.org/computer-organization-and-architecture-pipelining-set-1-
C

zz
execution-stages-and-throughput/
zz https://alldifferences.net/difference-between-risc-and-cisc/

12.13 Topics for Discussion Forums

zz Discuss the notion of improved processing technology and pipelining with your friends and
classmates. Also, discuss the RISC, CISC, and VLIW architectures where they are used.

12
UNIT

13

D
E
Memory Hierarchy Technology

V
R
E
S
Names of Sub-Units E
Hierarchical Memory Technology, Virtual Memory Technology: Virtual Memory, TLB, Paging and
R
Segmentation, Cache Memory Organization: Cache Addressing Modes, Direct Mapping and Associative
Caches, Set-Associative, Cache Performance Issues.
T

Overview
H

The unit begins by discussing the concept of hierarchical memory technology. Next, the unit discusses
IG

the concept of virtual memory technology and virtual memory. This unit also discusses the concept of
cache addressing modes, direct mapping and associative caches. Towards the end, the unit discusses
the concept of cache performance issue.
R
Y

Learning Objectives
P

In this unit, you will learn to:


O

aa Explain the concept of hierarchical memory technology


C

aa Describe the role of virtual memory technology


aa Explain the significance of paging and segmentation
aa Define the concept of cache addressing modes
aa Describe the concept of cache performance issue
JGI JAIN
DEEMED-TO-BE UNIVERSITY
Computer Organization and Architecture

Learning Outcomes

At the end of this unit, you would:


aa Examine the concept of hierarchical memory technology
aa Evaluate the role of virtual memory technology
aa Assess the knowledge about the paging and segmentation
aa Analyse the concept of cache addressing modes

D
aa Evaluate the concept of cache performance issue

E
Pre-Unit Preparatory Material

V
R
aa http://eceweb.ucsd.edu/~gert/ece30/CN5.pdf

E
13.1 Introduction

S
A CPU, as well as a huge number of memory devices, were employed in the computer system’s architecture.
However, the primary issue is that these components are costly. As a result, memory hierarchy may be E
used to organise the system’s memory. It has various memory tiers with varying performance rates.
However, any of these can provide as precise objective, reducing access time. The memory hierarchy
R
was created in response to the program’s activity.

13.2 Hierarchical Memory Technology


T

Memory hierarchy is a feature in computer system design that helps to arrange memory such that
H

access time is reduced. The memory hierarchy was created using a programming technique called
locality of references. Figure 1 depicts the multiple layers of memory hierarchy:
IG
R

CPU
Increase in Capacity & Access Time

Registers LEVEL 0
Y
Increase in Cost per bit

Cache Memory
P

LEVEL 1
(SRAMs)
O

Main Memory LEVEL 2


(DRAMs)
C

Magnetic Disk LEVEL 3


(Disk Storage)
Optical Disk LEVEL 4
Magnetic Tape

Figure 1: Multiple Layers of Memory Hierarchy

2
UNIT 13: Memory Hierarchy Technology JGI JAIN
DEEMED-TO-BE UNIVERSITY

This memory hierarchy design is divided into two main categories, which are as follows:
zz External Memory or Secondary Memory: Refers to the storage devices which are accessible by
the processor via I/O module. The examples of secondary memory are magnetic disk, optical disk,
magnetic tape, etc.
zz Internal Memory or Primary Memory: Refers to the memory that the CPU can access directly. The
examples of internal memory are main memory, cache memory, CPU registers, etc.

The characteristics of Memory Hierarchy are as follows:


zz Capacity: It refers to the total amount of data that the memory can hold. The capacity grows as we

D
progress along the Hierarchy from top to bottom.
Access time: It’s the time between a read/write request and the data becoming available. The access

E
zz
time grows as we progress along the Hierarchy from top to bottom.

V
zz Performance: At the time of designing the computer system, if we do not consider the memory
hierarchy design, then the difference between both the CPU registers and main memory increased.

R
Hence, the system’s performance suffers, necessitating improvement. This improvement was
produced in the form of Memory Hierarchy Design, which improves the overall system performance.

E
zz Cost per bit: The cost per bit grows as we advance up the hierarchy, hence, internal memory is more
expensive than external memory.

S
13.3 Virtual Memory Technology
E
Virtual memory is an important idea in computer architecture that allows you to execute huge, complex
R
applications on a machine with a limited amount of RAM. Virtual memory allows a computer to manage
the competing needs of several applications within a limited quantity of physical memory. A PC with
T

insufficient memory may execute the same applications as one with plenty of RAM, but at a slower pace.
H

13.3.1 Physical vs Virtual Addresses


A computer’s RAM is accessed via an address system, which is simply a set of integers that identify each
IG

byte. Because the amount of memory on a computer varies, knowing which applications will run on it
can be difficult. Virtual memory overcomes this problem by treating each computer as if it had a lot
of RAM and treating each application as if it only runs on the PC. For each programme, the operating
R

system, such as Microsoft Windows or Apple’s OS X, generates a set of virtual addresses. The operating
system transforms virtual addresses to physical addresses and dynamically allocates RAM to programs
Y

when it becomes free.


P

13.3.2 Paging
O

Virtual memory divides programmes into pages of a predetermined size. The operating system loads
all of a program’s pages into RAM if the machine has enough physical memory. If not, the OS crams as
C

much as it can into the available space and executes the instructions on those pages. When the computer
finishes those pages, it puts the rest of the programme into RAM, perhaps overwriting previous pages.
Because the operating system handles these aspects automatically, the software developer may focus
on programme functionality rather than memory concerns.

13.3.3 Multiprogramming
Virtual memory combined with paging allows a computer to execute many programmes at the
same time, practically independent of the amount of RAM available. This functionality, known as

3
JGI JAIN
DEEMED-TO-BE UNIVERSITY
Computer Organization and Architecture

multiprogramming, is a significant component of current PC operating systems since it allows you


to run many utility programmes at the same time as your applications, such as Web browsers, word
editors, email, and media players.

13.3.4 Paging File


With virtual memory, the computer saves programme pages that haven’t been utilised in a while to
a paging file on the hard drive. The file preserves the data contained in the pages, and the operating
system reloads it when RAM becomes available if the application needs it again. When many applications
compete for RAM, shifting pages to the file can reduce a computer’s processing performance because

D
it spends more time managing memory and less time doing productive work. A computer should have
enough RAM to accommodate the demands of several programmes, reducing the amount of time it

E
spends maintaining its pages.

V
13.3.5 Memory Protection

R
A computer without virtual memory may nevertheless execute many programmes at the same
time, however one programme may alter the data in another programme, either unintentionally or

E
purposefully, if its addresses refer to the incorrect programme. Because a programme never “sees” its

S
physical addresses, virtual memory precludes this scenario. The virtual memory manager safeguards
data in one application from being tampered with by another.
E
13.3.6 Translation Look aside Buffer (TLB)Mechanism
R
The system includes a special cache memory called a Translation Look aside Buffer (TLB) used with the
address translation hardware. The full-page table is kept in the primary memory. When a page number
T

is translated to a page frame, the map is read from the primary memory into the TLB. The TLB entry
then contains the page number, the physical address of the page frame, and various protection bits.
H

On subsequent references to the page, the map entry will be read from the TLB rather than from the TLB
entry. This entry contains page number, of the page frame, and various protection bits. On subsequent
IG

references to the page, the map entry will be read from the TLB rather than from primary memory.
R

13.3.7 Segmentation
Segmentation allows the programmer to interpret memory as consisting of multiple address spaces or
Y

segments. Segments of the memory may be of unequal dynamic size and have a number and a length.
Segmentation simplifies the handling of increasing data structures. With segmentation, the data
P

structure can be allocated its own segment and the OS would increase or decrease the segment as and
when required.
O

The segmentation system uses a logical limit register for each segment, so references intended to be
C

within a segment cannot accidentally reference information stored in a different segment. Segmented
virtual memory systems also employ protection mechanisms in the address translation to prevent
unauthorized access to segments. Segment-based virtual memory systems provide a means for
processes to share some segments while keeping access to others private.

13.3.8 Locality of Reference


A locality of reference refers to a phenomenon that describes the same value or related storage locations
that are accessed very frequently. Processes and programmes do not execute randomly. They show an

4
UNIT 13: Memory Hierarchy Technology JGI JAIN
DEEMED-TO-BE UNIVERSITY

amount of locality in terms of time and space, and this information can be used in different manners
for improving the performance of the system. There are mainly two types of reference locality, namely,
temporal locality and spatial locality.
Temporal locality signifies the reuse of particular data or resources within a very short span of time. On
the other hand, spatial locality signifies the usage of data elements in relatively close storage locations.
The spatial locality can be referred to as sequential locality when the data elements are sequenced and
retrieved linearly, similar to traversing elements in case of a one-dimensional array.

13.4 Cache Memory Organization

D
Cache memory is a type of memory that operates at a very fast speed. It’s used to boost performance

E
and synchronise with a high-speed CPU. Cache memory is more expensive than main memory and disc
memory, but it is less expensive than CPU registers. Cache memory is a form of memory that works as a

V
buffer between RAM and the CPU and is highly fast. It stores frequently requested data and instructions
so that the CPU may access them quickly when needed.

R
The average time taken to retrieve data from the main memory is reduced when cache memory is

E
used. The cache memory is a smaller, faster memory that holds replicas of information from recently
requested addresses of the main memory.

S
Figure 2 shows the cache memory organisation:
E
Cache
R
Memory
CPU Primary Secondary
Memory Memory
T
H
IG

Figure 2: Cache Memory Organisation

13.4.1 Levels of Cache Memory


R

Different level of cache memory are as follows:


Level 1 or register: It’s a type of memory which holds and receives information and data that’s then
Y

zz
loaded in the CPU instantly. The L1 cache is usually the smallest and is incorporated into the CPU
chip. An individual L1 cache is accessible for every core in multi-core CPUs.
P

zz Level 2 or cache memory: It’s usually built into the CPU, but it’s also a standalone chip that is placed
O

between the CPU and the RAM. L2 cache memory is a sort of secondary cache memory. The L2 cache
has a larger capacity than the L1 cache. It’s placed on a processor in a computer. The CPU first looks
C

for instructions in the L1 cache, then moves on to the L2 cache, if necessary, data or instructions are
not found in the L1 cache. The cache is connected to the processor through a high-speed system bus.
zz Level 3 or main memory: In comparison to L1 and L2 caches, L3 cache is slower, but it is bigger.
Each core in a multi-core processor may have its own L1 and L2 caches, but all cores share a shared
L3 cache. The L3 cache is twice as fast as the RAM. It is the memory on which the computer is now
operating. It is tiny in size, and data is lost when the power supply is turned off.
zz Level 4 or secondary memory: L4 cache is a type of external memory that is not fast in comparison
to main memory and is now uncommon. Data retains permanently in the L4 cache memory.

5
JGI JAIN
DEEMED-TO-BE UNIVERSITY
Computer Organization and Architecture

13.4.2 Cache Addressing Modes


For mapping data and information received from the main memory to the cache memory, there are
three essential techniques, which are as follows:
zz Direct mapping
zz Associative mapping
zz Set-associative mapping

D
Direct Mapping
Direct mapping is the most basic method, which maps each block of main memory into only one potential

E
cache line. Assign each memory block to a specified line in the cache via direct mapping. If a memory
block has previously occupied a line and a new block has to be loaded, the old block is destroyed. There

V
are two elements to an address space: an index field and a tag field. The tag field is saved in the cache,
while the remainder is kept in main memory. The Hit ratio is directly related to the performance of

R
direct mapping.

E
i = j modulo m
where,

S
i=cache line number
j= main memory block number
E
m=number of lines in the cache
R
Each main memory address may be thought of as having three fields for the purposes of cache access.
Within a block of main memory, the least significant w bits indicate a unique word or byte. The address
T

in most modern computers is at the byte level. The remaining bits designate one of the main memory’s 2s
blocks. These s bits are interpreted by the cache logic as a tag of s-r bits (most significant chunk) and an
H

r-bit line field. This last field indicates one of the cache’s m=2r lines. Figure 3 shows the direct mapping:
IG

Main Tag Word-offset


Memory
R

Cache Tag Line-offset Word-offset


Memory
Y

Main Memory Pages


P

Set (0)
Block Frame Set (1)
O

Data
Cache
Control
C

Logic
Set (S-1)

CPU

Figure 3: Direct Mapping

6
UNIT 13: Memory Hierarchy Technology JGI JAIN
DEEMED-TO-BE UNIVERSITY

Associative Mapping
In this type of mapping, the information and locations of the memory block are stored with the help of
associative memory. Any block can be placed in any cache line. The word id bits are used to determine
which word in the block is required, but the tag becomes all of the remaining bits. This allows any word
to be placed wherever in the cache memory. It is said to be the quickest and most adaptable mapping
method.
Figure 4 shows the associate mapping:

D
Main Memory

E
Set (0)

V
Block Frames

R
Blk(0) Blk(1) Blk(N-1)

E
S
Cache
Control CPU
E
Logic
R
T

Figure 4: Associate Mapping


H

Set-Associative
IG

This type of mapping is an improved version of direct mapping that eliminates the disadvantages of
direct mapping. The concern of potential thrashing in the direct mapping approach is addressed by
set associative. It does this by stating that rather than having exactly one line in the cache to which a
R

block can map, we will combine a few lines together to form a set. Then a memory block can correspond
to any one of a set’s lines. Each word in the cache can have two or more words in the main memory at
Y

the same index address thanks to set-associative mapping. The benefits of both direct and associative
cache mapping techniques are combined in set associative cache mapping.
P

In this case, the cache consists of a number of sets, each of which consists of a number of lines. The
relationships are:
O

m = v * k
i= j mod v
C

where,
i=cache set number
j=main memory block number
v=number of sets
m=number of lines in the cache number of sets
k=number of lines in each set

7
JGI JAIN
DEEMED-TO-BE UNIVERSITY
Computer Organization and Architecture

Figure 5 shows the set- associate mapping:

Main Tag Word-offset


Memory

Cache
Tag Line-offset Word-offset
Memory

D
E
Main Memory

V
Block Frames

R
Blk (0) Blk (N-1)

E
Set (0)

S
E Set (N-1)
R
T

Cache
Control
Logic
H

CPU
IG

Figure5: Set-associative Mapping


R

13.4.3 Types of Cache


Y

There are two types of cache, which are as follows:


P

zz Primary cache: The CPU chip always has a primary cache. This cache is small in size and its access
latency is equivalent to CPU registers.
O

zz Secondary cache: This cache is located between the primary cache and the rest of the system’s
memory. L2 cache is another name for this cache. This cache is located on the CPU chip as well.
C

13.4.4 Cache Performance Issue


When the CPU wants to read or write data from main memory, then the CPU it looks in the cache for
a matching the equivalent entry. A cache hit occurs when the CPU discovers that the memory address
is in the cache, and data is read from the cache. A cache miss occurs when the CPU cannot locate the
memory address in the cache. When a cache miss occurs, the cache creates a new entry and copies data
from main memory, after which the operation is executed by using cache’s contents.

8
UNIT 13: Memory Hierarchy Technology JGI JAIN
DEEMED-TO-BE UNIVERSITY

Cache memory performance is measured in quantity known as Hit ratio. The formula to calculate the
hit ratio is as follows:
Hit ratio = Hit / (Hit + Miss) = No. of Hits/Total Accesses
When there are a lot of cache misses, performance issues arise. The ratio of cache misses to cache hits
is the best indicator to measure the performance of cache.

Conclusion 13.5 Conclusion

D
zz Memory Hierarchy is a feature in computer system design that helps to arrange memory such that
access time is reduced.

E
zz Virtual memory is an important idea in computer architecture that allows you to execute huge,
complex applications on a machine with a limited amount of RAM.

V
zz A computer’s RAM is accessed via an address system, which is simply a set of integers that identify

R
each byte.
Virtual memory divides programmes into pages of a predetermined size.

E
zz

zz Virtual memory combined with paging allows a computer to execute many programmes at the

S
same time, practically independent of the amount of RAM available
Cache memory is a type of memory that operates at a very fast speed. It’s used to boost performance
zz
E
and synchronise with a high-speed CPU. Cache memory is more expensive than main memory and
R
disc memory, but it is less expensive than
zz Level 1 cache is a type of memory which holds and receives information and data that’s then loaded
in the CPU instantly.
T

zz L4 cache is a type of external memory that is not fast in comparison to main memory and is now
H

uncommon.
zz Direct mapping is the most basic method, which maps each block of main memory into only one
IG

potential cache line.


zz In associative mapping, the information and locations of the memory block are stored with the help
of associative memory.
R
Y

13.6 Glossary
P

zz Memory hierarchy: A feature in computer system design that helps to arrange memory such that
access time is reduced
O

zz Locality of references: A programming technique that is used for creating memory hierarchy
C

zz Access time: It’s the time between a read/write request and the data becoming available
zz Virtual memory: It allows a computer to manage the competing needs of several applications within
a limited quantity of physical memory
zz Cache memory: A type of memory that operates at a very fast speed. It’s used to boost performance
and synchronise with a high-speed CPU

9
JGI JAIN
DEEMED-TO-BE UNIVERSITY
Computer Organization and Architecture

13.7 Self-Assessment Questions

A. Multiple Choice Questions


1. __________ is used to organise the system’s memory.
a. Memory hierarchy b. Cache memory
c. Virtual memory d. Primary memory

D
2. Which of the following programming technique is used for creating a memory hierarchy?
a. Layers b. Locality of references

E
c. Virtual Addresses d. Structural

V
3. Which of the following refers to the total amount of data that the memory can hold?

R
a. Access time b. Performance
c. Throughput d. Capacity

E
4. ____________ allows a computer to manage the competing needs of several applications within a
limited quantity of physical memory.

S
a. Virtual memory b.
E Virtual address
c. physical address d. Cache memory
R
5. ___________ is a type of memory that operates at a very fast speed. It’s used to boost performance
and synchronise with a high-speed CPU.
a. Virtual memory b. Virtual address
T

c. physical address d. Cache memory


H

6. Which of the following is a type of memory which holds and receives information and data that’s
then loaded in the CPU instantly?
IG

a. Level 2 cache b. Level 4 cache


c. Level 1 cache d. Level 3 cache
R

7. How many types of addressing mode of cache?


a. Two b. Three
Y

c. Four d. Five
P

8. Which of the following cache addressing mode maps each block of main memory into only one
potential cache line?
O

a. Associative mapping b. Set-associative mapping


C

c. Direct mapping d. Set-direct mapping


9. ____________ is a type of external memory that is not fast in comparison to main memory and is
now uncommon.
a. Level 2 cache b. Level 4 cache
c. Level 1 cache d. Level 3 cache

10
UNIT 13: Memory Hierarchy Technology JGI JAIN
DEEMED-TO-BE UNIVERSITY

10. ____________ in computer system design that helps to arrange memory such that access time is
reduced.
a. Memory hierarchy b. Cache memory
c. Virtual memory d. Primary memory

B. Essay Type Questions


1. What do you understand by the term hierarchical memory?
2. List down the characteristics of memory hierarchy.

D
3. Explain the concept of virtual memory. Also, discuss the physical and virtual address.

E
4. What do you understand by the cache memory? Also, explain the different levels of cache memory.
5. Define the direct mapping cache addressing mode.

V
R
13.8 Answers AND HINTS FOR Self-Assessment Questions

E
A. Answers to Multiple Choice Questions

Q. No.

S
Answer
1. a. Memory hierarchy
E
R
2. b. Locality of references
3. d. Capacity
T

4. a. Virtual memory
5. d. Cache memory
H

6. c. Level 1 cache
IG

7. b. Three
8. c. Direct mapping
9. b. Level 4 cache
R

10. a. Memory hierarchy


Y

B. Hints for Essay Type Questions


P

1. Memory hierarchy is a feature in computer system design that helps to arrange memory such that
access time is reduced. The Memory Hierarchy was created using a programming technique called
O

locality of references. Refer to Section Memory Hierarchy Technology


2. The characteristics of memory hierarchy are as follows:
C

 Capacity: It refers to the total amount of data that the memory can hold. The capacity grows as
we progress along the Hierarchy from top to bottom.
Refer to Section Memory Hierarchy Technology
3. Virtual memory is an important idea in computer architecture that allows you to execute huge,
complex applications on a machine with a limited amount of RAM. Virtual memory allows a
computer to manage the competing needs of several applications within a limited quantity of
physical memory. Refer to Section Virtual Memory Technology

11
JGI JAINDEEMED-TO-BE UNIVERSITY
Computer Organization and Architecture

4. Cache Memory is a type of memory that operates at a very fast speed. It’s used to boost performance
and synchronise with a high-speed CPU. Cache memory is more expensive than main memory and
disc memory, but it is less expensive than CPU registers. Refer to Section Cache Memory Organisation
5. Direct mapping is the most basic method, which maps each block of main memory into only one
potential cache line. Assign each memory block to a specified line in the cache via direct mapping.
Refer to Section Cache Memory Organisation

@ 13.9 Post-Unit Reading Material

D
zz https://www.msuniv.ac.in/Download/Pdf/19055a11803e457

E
zz https://www.gatevidyalay.com/cache-mapping-cache-mapping-techniques/

V
13.10 Topics for Discussion Forums

R
zz Discuss with your classmates about the concept of cache memory. Also, discuss with the different

E
cache addressing mode.

S
E
R
T
H
IG
R
Y
P
O
C

12
UNIT

14

D
E
SIMD Architecture

V
R
E
S
Names of Sub-Units
E
Parallel Processing, Classification of Parallel Processing, Fine-Grained SIMD Architecture, Coarse-
R
Grained SIMD Architecture.
T

Overview
H

This unit begins by discussing about the concept of SIMD architecture and parallel processing. Next,
IG

the unit discusses the classification of parallel processing. Further the unit explains the fine-grained
SIMD architecture. Towards the end, the unit discusses the coarse-grained SIMD architecture.
R

Learning Objectives
Y

In this unit, you will learn to:


P

aa Discuss the concept of SIMD architecture


O

aa Explain the concept of parallel processing


aa Describe the classification of parallel processing
C

aa Explain the significance of fine-grained SIMD architecture


aa Discuss the coarse-grained SIMD architecture
JGI JAIN
DEEMED-TO-BE UNIVERSITY
Computer Organization and Architecture

Learning Outcomes

At the end of this unit, you would:


aa Evaluate the concept of SIMD architecture
aa Assess the concept of parallel processing
aa Evaluate the importance of classification of parallel processing
aa Determine the significance of fine-grained SIMD architecture

D
aa Explore the coarse-grained SIMD architecture

E
Pre-Unit Preparatory Material

V
R
aa http://cs.ucf.edu/~ahmadian/pubs/SIMD.pdf

E
14.1 Introduction

S
SIMD (single-instruction multiple-data streams) is an abbreviation that stands for single-instruction
multiple-data streams. As indicated in the diagram, the SIMD parallel computing paradigm consists of
E
two parts: a von Neumann-style front-end computer and a processor array.
R
The processor array is a group of synchronised processing units that may conduct the same operation
on many data streams at the same time. While being processed in parallel, the dispersed data is stored
in a small piece of local memory on each processor in the array.
T

The processor array is coupled to the front end’s memory bus, allowing the front end to build local
processor memories at random as if it were another memory. Figure 1 depicts SIMD architecture model:
H
IG
R
Y

Von Neumann
P

Computer
O

Virtual Processors
C

Figure 1: SIMD Architecture Model


On the front end, a typical serial programming language can be used to design and run a programme.
The front end runs the application programme serially, but sends a problem command to the processor
array so that SIMD tasks can be run in parallel.
The similarities between serial and data-parallel programming are one of the valid aspects of data
parallelism. The processors’ lock-step synchronisation eliminates the need for synchronisation.
Processors can either do nothing or accomplish the same thing at the same moment.

2
UNIT 14: SIMD Architecture JGI JAIN
DEEMED-TO-BE UNIVERSITY

Simultaneous operations across huge amounts of data are employed in SIMD architecture to take
advantage of parallelism. This paradigm works well when dealing with challenges that require the
improvement of a significant amount of data at once. It’s got a lot of dynamism.
In SIMD machines, there are two major configurations. Each CPU has its own local memory in the
first scheme. The interconnectivity network allows processors to communicate with one another. If
the interconnection network does not allow for a direct link between two groups of processors, the
information might be exchanged through an intermediary processor.
Processors and memory modules communicate with each other via the interconnection network in the

D
second SIMD architecture. Two processors can communicate with one another via intermediate memory
modules or, in some cases, intermediary processors (s). The BSP (Burroughs’ Scientific Processor) used

E
the second SIMD scheme.

V
14.2 Parallel Processing

R
Parallel processing is a set of techniques that allows a computer system to perform multiple data-
processing tasks at the same time in order to boost the system’s computational speed.

E
A parallel processing system may process several pieces of data at the same time, resulting in a quicker
execution time. For example, the next instruction can be read from memory while an instruction is

S
being processed in the ALU component of the CPU.
E
The primary goal of parallel processing is to improve the computer’s processing capability and
throughput, or the amount of processing that can be done in a given amount of time.
R
A parallel processing system can be achieved by having a multiplicity of functional units that perform
identical or different operations simultaneously. The data can be distributed among various multiple
T

functional units. One method of dividing the execution unit into eight parallel functional units, the
operation done in each functional unit is specified in each block, as shown in Figure 2:
H
IG

Adder-Subtractor

Integer Multiply
R

Logic unit
Y

Shift Unit
P

Incrementer
To memory
Processor
O

registers Floating-point add-


subtract
C

Floating-point multiply

Floating-point divide

Figure 2: Processor with Multiple functional Units

3
JGI JAIN
DEEMED-TO-BE UNIVERSITY
Computer Organization and Architecture

The integer multiplier and adder are used to execute arithmetic operations on integer numbers. The
floating-point operations are divided into three circuits that work in tandem. On distinct data, the logic,
shift, and increment operations can all be run at the same time. Because all units are independent of
one another, one number can be moved while another is increased.

14.3 Classification of Parallel Processing


Multiprocessing can be defined using Flynn’s classification, it is based on multiplicity of instruction
stream and data streams in a computer system. An instruction stream is a sequence of instruction
executed by computer. A data stream in a sequence of data which includes input data or temporary

D
results. When designing a programme or concurrent system, several system and memory architecture
styles must be considered. It’s critical because one system a memory style may be ideal for one task but

E
error-prone for another.

V
In 1972, Michael Flynn proposed a classification system for distinct types of computer system architecture.
The following are the four different styles defined by this taxonomy:

R
zz SISD (Single Instruction Single Data)
SIMD (Single Instruction Multiple Data)

E
zz

zz MISD (Multiple Instruction Single Data)

S
zz MIMD (Multiple Instruction Multiple Data)

14.3.1 SISD Architecture


E
R
SISD is the abbreviation for “Single Instruction and Single Data Stream.” It depicts the structure of a
single computer, which includes a control unit, a processor unit, and a memory unit. The system may or
may not have internal parallel processing capability, therefore instructions are performed sequentially.
T

Like classic Von-Neumann computers, most conventional computers have SISD architecture. Multiple
H

functional units or pipeline processing can be used to achieve parallel processing in this instance. The
SISD architecture model is shown in Figure 3:
IG

IS DS
Control Processing Memory
Unit Unit Unit
R

IS
Y

SIMD
P

Processing
Unit 1 DS Memory Unit 1
O

Processing
Control FS DS Memory Unit 2
C

Unit 2
Unit .. ..
..
..
..
Processing DS Memory Unit N
Unit n
FS

Figure 3: SISD Architecture Model

4
UNIT 14: SIMD Architecture JGI JAIN
DEEMED-TO-BE UNIVERSITY

The advantages of SISD architecture are as follows:


zz It has less power.
zz A sophisticated communication protocol between several cores is not a concern.

The disadvantages of SISD architecture are as follows:


zz SISD architecture, like single-core CPUs, has a speed limit.
zz It is unsuitable for larger projects.

D
14.3.2 SIMD Architecture

E
The acronym SIMD stands for ‘Single Instruction, Multiple Data Stream.’ It symbolises an organization
with a large number of processing units overseen by a central control unit. The control unit sends the

V
same instruction to all processors, but they work on separate data. The SIMD architecture model is
shown in Figure 4:

R
Data

E
Bus 1
Control PE 1 Memory 1

S
Bus Data
Bus 2

Control
PE 2 E Memory 2
R
Unit
T

Data
Bus n
PE n Memory n
H
IG

Figure 4: SIMD Architecture Model


R

The advantages of SIMD architecture are as follows:


A single instruction can perform the same operation on numerous components.
Y

zz

zz By increasing the number of processing cores, the system’s throughput can be boosted.
P

zz The processing speed is faster than that of the SISD design.


O

The disadvantages of SIMD architecture are as follows:


There is more sophisticated communication between processor cores
C

zz

zz The cost is higher than with the SISD design.

14.3.3 MISD Architecture


MISD is an acronym that stands for “Multiple Instruction and Single Data Stream.” Because no real
system has been built using the MISD structure, it is primarily of theoretical importance. Multiple
processing units work on a single data stream in MISD. Each processing unit works independently on
the data via a distinct instruction stream.

5
JGI JAIN
DEEMED-TO-BE UNIVERSITY
Computer Organization and Architecture

Figure 5 depicts the MISD architecture:

IS 1
C.U.1 P.S.1

C.U.2 IS 2
P.S.2
DS MU1 MU2 MUn

IS n

D
C.U.n P.S.n

E
V
R
Figure 5: MISD Architecture Model

E
14.3.4 MIMD Architecture
MIMD (Multiple Instruction, Multiple Data) states to a parallel architecture, which is the most

S
fundamental and well-known type of parallel processor. The main goal of MIMD is to achieve parallelism.
E
The MIMD architecture consists of a group of N tightly connected processors. Each processor has
memory that is shared by all processors but cannot be accessed directly by other processors.
R
The processors of the MIMD architecture work independently and asynchronously. Various processors
may be performing various instructions on various pieces of data at any given time. MIMD is further
T

classified into two broad categories:


H

zz SPMD (Single Program, Multiple Data Streams)


zz MPMD (Multiple Program, Multiple Data Streams)
IG

The MIMD architecture is shown in Figure 6:


R

IS 1 IS 1 DS 1
C.U.1 P.S.1 MU1
Y
P

IS 2 IS 2 DS 2
C.U.2 P.S.2 MU2
O
C

IS n DS n
C.U.n P.S.n MUn
IS n

IS 3
IS 2
IS 1

Figure 6: MIMD Architecture Model

6
UNIT 14: SIMD Architecture JGI JAIN
DEEMED-TO-BE UNIVERSITY

Some of the advantages of MIMD are as follows:


zz Less contention
zz High scalability
zz MIMD offers flexibility

Some of the disadvantages of MIMD are as follows:


zz Load balancing
zz Deadlock situation prone

D
zz Waste of bandwidth

E
14.4 Fine-Grained SIMD Architecture

V
A programme is broken down into a high number of tiny pieces in fine-grained parallelism tasks. Many
processors are allocated to these duties independently. The quantity of the amount of labour involved

R
with a parallel job is minimal, and it is evenly divided between the processors. As a result, fine-grained
parallelism makes load balancing easier. The number of processors required to complete each operation

E
decreases as the amount of data processed decreases. The level of full processing is high. As a result,
communication and collaboration improve overhead of synchronization. In architectures that enable

S
rapid communication, fine-grained parallelism is best utilised. Fine-grained parallelism is best achieved
with a shared memory architecture with low communication overhead.
E
It is difficult for programmers to detect parallelism in a program, therefore, it is usually the compilers’
R
responsibility to detect fine-grained parallelism. An example of a fine-grained system (from outside the
parallel computing domain) is the system of neurons in our brain.
T

14.5 Coarse-Grained SIMD Architecture


H

In coarse-grained parallelism, a programme is partitioned into large jobs. Processors conduct a


substantial amount of calculation as a result. This may result in a load imbalance, with some jobs
IG

processing the majority of the data while others remain idle. Furthermore, coarse-grained parallelism
fails to harness the parallelism in the programme because the majority of the computation is executed
sequentially on a machine. This type of parallelism has the benefit of low communication and
R

synchronisation costs. In a message-passing architecture, data communication takes a long time.


Parallelism on a medium scale is comparison to fine-grained and coarse-grained parallelism, medium-
Y

grained parallelism is utilised. Medium-grained parallelism is a compromise between fine-grained


and coarse-grained parallelism, with task sizes and communication times that are larger than fine-
P

grained parallelism but smaller than coarse-grained parallelism. This is where most general-purpose
parallel computers belong. In fine-grained parallelism, Assume there are 100 processors tasked with
O

analysing a 10*10 image. The 100 processors can process the 10*10 image in one clock cycle, ignoring
the communication overhead. Each processor works on a single pixel of the image and then sends the
C

results to the others. Fine-grained parallelism is demonstrated here.


Consider a medium-grained parallelism scenario in which the 10*10 image is processed by 25 processors.
The image will now be processed in four clock cycles. This is a medium-grain parallelism example.
Furthermore, if we lower the number of processors to two, the processing will take 50 clock cycles. Each
processor must process 50 items, increasing calculation time; however, as the number of processors
sharing data drops, the communication cost lowers. Coarse-grained parallelism is demonstrated in this
example.

7
JGI JAIN DEEMED-TO-BE UNIVERSITY
Computer Organization and Architecture

Conclusion 14.6 Conclusion

zz Parallel processing is a set of techniques that allows a computer system to perform multiple data-
processing tasks at the same time.
zz SISD architecture is the structure of a single computer, which includes a control unit, a processor
unit, and a memory unit.
zz SIMD symbolises an organization with a large number of processing units overseen by a central
control unit.

D
zz Multiple processing units work on a single data stream in MISD.

E
zz MIMD (Multiple Instruction, Multiple Data) states to a parallel architecture, which is the most
fundamental and well-known type of parallel processor.

V
14.7 Glossary

R
zz Parallel processing: It is a set of techniques that allows a computer system to perform multiple

E
data-processing tasks at the same time.

S
zz SISD architecture: It depicts the structure of a single computer, which includes a control unit, a
processor unit, and a memory unit.
zz
by a central control unit.
E
SIMD architecture: It symbolises an organization with a large number of processing units overseen
R
zz MISD architecture: The multiple processing units work on a single data stream in MISD.
zz MIMD architecture: It states to a parallel architecture, which is the most fundamental and well-
T

known type of parallel processor.


H

14.8 Self-Assessment Questions


IG

A. Multiple Choice Questions


1. What does SIMD stands for?
R

a. Single Instruction, Multiple Data b. Singular Instruction, Multiple Data


Y

c. Single Input, Multiple Data d. Single Instruction, Maximum Data


2. Which among the following is a set of techniques that allows a computer system to perform multiple
P

data-processing tasks at the same time?


O

a. SIMD architecture b. Parallel processing


c. Local memory d. Memory
C

3. The primary goal of parallel processing is to improve the computer's processing which of the
following?
a. Responsibility b. Functionality
c. Capability d. Flexibility
4. Which of these functional units are used to execute arithmetic operations on integer numbers?
a. Shift unit b. Logic unit
c. Adder d. Integer multiplier

8
UNIT 14: SIMD Architecture JGI JAIN
DEEMED-TO-BE UNIVERSITY

5. What is the main goal of MIMD architecture?


a. To achieve parallelism b. To improve functionality
c. To improve speed d. To achieve scalability
6. What does SISD stands for?
a. Single Instruction, Simple Data b. Single Instruction, Single Database
c. Single Instruction, Single Data d. Singular Instruction, Single Data
7. Which among the following in SIMD architecture sends the same instruction to all processors, but

D
they work on separate data?
a. Processing unit b. Control unit

E
c. Functional unit d. Memory unit

V
8. Which of these is a disadvantage of MIMD architecture?
a. Less contention b. High scalability

R
c. Less power d. Load balancing

E
9. Which type of processing can be used to achieve parallel processing in SISD architecture?

S
a. Memory b. Pipeline
c. Interconnection d. Synchronisation
E
10. Which among the following has the benefit of low communication and synchronisation costs?
R
a. Parallel processing b. Fine-grained SIMD
c. Coarse-grained SIMD d. MIMD
T

B. Essay Type Questions


H

1. Explain the concept of SIMD architecture.


2. The primary goal of parallel processing is to improve the computer's processing capability. Describe
IG

the significance of parallel processing.


3. Multiple processing units work on a single data stream in MISD. Discuss the concept of MISD
R

architecture.
4. Describe the concept of fine-grained SIMD architecture.
Y

5. Coarse-grained type of parallelism has the benefit of low communication and synchronisation costs.
Discuss.
P
O

14.9 Answers AND HINTS FOR Self-Assessment Questions


C

A. Answers to Multiple Choice Questions

Q. No. Answer
1. a. Single Instruction, Multiple Data
2. b. Parallel processing
3. c. Capability

9
JGI JAINDEEMED-TO-BE UNIVERSITY
Computer Organization and Architecture

Q. No. Answer
4. d. Integer multiplier
5. a. To achieve parallelism
6. c. Single Instruction, Single Data
7. b. Control unit
8. d. Load balancing

D
9. b. Pipeline
10. c. Coarse-grained SIMD

E
B. Hints for Essay Type Questions

V
1. SIMD (single-instruction multiple-data streams) is an abbreviation that stands for single-instruction

R
multiple-data streams. As indicated in the diagram, the SIMD parallel computing paradigm consists
of two parts: a von Neumann-style front-end computer and a processor array. Refer to Section SIMD

E
Architecture

S
2. Parallel processing is a set of techniques that allows a computer system to perform multiple data-
processing tasks at the same time in order to boost the system’s computational speed. Refer to
Section Parallel Processing E
3. Multiple Instruction and Single Data Stream (MISD) is an acronym that stands for “Multiple
R
Instruction and Single Data Stream.” Refer to Section Classification of Parallel Processing
4. A programme is broken down into a high number of tiny pieces in fine-grained parallelism tasks.
T

Many processors are allocated to these duties independently. Refer to Section Fine-Grained SIMD
Architecture
H

5. In coarse-grained parallelism, a programme is partitioned into large jobs. Processors conduct a


substantial amount of calculation as a result. Refer to Section Coarse-Grained SIMD Architecture
IG

@ 14.10 Post-Unit Reading Material


R

zz https://www.geeksforgeeks.org/computer-organization-and-architecture-pipelining-set-1-
Y

execution-stages-and-throughput/
https://www.geeksforgeeks.org/difference-between-simd-and-mimd/
P

zz
O

14.11 Topics for Discussion Forums


C

zz Discuss with your friends and classmates about the concept of SIMD architecture. Also, try to find
some real world examples of SIMD architecture.

10
UNIT

15

D
E
Storage Systems

V
R
E
S
Names of Sub-Units E
Introduction, Types of Storage Devices, Connecting I/O Devices to CPU/Memory, RAID, I/O Performance
R
Measures.
T

Overview
H

This unit begins by discussing about the concept of storage systems. Next, the unit describes the types
IG

of storage devices and process of connecting I/O devices to CPU/Memory. Further, the unit explains
the RAID. Towards the end, the unit covers the I/O performance measures.
R

Learning Objectives
Y

In this unit, you will learn to:


P

aa Discuss the concept of storage systems


O

aa Explain the concept of types of storage devices


aa Describe the connecting I/O devices to CPU/Memory
C

aa Outline the significance of RAID


aa Explain the I/O performance measures
JGI JAIN
DEEMED-TO-BE UNIVERSITY
Computer Organization and Architecture

Learning Outcomes

At the end of this unit, you would:


aa Discuss the I/O performance measures
aa Evaluate the concept of storage systems
aa Assess the concept of types of storage devices

D
aa Evaluate the importance of connecting I/O devices to CPU/Memory

E
aa Explore the I/O performance measures

V
Pre-Unit Preparatory Material

R
https://www.maths.tcd.ie/~nora/DT315-1/Storage%20Devices.pdf

E
aa

S
15.1 Introduction
E
Many computer components are utilised to store data in computer storage. Primary storage, secondary
storage, and tertiary storage are the three types of storage usually used. In computers, a storage device
R
is used to store data.
Input and output data are the two types of digital information. The input data is provided by the users.
T

Data is output by computers. A computer’s CPU, on the other hand, cannot compute or produce data
without the user’s input.
H

Users can immediately enter the input data into a computer. They discovered early in the computer era,
IG

however, that manually entering data is time and energy consuming. Computer memory, commonly
known as Random Access Memory, is one short-term option (RAM). However, its memory retention and
storage capacity are restricted. The data in Read Only Memory (ROM) can only be read and not modified,
R

as the name implies. They are incharge of a computer’s essential functions.


Y

15.2 Types of Storage Devices


P

A storage unit is a computer system component that stores the data and instructions to be processed.
A storage device is a part of computer hardware that stores data and information so that the results of
O

computation can be processed. Without a storage device, a computer would not be able to run or even
load. A storage device, in other words, is a piece of hardware that stores, transfers, or extracts data files.
C

It can also store data and information both momentarily and permanently.

2
UNIT 15: Storage Systems JGI JAIN
DEEMED-TO-BE UNIVERSITY

Computer storage can be divided into four categories as shown in Figure 1:

Primary storage

Central processing unit

Registers Main memory Random


Memory bus access memory 256-1024
Logic unit Cache MB

D
memory

E
V
R
Secondary storage Off-line storage

E
Removable media drive

S
Mass storage CD-RW, DVD-RW, drive
device Hard disk E
20-120 GB Removable medium
CD-RW
R
650 MB
T

Tertiary Storage
H
IG

Removable Robotic access


Removable
media system
medium Removable
drive
R

medium
Y
P

Figure 1: Types of Storage Devices


O

15.2.1 Primary Memory


C

Primary memory is the main memory in a computer system where data is stored for quick access by
the CPU. This type of memory stores the data temporarily. The CPU is associated with the following two
types of primary memories:
zz Read-Only Memory (ROM): ROM is a built-in computer memory containing data that normally can
only be read but not changed. ROM contains the start-up instructions for the computer. The data

3
JGI JAIN
DEEMED-TO-BE UNIVERSITY
Computer Organization and Architecture

stored in ROM is not lost when the computer power is turned off. ROM is sustained by a small long-
life battery in your computer. The ROM chip is shown in Figure 2:

Figure 2: ROM Chip


Random-Access Memory (RAM): RAM is the main memory used in a computer system. It is an

D
zz
integrated circuit that enables you to access stored data in random order. RAM stores instructions
from the operating system, application programs and data to be processed so that they can be

E
quickly accessed by the computer’s processor. RAM is much faster to read from and write to than

V
other kinds of storage, such as hard disk, floppy disk and CD-ROM in your computer. However, data
stays in RAM only as long as your computer is turned on. When you turn the computer off, RAM loses

R
its data. Figure 3 shows a RAM chip:

E
S
E
R
Figure 3: RAM Chip
T

15.2.2 Secondary Memory


H

As discussed earlier, primary memory stores the data temporarily. To store the data permanently, you
need to use secondary memory. The secondary memory is also known as the secondary storage. The
storage capacity of secondary memory devices is measured in terms of Kilobytes (KBs), Megabytes
IG

(MBs), Gigabytes (GBs) and Terabytes (TBs). The different types of secondary storage devices are as
follows:
R

zz Floppy disk: A floppy disk is the oldest type of secondary storage device that is used to transfer
data between computers as well as store data and information. A floppy disk, made up of a flexible
Y

substance called Mylar, consists of a magnetic surface that allows data storage.
Its structure is divided into track and sectors. A track of a floppy disk consists of concentric circles,
P

which are further divided into smaller sections called sectors. The data is stored in these sectors. The
maximum storage capacity of a floppy disk is 1.44 MB. Figure 4 shows floppy disks and floppy track:
O
C

Figure 4: Floppy Disks and a Floppy Track

4
UNIT 15: Storage Systems JGI JAIN
DEEMED-TO-BE UNIVERSITY

zz Hard disk: The hard disk in your system is known as the data centre of the PC. It is used to store all
your programs and data. The hard disk is the most important storage type among various types of
secondary storage devices used in a PC, such as CD, DVD and Pen Drive. The hard disk differs from
other storage devices on three counts– size, speed and performance. A hard disk stores information
on one or more circular platters, which are continually spinning disks. These platters are coated with
a magnetic material and are stacked on top of each other with some space between each platter.
While the platter is spinning, information is recorded on the surface with the help of magnetic heads
as magnetic spots. The information is recorded in bands; each band of information is called a track.
The tracks are, in turn, divided into pie-shaped sections known as sectors. A hard disk is shown in

D
Figure 5:

E
V
Spindle
Motor

R
Platter
Read/

E
Write Heads

S
Actuator

E
R
Figure 5: Hard Disk
The common elements of a hard disk are described as follows:
T

 Platters: Platters are the actual disks inside the drive that store data. Most drives have at least
two platters. The larger the storage capacity of the drive, the more platters it contains. Each
H

platter can store data on each side. So, a drive with 2 platters has 4 sides to store data.
 Spindle and spindle motor: Platters in a drive are separated by disk spacers and are clamped
IG

to a rotating spindle that turns all the platters together. The spindle motor is built right into the
spindle or mounted directly below it and spins the platters at a constant set rate ranging from
3,600 to 7,200 rpm (rotations per minute).
R

 Read/Write heads: Read/write heads read and write data to the platters. There is typically one
head per platter side and each head is attached to a single actuator shaft so that all the heads
Y

move in unison. When one head is over a track, all the other heads are at the same location
over their respective surfaces. Typically, only one of the heads is active at a time, i.e., reading or
P

writing data.
O

 Head actuator: All the heads are attached to a single head actuator or actuator arm that moves
the heads around the platters.
C

zz Compact disc: A compact disc, also known as CD, is an optical media that is used to store digital
data. The compact discs are cheaper than other storage devices, such as hard disks or RAM. The
compact disc was developed to store and play-back sound recordings. However, later on, it came to
be used as a data storage mechanism. CDs are categorised into the following types:
 CD-ROM (Compact Disc-Read Only Memory): A CD-ROM is an optical disc that is primarily used
to store data in the form of text, images, audios and videos. The data available on such discs can
only be read by using a drive, known as a CD-ROM drive. The maximum storage capacity of a
CD-ROM disc is 700 MB of data.

5
JGI JAIN DEEMED-TO-BE UNIVERSITY
Computer Organization and Architecture

 CD-R (Compact Disc-Recordable): A CD-R has the ability to create CDs, but it can write data on
the discs only once. The data once stored in these discs cannot be erased. The CD-R technology is
sometimes called the Write Once-Read Many (WORM) technology.
 CD-RW (Compact Disc-Rewritable): CD-RW (sometimes called Compact Disc-Erasable) is used
to write data multiple times on a disc. CD-RW discs are good for data backup, data archiving, or
data distribution on CDs. Figure 6 depicts different types of CDs:

-ROM

D
CD C D -R

E
V
R
CD-RW

E
S
E
Figure 6: Different Types of CDs
R
zz Pen/Thumb Drive–Flash Memory: A pen drive is a data storage device. It is also known as a Universal
Serial Bus (USB) flash drive, which is typically a small, lightweight, removable and rewritable device.
T

It is more compact and faster than other external storage mediums and stores more data. The flash
memory neither has parts that can be shifted as in the case of the magnetic storage device nor it
H

uses lasers like optical drivers. On the other hand, the functioning of the flash memory is similar
to that of RAM, with the only difference that in the case of power failure, the data stored in a flash
IG

memory is not destroyed. USB flash drive mainly comes in two variants, that is, USB 2.0 and USB 3.0.
USB 3.0 flash drive works faster than USB 2.0. Figure 7 shows USB flash drives:
R
Y
P
O
C

Figure 7: A USB Flash Drives


zz Memory stick: A memory stick is another memory storage device. It is a removable flash memory
card that is used in electronic products, such as mobile phones and digital cameras. It is used to
store data, such as text, images, pictures and audio/video. It was launched by Sony in October 1998.
The original memory stick design had a 128 MB limitation. Memory sticks can also be called Memory
Cards. Generally, memory cards are small, lightweight, removable, and rewritable. Memory cards
are faster and highly efficient.

6
UNIT 15: Storage Systems JGI JAIN
DEEMED-TO-BE UNIVERSITY

A memory card is shown in Figure 8:

Figure 8: A Memory Stick

D
15.2.3 Tertiary Storage
Tertiary storage is also known as tertiary memory which is a subset of secondary storage. The main aim

E
of tertiary storage is to offers a large amount of data at a low cost. Numerous types of tertiary storage
devices is accessible to be used at the Hierarchical Storage Systems (HSS) level. Usually, it includes a

V
robotic mechanism that will mount (insert) and dismount removable mass storage media into a storage
device according to the system’s demands; such types of data is frequently copied to secondary storage

R
before use.

E
In a computer system, it is a part of an external storage type, which has the slowest speed. However, it is
enough capable to store a huge amount of data and also it is considered offline storage. For the backup

S
of data, this storage device is mostly used. Some of the tertiary storage devices are as follows:
Optical storage: It can be stored data into megabytes or gigabytes. A Compact Disk (CD) with a
zz
E
playtime of about 80 mins can store around 700 megabytes of data. On the other hand, a data of 8.5
R
gigabytes is stored by a Digital Video Disk (DVD) on each side of the disk.
zz Tape storage: Tape storage is one of the cheapest storage medium than disks. Tapes are generally
used for archiving or for the backup of data. It accesses the data consecutively at the beginning as
T

it delivers slow access to the data. Hence, tape storage is also referred to as consecutive-access or
sequential-access storage. It is also considered direct-access storage because we can directly access
H

data from any location on a disk.


IG

15.2.4 Offline Storage


Any non-volatile storage medium whose data cannot be accessed by the computer once removed is
R

referred to as a non-volatile storage medium. A USB thumb drive is a good example of offline storage. In
the event of unforeseeable occurrences, such as hardware failure due to a power outage or files infected
Y

by computer viruses, offline storage is utilised for transfer and backup protection. Types of offline
storage are depicted in Figure 9:
P

Types of Computer Storage


O

CPU RAM
C

Cache Primary storage

CD-RW
USB thumb drive
Tape drive
Hard drive
Secondary storage Off-line storage

Figure 9: Types of Offline Storage

7
JGI JAIN
DEEMED-TO-BE UNIVERSITY
Computer Organization and Architecture

Three different methods of storage are depicted in figure 9. Offline storage, on the other hand, is a
subset of secondary storage because they both serve the same purpose and do not interact with the CPU
directly. The main distinctions are that offline storage is utilised to physically carry information, it has
a lower capacity, and it cannot be accessed without human contact.

15.3 Connecting I/O Devices to CPU/Memory


The I/O device is directly connected to certain primary memory regions, allowing it to send and receive
data blocks without having to go via the CPU. When using memory mapped I/O, the OS creates a buffer
in memory and instructs the I/O device to use that buffer to send data to the CPU. Figure 10 depicts the

D
connections between I/O devices and the CPU and memory:

E
V
I/O Commands
CPU I/O Device

R
E
Data

S
Data
E
Memory
R
Figure 10: Connecting I/O Devices to CPU/Memory
T

Buses were previously classified as CPU-memory buses or I/O buses. I/O buses can be long, with a
H

variety of devices attached, and a wide range of data bandwidth for the devices connected to them. They
normally follow a bus schedule. On the other hand, short, high-speed CPU-memory buses are matched
to the memory system to maximise memory-CPU bandwidth. A CPU-memory bus designer knows all
IG

of the devices that must join together during the design phase, but an I/O bus designer must accept
devices with varying latency and bandwidth capabilities. To save money, some computers feature a
single bus for both memory and I/O devices. Some buses can help you improve your I/O performance.
R

Reliability
Y

Machine Instructions are commands or programmes encoded in machine code that can be recognised
P

and executed by a machine (computer). A machine instruction is a set of bytes in memory that instructs
the processor to carry out a certain task. The CPU goes through the machine instructions in the main
O

memory one by one, doing one machine action for each one.
C

A machine language programme is a collection of machine instructions stored in the main memory.
A collection of instructions performed directly by a computer’s central processing unit is known as
machine code or machine language (CPU). Each instruction performs a highly particular duty on a
unit of data in a CPU register or memory, such as a load, a jump or an ALU operation. A set of such
instructions make up any programme that is directly executed by a CPU.
Reliability, Availability and Serviceability (RAS) is a collection of related attributes that must be
measured when designing, manufacturing, purchasing or using a computer product or component.

8
UNIT 15: Storage Systems JGI JAIN
DEEMED-TO-BE UNIVERSITY

The term was first used by IBM to define specifications for their mainframe s and originally applied only
to hardware.

Availability

The ratio of time a system or component is functional to the entire time it is required or anticipated
to work is known as availability. This can be represented as a percentage or as a straight proportion
(for example, 9/10 or 0.9). (For example, 90 percent). It can also be represented as total downtime for a
particular week, month, or year, or average downtime per week, month, or year. When a large component
or combination of components fails, availability is sometimes stated in qualitative terms, showing the

D
extent to which a system can continue to function.

E
Dependability

V
In computer architecture, dependability is a measure of a system’s availability, reliability, maintainability,
and maintenance support performance, as well as, in certain circumstances, durability, safety, and

R
security.

E
15.4 Redundant Array of Independent Disks (RAID)

S
RAID can be defined as a redundant array of inexpensive disks. It points to the disk drives that are
independent and multiple in numbers. RAID level decides how the data gets distributed between these
E
drives. RAID is considered as a collection of connected hard drives setup in a way to help protect or
speed up the performance of disk storage of a computer. It is commonly used in servers and in high-
R
performance computers.
The main advantage of RAID is that the array of disks can be accessed as a single disk in the OS.
T

Moreover, RAID is fault-tolerant because there is the redundancy of data in multiple disks in most of
the RAID levels. Sometimes, if one or two disks fail, the data remains safe and the OS will not be able to
H

know about the failure. The loss of data can be prevented as the data can be recovered from that disk
that is not failed.
IG

15.4.1 Terminologies and Techniques in RAID


R

Some of the most commonly used terminologies and techniques in a RAID are discussed as follows:
zz Stripping in RAID: Writing of data on a single disk is considered a slow process. However, when we
Y

write data on multiple disks in small chunks, it is a faster process. It is easy and fast to spread data in
small amounts into different disks and also fetch in small amounts by various disks. When the data
P

is retrieved using multiple disks, the CPU does not need to wait because the combined throughput of
all the disks is used. Here, every disk drive gets divided into small amounts. Stripping in RAID ranges
O

from 4 kb to 512 kb.


Mirroring in RAID: Mirroring refers to a technique in which a copy of the same data is created on
C

zz
another disk drive. In mirroring, we create various groups of data on two disks. The main advantage
of mirroring is that it provides 100 percent redundant data. It means suppose there are two drives in
mirroring mode, then both the drives will have an exact copy of the data. In case one disk fails, then
the data is protected in another disk.
zz Parity in RAID: Parity refers to the method used for rebuilding the data when there is a failure
of one of the disks. It uses the well-known binary operation known as XOR. It is a mathematical
operation performed to get one output from two inputs.

9
JGI JAIN
DEEMED-TO-BE UNIVERSITY
Computer Organization and Architecture

15.4.2 Different Levels of RAID


RAID can be classified into different levels on the basis of its operation and the level of redundancy
provided. There are 3 levels of RAID are as follows:
zz RAID 0: RAID 0 is also called s ‘No RAID or striping’. If the main priority is performance, then RAID
0 can be selected. RAID 0 does not provide any kind of redundancy. Therefore, it should be kept in
mind that if one drive fails, the data is at risk. RAID 0 does not support data recovery. In RAID 0, the
striping is done on the disk array. The data is broken into smaller chunks and spread across all the
disks. Thus, RAID 0 utilises this technique of spanning and software striping.

D
RAID 0 has no mirroring and no parity checking of data. Thus, no redundancy is there. RAID 0
implementation requires minimum of two disks. The diagram of RAID level 0 is shown in Figure 11:

E
RAID 0

V
R
A1 A2
A3 A4

E
A5 A6

A7 A8

S
Disk 0
E Disk 1
R
Figure 11: Data Disks in RAID Level 0
zz RAID 1: RAID 1 is also called ‘Mirroring’. The data is stored twice by writing it in both the data drive
and mirror drive. Here, the data drive and mirror drive are the set of drives. In case of failure of
T

a drive, the controller uses either the data drive or the mirror drive for the recovery of data and
H

continues further operation.


In RAID 1, the read performance can be improved by reading the file in parallel. However, the write
IG

performance suffers because the data is written twice. RAID 1 provides complete redundancy of
data. RAID 1 implementation requires minimum two disks. The diagram of RAID level 0 is shown in
Figure 12:
R

RAID 0
Y

A1 A1
P

A2 A2
A3 A3
O

A4 A4
C

Disk 0 Disk 1

Figure 12: Data Disks in RAID Level 1


zz RAID 2: Similar to RAID 0, RAID 2 stripes data across. The only difference is that the data is bit
interleaved in place of block interleaved. This level uses Hamming or Error-Correcting Code (ECC) to
monitor the accuracy of the information stored in a disk. Multiple disks store the ECC information
for determining the disk that has fault in it.

10
UNIT 15: Storage Systems JGI JAIN
DEEMED-TO-BE UNIVERSITY

A parity disk is utilised to rebuild corrupted or lost data. It needs fewer disks than level 1 for providing
redundancy. It requires a few more disks; for example, if 10 data disks are there, it needs 4 check
disks plus parity disk. The diagram of RAID level 2 is shown in Figure 13:

RAID 2

A1 A2 A3 A4 Ap1 Ap2 Ap3


B1 B2 B3 B4 Bp1 Bp2 Bp3
C1 C2 C3 C4 Cp1 Cp2 Cp3

D
D1 D2 D3 D4 Dp1 Dp2 Dp3

E
Disk 0 Disk 1 Disk 2 Disk 3 Disk 4 Disk 5 Disk 6

V
Figure 13: Data Disks in RAID Level 2

R
15.5 I/O Performance Measures
I/O performance is measured in ways that are not seen in the design. In addition to these unique metrics,

E
standard performance metrics (such as reaction time and throughput) are also applicable to I/O. (I/O
bandwidth is also referred to as I/O throughput, while the reaction time is sometimes referred to as

S
latency.)
E
Many recently developed data-intensive applications have been identified as performance killers due to
the I/O system rather than the CPU and RAM. In the high-performance computing industry, evaluating
R
and comprehending the performance of I/O systems has become a hot topic. In traditional I/O
contexts, traditional I/O performance parameters, such as input/output operations per second (IOPS),
bandwidth, reaction time, and other classic I/O performance indicators are helpful. As I/O systems get
T

more complex, existing I/O metrics are becoming less and less capable of reflecting the features of I/O
system performance. We illustrate how existing measurements have limitations and introduce a new
H

I/O statistic, Blocks Per Second (BPS), to evaluate I/O system performance.
IG

Conclusion 15.6 Conclusion

A storage unit is a computer system component that stores the data and instructions to be processed.
R

zz

zz Primary memory is the main memory in a computer system where data is stored for quick access
Y

by the CPU.
zz ROM is a built-in computer memory containing data that normally can only be read but not changed.
P

zz RAM is an integrated circuit that enables you to access stored data in random order.
O

zz The secondary memory is also known as secondary storage.


zz A floppy disk is the oldest type of secondary storage device that is used to transfer data between
C

computers as well as store data and information.


zz Hard disk is used to store all your programs and data.
zz Tertiary storage is also known as tertiary memory which is a subset of secondary storage.
zz A collection of instructions performed directly by a computer's central processing unit is known as
machine code or machine language.
zz Any non-volatile storage medium whose data cannot be accessed by the computer once removed is
referred to as a non-volatile storage medium.

11
JGI JAIN
DEEMED-TO-BE UNIVERSITY
Computer Organization and Architecture

zz The ratio of time a system or component is functional to the entire time it is required or anticipated
to work is known as availability.
zz RAID can be defined as a redundant array of inexpensive disks. It points to the disk drives that are
independent and multiple in numbers.

15.7 Glossary

zz Storage unit: A computer system component that stores the data and instructions to be processed.

D
zz Primary memory: The main memory in a computer system where data is stored for quick access by
the CPU.

E
zz ROM: A built-in computer memory containing data that normally can only be read but not changed.
RAM: An integrated circuit that enables you to access stored data in a random order.

V
zz

zz Floppy disk: The oldest type of secondary storage device that is used to transfer data between

R
computers as well as store data and information is a floppy disk.
zz Hard disk: It is used to store all your programs and data.

E
zz Tertiary storage: It is also known as tertiary memory which is a subset of secondary storage.

S
zz Machine language: A collection of instructions performed directly by a computer's central
processing unit is known as machine code or machine language.
E
zz Availability: The ratio of time a system or component is functional to the entire time it is required
or anticipated.
R
zz Redundant Array Inexpensive Disks (RAID): It points to the disk drives that are independent and
multiple in numbers.
T
H

15.8 Self-Assessment Questions


IG

A. Multiple Choice Questions


1. Which among the following is a computer system component that stores the data and instructions
to be processed?
R

a. Storage devices b. Input/ Output


Y

c. CPU d. Memory
P

2. Which of these is the main memory in a computer system where data is stored for quick access by
the CPU?
O

a. Secondary storage b. Primary storage


c. Tertiary storage d. Offline storage
C

3. Which among the following is known as the data centre of the PC?
a. ROM b. RAM
c. Hard disk d. Floppy disk
4. _________ is the actual disks inside the drive that store data.
a. Head actuator b. Spindle and Spindle Motor
c. Read/Write Heads d. Platter

12
UNIT 15: Storage Systems JGI JAIN
DEEMED-TO-BE UNIVERSITY

5. A memory stick is a removable flash memory card that is used in electronic products, was launched
by Sony in October __________.
a. 1998 b. 1996
c. 1999 d. 1997
6. Which among the following is known as Universal Serial Bus (USB)?
a. Optical disk b. Tape disk
c. Pen drive d. Memory stick

D
7. The ratio of time a system or component is functional to the entire time it is required or anticipated
to work is known as __________.

E
a. Reliability b. Availability
c. Dependability d. Maintainability

V
8. Which among the following are the points to the disk drives that are independent and multiple in

R
numbers?
a. RAID

E
b. Storage devices

S
c. Memory
d. I/O performance
9. Which level of a RAID is also called ‘Mirroring’?
E
R
a. RAID 0
b. RAID 1
T

c. RAID 2
H

d. RAID
10. What does RAID stands for?
IG

a. Redundant Area of Independent Disks


b. Replicate Array of Independent Disks
R

c. Redundant Array of Independent Disks


d. Redundant Array of Internal Disks
Y

B. Essay Type Questions


P

1. Without a storage device, a computer would not be able to run or even load. Define the term storage
device.
O

2. The secondary memory is also known as the secondary storage. Describe the term secondary storage
C

with examples.
3. The OS creates a buffer in memory and instructs the I/O device to use that buffer to send data to the
CPU. Discuss
4. Numerous type of tertiary storage devices are accessible to be used at the Hierarchical Storage
Systems (HSS) level. Outline the significance of tertiary storage.
5. The main advantage of RAID is that the array of disks can be accessed as a single disk in the OS.
Explain the concept of RAID in brief.

13
JGI JAIN DEEMED-TO-BE UNIVERSITY
Computer Organization and Architecture

15.9 Answers AND HINTS FOR Self-Assessment Questions

A. Answers to Multiple Choice Questions

Q. No. Answer
1. a. Storage devices
2. b. Primary storage

D
3. c. Hard disk

E
4. d. Platter
5. a. 1998

V
6. c. Pen drive

R
7. b. Availability
8. a. RAID

E
9. b. RAID 1

S
10. c. Redundant Array of Independent Disks

B. Hints for Essay Type Questions


E
R
1. A storage unit is a computer system component that stores the data and instructions to be processed.
Refer to Section Types of Storage Devices
2. As discussed earlier, primary memory stores the data temporarily. To store the data permanently,
T

you need to use secondary memory. Refer to Section Types of Storage Devices
H

3. The I/O device is directly connected to certain primary memory regions, allowing it to send and
receive data blocks without having to go via the CPU. Refer to Section Connecting I/O Devices to
CPU/Memory
IG

4. Tertiary storage is also known as tertiary memory which is a subset of secondary storage. The main
aim of tertiary storage is to offers a large amount of data at a low cost. Refer to Section Types of
Storage Devices
R

5. RAID can be defined as a redundant array of inexpensive disks. It points to the disk drives that
Y

are independent and multiple in numbers. Refer to Section Redundant Array of Independent Disks
(RAID)
P

15.10 Post-Unit Reading Material


O

@
https://www.trustradius.com/buyer-blog/primary-vs-secondary-storage
C

zz

zz https://www.google.co.in/books/edition/Computer_Architecture/
XX69oNsazH4C?hl=en&gbpv=1&dq=storage+devices+in+COA&pg=PA679&printsec=frontcover

15.11 Topics for Discussion Forums

zz Discuss with your friends and classmates the concept of storage devices and their types. Also, try to
find some real-world examples of storage devices.

14

You might also like