MCA Computer Organization Architecture 01-Merged
MCA Computer Organization Architecture 01-Merged
01
D
Basic Structure of Computers
E
V
and Instruction Set
R
E
S
Names of Sub-Units
E
Basic Computer Structure, Computer Types, Functional Units, Basic Operational Concepts, Bus
R
Structures, Processor Clock, Basic Performance Equation, Clock Rate, Performance Measurement,
Machine Instructions, Numbers, Arithmetic Operations and Characters, Memory Location and
Addresses, Memory Operations, Instructions and Instruction Sequencing.
T
H
Overview
The unit begins by explaining the basic structure of computer. Further, it discusses the computer
IG
types, functional units, basic operational concepts. This unit explains the bus structures, processor
clock, basic performance equation, and clock rate. Next, the unit discusses the machine instructions.
The unit also discusses the number representation, arithmetic operations and character, memory
R
location and addresses. Towards the end, the unit explains the memory operations, instructions and
instruction sequencing.
Y
P
Learning Objectives
O
Learning Outcomes
D
aa Analyse the instructions and instruction sequencing
E
Pre-Unit Preparatory Material
V
http://www.ipsgwalior.org/download/computer_system_architecture.pdf
R
aa
E
1.1 Introduction
A computer is an electronic device which is used to perform a variety of operations on the basis of a set
S
of instructions called program. A computer takes input from the user in the form of data or instructions.
E
On receiving the instructions from the user, the computer processes the data and generates some output
and displays it to the user. When the computer processes data, it becomes information. A computer
R
performs a task in the same manner as we do our day-to-day activities.
In general, computer architecture encompasses three areas of computer design: computer hardware,
H
instruction set architecture, and computer organization. Electronic circuits, displays, magnetic and
optical storage media, and communication capabilities make up computer hardware.
IG
The instruction set, registers, memory management, and exception handling are all examples of
machine interfaces that are accessible to the programmer. CISC (Complex Instruction Set Computer)
and RISC (Reduced Instruction Set Computer) are the two basic techniques (Reduced Instruction Set
R
Computer).
Y
Computer organization refers to the high-level aspects of a design, such as the memory system, bus
structure, and internal CPU design.
P
A computer is a high-speed electronic calculating machine that accepts digital input, processes it
according to internally stored instructions (Programs), and outputs the result on a display device. The
C
Fetch the
instruction
2
UNIT 01: Basic Structure of Computers and Instruction Set JGI JAIN
DEEMED-TO-BE UNIVERSITY
D
LAN can also be used (local area network).
zz Super computer: A computer that is regarded as the world’s fastest. Used to complete activities that
E
would take other computers a long time to complete. Modeling weather systems, for example, or
genomic sequence.
V
zz Main frame: Large, expensive computer capable of processing data for hundreds or thousands of
users at the same time. Large amounts of data must be stored, managed, and processed in a reliable,
R
secure, and centralized manner.
E
zz Hand held: It’s also known as a PDA (Personal Digital Assistant). A pocket computer that operates
on batteries and may be used while holding the device in your hand. Appointment books, address
S
books, calculators, and notepads are all common uses.
zz E
Multi-core: Parallel computing platforms with many cores. In a single chip, there are many Cores or
computer elements. Sony Play Station, Core 2 Duo, i3, i7, and other common examples.
R
zz Note-book computers: These are compact and portable versions of PC.
T
memory unit, arithmetic and logic unit, and a control unit. The functional units of a computer system
are depicted in Figure 2:
IG
R
Input alu
Y
memory
P
output control
O
I/O processor
C
3
JGI JAINDEEMED-TO-BE UNIVERSITY
Computer Organization and Architecture
zz Memory Unit: The memory unit holds the programme instructions (code), data, and computing
results, among other things. The following are the several types of memory units:
Primary /Main Memory
Secondary /Auxiliary Memory
Primary memory is a type of semiconductor memory that allows for fast access. The main memory
stores run-time programme instructions and operands. ROM and RAM are the two types of main
memory. BIOS, POST, and I/O Drivers are examples of system applications and firmware procedures
that are required to manage a computer’s hardware. RAM stands for read/write memory or
D
user memory, and it stores programme instructions and data during execution. Primary storage
is necessary, but it is inherently volatile and costly. Additional memory could be provided as an
E
auxiliary memory at a lower cost.
Secondary memory is utilized when huge volumes of data and programmes, especially information
V
that is accessed infrequently, must be kept.
R
zz Arithmetic and Logic Unit (ALU): Adder, comparator, and other logic circuits are used in ALU to
accomplish operations such as addition, multiplication, and number comparison.
E
zz Output Unit: These are the input unit’s polar opposites. Its primary function is to communicate the
processed results to the rest of the world.
S
zz Control Unit: It is the nerve centre that transmits messages to and senses the status of other units.
E
The control unit generates the real timing signals that govern data transfer between the input unit,
processor, memory, and output unit.
R
1.2.3 Basic Operational Concepts
T
A programme containing a list of instructions is stored in the memory to accomplish a certain task.
Individual instructions are transferred from memory to the processor, which then performs the actions.
H
zz The instruction is first retrieved from memory and entered into the processor.
Y
zz
An ALU action is combined with a memory access operation in the previous add instruction. For
O
performance considerations, these two sorts of operations are done by different instructions in various
other types of computers.
C
Load LOCA, R1
Add R1, R0
Sending the address of the memory location to be accessed to the memory unit and providing the
relevant control signals initiates transfers between the memory and the processor. After that, the data
is moved to or from memory.
4
UNIT 01: Basic Structure of Computers and Instruction Set JGI JAIN
DEEMED-TO-BE UNIVERSITY
Figure 3 depicts how memory and the CPU can be linked. The processor has a number of registers in
addition to the ALU and control circuitry. These registers are utilized for a variety of reasons.
Main Memory
MAR MDR
Control
D
PC R0
E
R1
ALU
IR .
V
.
.
R
Rn-1
n General Purpose
E
Register
S
Figure 3: Connections Between the Processor and the Memory
E
The instruction register (IR) maintains track those instructions that are presently being performed.
Its output is provided to control circuits that generate timing signals to manage various processing
R
elements throughout a particular instruction’s execution.
The program counter (PC) is a customized register for keeping track of a program’s execution. It holds
T
zz When the PC is set to point at the program’s first instruction, the programme will begin to run.
zz The contents of the PC are copied to the MAR, and the memory receives a Read Control Signal.
R
zz The address word is read from memory and loaded into the MDR once the time required to access
the memory has elapsed.
Y
zz Now that the MDR’s contents have been transmitted to the IR, the instruction can be decoded and
executed.
P
zz If the instruction calls for an ALU operation, the required operands must be obtained.
O
zz A memory operand is retrieved by providing its address to MAR and starting a read cycle.
C
zz The operand is moved from the MDR to the ALU once it has been read from memory to the MDR.
zz The ALU can perform the desired action after one or two such repeated cycles.
zz The result of this operation is delivered to MDR if it is to be saved in memory.
zz The address of the result’s storage place is passed to MAR, and a write cycle is started.
zz The contents of PC are incremented so that PC points to the next to be executed instruction.
5
JGI JAIN
DEEMED-TO-BE UNIVERSITY
Computer Organization and Architecture
D
E
V
Figure 4: Single Bus Structure
R
Processor to printer as an example. Using the concept of buffer registers to keep the content during the
transmission is a typical method. Only two units can use the bus at any given moment because it can
E
only be used for one transfer at a time. Multiple demands for the usage of a single bus are arbitrated
via bus control lines.
S
Single bus structure is: E
zz Low Cost
R
zz Very flexible for attaching peripheral devices
The performance of a multiple bus arrangement is unquestionably improved, but the cost is greatly
T
increased.
H
1.3 Performance
IG
The most essential indicator of a computer’s performance is how rapidly it can run programmes. The
architecture of a computer’s hardware influences the speed with which it runs programmes. It is vital
to design the compiler for optimal performance. In a coordinated manner, the machine instruction set
and the hardware.
R
The whole time it takes to run the programme is called elapsed time, and it is a measure of the computer
Y
system’s overall performance. The speed of the CPU, disc, and printer all have an impact. The processor
time is the amount of time it takes to execute an instruction.
P
The time taken by the processor is determined by the hardware used in the execution of each machine
O
instructions, much as the time it takes to execute a programme is affected by all of the components of a
computer system. The hardware consists of the CPU and memory, which are normally coupled by a bus.
C
On a single IC chip, the processor and a tiny cache memory can be created. Internally, the speed at
which the basic steps of instruction processing are performed on the chip is extremely rapid, and it
is significantly quicker than the speed at which the instruction and data are acquired from the main
memory. When the movement of instructions and data between the main memory and the processor is
limited, which is achieved by utilising the cache, a programme will run faster.
Consider the following scenario: In a programme loop, a set of instructions are performed repeatedly
over a short period of time. If these instructions are stored in the cache, they can be swiftly retrieved
during periods of frequent use. The same can be said for data that is used frequently.
6
UNIT 01: Basic Structure of Computers and Instruction Set JGI JAIN
DEEMED-TO-BE UNIVERSITY
D
the amount of time it takes a processor to run a programme that has been prepared. In a high-level
programming language, a machine language object programme is generated by the compiler.
E
It relates to the source programme. Assume that the programme has completed its execution. N
V
machine cycle language instructions must be executed. The number N is a prime number. The number
of instructions executed is not always the same as the number of instructions written. In the object
R
programme, there are machine cycle instructions.
Depending on the input data utilised, some instructions may be executed multiple times, while others
E
may not be executed at all in the case of instructions inside a programme loop.
S
Assume that the average number of basic steps required to accomplish one machine cycle instruction
is S, and that each basic step takes one clock cycle to complete. The programme execution time is given
E
by if the clock rate is ‘R’ cycles per second.
R
N×S
T=
R
T
It’s important to note that N, S, and R are not independent parameters, and changing one can have
an impact on the others. Only if the overall consequence is to reduce the value of T will adding a new
IG
We presume that instructions are carried out in a sequential order. As a result, the value of S represents
Y
the total number of fundamental steps (or clock cycles) required to complete one instruction.
Pipelining, a technique that involves overlapping the execution of successive instructions, can result in
P
Consider Add R1 R2 R3
This sums R1 and R2’s contents and places the result in R3. The contents of R1 and R2 are first copied to
C
the ALU’s inputs. The sum is moved to R3 after the addition procedure is completed. While the addition
operation is being done, the CPU can read the following instruction from memory.
At the same time that the add instructions are sent to R3, the operand of that instruction, which also
utilises the ALU, can be supplied to the ALU inputs.
If all instructions are overlapped to the greatest extent possible, execution proceeds at the pace of one
instruction per clock cycle in the ideal scenario. Individual instructions still take a number of clock
cycles to finish. However, the effective value of S for the purpose of determining T is 1.
7
JGI JAIN
DEEMED-TO-BE UNIVERSITY
Computer Organization and Architecture
If numerous instructions pipelines are built in the CPU, a higher level of concurrency can be attained.
This means that numerous functional units are employed to create parallel channels through which
different instructions can be executed in parallel.
As a result of this arrangement, several instructions can be started in one clock cycle. Superscalar
execution is the name for this kind of action.
The effective value of S can be decreased to less than one if it can be maintained for a long time during
programme execution. However, parallel execution must maintain logical correctness of programmes,
i.e., the results produced must be identical to those produced by sequential execution.
D
The execution of programme instructions in a sequential order Many processors are now constructed
in this manner.
E
V
1.3.3 Clock Rate
These are the two options for increasing the clock rate ‘R’, which are as follows:
R
1. Improving IC technology speeds up logical circuits, reducing the time it takes to complete basic
E
steps. This allows for a reduction in the clock period P and an increase in the clock rate R.
2. Reducing the amount of processing done in a single basic step allows the clock period P to be reduced.
S
However, if the acts that an instruction must execute remain constant, the number of basic steps
required may increase. E
With the exception of the time it takes to access the main memory, increases in the value ‘R’ that are
R
solely due to improvements in IC technology effect all elements of the processor’s operation equally. The
fraction of accesses to main memory is small when cache is present. As a result, most of the performance
benefit that would otherwise be gained by using faster technology can be realised.
T
H
This could result in a large ‘N’ value and a small ‘S’ value. Individual instructions that perform more
sophisticated processes, on the other hand, will require fewer instructions, resulting in a lower value
Y
of N and a higher value of S. It’s difficult to say whether one option is better than the other. However,
combining complicated instructions with pipelining (effective value of S 1) would result in the best
P
performance. In CPUs with basic instruction sets, however, efficient pipelining is much easier to achieve.
O
8
UNIT 01: Basic Structure of Computers and Instruction Set JGI JAIN
DEEMED-TO-BE UNIVERSITY
D
A collection of instructions performed directly by a computer’s central processing unit is known as
machine code or machine language (CPU).
E
Each instruction performs a highly particular duty on a unit of data in a CPU register or memory, such
as a load, a jump, or an ALU operation. A set of such instructions make up any programme that is
V
directly executed by a CPU.
R
The general format of a machine instruction is:
E
[Label:] Mnemonic [Operand, Operand] [; Comments]
The use of brackets denotes that a field is optional. The address of the first byte of the instruction in
S
which it appears is allocated to the label.
E
It must be followed by a colon (:). The placement of spaces is arbitrary, with the exception that at least
one space must be included; otherwise, there would be ambiguity. A semicolon “; ” begins the comment
R
field.
For example:
T
1.4.1 Representation of Number
IG
Representing (or encoding) a number signifies expressing the number in binary form. The representation
of numbers is necessary for storing and manipulating the numbers efficiently. There are several ways to
represent integers in the binary form. Three binary representations of integers are listed below:
R
zz
An integer may consist of one or more digits which indicate its magnitude and may have +ve or –ve sign.
Examples of integer are +13 or 39 and –12 or –18.
In a computer, numbers are represented in the binary form as bits without using any other symbol.
Various methods are used for binary representation of numbers. One such method is known as signed
and magnitude representation.
9
JGI JAIN
DEEMED-TO-BE UNIVERSITY
Computer Organization and Architecture
In this method, the sign of a number is represented by using the leftmost bit of its binary equivalent,
which is called the Most Significant Bit (MSB). If the MSB is 0, then the sign of the number is + and if the
MSB is 1, then number sign is –.
For example, If a computer has a word size of 8 bits, then the positive number 17 in binary notation
would be represented as shown in Figure 5:
MSB is 0, which
represents the 0 0 0 1 0 0 0 1
positive sign
D
E
Figure 5: Representing a Positive Number(17) in Binary Notation
Similarly, the negative number 12 in binary notation would be represented as shown in Figure 6:
V
R
MSB is 1, which
represents the 1 0 0 0 1 1 0 0
E
negative sign
S
Figure 6: Representing a Negative Number (–12) in Binary Notation
E
In the sign and magnitude notation, if a word is represented by X bits, then the total numbers which can
be represented by X bits is 2X – 1. In the preceding example, a word is represented using 8 bits, where MSB
R
is reserved for representing sign.
Therefore, the number of maximum magnitude or value that can be represented by using 8 bits is 27 =
T
128 and the total numbers that can be represented by using 8 bits are 28 – 1 = 255. These numbers range
from –127 to 0 and 0 to +128.
H
One’s Complement
IG
In 1’s complement representation, a positive number is represented by its binary equivalent and a
negative number is represented by taking 1’s complement of its binary equivalent. The 1’s complement
of a binary number is derived by interchanging the positions of 1s and 0s in that number’s binary
R
equivalent. For example, 1’s complement of the binary number 1100 is 0011. This is called 1’s complement
because we can obtain the same number by subtracting the binary number 1100 from 1111.
Y
Let’s find out the binary equivalent of +5 and –5. In case of +5, 1’s complement (i.e. its binary equivalent)
P
is 0101. The 1’s complement of –5 is obtained by interchanging the positions of 1s and 0s in the binary
equivalent of 5. Therefore, the 1’s complement of –5 is 1010.
O
The maximum numbers that can be represented using 1’s complement is 2X – 1; where X is the number
C
of bits in a word. Therefore, an 8-bit word can have a maximum of 28 – 1 = 255 numbers. 1’s complement
can represent numbers in the range of +0(0000) and –0(1111).
For example, the 1’s complement of –12 is found in the following way:
1’s complement (or binary equivalent) of +12 = 0000 1100
By interchanging the positions of 1s and 0s in the binary equivalent +12, we get the 1’s complement of
–12 as 1111 0011.
10
UNIT 01: Basic Structure of Computers and Instruction Set JGI JAIN
DEEMED-TO-BE UNIVERSITY
1 1 1 1 1111
– 0000 1100
D
1 1 1 1 0011
E
Two’s Complement
In 2’s complement representation, a positive number is represented by its binary equivalent and a
V
negative number is represented by taking the 2’s complement form of the binary equivalent of the
negative number. To get the 2’s complement of a number, we need to add 1 to the 1’s complement of the
R
same number.
E
For example, the 2’s complement form of 5 can be found in the following way:
S
The binary equivalent of 5 is 0101.
1’s complement of 0101 = 1010 E
2’s complement of 0101 = 1010
R
+1
1011
T
Therefore 5 and –5 can be represented as 0101 (binary equivalent) and 1011 (2’s complement of 5).
Alternative Method to Represent the 2’s Complement
IG
number in Step 1.
O
For performing various operations in digital computers and in other digital system, binary arithmetic is
mandatory. Binary arithmetic includes binary addition, subtraction, multiplication and division.
Let’s now learn about rules for performing each kind of operation in detail.
Binary Addition
As the name suggests, binary addition is used for performing addition of binary numbers.
11
JGI JAIN
DEEMED-TO-BE UNIVERSITY
Computer Organization and Architecture
D
IV 1 1 0 1
In case IV, 1+1 will give 10. The 0 is retained in the same column while the 1 is carried to the next column.
E
Note that the binary addition of 1+1+1 is 11 in which 1 is retained in the same column and other 1 is
V
carried to the next column.
For example: Let’s perform binary addition on the two numbers 101112(2310) and 101101(4510).
R
Carry111111
E
0010111 (2310)
S
+ 0101101 (4510)
1000100 (6810)
E
R
Binary Subtraction
In binary subtraction, two binary numbers are subtracted. While performing subtraction, the smaller
number must be subtracted from a larger number.
T
III 1 1 0 0
Y
IV 0 1 0 1
For example: Let’s perform binary addition on the two numbers 101101(4510) and 101112(2310).
P
0100 1 0
O
0 1 0 1 100101 (4510)
– 0 0 1 0 1 1 1 (2310)
C
0 0 1 0 1 1 0 (2210)
Here, 0 becomes 102 after borrow which is equal to 2(decimal equivalent) and 12 (decimal equivalent) = 1,
therefore, 2 - 1 = 1 or 102 - 12 = 12 .
Binary Multiplication
In binary multiplication, the multiplication of binary digits is done similar to decimal numbers.
12
UNIT 01: Basic Structure of Computers and Instruction Set JGI JAIN
DEEMED-TO-BE UNIVERSITY
D
For example: Let’s perform binary multiplication on the two numbers 0101101(4510) and 00011002(1210).
E
0 1 0 1 1 0 1 (4510)
V
X 0 0 0 1 1 0 0 (1210)
0000000
R
0000000+
E
0101101 +
0101101 +
S
1 0 0 0 0 1 1 1 0 0 (54010)
Binary Division
E
R
In binary division of two numbers, the larger number is divided by the smaller. The rules of subtraction
and multiplication are obeyed while performing the division operation. The rules of binary division are
T
same as the rules of decimal number division. Let’s perform binary multiplication on the two numbers
01011012(4510) and 1012(510).
H
1 0 0 1
IG
101√1 0 1 1 0 1
-1 0 1
R
0 0 0 1 0 1
Y
-1 0 1
0 0 0
P
A computer’s memory stores number and character operands, as well as instructions. The memory is
made up of millions of storage cells, each of which can store a single bit of data with a value of 0 or 1.
Bits are rarely handled separately because they represent such a small quantity of information. The
standard method is to deal with them in groups of a predetermined size. The memory is designed in
such a way that a group of n bits can be stored or retrieved in a single, basic operation for this purpose.
A word of information is made up of n bits, with n being the length of the word.
13
JGI JAIN
DEEMED-TO-BE UNIVERSITY
Computer Organization and Architecture
n bits
First Word
Second Word
D
E
i th Word
V
R
E
Last Word
S
E
Figure 7: Memory Structure
R
In most cases, a number takes only one word. By providing its word address, it may be retrieved in
memory. Individual characters can also be retrieved using their byte addresses.
Many applications need the handling of variable-length character strings. The address of the byte
T
containing the string’s first character is used to identify the start of the string. The string’s characters
are stored in successive byte positions. The length of the string can be indicated in two ways. A special
H
control character with the meaning “end of string” is used as the last character in the string, or a
number representing the string’ length in bytes can be put in a separate memory storage address or
IG
CPU register. Memory is made up of several millions of storage cells, each of which can hold one piece
of information. In most cases, data is accessed in n-bit chunks. The number n stands for word length.
Figure 8 shows the 32-bit word length:
R
32 bits
Y
b31 b30 b1 b0
P
14
UNIT 01: Basic Structure of Computers and Instruction Set JGI JAIN DEEMED-TO-BE UNIVERSITY
Addresses for each place are required to access information from memory, whether for one word or one
byte (8-bit). Memory space refers to the 2^k memory places in a k-bit address memory, namely 0 to (2^k-
1). Assigning unique addresses to individual bit places in the memory is impracticable.
The most feasible assignment is for consecutive addresses to refer to successive byte places in memory
(byte-addressable memory). The addresses of bytes are 0, 1, 2,... If a word is 32 bits long, the following
words are placed at locations 0, 4, 8,...
D
There are two methods for assigning byte addresses across words, namely, big-endian and little-endian.
Big-endian is the most significant bytes of the word are addressed with lower byte addresses. Little-
E
endian is the antithesis of Big-Endian. For the word’s fewer important bytes, lower byte addresses are
used. Figure 9 shows the Big-endian and Little-endian assignment:
V
Word
R
Address Byte Address Byte Address
0 0 1 2 3 0 3 2 1 0
E
4 4 5 6 7 4 7 6 5 4
S
•
•
•
E •
•
•
R
T
2k – 4 2k – 4 2k – 3 2k – 2 2k – 1 2k – 4 2k – 1 2k – 2 2k – 3 2k – 4
When words begin at a byte address that is a multiple of the number of bytes in a word, they are said to
be aligned in memory.
R
The memory stores both programme instructions and data operands. The processor control circuits
must cause the word (or words) holding the instruction to be moved from memory to the processor in
C
15
JGI JAIN
DEEMED-TO-BE UNIVERSITY
Computer Organization and Architecture
The processor initiates a Load operation by sending the address of the target place to memory and
requesting that its contents be read. The data stored at that address is read from the memory and sent
to the CPU.
The store operation copies data from the processor to a specified memory location, erasing the contents
of that location before. The processor transfers the address of the desired place, as well as the data to be
put into that position, to the memory.
In a single operation, an information item of one word or one byte can be transmitted from the CPU to
the memory. This is a transfer between the CPU register and the main memory.
D
Load (or Read or Fetch)
E
zz Make a copy of the material. The memory’s content remains constant.
zz Address – Load
V
zz The usage of registers is possible.
R
Store (or Write)
In memory, overwrite the material.
E
zz
S
zz Registers can be used
zz I/O transfers
There are two type of notation that is used in instruction sequencing. These notation are as follows:
R
zz Register transfer notation: Information is transferred from one computer place to another.
Memory locations, CPU registers, or registers in the I/O subsystem are all possible places that may
Y
be implicated in such transfers. We usually refer to a location by a symbolic name that represents
its hardware binary address.
P
zz Assembly language notation: Another way to express machine instructions and programmes is to
O
use notation. We utilise an assembly language format for this. The statement, for example, specifies
an instruction that causes the transfer indicated above, from memory address LOC to processor
C
register R1.
Move LOC, R1
The old values of register R1 are rewritten when this instruction is executed, while the contents of
LOC remain unaltered.
The assembly language line may be used to specify the second example of adding two integers in
processor registers R1 and R2 and storing their sum in R3.
16
UNIT 01: Basic Structure of Computers and Instruction Set JGI JAIN
DEEMED-TO-BE UNIVERSITY
zz A computer is a high-speed electronic calculating machine that accepts digital input, processes it
according to internally stored instructions (Programs), and outputs the result on a display device.
zz In general, computer architecture encompasses three areas of computer design: computer hardware,
instruction set architecture, and computer organization.
zz A computer consists of five functional units: an input unit, an output unit, a memory unit, arithmetic
and logic unit, and a control unit.
D
zz The memory unit holds the programme instructions (Code), data, and computing results, among
E
other things.
zz A programme containing a list of instructions is stored in the memory to accomplish a certain task.
V
zz A bus is a collection of lines that act as a connecting path for multiple devices (one bit per line).
R
zz Clock is a timing signal that controls processor circuits.
Machine Instructions are commands or programmes encoded in machine code that can be
E
zz
recognised and executed by a machine (computer).
S
zz Representing (or encoding) a number signifies expressing the number in binary form.
zz Binary arithmetic includes binary addition, subtraction, multiplication and division.
E
R
1.6 Glossary
zz Computer: An electronic calculating machine that accepts digital input, processes it according to
T
internally stored instructions (Programs), and outputs the result on a display device.
H
zz Programme: It contains a list of instructions is stored in the memory to accomplish a certain task.
zz Bus: It is a collection of lines that act as a connecting path for multiple devices (one bit per line).
IG
zz Memory Operation: The memory stores both programme instructions and data operands.
P
17
JGI JAIN
DEEMED-TO-BE UNIVERSITY
Computer Organization and Architecture
2. Which of the following is a small, portable computer that can be powered by a power supply or a
battery?
a. Laptop computer b. Desktop computer
c. Workstation d. Super computer
3. Which of the following computer is also known as Personal Digital Assistant?
a. Laptop computer b. Mainframe computer
c. Handheld computer d. Super computer
D
4. A computer’s memory stores:
a. Number b. Character operands
E
c. Instructions d. All of these
V
5. A __________ is a collection of lines that act as a connecting path for multiple devices (one bit per
line).
R
a. Circuit b. Bus
E
c. Memory d. Processor
6. Clock cycles are regular time periods created by the ______________.
S
a. Programmer b. Circuit programmer
c. Clock designer
E
d. CPU
R
7. Which of the following technique that involves overlapping the execution of successive instructions,
can result in significant speed gains?
a. Clock rate b. Pipelining
T
10. Which of the following is used to communicate the processed results to the rest of the world?
O
18
UNIT 01: Basic Structure of Computers and Instruction Set JGI JAIN
DEEMED-TO-BE UNIVERSITY
4. Encoding a number signifies expressing the number in binary form. What are the three binary
representations of integers?
5. The memory stores both programme instructions and data operands. Discuss the concept of memory
operations.
D
Q. No. Answer
E
1. b. SCSI bus2. a. Laptop computer
V
3. c. Handheld computer
R
4. d. All of these
5. b. Bus
E
6. c. Clock designer
S
7. b. Pipelining
8.
9.
a.
d.
Machine code
All of these
E
R
10. b. Output unit
10. a. Single bus
T
Micro Computer: A personal computer is one that is designed to satisfy the needs of a single
person. Access to a wide range of computer applications, including word processing, photo
editing, e-mail, and the internet.efer to Section Basic Computer Structure
R
2. In its most basic form, a computer consists of five functional units: an input unit, an output unit, a
memory unit, arithmetic and logic unit, and a control unit. Refer to Section Basic Computer Structure
Y
3. A machine instruction is a set of bytes in memory that instructs the processor to carry out a certain
P
task. The CPU goes through the machine instructions in main memory one by one, doing one machine
action for each one. A machine language programme is a collection of machine instructions stored
O
There are several ways to represent integers in the binary form. Three binary representations of
integers are listed below:
Signed and magnitude representation
1’s (or one’s) complement
2’s (or two’s) complement
Refer to Section Machine Instructions
19
JGI JAINDEEMED-TO-BE UNIVERSITY
Computer Organization and Architecture
5. The processor control circuits must cause the word (or words) holding the instruction to be moved
from memory to the processor in order for it to be executed. Operands and results should be
transmitted between the memory and the CPU. Hence, two basic memory operations need to the
executed: load (or read or fetch) and store (or Write). Refer to Section Machine Instructions
zz https://www.studocu.com/in/document/psg-college-of-technology/computer-architecture/m-
D
morris-mano-solution-manual-computer-system-architecture/10775236
zz http://sites.alaqsa.edu.ps/ekhsharaf/wp-content/uploads/sites/333/2016/03/machine-instruction-
E
and-programs.pdf
V
1.10 Topics for Discussion Forums
R
Do the online research on the basics of computer and make a presentation on in-depth knowledge of
E
zz
internal working, structuring, and implementation of a computer system. Also, discuss the concept
S
cover in the presentation with your classmates.
E
R
T
H
IG
R
Y
P
O
C
20
UNIT
02
D
E
Machine Instructions
V
and Programs
R
E
S
Names of Sub-Units
E
Introduction to Machine Instructions and Programs, Addressing Modes, Assembly Language, Basic
R
Input and Output Operations, Subroutines, Additional Instructions, Encoding of Machine Instructions.
T
Overview
H
This unit begins by discussing the machine instructions and programs. Next, the unit discusses the
addressing modes and assembly language. Further the unit explains the basic input and output
IG
operations. This unit also discusses the subroutines. Towards the end, the unit discusses the encoding
of machine instructions.
R
Learning Objectives
Y
aa Define the machine instructions, data, and programmes are represented in assembly language.
aa Discuss the Input/Output activities that are controlled by a programme
C
Learning Outcomes
D
aa Assess the machine instructions and programme execution
E
Pre-Unit Preparatory Material
V
http://www.mhhe.com/engcs/electrical/hamacher/5e/graphics/ch02_025-102.pdf
R
aa
E
2.1 Introduction
S
Machine instructions and operand addressing information are represented by symbolic names in
assembly language. The instruction set architecture (ISA) of a CPU refers to the whole instruction set.
E
Machine instructions and operand addressing techniques that are common in commercial processors.
We need to provide a sufficient amount of instructions and addressing techniques to present complete,
R
realistic programmes for simple tasks. The assembly language is used to specify these generic
programmes.
T
We’ve now looked at a few simple assembly language applications. A programme, in general, works with
data stored in the computer’s memory. These records can be arranged in a number of ways. We can keep
IG
track of pupils’ names by writing them down on a list. We may organise this information in the form of a
table if we wish to connect information with each name, such as phone numbers or grades in particular
courses. Data structures are used by programmers to represent the data utilised in computations. Lists,
linked lists, arrays, queues, and other data structures are examples.
R
Constants, local and global variables, pointers, and arrays are often used in high-level programming
Y
languages. The compiler must be able to implement these structures utilising the capabilities given in
the instruction set of the machine on which the programme will be run when converting a high-level
P
language programme into assembly language. Addressing modes relate to the many methods in which
an operand’s location is stated in an instruction.
O
2
UNIT 02: Machine Instructions and Programs JGI JAIN
DEEMED-TO-BE UNIVERSITY
D
Auto increment (Ri)+ EA=[Ri]; Increment Ri
Auto decrement =(Ri) Decrement Ri EA=[Ri]
E
The description of these addressing modes is as follows:
V
zz Register mode: The operand is the contents of a processor register; the register’s name (address) is
specified in the instruction.
R
Instruction Register
E
Register Data
S
zz Absolute mode: The operand is stored in memory, and the location’s address is specified directly in
E
the instruction. (This mode is known as Direct in various assembly languages.)
R
Move LOC,R2
zz Immediate mode: The operand is given explicitly in the instruction.
T
Register R0 with the value 200. Obviously, the Immediate mode is only used to express a source
IG
operand’s value. In assembly languages, using a subscript to signify the Immediate mode is not
acceptable. The use of the sharp sign (#) in front of a value to signal that it is to be used as an
immediate operand is a frequent convention.
R
Move #200,R0
In high-level language applications, constant values are commonly employed. Consider the following
P
example:
O
A=B+6
holds the value of 6. This statement can be compiled as follows, assuming A and B have been declared
C
Move B,R1
Add #6,R1
Move R1,A
In assembly language, constants are also used to increment a counter, test for a bit pattern, and
so on.
3
JGI JAIN
DEEMED-TO-BE UNIVERSITY
Computer Organization and Architecture
zz Indirect mode: The contents of a register or memory location whose address is specified in the
instruction are the operand’s effective address.
Indirect method is divided into two types based on the availability of effective address:
1. Register indirect: In this method, the effective address is kept in the register, and the matching
register name is kept in the instruction’s address field. To access the data, one register reference
and one memory reference are required.
2. Memory Indirect: In this method, the effective address is stored in memory, and the matching
memory address is stored in the instruction’s address field. To access the data, two memory
D
references are necessary.
zz Index mode: By adding a constant value to the contents of a register, the operand’s effective address
E
is produced.
V
Example: MOV AX, [SI +05]
Relative mode: The Index mode determines the effective address by utilising the programme counter
R
zz
instead of the general-purpose register Ri.
E
zz Auto increment mode: The contents of a register provided in the instruction are the operand’s
effective address. The contents of this register are automatically increased to refer to the next item
S
in a list after accessing the operand.
E
Add R1, (R2)+ // OR
R1 = R1 +M[R2]
R
R2 = R2 + d
Auto decrement mode: The contents of a register provided in the instruction are decremented
T
zz
automatically before being utilised as the operand’s effective address.
H
R2 = R2-d
R1 = R1 + M[R2]
R
Patterns of 0s and 1s are used to represent machine instructions. When discussing or planning plans, such
tendencies are difficult to cope with. As a result, we refer to the patterns by symbolic names. To describe
P
the matching binary code patterns, we have so far utilised common terms like Move, Add, Increment,
and Branch for the instruction operations. Such phrases are usually substituted with abbreviations
O
called mnemonics, such as MOV, ADD, INC, and BR, when creating programmes for a certain machine.
We use the notation R3 to refer to register 3 and LOC to refer to a memory location in the same way. A
C
programming language, gen, is made up of a comprehensive collection of such symbolic names and
rules for their use. A programming language, often known as an assembly language, is made up of a
comprehensive collection of such symbolic names and rules for their use. The syntax of the language is
a set of rules for employing mnemonics in the definition of full instructions and programmes.
A programme called an assembler can automatically transform programmes written in an assembly
language into a series of machine instructions. The assembler programme is one of the utility
applications included in the system software. The assembler, like any other programme, is stored in
the computer’s memory as a series of machine instructions. A user programme is typically typed into
4
UNIT 02: Machine Instructions and Programs JGI JAIN
DEEMED-TO-BE UNIVERSITY
a computer through the keyboard and saved in memory or on a magnetic drive. At this stage, the
user programme is nothing more than a series of alphanumeric characters on a line. When you run
the assembler programme, it reads the user programme, analyses it, and then creates the machine
language programme you want. The latter comprises patterns of 0s and 1s that indicate the computer’s
instructions to be performed. A source programme is the user programme in its original alphanumeric
text format, while an object programme is the constructed machine language programme.
A computer’s assembly language may or may not be case sensitive, that is, it may or may not differentiate
between capital and lower case letters. To increase the readability of the text, we will use capital letters
to designate all names and labels in our examples. We’ll write a Move instruction as follows:
D
MOVE R0,SUM
E
The binary pattern, or OP code, representing the operation executed by the instruction is represented by
the mnemonic MOVE. This mnemonic is translated into binary OP code that the computer understands
V
by the assembler. At least one blank space character follows the OP-code mnemonic. Then comes the
information that defines the operands. The source operand in our case is in register R0. There are no
R
blanks between this information and the specification of the destination operand, which is separated
from the source operand by a comma. The destination operand is in the memory location SUM, which is
E
represented by its binary address.
S
Because there are numerous different ways to define operand locations, the assembly language must
declare which one is being utilised. To represent the Absolute mode, a numerical number or a word
E
used alone, such as SUM in the previous command, might be used. An immediate operand is generally
R
denoted by a sharp sign. As a result, the directive is:
ADD #5,R3
T
adds the number 5 to the contents of register R3 and returns the result to R3. The sharp symbol isn’t
the only technique to indicate that the mode is Immediate. The desired addressing mode is specified in
H
the OP code mnemonic in various assembly languages. In this example, distinct OP-code mnemonics
are used for different addressing modes for the same instruction. The preceding Add instruction, for
IG
The suffix I indicate that the source operand is supplied in the Immediate addressing mode in the
mnemonic ADDI. Indirect addressing is often defined by placing parentheses around the name or
Y
symbol indicating the operand’s pointer. If the number 5 is to be stored in a memory location whose
address is maintained in register R2, for example, the desired action can be expressed as:
P
MOVE #5,(R2)
O
MOVEI 5,(R2)
C
5
JGI JAIN
DEEMED-TO-BE UNIVERSITY
Computer Organization and Architecture
D
2.4 Basic Input and Output Operations
E
An efficient mode of communication between the central system and the outside environment is provided
V
by a computer’s I/O subsystem. It is in charge of the computer system’s entire input-output operations.
R
2.4.1 Peripheral Devices
E
Peripheral devices are input or output devices that are connected to a computer. These devices, which
are regarded to be part of a computer system, are designed to read information into or out of the
S
memory unit in response to commands from the CPU. Peripherals are another name for these gadgets.
For example: Keyboards, display units and printers are common peripheral devices.
E
There are three types of peripherals:
R
1. Input peripherals: These devices allow users to enter data from the outside world into the computer.
For instance, a keyboard, a mouse, and so on.
T
2. Output peripherals: These devices allow data to be sent from the computer to the outside world. For
instance, a printer, a monitor, and so on.
H
3. Peripherals for input and output: Allows both input (from the outside world to the computer) and
output (from the computer to the outside world) (from computer to the outside world). For instance,
IG
a touch screen.
R
program’s command triggers the transmission of each data item. Typically, the software is in charge
of data transmission between the CPU and peripherals. Transferring data through programmed I/O
P
The CPU remains in the programme loop in the programmed I/O technique until the I/O unit indicates
that it is ready for data transfer. This is a time-consuming procedure since it wastes the processor’s
time.
Interrupt started I/O can be used to solve this problem. The interface creates an interrupt when it
determines that the peripheral is ready for data transmission. When the CPU receives an interrupt
signal, it stops what it’s doing and handles the I/O transfer before returning to its prior processing
activity.
6
UNIT 02: Machine Instructions and Programs JGI JAIN
DEEMED-TO-BE UNIVERSITY
D
interrupt from the DMA controller when the transfer was finished.
E
2.5 Subroutines
V
In a given programme, it’s common to have to repeat a subtask several times with various data values.
A subroutine is a term used to describe such a subtask.
R
It is feasible to include the block of instructions that make up a subroutine at any point in the programme
E
where it is required. However, to save memory, just one copy of the instructions that make up the
subroutine is stored in memory, and any programme that needs to utilise it simply branches to the
S
beginning of the subroutine. We say that a programme is invoking a subroutine when it branches to it.
Call is the name of the instruction that conducts this branch action.
E
The calling programme must continue execution once a subroutine has been run, commencing
R
immediately after the instruction that called the function. By performing a Return instruction, the
subroutine is said to return to the programme that called it. Because the subroutine may be called from
several locations in a calling programme, it must be possible to return to the correct position. While
T
the Call instruction is being executed, the updated PC points to the place where the calling programme
resumes execution. As a result, the Call instruction must preserve the contents of the PC in order to
H
The subroutine linkage technique is the mechanism through which a computer allows you to call and
return from subroutines. The most basic way of linking subroutines is to save the return address in a
specified place, which may be a register devoted to this function. The link register is a type of register. The
R
Return instruction returns to the caller programme by branching indirectly through the link register
after the subroutine has completed its work.
Y
The Call instruction is a special branch instruction that does the following tasks:
In the link register, save the contents of the PC.
P
zz
7
JGI JAIN
DEEMED-TO-BE UNIVERSITY
Computer Organization and Architecture
respectively. A Move instruction with a zero immediate operand can also be used to replace Clear. As a
result, simply 8 instructions would have sufficed for our needs. However, considerable redundancy in
real machine instruction sets is not uncommon. Certain basic procedures may typically be carried out
in a variety of ways. Some options could be more effective than others. We’ll go over a couple more key
instructions that are included in most instruction sets in this section.
zz Logic Instructions: The basic building blocks of digital circuits are logic operations such as AND, OR,
and NOT applied to individual bits. It’s also useful to be able to conduct logic operations in software,
which is done using instructions that apply these operations separately and in parallel to all bits of
a word or byte. For instance, consider the instruction.
D
Not dst
E
zz Shift and Rotate Instructions: Many applications demand that the bits of an operand be shifted
right or left by a certain amount of bit positions. Whether the operand is a signed integer or some
V
more generic binary-coded information determines the specifics of how the shifts are executed. We
employ a logical shift for generic operands. We utilise an arithmetic shift for a number, which keeps
R
the integer’s sign.
zz Logical Shifts: Two logical shift instructions are required, one for left (LShiftL) and the other for
E
right (LShiftR) (LShiftR). These instructions shift an operand across a number of bit locations given
in the instruction’s count operand. A logical left shift instruction might have several different forms.
S
LShiftL count,dst
them as “machine instructions” on several occasions. Actually, the format in which the instructions were
provided is similar to that used in assembly languages, with the exception that we attempted to avoid
H
using acronyms for the various processes, which are difficult to remember and are likely to be peculiar
to a certain commercial processor. An instruction must be encoded in a compact binary pattern in order
IG
to be executed in a processor. Machine instructions are the correct term for such encoded instructions.
Assembly instructions are those that employ symbolic names and acronyms.
The assembler software is used to transform the language instructions into machine instructions. We’ve
R
seen instructions for adding, subtracting, moving, shifting, rotating, and branching. These instructions
can employ a variety of operands, including 32-bit and 8-bit integers, as well as 8-bit ASCII-encoded
Y
characters. An encoded binary pattern known as the OP code for the given instruction can be used to
specify the type of operation to be performed and the type of operands to be used. Assume that 8 bits are
P
set aside for this purpose, allowing you 256 distinct ways to define various instructions. The remaining
24 bits are used to specify the remaining information.
O
Add R1, R2
In addition to the OP code, the registers R1 and R2 must be specified. If the CPU has 16 registers, each one
requires four bits to identify. For each operand, additional bits are required to signal that the Register
addressing mode is being utilised.
The Instruction is:
Move 24(R0), R5
8
UNIT 02: Machine Instructions and Programs JGI JAIN
DEEMED-TO-BE UNIVERSITY
16 bits are needed to represent the OP code and the two registers, as well as a few bits to indicate that
the source operand utilises Index addressing mode and that the index value is 24.
The instructions for the shift is:
LShiftR #2, R0
The move instruction is:
Move #$3A, R1
D
In addition to the 18 bits needed to describe the OP code, the addressing modes, and the register, the
immediate values 2 and #$3A must be specified. The size of the immediate operand is thus limited to
E
what can be expressed in 14 bits. Consider next the branch instruction:
Branch >0 LOOP
V
The OP code is 8 bits again, leaving 24 bits to define the branch offset. Because the offset is a two-
R
symbol integer, the branch target address must be within 223 bytes of the branch instruction’s position.
An alternative addressing mode, such as Absolute or Register Indirect, must be used to branch to an
E
instruction beyond this range. Jump instructions are branch instructions that make advantage of these
modes.
S
In all these examples, the instructions can be encoded in a 32-bit word, as shown in Figure 1:
E
R
8 7 7 10
OP code Source Dest Other info
T
Depicts a possible format. There is an 8-bit Op-code field and two 7-bit fields for specifying the source
and destination operands. The 7-bit field identifies the addressing mode and the register involved (if
C
any). The “Other info” field allows us to specify the additional information that may be needed, such as
an index value or an immediate operand.
But what if we wish to use the Absolute addressing mode to specify a memory operand?
The instruction
Move R2, LOC
9
JGI JAIN DEEMED-TO-BE UNIVERSITY
Computer Organization and Architecture
The OP code, the addressing modes, and the register all require 18 bits. This leaves 14 bits to express the
LOC-related address, which is plainly insufficient.
And #$FF000000. R2
In which case, the second word gives a full 32-bit immediate operand. If we wish to use the Absolute
addressing mode to provide two operands for an instruction, we can do so. for example
Move LOC1, LOC2
The operands’ 32-bit addresses must then be represented using two extra words. This method yields
D
instructions of varying lengths, depending on the number of operands and addressing modes employed.
We may create fairly complicated instructions using several words, which are quite similar to operations
E
in high-level programming languages. The name “complex instruction set computer” (CISC) has been
applied to processors that employ this sort of instruction set. The requirement that an instruction take
V
up just one word has resulted in a type of computer called as a limited instruction set computer (RISC).
Other constraints imposed by the RISC method include the need that all data modification is done on
R
operands existing in processor registers. Because of this limitation, the above addition would need a
two-instruction sequence:
E
Move (R3), R1
S
Add R1, R2
E
Only a part of a 32-bit word is required if the Add instruction only needs to identify the two registers.
As a result, we may be able to give a more powerful command with three operands:
R
Add R1, R2, R3
T
Conclusion 2.8 Conclusion
zz Machine instructions and operand addressing information are represented by symbolic names in
R
assembly language.
zz The assembly language is used to specify these generic programmes.
Y
zz Data structures are used by programmers to represent the data utilised in computations.
P
zz Constants, local and global variables, pointers, and arrays are often used in high-level programming
languages.
O
zz Addressing modes relate to the many methods in which an operand’s location is stated in an
instruction.
C
10
UNIT 02: Machine Instructions and Programs JGI JAIN
DEEMED-TO-BE UNIVERSITY
zz Peripheral devices are input or output devices that are connected to a computer.
zz I/O instructions written in a computer programme result in programmed I/O instructions.
zz The CPU remains in the programme loop in the programmed I/O technique until the I/O unit
indicates that it is ready for data transfer.
zz The speed of transmission would be improved by removing the CPU from the pipeline and allowing
the peripheral device to control the memory buses directly. DMA is the name for this method.
zz An instruction must be encoded in a compact binary pattern in order to be executed in a processor.
Machine instructions are the correct term for such encoded instructions.
D
zz The assembler software is used to transform the language instructions into machine instructions.
E
zz An encoded binary pattern known as the OP code for the given instruction can be used to specify the
type of operation to be performed and the type of operands to be used.
V
2.9 Glossary
R
Indirect mode: The contents of a register or memory location whose address is specified in the
E
zz
instruction are the operand’s effective address.
S
zz Index mode: By adding a constant value to the contents of a register, the operand’s effective address
is produced.
zz
E
Relative mode: The Index mode determines the effective address by utilising the programme counter
R
instead of the general-purpose register Ri.
zz Auto increment mode: The contents of a register provided in the instruction are the operand’s
effective address. The contents of this register are automatically increased to refer to the next item
T
zz Auto decrement mode: The contents of a register provided in the instruction are decremented
automatically before being utilised as the operand’s effective address.
IG
zz Register mode: The operand is the contents of a processor register; the register’s name (address) is
specified in the instruction.
zz Absolute mode: The operand is stored in memory, and the location’s address is specified directly in
R
the instruction.
Immediate mode: The operand is given explicitly in the instruction.
Y
zz
P
11
JGI JAIN
DEEMED-TO-BE UNIVERSITY
Computer Organization and Architecture
D
a. another register b. another memory location
c. other operands d. all of the mentioned
E
6. In a machine instruction format, S-bit is the __________.
V
a. status bit b. sign bit
R
c. sign extension bit d. none of the mentioned
7. The bit that the ‘REP’ instruction uses is __________.
E
a. W-bit b. S-bit
S
c. V-bit d. Z-bit
a. Autodecrement mode
b. Register mode
IG
c. Absolute mode
d. Immediate mode
R
10. Instructions that transfer control to a preset address or an address specified in the instruction are
referred to as.
Y
12
UNIT 02: Machine Instructions and Programs JGI JAIN
DEEMED-TO-BE UNIVERSITY
D
a. direct addressing mode b. register addressing mode
c. register relative addressing mode d. register indirect addressing mode
E
V
B. Essay Type Questions
1. Addressing modes relate to the many methods in which an operand’s location is stated in an
R
instruction. Discuss the different types of addressing modes.
2. What do you understand by assembly language?
E
3. Peripheral devices are input or output devices that are connected to a computer. Discuss the different
S
types of peripheral devices.
4. Write the brief note on direct memory access. E
5. Discuss the shift and rotate instructions.
R
2.11 Answers AND HINTS FOR Self-Assessment Questions
T
H
Q. No. Answer
1. c. branch instructions
2. c. Operation code field & operand field
R
3. b. 1 byte
Y
4. a. 2 bytes
5. d. all of the mentioned
P
7. d. Z-bit
8. c. 16 bits
C
9. c. Absolute mode
10. b. control transfer instructions
11. d. control transfer & branch instructions
12. c. immediate
13. c. immediate addressing mode
14. b. direct addressing mode
13
JGI JAINDEEMED-TO-BE UNIVERSITY
Computer Organization and Architecture
Q. No. Answer
15. d. register indirect addressing mode
D
in the instruction. (This mode is known as Direct in various assembly languages.)
E
Refer to Section Addressing Modes
2. A programming language, gen, is made up of a comprehensive collection of such symbolic names
V
and rules for their use. A programming language, often known as an assembly language, is made
up of a comprehensive collection of such symbolic names and rules for their use. Refer to Section
R
Assembly Language
E
3. There are three types of peripherals:
Input peripherals: These devices allow users to enter data from the outside world into the
S
computer. For instance, a keyboard, a mouse, and so on.
Refer to Section Basic Input and Output Operations E
4. The speed of transmission would be improved by removing the CPU from the pipeline and allowing
R
the peripheral device to control the memory buses directly. DMA is the name for this method. The
interface transfers data to and from the memory through the memory bus in this case. A DMA
controller is responsible for data transmission between peripherals and the memory unit. Refer to
T
5. Many applications demand that the bits of an operand be shifted right or left by a certain amount
of bit positions. Whether the operand is a signed integer or some more generic binary-coded
IG
information determines the specifics of how the shifts are executed. We employ a logical shift for
generic operands. We utilise an arithmetic shift for a number, which keeps the integer’s sign. Refer
to Section Additional Instructions
R
https://www.studocu.com/in/document/psg-college-of-technology/computer-architecture/m-
P
zz
morris-mano-solution-manual-computer-system-architecture/10775236
O
zz https://www.educba.com/what-is-assembly-language/
C
zz Discuss with the classmates about the manner in which sequences of instructions are transferred
from memory into the processor.
14
UNIT
03
D
E
Input/Output (I/O) Organisation
V
R
E
S
Names of Sub-Units
E
Introduction to Input/Output (I/O) Organisation, Accessing I/O Devices, Interrupts: Interrupt
R
Hardware, Direct Memory Access, Buses, Standard I/O Interfaces: PCI Bus, SCSI Bus, Universal Serial
Bus (USB).
T
Overview
H
This unit begins by learning about accessing the input/output devices. Then, the unit discusses the
IG
interrupt hardware and direct memory access. Next, the unit explain the buses. Towards the end, the
unit discusses the standard i/o interfaces.
R
Learning Objectives
Y
Learning Outcomes
D
aa Assess the knowledge about the standard input/output interfaces
E
V
Pre-Unit Preparatory Material
R
aa http://www.cse.iitm.ac.in/~vplab/courses/comp_org/Input_Output_Organization_11.pdf
E
3.1 Introduction
S
An efficient means of communication between the central system and the outside world is provided by
a computer’s I/O subsystem. It is in charge of the computer system’s input/output processes.
E
Peripheral devices are input or output devices that are connected to a computer. These devices, which
R
are regarded to be part of a computer system, are designed to read information into or out of the
memory unit in response to commands from the CPU.
T
A single bus configuration is a straightforward way to connect I/O devices to a computer. The bus allows
all of the devices connected to it to communicate with one another. It usually has three sets of lines for
IG
carrying address, data, and control signals. A unique set of addresses is assigned to each I/O device.
The device that recognises a certain address responds to the commands sent on the control lines when
the processor places that address on the address line. When I/O devices and memory share the same
address space, the CPU requests either a read or a write operation, and the required data is sent via the
R
Any machine instruction that can access memory can be used to transport data to or from an I/O device
when memory-mapped I/O is employed. If DATAIN is the address of the keyboard’s input buffer, the
P
Move DATAIN, R0
The data from DATAIN is read and stored in processor register R0. Likewise, the instruction:
C
2
UNIT 03: Input/Output (I/O) Organisation JGI JAIN
DEEMED-TO-BE UNIVERSITY
memory address space. The low-order bits of the address bus are examined by the I/O devices to decide
if they should reply. To connect an I/O device to the bus, you’ll need the following hardware. When the
device’s address appears on the address lines, the address decoder allows it to recognise it. The data
register stores the information being sent to or received by the processor. The status register includes
information about the I/O device’s functioning. The data and status registers are both connected to the
data bus and have their own addresses. The device’s interface circuit consists of the address decoder,
data and status registers, and the control circuitry necessary to coordinate I/O transfers.
I/O devices run at rates that are significantly slower than the processor. When a human operator
types characters on a keyboard, the CPU may process millions of instructions in the time between each
D
character entry. Only when a character is available in the keyboard interface’s input buffer should an
instruction to read a character from the keyboard be performed. We must also ensure that each input
E
character is only read once. Figure 1 shows accessing the I/O devices:
V
Memory-Mapped I/O (1)
Memory Address Space
R
Two Address One Address Space Two Address Space
OxFFFF... Memory
E
I/O ports
S
0 (a) (b) (c)
E
I/O Address Space
(a) Separate I/O and Memory space
R
(b) Memory-mapped I/O
(c) Hybrid
T
3.3 Interrupts
IG
When a process or event requires immediate attention, hardware or software emits an interrupt
signal. It notifies the processor of a high-priority task that requires the present working process to be
interrupted. One of the bus control lines in I/O devices is devoted to this function and is known as the
R
We said that an I/O device requests an interrupt by activating the interrupt-request bus line. Almost all
computers have several I/O devices that can request an interrupt. As shown, a single interrupt-request
O
line can be utilised to service n devices. Switches to ground are used to connect all devices to the line. A
device shuts its connected switch to request an interrupt. The voltage on the interrupt-request line will
C
be equal to Vdd if all interrupt-request signals INTR1 to INTRn are inactive, that is, if all switches are
open. This is the line’s dormant state. Because the line voltage will drop to zero if one or more switches
are closed, the value of INTR is the logical OR of the requests from individual devices, that is, INTR
INTR = INTR1 + ………+INTRn
The complemented form is commonly used. INTR, to refer to the interrupt-request signal on the common
line, which is active while the voltage is low.
3
JGI JAIN
DEEMED-TO-BE UNIVERSITY
Computer Organization and Architecture
D
subsystem in the same way that a CPU does. The following are the elements of DMA:
DMA controller: The DMA controller can send orders to the memory that are identical to the CPU’s
E
zz
directives. In a way, the DMA controller is a second CPU in the system, although it is dedicated to
I/O. The DMA controller, links one or more I/O ports directly to memory, with the I/O data stream
V
passing via the DMA controller faster and more effectively than through the CPU since the DMA
channel is dedicated to data transport.
R
Figure 2 shows the DMA controller:
E
S
Address bus
Data bus
Data bus
buffers
E Address bus
buffers
R
Address register
Internal bus
T
DMA select DS
Register select RS
H
logic
Bus request BR Control register
Bus grant BG
DMA request
Interupt Interupt
R
zz The DMA interface: Because a DMA controller has independent memory access, it adds another
layer of complexity to the I/O interface. A single transaction can be carried on a single pair of wires
O
(bus). If a DMA and a microprocessor share a signal wire to memory, a method must be in place to
determine who gets access to memory when both try at the same time.
C
zz DMA interface operation: A typical direct memory-access controller interface. In this DMA-
controlled example, the I/O ports are solely connected to the DMA controller. Signal lines are the
identical ones that connect the ports to the processor in most cases. Memory lines are similar to
regular lines, except that they are used by both the CPU and the DMA controller. The HALT and
HALT ACKNOWLEDGE lines are the two new lines. The DMA controller and the CPU are synchronised
using these lines. When the DMA controller wants to access memory, it asserts the HALT signal,
which causes the CPU to halt.
4
UNIT 03: Input/Output (I/O) Organisation JGI JAINDEEMED-TO-BE UNIVERSITY
At a later time, the CPU reacts with HALT ACKNOWLEDGE, and the DMA controller takes control of
memory. The DMA controller removes its HALT request after it completes its work, the processor
resumes its suspension, and the HALT ACKNOWLEDGE is removed. A third form of DMA request
is the dotted IMMEDIATE HALT. Due to the time it takes for the CPU to reach a condition where it
may halt operations, the processor may take several clock cycles to recognise the HALT request.
Data kept in dynamic registers that are updated during regular processing must be transferred to
status registers, otherwise the data is no longer required. The IMMEDIATE HALT line eliminates the
wait, but it comes with a slew of limitations. The IMMEDIATE HALT can only be used once or twice,
otherwise the processor may not be able to appropriately recover its state and resume the halted
D
operation. Figure 3 shows the DMA controller interface:
E
Interrupt
V
Random-access
BG memory (RAM)
CPU
R
BR
RD WR Addr Data RD WR Addr Data
E
Read Control
Write Control
S
Data Bus
E Address Bus
R
Address
select
RD WR Addr Data
T
DMA Aok.
DS
H
RS I/O
DMA
Controller Peripheral
BR
device
IG
BG DMA Request
Interrupt
R
3.5 Buses
P
The Central Processing Unit (CPU), memory chips, and input/output (I/O) devices are all components
of a conventional computer system. A bus is a cable or a common channel that connects these diverse
O
subsystems. As a result, the bus allows the various components to interact with one another.
In computer terms, a bus is a conduit that allows data to move between units or devices. It usually
C
has access points, which are locations where a device may tap to join the channel. The majority of
buses are bidirectional, meaning that devices may transmit and receive data. A bus is a form of public
transportation that connects people from various locations’ devices.
It permits the addition of new peripheral devices and the mobility of peripheral devices across various
computer systems. When too many devices are linked to the same bus, however, the bus’ bandwidth
might constitute a bottleneck. A bus usually contains more than two devices or subsystems, and channels
linking only two components are sometimes referred to as ports rather than buses.
5
JGI JAIN
DEEMED-TO-BE UNIVERSITY
Computer Organization and Architecture
Device 1 Device 2
Figure 4: Bus
D
Some important concepts related to buses are as follows:
zz Bus protocols: Because a bus is a communication channel used by numerous devices, rules must
E
be created to ensure that communication occurs appropriately. The regulations are referred to as
bus protocols. The width of the data bus, data transfer size, bus protocols, timing, and other factors
V
all play a role in bus architectural design. Buses are classed as synchronous or asynchronous
depending on whether or not the transactions are regulated by a clock. Parallel and serial buses are
R
distinguished by whether data bits are transmitted on parallel cables or multiplexed onto a single
wire.
E
zz Synchronous and asynchronous buses: Bus activities are synchronised with reference to a clock
S
signal in a synchronous bus. Although the bus clock is usually taken from the computer system
clock, it is frequently slower than the master clock. 66MHz buses, for example, are utilised in systems
E
with CPU speeds exceeding 500MHz. Because memory access periods are generally longer than
processor clock cycles, buses have traditionally been slower than processors. Although many people
R
refer to the cycles as a bus cycle, a bus transaction might take many clock cycles.
Figure 5 shows the for a synchronous read operation:
T
H
T1 T2 T3
IG
Clock
Status
Status signals
R
lines
Address
Stable address
Y
lines
Address
P
enable
Data
O
Valid data in
lines
Resd
cycle
C
Resd
Data
Valid data out
Write lines
Cycle
Write
6
UNIT 03: Input/Output (I/O) Organisation JGI JAIN
DEEMED-TO-BE UNIVERSITY
There is no system clock on an asynchronous bus. Handshaking is used to ensure that data is sent correctly
between the transmitter and the receiver. The bus master places the address and control signals on the
bus before asserting a synchronisation signal in an asynchronous read operation. The slave is prompted
to become synchronised by the master’s synchronisation signal, and after it has accessed the data, it
asserts its own synchronisation signal. The synchronisation signal from the slave informs the processor
that there is valid data on the bus, which it reads. After then, the master deasserts its synchronisation
signal, signalling to the slave that the master has read the data. The slave’s synchronisation signal is
then deasserted. A complete handshake is the name for this type of synchronisation. It’s worth noting
that there’s no clock and that the beginning and conclusion of the data transfer are signalled by specific
D
synchronisation signals. A pair of Finite State Machines (FSMs) that function in such a way that one FSM
does not advance until the other FSM has achieved a specific state might be called an asynchronous
E
communication protocol. Figure 6 shows the for an asynchronous read operation:
V
t1 t2 t3
R
addr Address Valid
data Data Valid
E
RD
MA
S
MSYN
SSYN
Time
E
R
Figure 6: Asynchronous Read Operation
T
The processor bus is the bus that is used by the processor chip’s signals. This bus can be used to link
devices that require very high-speed connectivity to the processor, such as the main memory. Only a
IG
few gadgets can be linked in this way due to electrical issues. A second bus is generally provided by the
motherboard, which can handle extra devices.
R
The two buses are linked by a circuit, which we’ll refer to as a bridge, that converts one bus’s signals
and protocols into those of the other. The CPU sees devices attached to the extension bus as if they were
Y
connected directly to the processor’s own bus. Only the bridge circuit adds a tiny delay to data flows
between the CPU and those devices.
P
It is impossible to create a universal processor bus standard. The architecture of the CPU is intimately
linked to the topology of this bus. It is also influenced by the processor chip’s electrical properties, such
O
as its clock speed. Because the extension bus is not constrained by these constraints, it may employ
a standardised signalling method. A variety of guidelines have been created. Some have emerged by
C
accident, as a result of a commercially successful design. For example, IBM created the ISA (Industry
Standard Architecture) bus for its PC AT (Personal Computer at the Time).
Some standards have been created by collaborative efforts among industries, even among competitors,
motivated by a common desire to have interoperable goods. Organisations like the IEEE (Institute of
Electrical and Electronics Engineers), ANSI (American National Standards Institute), and international
organisations like the ISO (International Standards Organisation) have granted these standards formal
status in some circumstances.
7
JGI JAIN
DEEMED-TO-BE UNIVERSITY
Computer Organization and Architecture
D
32-bit version of the bus. The Microchannel, which is used in IBM PCs, and the NuBus, which is used in
Macintosh computers, are two more buses developed in the 1980s with comparable characteristics.
E
The PCI bus was created as a low-cost, processor-independent bus. Its design anticipated a rapidly rising
V
requirement for bus capacity to accommodate high-speed discs, graphic and video devices, as well as
multiprocessor systems’ specific demands. As a result, over a decade after its introduction in 1992, the
R
PCI is still widely used as an industry standard. A plug-and-play capability for connecting I/O devices is
an essential feature that the PCI pioneered. The user simply attaches the device interface board to the
E
bus to connect a new device. The remainder is handled by the programme.
S
Data Transfer
E
Most memory transfers in today’s computers use a burst of data rather than a single word. The reason
for this is that contemporary CPUs have cache memory built in. Data is transmitted in bursts of multiple
R
words between the cache and the main memory. The words involved in such a transfer are stored in
different memory regions at different times. When the processor (in this case, the cache controller)
requests a read operation from the main memory, the memory replies by delivering a series of data
T
words beginning at that address. Similarly, during a write operation, the processor provides a memory
address followed by a series of data words, which are written in order starting at the address. The PCI
H
was created with this manner of operation in mind. A single-word read or write operation is simply
considered as a burst of length one.
IG
Device Configuration
R
When a computer is linked to an I/O device, many steps must be taken to setup both the hardware and
the software that interfaces with it.
Y
The PCI streamlines this procedure by including a tiny configuration ROM memory in each I/O device
interface that saves information about that device. In the configuration address space, all devices’
P
configuration ROMs are accessible. When the system is powered up or reset, the PCI initialisation
software reads these ROMs. It detects if the device in question is a printer, a keyboard, an Ethernet
O
interface, or a disc controller in each scenario. It can also learn about different device features and
choices.
C
During the startup procedure, addresses are allocated to devices. This implies that devices that have
not yet been allocated an address cannot be accessible during the bus setup procedure. As a result,
the configuration address space employs a unique technique. IDSEL#, or Initialisation Device Select, is
an input signal that each device possesses. In the PC world, the PCI bus has acquired a lot of traction.
Many other systems, such as SUNs, employ it to take advantage of the large range of I/O devices that a
PCI interface may support. The PCI-processor bridge circuit is built on the processor chip itself in some
processors, such as the Compaq Alpha, simplifying system design and packaging even more.
8
UNIT 03: Input/Output (I/O) Organisation JGI JAIN
DEEMED-TO-BE UNIVERSITY
D
devices attached to the processor bus are not part of the CPU’s address space, devices linked to the
SCSI bus are not. Through a SCSI controller, the SCSI bus is linked to the processor bus. DMA is used by
E
this controller to transport data packets from main memory to the device and vice versa. A packet can
include a data block, orders from the CPU to the device, or device status information.
V
Consider how the SCSI bus may be utilised with a disc drive to demonstrate how it works. The connection
R
with a disc drive is very different from the contact with the main memory. An initiator or a target are
the two types of controllers linked to a SCSI bus. An initiator can choose a specific target and transmit
E
orders defining the actions to be carried out. Clearly, the processor’s controller, such as the SCSI
controller, must be capable of acting as an initiator. As a target, the disc controller is used. It executes
S
the orders given to it by the initiator. The initiator creates a logical link with the intended recipient. Once
established, this connection can be interrupted and resumed as needed to send instructions and data
E
bursts. Other devices can utilise the bus to transport data when a particular connection is suspended.
One of the major aspects of the SCSI bus that contributes to its great performance is its ability to overlap
R
data transfer requests.
The target controller is always in charge of data transfers on the SCSI bus. To transmit a command
T
to a target, an initiator first requests control of the bus, then picks the controller it wishes to interact
with and gives control of the bus onto it after winning arbitration. The controller then initiates a data
H
transmission operation in order to receive the initiator’s command. The CPU issues an instruction to the
SCSI controller, which sets in motion the following series of events:
IG
3. The initiator sends a command indicating the needed read operation in response to the target
starting an output operation (from initiator to target).
Y
4. The destination, recognising that it must first execute a disc search operation, sends a message to
the initiator indicating that the connection between them will be momentarily suspended. The bus
P
is then released.
O
5. The target controller instructs the disc drive to advance the read head to the first sector in the
desired read operation. The data contained in that sector is then read and stored in a data buffer.
C
The target requests control of the bus when it is ready to begin transmitting data to the initiator.
6. The target sends the contents of the data buffer to the initiator before disconnecting the connection.
Depending on the bus width, data is transmitted in 8 or 16 bits in parallel.
7. The disc drive receives a directive from the target controller to execute another seek operation.
The contents of the second disc sector are then sent to the initiator as previously. The logical link
between the two controllers is severed at the end of these transfers.
9
JGI JAIN
DEEMED-TO-BE UNIVERSITY
Computer Organization and Architecture
8. After receiving the data, the initiator controller uses the DMA method to put it in the main memory.
9. The CPU receives an interrupt from the SCSI controller indicating that the desired operation has
been performed.
The messages exchanged over the SCSI bus are at a higher level than those transferred via the CPU bus,
as seen in this instance. A “higher level” message in this sense refers to actions that, depending on the
device, may need many stages to accomplish. The CPU and the SCSI controller do not need to be aware
of the specifics of the device involved in a data transfer. The processor is not required to participate in
the disc search procedure in the preceding case.
D
3.6.3 Universal Serial Bus (USB)
E
Computer-communications synergy is at the heart of today’s information technology revolution.
V
Keyboards, microphones, cameras, speakers, and display devices are likely to be used in a modern
computer system. A wired or wireless Internet connection is available on most PCs. A crucial necessity in
R
such an environment is the availability of a simple, low-cost method to link these devices to the computer,
and the advent of the Universal Serial Bus represents a significant recent advance in this respect (USB).
E
Low-speed (1.5 megabits/s) and full-speed (12 megabits/s) operation are supported by the USB. The most
recent edition of the bus standard (USB 2.0) added a third operating speed, known as high-speed (480
S
megabits/s). The USB is rapidly gaining commercial popularity, and with the inclusion of high-speed
capabilities, it may easily become the preferred connecting mechanism for most computer equipment.
E
The USB has been designed to meet several key objectives:
R
zz Provides a simple, low-cost, and simple-to-use connectivity solution that solves the challenges posed
by a computer’s restricted number of I/O ports.
Support a variety of data transmission characteristics for I/O devices, such as phone and Internet
T
zz
connections.
H
Port Limitation
The parallel and serial ports mentioned in the preceding section provide a general-purpose point of
R
connection for connecting a computer to a variety of low-to medium-speed devices. A normal computer
only has a couple of these ports for practical reasons.
Y
The gadgets that can be linked to a computer can perform a wide range of tasks. Data transmissions to
and from such devices are subject to a wide range of speed, volume, and timing restrictions.
O
A wide range of basic devices that may be connected to a computer produce data that is similar in
C
nature - slow and asynchronous. Computer mice, as well as video game controllers and manipulators,
are ideal examples.
Plug-and-Play
As computers grow more integrated into daily life, their existence should become more apparent. When
running a home theatre system with at least one computer, for example, the user should not be required
to switch the computer off or restart the system in order to connect or disconnect a device.
10
UNIT 03: Input/Output (I/O) Organisation JGI JAIN
DEEMED-TO-BE UNIVERSITY
A new device, such as an additional speaker, may be attached at any moment while the system is
running, thanks to the plug-and-play function. The system should immediately detect the presence of
this new device, identify the relevant device-driver software and any other facilities required to serve
it, and create the necessary addresses and logical connections to allow them to interact. The necessity
for plug-and-play has ramifications at all levels of the system, from hardware to operating system
and application software. One of the major goals of the USB design was to allow for plug-and-play
functionality.
USB Architecture
D
The necessity for an interconnection system that combines low cost, flexibility, and high data-transfer
capacity has been highlighted in the preceding debate. I/O devices may also be physically separated
E
from the computer to which they are attached. When large bandwidth is required, a broad bus carrying
8, 16, or more bits in parallel is typically used. A high number of wires, on the other hand, adds expense
V
and complexity while also being troublesome for the user.
R
Because of the data skew problem, it’s also difficult to build a broad bus that can transport data over a
long distance. With increasing distance, the number of skew rises, limiting the quantity of data that can
E
be utilised.
S
For the USB, a serial communication protocol was chosen because it meets the low-cost and flexibility
criteria. The clock and data information are combined and sent as a single signal. As a result, data skew
E
imposes no restrictions on clock frequency or distance. As a result, a high clock frequency may be used
to provide a large data transmission bandwidth. As previously stated, the USB provides three-bit rates
R
ranging from 1.5 to 480 megabits/s to accommodate the demands of various I/O devices.
The USB has the tree structure to support a large number of devices that may be added or deleted at
T
Host Computer
Root Hub
R
Hub Hub
Y
P
11
JGI JAIN DEEMED-TO-BE UNIVERSITY
Computer Organization and Architecture
Conclusion 3.7 Conclusion
zz Input/output architecture is a method of regulating interaction with the outside world through a
system of input/output architecture.
zz Peripheral devices are input or output devices that are connected to a computer.
zz A single bus configuration is a straightforward way to connect I/O devices to a computer.
zz A unique set of addresses is assigned to each I/O device. The device that recognises a certain address
D
responds to the commands sent on the control lines when the processor places that address on the
address line.
E
zz Memory-mapped I/O is used in the majority of computer systems. To execute I/O transfers, certain
CPUs feature specific In and Out instructions.
V
zz When a process or event requires immediate attention, hardware or software emits an interrupt
signal. It notifies the processor to a high-priority task that requires the present working process to
R
be interrupted.
E
zz Direct Memory Access (DMA) refers to an I/O subsystem’s ability to transmit data to and from a
memory subsystem without the need of a CPU.
S
zz A DMA Controller is a device that controls data transfers between an I/O subsystem and a memory
subsystem in the same way that a CPU does. E
zz The Central Processing Unit (CPU), memory chips, and input/output (I/O) devices are all components
R
of a conventional computer system.
zz A bus is a cable or a common channel that connects these diverse subsystems.
T
zz A bus is a communication channel used by numerous devices, rules must be created to ensure that
communication occurs appropriately.
H
zz Buses are classed as synchronous or asynchronous depending on whether or not the transactions
are regulated by a clock.
IG
zz The processor bus is the bus that is used by the processor chip’s signals.
zz Another bus is generally provided by the motherboard, which can handle extra devices.
R
zz Organizations like the IEEE (Institute of Electrical and Electronics Engineers), ANSI (American
National Standards Institute), and international organisations like the ISO (International Standards
Y
zz The PCI bus is an excellent example of a system bus that arose from a desire for standardisation. It
provides functionalities located on a processor bus bit in a defined format that is not specific to any
O
processor
zz SCSI bus refers to the ANSI-defined X3.131 standard bus.
C
zz DMA is used by this controller to transport data packets from main memory to the device and vice
versa.
3.8 Glossary
zz Peripheral devices: It refers to input or output devices that are connected to a computer.
zz Direct Memory Access (DMA): It refers to an I/O subsystem’s ability to transmit data to and from a
memory subsystem without the need of a CPU.
12
UNIT 03: Input/Output (I/O) Organisation JGI JAIN
DEEMED-TO-BE UNIVERSITY
zz DMA controller: A device that controls data transfers between an I/O subsystem and a memory
subsystem in the same way that a CPU does.
zz Bus: A cable or a common channel that connects these diverse subsystems.
zz Universal Serial Bus (USB): Computer-communications synergy is at the heart of today’s information
technology revolution.
zz SCSI bus: It refers to the ANSI-defined X3.131 standard bus. Devices like discs are linked to a computer
via a 50-wire connection that may be up to 25 metres long and transport data at up to 5 megabytes
per second, according to the standard’s initial requirements.
D
3.9 Self-Assessment Questions
E
V
A. Multiple Choice Questions
R
1. Input or output devices that are connected to computer are called __________.
a. Input/Output subsystem
E
b. Peripheral devices
S
c. Interfaces
d. Interrupt
2. How many types of modes of I/O Data Transfer?
E
R
a. 2 b. 3
c. 4 d. 5
T
b. The I/O devices and the memory share the same address space
O
c. A part of the memory is specifically set aside for the I/O operation
d. The memory and I/O devices have an associated address space
C
6. The __________ circuit is basically used to extend the processor BUS to connect devices.
a. Router b. Router
c. Bridge d. None of these
7. The ISA is an architectural standard developed by __________.
a. IBM b. AT&T Labs
c. Microsoft d. Oracle
13
JGI JAIN
DEEMED-TO-BE UNIVERSITY
Computer Organization and Architecture
8. The SCSI BUS is used to connect the video devices to a processor by providing a __________.
a. Single Bus b. USB
c. SCSI d. Parallel Bus
9. Which of the following is used by the processor chip’s signals?
a. Processor bus b. SCSI bus
c. USB bus d. None of these
10. The registers of the controller are __________.
D
a. 16 bit b. 32 bit
c. 64 bit d. 128 bit
E
11. The main job of the interrupt system is to identify the __________ of the interrupt.
V
a. signal b. device
R
c. source d. peripherals
12. The hardware interrupts which can be delayed when a much high priority interrupt has occurred at
E
the same time are known as __________.
S
a. Non-maskable interrupt b. Maskable interrupt
c. Normal Interrupt d. None of these
E
13. The interrupts that are caused by software instructions are called __________.
R
a. Exception interrupts b. Normal interrupt
c. hardware interrupt. d. None of these
T
14. In Daisy Chaining Priority, the device with the highest priority is placed at the __________.
a. First position b. Last position
H
16. Which microprocessor is designed to complete the execution of the current instruction and then to
Y
c. 8084 d. 8085
O
14
UNIT 03: Input/Output (I/O) Organisation JGI JAIN
DEEMED-TO-BE UNIVERSITY
19. Which table handle stores the addresses of the interrupt handling sub-routines?
a. Interrupt-vector table b. Vector table
c. Symbol link table d. All of these
20. Interrupts initiated by an instruction is called as __________.
a. Internal b. External
c. hardware d. Software
D
1. The bus allows all of the devices connected to it to communicate with one another. Discuss.
E
2. When a process or event requires immediate attention, hardware or software emits an interrupt
signal. What do you mean by hardware interrupt?
V
3. What do you understand by Direct Memory Access (DMA)?
R
4. Bus allows the various components to interact with one another. Discuss.
5. Discuss the concept of PCI bus.
E
S
3.10 Answers AND HINTS FOR Self-Assessment Questions
1. b. Peripheral fevices
2. b. 3
H
3. a. Input peripherals
IG
4. d. DMA
5. b. The I/O devices and the memory share the same address space
6. c. Bridge
R
7. a. IBM
Y
8. d. Parallel bus
9. a. Processor bus
P
10. b. 32 bit
O
11. c. source
12. b. Maskable interrupt
C
15
JGI JAIN
DEEMED-TO-BE UNIVERSITY
Computer Organization and Architecture
Q. No. Answer
18. b. control line
19. a. Interrupt-vector table
20. b. External
D
the commands sent on the control lines when the processor places that address on the address line.
Refer to Section Accessing I/O Devices
E
2. We said that an I/O device requests an interrupt by activating the interrupt-request bus line. Almost
V
all computers have several I/O devices that can request an interrupt. As shown, a single interrupt-
request line can be utilised to service n devices. Switches to ground are used to connect all devices to
R
the line. Refer to Section Interrupts
3. Direct Memory Access (DMA) refers to an I/O subsystem’s ability to transmit data to and from a
E
memory subsystem without the need for a CPU. A DMA Controller is a device that controls data
transfers between an I/O subsystem and a memory subsystem in the same way that a CPU does.
S
Refer to Section Direct Memory Access
E
4. In computer terms, a bus is a conduit that allows data to move between units or devices. It usually
has access points, which are locations where a device may tap to join the channel. The majority of
R
buses are bidirectional, meaning that devices may transmit and receive data. Refer to Section Buses
5. The PCI bus is an excellent example of a system bus that arose from a desire for standardisation.
T
It provides functionalities located on a processor bus bit in a defined format that is not specific to
any processor. The CPU sees devices linked to the PCI bus as if they were connected directly to the
H
zz https://www.studocu.com/in/document/psg-college-of-technology/computer-architecture/m-
R
morris-mano-solution-manual-computer-system-architecture/10775236
https://www.pvpsiddhartha.ac.in/dep_it/lecturenotes/CSA/unit-5.pdf
Y
zz
P
16
UNIT
04
D
E
Introduction to Memory System
V
R
E
S
Names of Sub-Units
E
Basic Concepts of Memory System, Semiconductor RAM Memories, Read Only Memories (ROM), Speed,
R
Size, Cost.
T
Overview
H
The unit begins by discussing about the concept of memory system. Next, the unit discusses the
semiconductor RAM memories. Then, the unit discusses the Read Only Memories (ROM). Towards the
end, the unit discusses the speed, size and cost of memories.
IG
Learning Objectives
R
aa
Learning Outcomes
D
aa Analyse the speed, size, and cost of memories
E
Pre-Unit Preparatory Material
V
https://www.uobabylon.edu.iq/eprints/publication_12_21274_1610.pdf
R
aa
E
4.1 Introduction
S
One of the most crucial components of a computer system is its memory. It saves the data and instructions
needed for data processing and output outcomes. Storage may be necessary for a short amount of time,
E
immediately, or for a long time. The electrical storing area for instructions and data that the processor
can read rapidly is referred to as computer memory. Computer memory is divided into two categories
R
– primary and secondary. Primary memory is directly accessed by a processor to execute instructions.
An example of primary memory is Random Access Memory (RAM). Secondary memory, such as hard
disks drives and Solid State Drive (SSD), are used to store and retrieve data from a computer. Both
T
memories are an integral part of a computer. The failure of any of the two memories stops a computer
from functioning. The storage capacity of computer memory is measured in bits, bytes, Kilobyte (KB),
H
A 16-bit computer, for example, that generates 16-bit addresses, may address up to 216=64K memory
locations. Machines that produce 32-bit addresses, on the other hand, may use a memory with up to
Y
232=4G memory locations. Byte-addressable computers make up the majority of today’s computers.
Figure 1 shows the block diagram of memory system:
P
Operational
Registers Arithmetics
and Instruction
Cache Program Logic Unit
memory Counter Sets
Control Unit
Input/Output System
2
UNIT 04: Introduction to Memory System JGI JAIN
DEEMED-TO-BE UNIVERSITY
The time it takes for an operation to start and finish is a valuable indicator of memory unit speed.
Memory access time is what it’s called. Another significant metric is the memory cycle time, which
is the shortest time between two memory operations. If any place can be reached for a Read or Write
operation in some defined period of time, regardless of the location’s address, the memory unit is known
as Random Access Memory (RAM). The system’s bottleneck is its memory cycle time.
Using a cache memory is one approach to minimise memory access time. Cache memory is a tiny, quick
memory that sits between the CPU and the bigger, slower main memory. Virtual memory is used to
make the actual memory appear larger. Data is addressed in a virtual address space that is as big as the
processor’s addressing capacity. However, only the active fraction of this space is mapped onto physical
D
memory locations at any one time. The remaining virtual addresses are mapped to the bulk storage
devices (such as magnetic drives) that are utilised.
E
V
4.2.1 CPU-Main Memory Connection – A Block Schematic
The Main Memory (MM) unit can be thought of as a “block box” in terms of the system. The MAR (Memory
R
Address Register) and MDR (Memory Data Register) are two CPU registers that transmit data between
the CPU and the MM (Memory Data Register). If MAR is K bits long and MDR is ‘n’ bits long, the MM unit
E
can hold up to 2k addressable locations, each of which is ‘n’ bits wide and has a word length of ‘n’ bits. n
bits of data may be exchanged between the MM and the CPU during a “memory cycle.”
S
The CPU loads the address into MAR, sets READ to 1, and sets additional control signals as needed for a
E
read operation. MDR receives the data from the MM and MFC is set to 1. For a write operation, the CPU
loads MAR and MDR appropriately, sets write to 1, and sets the other control signals appropriately. The
R
data is loaded into suitable places by the MM control circuitry, which also sets MFC to 1. The following
block diagram depicts this arrangement.
T
that passes between the start of an operation and its completion (for example, the time that passes
between READ and MFC).
zz Memory cycle time: It’s a crucial metric for the memory system. It is the shortest time interval
R
between two consecutive memory operations (for example, the time between two consecutive READ
operations). In most cases, the cycle time is slightly longer than the access time.
Y
zz Transfer rate: The pace at which data may be moved in and out of a memory unit.
P
A computer stores data internally in the form of binary numbers, 0 (OFF/low voltage) and 1 (ON/high
voltage). All the digits (0–9), alphabet (a to z or A to Z), special characters ($, %, @, * etc.) are stored in the
C
computer in the binary form. ASCII (American Standard Code for Information Interchange) assigned a
unique number or code to all the alphabet and special characters. Later on, this number/code could be
converted into the binary form to store the corresponding letter/character in the computer.
In a computer, characters are represented by a group of bits that depends upon the encoding scheme
being used. The encoding scheme is the manner of specifying the binary code for each character. There
are two types of encoding scheme: ASCII and Unicode. ASCII represents a character as a group of 8 bits
or 1 byte. Unicode represents a character as a group of 16 bits or 2 bytes. For example, if you want to store
the word “bottle” in the computer memory, then as per the ASCII scheme, it is 6 bytes (6 characters of 1
3
JGI JAIN
DEEMED-TO-BE UNIVERSITY
Computer Organization and Architecture
byte each); and as per the Unicode scheme, the same word is stored as 12 bytes (6 characters of 2 bytes
each). Apart from the byte, there are various other units of memory. The following is the description of
the units of memory in a computer:
zz Byte (B) = 8 bits
zz Kilobyte (KB) = 1,024 bytes
zz Megabyte (MB) = 1,048,576 bytes = 1024 KB
zz Gigabyte (GB) = 1,024 megabytes
Terabyte (TB) = 1,024 gigabytes
D
zz
E
zz Exabyte (EB) = 1,024 petabytes
V
zz Zettabyte (ZB) = 1,024 exabytes
zz Yottabyte (YB) = 1,024 zettabytes
R
zz Brontobyte = 1,024 yottabytes
E
zz Geopbyte = 1,024 brontobytes
S
4.3 Semiconductor RAM Memories
E
Any electronics assembly that employs computer processing technologies uses semiconductor memory.
Semiconductor memory is a critical electronic component for any PCB assembly based on a computer.
R
Memory cards have also become widespread for temporarily storing data, ranging from portable
flash memory cards used for file transfers to semiconductor memory cards used in cameras, mobile
T
phones, and other devices. As the need for greater and larger quantities of storage has grown, the use
of semiconductor memory has expanded, as has the size of these memory cards.
H
Many different types and methods are employed to fulfil the rising need for semiconductor memory.
New memory technologies are being launched as demand rises, and old types and technologies are
IG
being improved.
There is a multitude of memory technologies available, each with its own set of advantages and
R
disadvantages. There are many different forms of memory, including ROM, RAM, EPROM, EEPROM,
Flash memory, DRAM, SRAM, SDRAM, F-RAM, and MRAM, and new varieties are being created to
Y
increase performance. DDR3, DDR4, DDR5, and a variety of other terms are used to describe different
types of SDRAM semiconductor memory.
P
Furthermore, semiconductor devices are accessible in a variety of formats, including integrated circuits
for printed circuit boards, USB memory cards, Compact Flash cards, SD memory cards, and even solid
O
state hard drives. Semiconductor memory is even used as on-board memory in several microprocessor
processors.
C
4
UNIT 04: Introduction to Memory System JGI JAIN
DEEMED-TO-BE UNIVERSITY
data. In this memory, the data is temporarily stored since it is a volatile memory. Once the system
turns off, it loses the data. As a result, RAM is used as a temporary data storage area. This form of
memory is used to store and read data multiple times.
zz Read Only Memory (ROM): A ROM is a type of semiconductor memory in which data is written once
and then is not altered. As a result, it is employed in situations where data must be kept permanently,
even when the power is turned off (many memory technologies lose data when the power is turned
off).
D
The following are the different types of RAM:
E
zz Static Random Access Memory (SRAM): It is a type of memory that stores data in a fixed location.
The data in this type of semiconductor memory does not need to be updated dynamically, unlike
V
DRAM.
R
zz Dynamic Random Access Memory (DRAM): It is a kind of random access memory. Each bit of data
in DRAM is stored on a capacitor, and the charge level on each capacitor determines whether the
E
bit is a logical 1 or 0. However, because these capacitors can not keep their charge permanently, the
data must be updated on a regular basis. It gets its moniker as a dynamic RAM as a consequence of
S
this dynamic updating. DRAM is a kind of semiconductor memory that is often found in equipment
such as personal computers and workstations, where it serves as the computer’s primary RAM.
E
Semiconductors are often supplied as integrated circuits in the form of surface mount devices or,
less commonly, as leaded components for use in PCB assembly.
R
zz Synchronous DRAM (SDRAM): It is a kind of DRAM that is synchronised. This type of semiconductor
memory can operate at higher rates than standard DRAM. It is synchronised with the processor’s
T
clock and can maintain two sets of memory addresses open at the same time.
Ferroelectric Random Access Memory (F-RAM) is a random access memory technology that is
H
zz
quite comparable to regular DRAM. The main distinction is that it has a ferroelectric layer rather
than the more common dielectric layer, which gives it its non-volatile properties. F-RAM is a direct
IG
The clock speed of DDR SDRAM ranges from 133 MHz to 2133 MHz.
zz Rambus DRAM (RDRAM): It is the fastest among all the random memory types with the data transfer
Y
speed of 1 GHz. Generally, RDRAM is used for the purpose of video memory on graphics accelerator
cards. Dynamic RDRAM is an improvement to the existing RDRAM. The RDRAM chip provides high
P
bandwidth and therefore used by workstations and servers. This memory chip places under the
RIMM (Rambus Inline Memory Module) module. In addition, the number of chips placed under the
O
module completely relies on the bus width of the RAM. RDRAM (RAM Bus DRAM) of 160 or 184 Pins
operates at 300-400 MHz.
C
zz Magnetic RAM (MRAM): It is a kind of magneto-resistive RAM. It’s a non-volatile RAM memory that
stores data using magnetic charges rather than electric charges.
zz Phase Change Random Access Memory (P-RAM) or Phase Change Memory (PCM): It is based on a
phenomena in which a chalcogenide glass changes state or phase from amorphous (high resistance)
to polycrystalline (low resistance) (low resistance). It is feasible to detect the status of a single cell
and so use this information to store data.
5
JGI JAIN
DEEMED-TO-BE UNIVERSITY
Computer Organization and Architecture
D
may frequently be changed, this benefit necessitates the use of specific technology to delete the data so
that new data can be written in.
E
4.4.1 Types of Read-Only Memory (ROM)
V
Following are the different types of ROM:
R
zz Programmable Read Only Memory (PROM): It is a type of memory that can be programmed.
It’s a type of semiconductor memory that can only have data written to it once before it becomes
E
permanent. These memories are purchased in a blank state and are programmed with a PROM
programmer.
S
zz Erasable Programmable Read Only Memory (EPROM): It is a type of memory that can be erased
E
and reprogrammed. These semiconductor devices have the ability to be programmed and then
deleted at a later date. Normally, this is accomplished by exposing the semiconductor device to UV
R
radiation. To make this possible, the EPROM packaging has a circular window that allows light to
pass through to the device’s silicon. This window is usually covered with a label when the PROM is in
use, especially if the data has to be maintained for a long time.
T
Flash memory may be thought of as an advancement of EEPROM technology. Data may be written
H
to it and deleted from it in blocks, but data can only be read from individual cells.
zz Electrically Erasable Programmable Read-On Memory (EEPROM): It is a kind of electrically erasable
IG
programmable read-only memory. Data may be written and deleted on these semiconductor devices
using an electrical voltage. This is usually applied to a chip’s erase pin. EEPROM, like other forms of
PROM, keeps the memory’s contents even after the power is switched off. EEPROM, like other kinds
R
so that consecutive words in the address space are assigned to distinct modules. Memory access requests
O
6
UNIT 04: Introduction to Memory System JGI JAIN
DEEMED-TO-BE UNIVERSITY
D
E
Secondary Cache
V
Main Memory
R
E
Magnetic disk
Secondary Memory
S
E
Figure 2: Speed, Size, and Cost of Memory
R
4.6.1 Speed
A perfect memory would be quick, large, and cheap. SRAM chips can be used to create a very fast
T
memory. However, these chips are costly because their basic cells have six transistors, which prevents
a large number of cells from being packed onto a single chip. As a result, using SRAM chips to build a
H
large memory is impractical due to cost. Dynamic RAM chips, which have much simpler basic cells and
are thus much less expensive, are an alternative. However, such memories are much slower.
IG
4.6.2 Size
R
A very small amount of memory can be implemented directly on the processor chip at the next level of
the hierarchy. This memory, known as a processor cache, stores copies of instructions and data that are
Y
stored in a much larger external memory. Caches are frequently divided into two tiers. On the CPU chip,
there is always a primary cache. Because it competes for space on the processor chip, which must also
P
implement many other functions, this cache is tiny. The primary cache is known as the level 1 (L1) cache.
Between the main cache and the rest of the memory lies a bigger secondary cache.
O
The main memory is the next level in the hierarchy. Dynamic memory components, including as SIMMs,
DIMMs, and RIMMs, are used to create this fairly big memory. The main memory is substantially larger
C
than the cache memory, but it is a lot slower. The access time to the main memory in a typical computer
is about ten times longer than the access time to the L 1 cache.
Disk drives can store a lot of data at a low price. When compared to the semiconductor devices used to
implement the main memory, they are extremely sluggish. A hard disc drive (HDD; sometimes known
as a hard drive, hard disc, magnetic disc, or disc drive) is a storage and retrieval device for digital
data, particularly computer data. It is made up of one or more rigid (thus “hard”) fast spinning discs
(commonly referred to as platters) that are covered with magnetic material and have magnetic heads
7
JGI JAIN DEEMED-TO-BE UNIVERSITY
Computer Organization and Architecture
positioned to write and read data from the surfaces. The speed with which a software may access
memory is critical during execution.
4.6.3 Cost
Although dynamic memory units in the hundreds of megabytes range can be implemented for a
reasonable price, the size is still small when compared to the demands of large programmes with large
amounts of data. To implement large memory spaces, a solution is provided by using secondary storage,
primarily magnetic discs. Large discs can be purchased for a reasonable price and are widely used
in computer systems. They are, however, significantly slower than semiconductor memory units. So
D
Magnetic discs can provide a huge amount of cost-effective storage. Dynamic RAM technology can be
used to create a large, yet affordable main memory.
E
V
Conclusion 4.7 Conclusion
R
zz One of the most crucial components of a computer system is its memory.
The electrical storing area for instructions and data that the processor can read rapidly is referred
E
zz
to as computer memory.
S
zz Computer memory is divided into two categories – primary and secondary. Primary memory is
directly accessed by a processor to execute instructions.
zz
E
The storage capacity of computer memory is measured in bits, bytes, Kilobyte (KB), Megabyte (MB),
R
Gigabyte (GB), Terabyte (TB), and Petabyte (PB).
zz The addressing method determines the maximum size of memory that may be used in every machine.
Cache memory is a tiny, quick memory that sits between the CPU and the bigger, slower main
T
zz
memory.
H
zz The Main Memory (MM) unit can be thought of as a “block box” in terms of the system.
zz The MAR (Memory Address Register) and MDR (Memory Data Register) are two CPU registers that
IG
transmit data between the CPU and the MM (Memory Data Register).
zz Memory access time is a valuable indicator of the memory unit’s speed.
R
zz ASCII (American Standard Code for Information Interchange) assigned a unique number or code to
all the alphabet and special characters.
P
zz In a computer, characters are represented by a group of bits that depends upon the encoding scheme
O
being used.
There are two types of encoding scheme: ASCII and Unicode.
C
zz
8
UNIT 04: Introduction to Memory System JGI JAIN
DEEMED-TO-BE UNIVERSITY
zz There are many different forms of memory, including ROM, RAM, EPROM, EEPROM, Flash memory,
DRAM, SRAM, SDRAM, F-RAM, and MRAM, and new varieties are being created to increase
performance.
zz RAM is a semiconductor-based memory where the CPU or the other hardware devices can read and
write data. In this memory, the data is temporarily stored since it is a volatile memory. Once the
system turns off, it loses the data.
zz A ROM is a type of semiconductor memory in which data is written once and then is not altered.
zz This method splits the memory system into a number of memory modules and organizes the
D
addressing so that consecutive words in the address space are assigned to distinct modules.
zz Memory cells are commonly arranged in an array, with each cell capable of storing one bit of
E
information.
V
4.8 Glossary
R
zz ROM: It is a form of semiconductor memory in which data is written once and then not changed.
E
zz EEPROM: It is a kind of electrically erasable programmable read-only memory.
S
zz RAM: RAM is a semiconductor-based memory where the CPU or the other hardware devices can
read and write data. In this memory, the data is temporarily stored since it is a volatile memory.
Once the system turns off, it loses the data. E
Computer memory: The electrical storing area for instructions and data that the processor can
R
zz
read rapidly.
zz Cache memory: It is a tiny, quick memory that sits between the CPU and the bigger, slower main
T
memory.
Semiconductor memory: It is a critical electronic component for any PCB assembly based on a
H
zz
computer.
IG
b. Secondary memory
O
c. Both a and b
d. None of these
C
9
JGI JAIN
DEEMED-TO-BE UNIVERSITY
Computer Organization and Architecture
3. Which of the following is a tiny, quick memory that sits between the CPU and the main memory?
a. RAM
b. ROM
c. Virtual memory
d. Cache memory
4. Which of the following is the registers of CPU?
a. MAR
D
b. MDR
E
c. MNR
d. Both a and b
V
5. The pace at which data may be moved in and out of a memory unit is called ___________.
R
a. Memory access time
b. Memory cycle time
E
c. Transfer rate
S
d. None of these
E
6. Unicode encoding scheme represents a character as a group of _____________.
a. 8 bits or 1 byte
R
b. 16 bits or 2 bytes
c. 24 bits or 3 bytes
T
d. 32 bits or 4 bytes
H
7. Which of the following type of memory that stores data in a fixed location?
a. Dynamic RAM
IG
b. Video RAM
c. Static RAM
R
d. Synchronous DRAM
8. Which of the following memory is based on a phenomena in which a chalcogenide glass changes
Y
state or phase from amorphous (high resistance) to polycrystalline (low resistance) (low resistance)?
P
c. Dynamic RAM
C
d. Synchronous DRAM
9. In which types of memory data may be written and deleted on these semiconductor devices using an
electrical voltage?
a. PROM
b. EEPROM
c. EPROM
d. None of these
10
UNIT 04: Introduction to Memory System JGI JAIN
DEEMED-TO-BE UNIVERSITY
10. ___________ cells are commonly arranged in an array, with each cell capable of storing one bit of
information.
a. Memory
b. Data
c. Load
d. Information
D
1. One of the most crucial components of a computer system is its memory. Discuss.
E
2. The storage capacity of computer memory is measured in bits, bytes, Kilobyte (KB), etc. What do you
understand by units of memory?
V
3. Any electronics assembly that employs computer processing technologies uses semiconductor
memory. Discuss.
R
4. RAM is categorised in different types. Discuss the RDRAM.
E
5. ROM is categorised in different types. Discuss one of the categories of ROM, namely, EPROM.
S
4.10 Answers AND HINTS FOR Self-Assessment Questions
E
R
A. Answers to Multiple Choice Questions
Q. No. Answer
T
1. a. Primary memory
H
2. d. All of these
IG
3. d. Cache memory
4. d. Both a and b
5. c. Transfer rate
R
6. b. 16 bits or 2 bytes
Y
7. c. Static RAM
P
10. a. Memory
C
11
JGI JAINDEEMED-TO-BE UNIVERSITY
Computer Organization and Architecture
2. A computer stores data internally in the form of binary numbers, 0 (OFF/low voltage) and 1 (ON/high
voltage). All the digits (0–9), alphabet (a to z or A to Z), special characters ($, %, @, * etc.) are stored
in the computer in the binary form. ASCII (American Standard Code for Information Interchange)
assigned a unique number or code to all the alphabet and special characters.
D
phones, and other devices.
E
4. RDRAM is the fastest among all the random memory types with the data transfer speed of 1 GHz.
V
Generally, RDRAM is used for the purpose of video memory on graphics accelerator cards. Dynamic
RDRAM is an improvement to the existing RDRAM.
R
Refer to Section Semiconductor RAM Memories
E
5. EPROM is a type of memory that can be erased and reprogrammed. These semiconductor devices
have the ability to be programmed and then deleted at a later date. Normally, this is accomplished
S
by exposing the semiconductor device to UV radiation.
zz https://www.jbiet.edu.in/coursefiles/cse/HO/cse2/DLD1.pdf
H
zz Do the online research and discuss about real application of memory system with your classmates.
R
Y
P
O
C
12
UNIT
05
D
Introduction to Cache &
E
V
Virtual Memory
R
E
S
Names of Sub-Units
E
Overview of Cache Memory, Mapping Functions, Replacement Algorithms, Performance
R
Considerations, Overview of Virtual Memory, Virtual Memory Organisation, Address Translation.
T
Overview
H
The unit begins by discussing the concept of cache memory. Next, the unit discusses the mapping
function and replacement algorithms. Then, the unit discusses the performance considerations.
Further the unit discusses the concept of virtual memory. This unit also discusses the virtual memory
IG
organisation. Towards the end, the unit discusses the address translation in virtual memory.
R
Learning Objectives
Y
aa
Learning Outcomes
D
aa Evaluate the use of address translation in virtual memory
E
Pre-Unit Preparatory Material
V
aa https://www.cs.umd.edu/~meesh/cmsc411/website/proj01/cache/cache.pdf
R
5.1 Introduction
E
Cache memory is a type of memory that operates at extremely fast speeds. It’s used to boost performance
S
and synchronise with high-speed processors. Although cache memory is more expensive than main
memory or disc memory, it is less expensive than CPU registers. Cache memory is a form of memory
E
that works as a buffer between the RAM and the CPU and is highly fast. It stores frequently requested
data and instructions so that they may be accessed quickly by the CPU.
R
5.2 Cache Memory
T
A computer’s CPU can generally execute instructions and data quicker than it can acquire them from
a low-cost main memory unit. As a result, the memory cycle time becomes the system’s bottleneck.
H
Using cache memory is one technique to minimise memory access time. This is a tiny, quick memory
that sits between the CPU and the bigger, slower main memory. This is where a programme’s presently
IG
active segments and data are stored. Because address references are local, the CPU can usually find the
relevant information in the cache memory itself (cache hit) and only needs access to the main memory
infrequently (cache miss). With a large enough cache memory, cache hit rates of over 90% are possible,
R
The usage of cache memory reduces the average time it takes to access data from the main memory.
The cache is a more compact and quicker memory that stores copies of data from frequently accessed
P
main memory locations. In a CPU, there are several distinct, independent caches that store instructions
and data. Figure 1 shows the structure of cache memory:
O
C
Cache Memory
2
UNIT 05: Introduction to Cache & Virtual Memory JGI JAIN
DEEMED-TO-BE UNIVERSITY
This method splits the memory system into a number of memory modules and organises the addressing
so that consecutive words in the address space are assigned to distinct modules. Memory access requests
involving successive addresses will be sent to separate modules. The average pace of obtaining words
from the Main Memory can be improved since parallel access to these modules is available.
D
Direct Mapping
E
Direct mapping is the most basic method, which maps each block of main memory into only one potential
V
cache line. Assign each memory block to a specified line in the cache via direct mapping. If a memory
block filled a line before and a new block needs to be loaded, then old block gets deleted. There are two
R
elements to an address space: an index field and a tag field. The tag field is saved in the cache, while
the remaining keys are kept in the main memory. The Hit ratio is directly related to the performance of
E
direct mapping. The following formula is used to express this mapping:
i = j modulo m
S
where,
i = cache line number
E
R
j = main memory block number
m = number of lines in the cache
T
Each main memory address may be thought of as having three fields for the purposes of cache access.
Within a block of main memory, the least significant w bits indicate a unique word or byte. The address
H
in most modern computers is at the byte level. The remaining bits designate one of the main memory’s
2s blocks. These s bits are interpreted by the cache logic as a tag of s-r bits (most significant chunk)
IG
and an r-bit line field. This last field indicates one of the cache’s m=2r lines. Figure 2 shows the direct
mapping of cache memory:
R
processor
Tag and index
Tag Index
Main
P
Main
Cache memory
memory
accessed
O
Index
if tags do
Tag Data not match
C
Read
Compare
Different
Same
Access location
3
JGI JAIN
DEEMED-TO-BE UNIVERSITY
Computer Organization and Architecture
Associative Mapping
The associative memory is utilised to store the content and addresses of the memory word in this form
of mapping. Any block can be placed in any cache line. This implies that the word id bits are utilised
to determine which word in the block is required, but the tag becomes the remainder of the bits. This
allows any word to be placed wherever in the cache memory. It is said to be the quickest and most
adaptable mapping method.
Figure 3 shows the associative mapping of cache memory:
D
Memory address from
Main memory accessed
E
processor
if address not in cache
V
Cache Main
R
memory
Compare with
E
all stored Address Data
addresses
S
simultaneously
E Address not
R
Address found found in cache
Access location
T
Set-associative Mapping
IG
This type of mapping is an improved version of direct mapping that eliminates the disadvantages of
direct mapping. The concern of potential thrashing in the direct mapping approach is addressed by
set associative. It does this by stating that rather than having exactly one line in the cache to which a
R
block can map, we will combine a few lines together to form a set. Then a memory block can correspond
to any one of a set’s lines. Each word in the cache can have two or more words in the main memory at
Y
the same index address, thanks to set-associative mapping. The benefits of both direct and associative
cache mapping techniques are combined in set associative cache mapping. The following formula is
P
i = j mod v
C
where,
i = cache set number
j = main memory block number
v = number of sets
m = number of lines in the cache number of sets
k = number of lines in each set
4
UNIT 05: Introduction to Cache & Virtual Memory JGI JAINDEEMED-TO-BE UNIVERSITY
Line Cache
D
Index
E
Tag Data Tag Data Tag Data Tag Data
V
R
E
Compare Main
memory
S
Same accessed
if tags do
E
Access word
not match
R
Figure 4: Set-associative Mapping
T
5.2.2 Replacement Algorithms
H
Page replacement is obvious in demand paging, and there are many page replacement algorithms.
Executing an algorithm on a particular memory reference for a process and calculating the number of
IG
page faults assess an algorithm. The sequence of memory references is called reference string.
Some of the most widely used page replacement mechanisms are discussed as follows:
R
the policy calls for selecting the page to be replaced by choosing the page from any frame with
equal probability, it uses no knowledge of the reference stream (or the locality) when it selects the
P
page frame to replace. In general, random replacement does not perform well. On most reference
streams, it causes more missing page faults than the other algorithm discussed in this section. After
O
early exploration with random replacement, it has been recognised that several other policies would
produce fewer missing page faults.
C
zz First-in-First-out: Refers to replacement algorithm that replaces the page that has been in the
memory the longest. FIFO emphasises on the interval of time of a page that has been present in the
memory instead of how much the page is being used. The advantage of FIFO is that it is simple to
implement.
A FIFO replacement algorithm is related to the time of each page when it was brought into the
memory. When there is a need for page replacement, the oldest page is chosen for the replacement.
A FIFO queue can be created that holds all pages brought in the memory. The page at the head of the
5
JGI JAIN
DEEMED-TO-BE UNIVERSITY
Computer Organization and Architecture
queue can be replaced. When a page is brought into the memory, it is inserted at the tail of the queue
of pages.
zz Belady’s Optimal Algorithm: Refers to the replacement policy which has “perfect knowledge” of the
page reference string; thus, it always chooses an optimal page to be removed from the memory. Let
the forward distance of a page p at time t be the distance from the current point in the reference
stream to the next place in the stream where the same page is referenced again. In the optimal
algorithm, the replaced page y is one that has maximal forward distance:
Yt = Max FWDt(x)
D
Since more than one page is loaded at time t, there may be more than one page that never appears
again in the reference stream (x), that is, there may be more than one loaded page with maximal
E
forward distance (Yt). In this case, Belady’s optimal algorithm chooses an arbitrary loaded page
with maximal forward distance.
V
The optimal algorithm can only be implemented if the full page reference stream is known in advance.
Since it is rare for the system to have such knowledge, the algorithm is not practically realisable.
R
Instead, its theoretical behavior is used to compare the performance of realisable algorithms with
the optimal performance.
E
Although it is usually not possible to exactly predict the page reference stream, one can sometimes
S
predict the next page with a high probability that the prediction will be correct. For example, the
conditional branch instruction at the end of a loop almost always branches back to the beginning of
E
the loop rather than exiting it. Such predictions are based on static analysis.
R
Figure 5 shows Belady’s optimal algorithm behavior:
T
Frame 2 1 3 4 2 1 3 4 2 1 3 4 5 6 7 8
H
0 2* 2 2 2 2 2 2 2 2 1* 1 1 5* 5 5 8*
IG
1 1* 1 1 1 1 3* 3 3 3 3 3 3 6* 6 6
2 3* 4* 4 4 4 4 4 4 4 4 4 4 7* 7
R
Y
enough information to incorporate replacement “hints” in the source code. The compiler and paging
systems can then be designed to use these hints to predict the future behavior of the page reference
O
stream.
An example of Belady’s optimal algorithm with m=3 page frames is as follows:
C
Reference 3 4 3 2
2 1 2 1 4 1 3 4 5 6 7 8
Strings
Figure 5 has a row for each of the three page frames and a column for each reference in the page
stream. A table entry at row i, column j shows the page loaded at page frame i after rj has been
6
UNIT 05: Introduction to Cache & Virtual Memory JGI JAIN
DEEMED-TO-BE UNIVERSITY
referenced. The optimal algorithm will behave as shown in Figure 5, which incurs 10 page faults. An
optimal page replacement algorithm has the minimum page fault rate among all algorithms.
An optimal algorithm never suffers from Belady’s anomaly. An optimal page replacement algorithm
resides and is called OPT or MIN. It states that replace the page which will not be accessed for the
longest period of time. By using this page replacement algorithm, the lowest possible page fault rate
for a fixed number of frames occurs.
The optimal page replacement algorithm is hard to implement because it needs preinformation of
the reference string. The optimal algorithm is mainly used for comparison studies.
D
zz Least Recently Used [LRU]: Refers to the algorithm that is designed to take advantage of “normal”
programme behavior. Programmes are developed that contain loops and cause the main line of the
E
code to execute repeatedly. In the code portion of the address space, the control unit will repeatedly
access the set of pages containing these loops. This set of pages containing the code is known as the
V
locality of the process. If the loop or loops that are executed are stored in a small number of pages,
then the programme has a small code locality.
R
zz Least Frequently Used [LFU]: Refers to the algorithm that selects a page for replacement if the
page was not often used in the past. There may be more than one page that satisfies the criteria for
E
replacement, so many of the qualifying pages can be selected for replacement. Thus, an actively used
page should have larger reference count. If a programme changes the set of pages it is currently
S
using, the frequency counts will tend to cause the pages in the new locality to be replaced even
though they are currently being used. Another problem that lies with LFU is that it uses frequency
E
counts from the beginning of the page reference stream. The frequency counter is reset each time
R
a page is loaded rather than being allowed to monotonically increase throughout the execution of
the programme.
zz Not Recently Used [NRU]: Refers to the algorithm in which a resident page that has not been
T
accessed in the near past or recently is replaced. This algorithm keeps all resident pages in a circular
list. A referenced bit is set for each page whenever it is accessed. When the referenced bit is set as 1, it
H
means that the page is recently accessed, and if the referenced bit is set as 0, then the page has not
been referenced in the recent past.
IG
zz Second chance replacement: Resembles FIFO replacement algorithm. When a page has been
selected, its reference bit is checked. If it is 0, the page is replaced; else, if the reference bit is 1, the
page is given a second chance and the system moves on to select the next FIFO page. When a page
R
is brought second time, its reference bit is cleared and its arrival time is updated to the current
time. The page which is brought second time will not be replaced till all other pages are replaced
Y
(or provided a second chance). If a page is used rapidly to keep its reference bit set, it will never be
replaced. This algorithm looks for an old page that has not been used in the previous clock intervals.
P
If the reference bit is set for all pages, second chance algorithm gets transformed into pure FIFO
algorithm.
O
zz Most Frequently Used [MFU]: Assumes that the page with the smallest count was probably just
C
brought into the memory and has yet to be used and maybe referenced soon. However, neither MFU
nor LFU is very common, as the implementation of these algorithms is fairly expensive.
zz Page classes: Refer to another page replacement policy that can be implemented by using the
classes of page. For example, if the reference bit and the dirty bit for a page is considered together,
the pages can be classified into four classes:
1. (0, 0) neither referenced nor dirty
2. (0, 1) not referenced (recently) but dirty
7
JGI JAIN
DEEMED-TO-BE UNIVERSITY
Computer Organization and Architecture
D
next page to be swapped in should never be swapped out, i.e., it could never execute itself to swap
itself back in. Device I/O operations may read or write data directly at the memory locations of the
E
process; then, the pages containing those memory locations do not get swapped out. Although the
process may not reference them presently, the I/O device is referencing them. Thus, setting a lock
V
table entry in the page table prevents a page from being swapped out.
R
5.2.3 Performance Considerations
E
When the processor wants to read or write data from main memory, it first looks in the cache for a
matching item. A cache hit occurs when the CPU discovers that the memory location is in the cache,
S
and data is read from the cache. A cache miss occurs when the CPU cannot locate the memory location
in the cache. When a cache miss occurs, the cache creates a new entry and transfers data from main
E
memory, after which the request is completed using the cache’s contents.
R
Cache memory performance is usually assessed in terms of a metric known as Hit ratio.
Hit ratio = hit / (hit + miss) = no. of hits/total accesses
T
Greater cache block size, higher associativity, lower miss rate, lower miss penalty, and lower time to hit
in the cache can all help to enhance cache performance.
H
A virtual or logical address is the address created by the CPU in a virtual memory system. The needed
mapping is done by a specific memory control unit, sometimes known as the memory management
R
unit, which might have a distinct physical address. According to system needs, the mapping function
can be modified during programme execution.
Y
Because of the distinction created between the logical (virtual) and physical (physical) address spaces,
the former can be as big as the CPU’s addressing capabilities, while the latter can be considerably less.
P
Only the active portion of the virtual address space is mapped to physical memory, while the rest is
transferred to the bulk storage device. If the requested information isn’t available, it is accessible and
O
8
UNIT 05: Introduction to Cache & Virtual Memory JGI JAIN
DEEMED-TO-BE UNIVERSITY
Virtual memory gives a computer programmer an addressing space that is several times bigger than
the main memory’s physically available addressing area. Virtual addresses, which might be considered
artificial in certain ways, are used to place data and instructions in this area.
Data and instructions are stored in both the main memory and the auxiliary memory in reality (usually
disc memory). It’s done under the watchful eye of the virtual memory control system, which oversees
the real-time insertion of data based on virtual addresses. This system fetches data and instructions
requested by currently running programmes to main memory automatically (i.e. without the intervention
of a programmer). The virtual memory’s overall organisation is depicted in Figure 6:
D
E
Operating system
V
Exchange Auxiliary
Address Main
Processor store
R
translator memory
Virtual Physical (disk)
Virtual mem. address
address
E
control
S
E
Figure 6: General Scheme of the Virtual Memory
on the kind of virtual memory used, the sequences of virtual memory addresses that correspond to these
pieces are referred to as pages or segments. The number of the relevant fragment of the virtual memory
H
address space and the word or byte number in the supplied fragment make up a virtual memory address.
For modern virtual memory systems, we differentiate the following options:
IG
When accessing data stored at a virtual address, address translation is used to translate the virtual
address to a physical memory address. Prior to beginning to translate, the virtual memory system
verifies whether the segment or page carrying the requested word or byte is available in primary
P
memory. Tests of page or segment descriptors in corresponding tables in the main memory are used. If
O
the test result is negative, the requested page or segment is allocated a physical address sub-space in
main memory, and it is then loaded from the auxiliary store into this address sub-space.
C
The virtual memory system then updates the page or segment descriptions in the descriptor tables and
gives the processor instruction that emitted the virtual address access to the requested word or byte.
Today, the virtual memory control system is partially realised as a hardware and software system.
Computer hardware is responsible for accessing descriptor tables and translating virtual to physical
addresses. The operating system is responsible for retrieving lost pages or segments and updating their
descriptors, although it is aided by specific memory management hardware. This hardware typically
consists of a specific functional unit for virtual memory management and special functional blocks for
virtual address translation computations.
9
JGI JAIN
DEEMED-TO-BE UNIVERSITY
Computer Organization and Architecture
D
determined in a method that matches to the kind of memory used as the auxiliary store (usually a disk).
Figure 7 showsb virtual address translation scheme for paging:
E
virtual address
V
page # offset Page table Main memory
R
s p
E
frame 0
S
ster. r
s frame1
page table base
address
+ E
R
control bits
frame n
T
page descriptor
physical address
IG
r p
virtual address translation scheme for paging
Figure 7: Virtual Address Translation Scheme for Paging
R
In addition, a page descriptor has a number of control bits. They are the ones who decide on the page’s
status and kind. A page existence bit in main memory (page status), the acceptable access code, a
Y
modification registration bit, a swapping lock bit, and an operating system inclusion bit are examples
of control bits.
P
Address translation converts a virtual address of a word (byte) into a physical address in the main
O
memory. The page description is used to complete the translation. The descriptor gets found inside the
page table by linking the base page table address with the page number supplied in the virtual address
C
addition. The page status is read in the descriptor. The frame address is read from the descriptor if the
page is in the main memory. Following that comes the frame address.
The word (byte) offset from the virtual address is used to index the data. The physical address that results
is utilised to access data in the main memory. If the requested page does not exist in main memory, the
programme’s execution is halted and the “missing page” exception is thrown. The operating system
handles this exception. As a result, the missing page is transferred from the auxiliary store to the
main memory, and the address of the allocated frame is saved in the page descriptor with a control bit
adjustment. The stopped programme is then restarted, and the required word or byte is accessed.
10
UNIT 05: Introduction to Cache & Virtual Memory JGI JAIN
DEEMED-TO-BE UNIVERSITY
When a computer system has a large number of users or huge applications (tasks), each user or task
might have their own page table. In this example, the virtual memory control uses a two-level page
table. The Page Table Directory table is kept at the first level. It includes the base addresses for all of the
system’s page tables. Three fields are included in the virtual address: the page table number in the page
table directory, the requested page number, and the word (byte) offset.
Figure 8 shows the virtual address translation scheme with two-level page tables:
virtual address
D
Page table # Page # Offset
E
Page table directory Page tables
V
Page table
directory base
R
E
S
Physical
Page table base address +
E address
R
page descriptor Frame address
T
H
virtual memory based on segments defined by a programmer or a compiler. Segments have their own
distinct IDs, lengths, and address spaces.
Y
Data or instruction needs to be written in segments at successive addresses in the memory. Other users’
P
access privileges and owners have been identified by segments. Segments can be “private” for a single
person or “shared,” meaning they can be accessed by other users. Segment parameters may be modified
O
while the application is running, allowing you to specify the segment length and restrictions for mutual
access by many users on the fly.
C
The names and lengths of segments are used to arrange them in a shared virtual address space.
They can be found in either the main memory or the auxiliary store, which is often disc memory. The
operating system’s segmented memory control mechanism automatically fetches segments requested
by presently running applications and stores them in main memory.
In a system with multiple users, segmentation is a technique to extend the address space of user
programmes as well as a tool for intentional organised programme organisation with specified access
rights and segment protection.
11
JGI JAIN
DEEMED-TO-BE UNIVERSITY
Computer Organization and Architecture
D
8K
E
12K
V
R
16K
E
Figure 9: Different Segments in a Programme
S
In segmentation, the virtual address is consists of two fields: one is a segment number and other is
a word displacement inside the segment. The segment table stores a description for each segment. The
E
parameters of a segment, such as control bits, segment address, segment length, and protection bits,
are determined in a descriptor. Usually, the control bit contains the main memory bit, a segment type
R
code, authorised access type code, and size extension control. The protection bits stores the segment’s
privilege code in the data protection scheme.
T
When attempting to access the contents of a segment, the privilege level of the accessing programme is
compared to the privilege level of the segment, and the access rules are verified. Access to the segment
H
is prohibited if the access rules are not followed, and the exception “segment access rules violation” is
thrown. Figure 10 shows the virtual address translation scheme with segmentation:
IG
virtual address
segment #
R
offset
Y
Segment table
P
Comparison
segment descriptor
in limit
+
physical address
12
UNIT 05: Introduction to Cache & Virtual Memory JGI JAIN
DEEMED-TO-BE UNIVERSITY
D
E
virtual address
segment # page # offset Main memory
V
R
Segment page
tables
E
Segment table frames
segment table base +
+
S
address
E
+
physical
R
segment descriptor address
word
(byte)
T
register) that is loaded before the application starts. The address of the page table for the specified
segment, the segment size in pages, control bits, and protection bits are all contained in a segment
Y
descriptor.
P
The existence bit in the main memory, a segment type code, a code of authorised access type, and
extension control bits are all contained in the control bits. The privilege level of the segment in the
O
general protection system is stored in the protection bits. Figure 12 shows the segment descriptor
structure in segmented virtual memory with paging:
C
Protection Segment
Control bits length Segment page table address
bits (in pages)
Figure 12: Segment Descriptor Structure in Segmented Virtual Memory with Paging
13
JGI JAIN DEEMED-TO-BE UNIVERSITY
Computer Organization and Architecture
5.3.2 Address Translation
Process always uses virtual addresses. The Memory Management Unit (MMU) (a part of CPU) does
address translation. Caches recently used translations in a Translation Lookaside Buffer (Page Table
Cache). The page tables are stored in OS’s virtual address space. The page tables are (at best) present
in the MM – One main memory reference per address translation. To translate a virtual memory
address, the MMU has to read the relevant page table entry out of memory. Figure 13 shows the address
translation table:
D
MAIN MEMORY HARD DISK
Physical Page Numbers Disk Addresses V
E
Virtual Address VPN 0
V
VPN PO VPN 1
R
PPN Disk Address
E
S
PPN PO
Physical Address VPN N
E
R
Figure 13: Address Translation Table
T
zz
highly fast
zz Various types of mapping functions that is used by cache memory are as follows:
R
Direct Mapping
Associative Mapping
Y
Set-associative Mapping
P
zz Page replacement is obvious in demand paging, and there are many page replacement algorithms.
Executing an algorithm on a particular memory reference for a process and calculating the number
O
of page faults assess an algorithm. The sequence of memory references is called reference string.
C
zz A virtual or logical address is the address created by the CPU in a virtual memory system.
zz Virtual memory gives a computer programmer an addressing space that is several times bigger
than the main memory's physically available addressing area.
zz Virtual addresses, which might be considered artificial in certain ways, are used to place data and
instructions in this area.
zz Depending on the kind of virtual memory used, the sequences of virtual memory addresses that
correspond to these pieces are referred to as pages or segments.
14
UNIT 05: Introduction to Cache & Virtual Memory JGI JAIN
DEEMED-TO-BE UNIVERSITY
zz The number of the relevant fragment of the virtual memory address space and the word or byte
number in the supplied fragment make up a virtual memory address.
zz For modern virtual memory systems, we differentiate the following options:
Paged virtual memory
Segmented virtual memory
Segmented virtual memory with paging
zz The Memory Management Unit (MMU) (a part of CPU) does address translation.
D
5.5 Glossary
E
zz Cache memory: It is a form of memory that works as a buffer between the RAM and the CPU and is
V
highly fast.
Direct mapping: This method maps each block of main memory into only one potential cache line.
R
zz
zz Associative memory: It is utilised to store the content and addresses of the memory word in this
E
form of mapping.
Set-associative Mapping: This type of mapping is an improved version of direct mapping that
S
zz
eliminates the disadvantages of direct mapping.
zz
are larger in size than the physical main memory.
E
Virtual memory: It is a memory management technique that helps in executing programmes that
R
zz Demand paging: A set of techniques is provided by the virtual memory to execute the programme
that is not present entirely on the memory.
T
b. Cache memory
Y
c. SSD
P
d. None of these
2. Which of the following is/are type(s) of mapping functions that is used by cache memory?
O
a. Direct mapping
C
b. Associative mapping
c. Set-associative mapping
d. All of these
3. Which of the following mapping method maps each block of main memory into only one potential
cache line?
a. Direct mapping
b. Associative mapping
15
JGI JAIN
DEEMED-TO-BE UNIVERSITY
Computer Organization and Architecture
c. Set-associative mapping
d. All of these
4. In which of the following page replacement algorithm refers to replacement algorithm that replaces
the page that has been in the memory the longest?
a. Random-replacement
b. First-in-First-out
c. Belady’s Optimal Algorithm
D
d. Least Recently Used
5. Which of the following is a page replacement algorithm?
E
a. Least Frequently Used
V
b. Second chance replacement
c. Page classes
R
d. All of these
E
6. Which of the following page replacement algorithm selects a page for replacement if the page was
not often used in the past?
S
a. Not Recently Used
b. Least Recently Used
c. Belady’s Optimal Algorithm
E
R
d. Least Frequently Used
7. Which of the following is the address created by the CPU in a virtual memory system?
T
a. Virtual address
H
b. Cache address
c. Physical address
IG
d. None of these
8. The virtual address of a paged memory is made up of two parts: the page number and
R
the _________________.
a. Word displacement
Y
b. Virtual address
P
c. Both a and b
O
d. None of these
9. In which of the following segments are paged in virtual memory, which means they contain the
C
16
UNIT 05: Introduction to Cache & Virtual Memory JGI JAIN
DEEMED-TO-BE UNIVERSITY
10. Which of the following gives a computer programmer an addressing space that is several times
bigger than the main memory’s physically available addressing area?
a. Virtual memory
b. Cache memory
c. Physical memory
d. None of these
D
1. Cache memory is a type of memory that operates at extremely fast speeds. Discuss.
E
2. Various types of mapping functions that is used by cache memory are direct mapping, associative
mapping, and set-associative mapping. What do you understand by direct mapping?
V
3. Page replacement is obvious in demand paging, and there are many page replacement algorithms.
Discuss the random replacement algorithm.
R
4. A virtual or logical address is the address created by the CPU in a virtual memory system. Discuss.
E
5. What do you understand by segmented virtual memory?
S
5.7 Answers AND HINTS FOR Self-Assessment Questions
E
R
A. Answers to Multiple Choice Questions
Q. No. Answer
T
1. b. Cache memory
H
2. d. All of these
3. a. Direct mapping
IG
4. b. First-in-First-out
5. c. Page classes
R
8. a. Word displacement
P
1. A computer’s CPU can generally execute instructions and data quicker than it can acquire them from
a low-cost main memory unit. As a result, the memory cycle time becomes the system’s bottleneck.
Using cache memory is one technique to minimise memory access time. This is a tiny, quick memory
that sits between the CPU and the bigger, slower main memory. Refer to Section Cache Memory
2. Direct mapping is the most basic method, which maps each block of main memory into only one
potential cache line. Assign each memory block to a specified line in the cache via direct mapping. If
17
JGI JAINDEEMED-TO-BE UNIVERSITY
Computer Organization and Architecture
a memory block filled a line before and a new block needs to be loaded, then old block gets deleted.
Refer to Section Cache Memory
3. In random-replacement algorithm the replaced page is chosen at random. This is the memory
manager randomly choosing any loaded page. Because the policy calls for selecting the page to be
replaced by choosing the page from any frame with equal probability, it uses no knowledge of the
reference stream (or the locality) when it selects the page frame to replace. Refer to Section Cache
Memory
4. A virtual or logical address is the address created by the CPU in a virtual memory system. The
needed mapping is done by a specific memory control unit, sometimes known as the memory
D
management unit, which might have a distinct physical address. According to system needs, the
mapping function can be modified during programme execution. Refer to Section Virtual Memory
E
5. Segmented memory is another form of virtual memory. Programmes are constructed using this type
V
of virtual memory based on segments defined by a programmer or a compiler. Segments have their
own distinct IDs, lengths, and address spaces. Data or instruction needs to be written in segments at
R
successive addresses in the memory. Other users’ access privileges and owners have been identified
by segments. Refer to Section Virtual Memory
E
S
@ 5.8 Post-Unit Reading Material
zz
https://www.msuniv.ac.in/Download/Pdf/19055a11803e457
E
https://www.cmpe.boun.edu.tr/~uskudarli/courses/cmpe235/Virtual%20Memory.pdf
R
zz
zz Do the online research on cache memory and virtual memory and discuss with your classmates
about uses, advantages and disadvantages of cache memory & virtual memory.
IG
R
Y
P
O
C
18
UNIT
06
D
E
Arithmetic Operations
V
R
E
S
Names of Sub-Units
E
Introduction, Logic Gates, Flip Flops, Addition and Subtraction of Signed Numbers, Design of Fast
R
Adders, Multiplication of Positive Numbers, Signed Operand Multiplication, Fast Multiplication,
Integer Division.
T
Overview
H
This unit begins by discussing about the concept of arithmetic operations, logic gates and flip flops.
IG
Next, the unit discusses the addition and subtraction of signed numbers. Further, the unit explains the
design of fast adders and multiplication of positive numbers. Towards the end, the unit discusses the
signed operand multiplication, fast multiplication and integer division.
R
Y
Learning Objectives
P
Learning Outcomes
D
aa Explore the concept of fast multiplication and integer division
E
Pre-Unit Preparatory Material
V
aa http://flint.cs.yale.edu/cs422/doc/art-of-asm/pdf/CH09.PDF
R
6.1 Introduction
E
The arithmetic instructions in digital computers are used to modify data. Data is modified to provide the
S
results required to solve computing issues. The four basic arithmetic operations are addition, subtraction,
multiplication, and division. We can use these four operations to generate further operations if we wish.
E
R
6.2 Logic Gates
In the central processing unit, there is a distinct component called the arithmetic processing unit that
performs arithmetic operations. Arithmetic instructions are usually applied to binary or decimal data.
T
Integers and fractions are represented using fixed-point numbers. Negative numbers can be signed or
unsigned. The simplest arithmetic operation is fixed-point addition. When we wish to address an issue,
H
Algorithm refers to all of these stages taken together. We provide algorithms to tackle diverse issues. A
logic gate is a component in digital circuits that serves as a building block. They carry out basic logical
operations that are essential in digital circuits. Logic gates are included in almost every electrical
R
gadget we use today. Logic gates, for example, maybe found in gadgets such as smartphones, tablets,
and memory devices.
Y
Logic gates in a circuit make choices depending on a mix of digital signals from their inputs. There
are two inputs and one output on most logic gates. Boolean algebra is used to create logic gates. Every
P
terminal is in one of two binary states at any one time: false or true. False is equal to 0 and true is equal
to 1. The binary output will vary depending on the type of logic gate and the mix of inputs. A logic gate
O
may be compared to a light switch, with the output being off in one position and on in the other. In
integrated circuits, logic gates are commonly used (IC).
C
2
UNIT 06: Arithmetic Operations JGI JAIN
DEEMED-TO-BE UNIVERSITY
and “false” otherwise. In other words, the output is 1 only when both inputs one AND two are 1. The
truth table of AND gate is shown in Table 1:
D
1 1 1
E
zz OR gate: The OR gate takes its name from the fact that it works in the same way as the logical
inclusive "or." If any or both of the inputs are "true," the result is "true." The output is "false" if both
V
inputs are "false." To put it another way, for the output to be 1, at least one OR both of the inputs
must be 1. The truth table of OR gate is shown in Table 2:
R
Table 2: Truth Table of OR Gate
E
Input 1 Input 2 Output
S
0 0 0
0 E
1 1
R
1 0 1
1 1 1
T
zz NOT gate: A NOT gate is sometimes called a logical inverter to distinguish it from other forms of
electronic inverter devices, which have only one input. It contraries the logic state. If the input is 1,
H
then the output is 0 and if the input is 0, then the output is 1. The truth table of NOT gate is shown
in Table 3:
IG
0 1
Y
1 0
NAND gate: The NAND gate works as an AND gate followed by a NOT gate. It works in the way of the
P
zz
logical operation “AND” followed by a cancellation. The output is “false” if both inputs are “true.” If
O
not, the output is “true”. The truth table of NAND gate is shown in Table 4:
3
JGI JAIN
DEEMED-TO-BE UNIVERSITY
Computer Organization and Architecture
zz NOR gate: The NOR gate is a mixture of OR gate followed by an inverter. Its output is "true" if both
inputs are "false." If not, the output is "false." The truth table of NOR gate is shown in Table in 5:
D
1 1 0
XOR gate: The XOR (exclusive-OR) gate works in the similar way as the OR gate. The output is "true"
E
zz
if either 1 or high and the output is "false" if both inputs are "false" or if both inputs are "true." An
additional method to look at this circuit is to sign that the output is 1 if the inputs are different and
V
the output is 0 if the inputs are same. The XOR operation is shown in Table 6:
R
Table 6: Truth Table of XOR Gate
E
Input1 Input 2 Output
0 0 0
S
0 1 1
1
1
0E
1
1
0
R
zz XNOR gate: The XNOR (exclusive-NOR) gate is a mixture of XOR gate followed by an inverter. The
output of the XNOR gate is “true” if the inputs are same and the output of XNOR gate is “false” if the
inputs are different. The operation of XNOR gate is shown in Table 7:
T
0 0 1
0 1 0
1 0 0
R
1 1 1
Y
In a consecutive circuit, the current output is not simply determined by the current input but also
depends on the previous output. Flip flops are the simplest kind of consecutive circuits. A flip-flop can
O
preserve a binary state uniqueness which means it can act as 1-bit memory cell. It is a circuit that keeps
its state until it is commanded to alter it by input. Four NAND or four NOR gates can be used to make a
C
4
UNIT 06: Arithmetic Operations JGI JAIN DEEMED-TO-BE UNIVERSITY
considered as S and another is known as “RESET” which will reset the device (output = 0) considered as
R. The RS stands for ET/RESET. The truth table of RS flip flop is shown in Table 8:
S R QN QN+1
0 0 0 0
0 0 1 1
0 1 0 0
D
0 1 1 0
E
1 0 0 1
1 0 1 1
V
1 1 0 -
R
1 1 1 -
E
6.3.2 JK Flip Flop
S
The JK flip flop is also known as SR flip flop having the accumulation of a clock input circuitry. The
unacceptable or prohibited output condition arises when both the inputs are set to 1 and are prohibited
E
by the addition of a clock input circuit. So, the JK flip flop has four possible input mixtures, i.e., 1, 0, “no
change” and “clasp”. The truth table of JK flip flop is shown in Table 9:
R
Table 9: Truth Table of JK Flip Flop
T
J K QN QN+1
0 0 0 0
H
0 0 1 1
IG
0 1 0 0
0 1 1 0
1 0 0 1
R
1 0 1 1
Y
1 1 0 1
1 1 1 0
P
O
output follows the state of ‘D’. It requires only two inputs D and CP. The truth table of D flip flop is shown
in Table 10:
Table 10: Truth Tables of D Flip Flop
Q D Q(t+1)
0 0 0
0 1 1
5
JGI JAIN
DEEMED-TO-BE UNIVERSITY
Computer Organization and Architecture
Q D Q(t+1)
1 0 0
1 1 1
D
Table 11: Truth Tables of T Flip Flop
E
T Qn Qn+1
V
0 0 0
0 1 1
R
1 0 1
1 1 0
E
S
6.4 Addition and Subtraction of Signed Numbers
The magnitude of the two integers is denoted by the letters A and B. There are eight distinct criteria
E
to consider when signed numbers are added or subtracted, depending on the sign of the numbers and
R
the operation done. First column has a list of these situations. The table’s other columns depict the
actual procedure to be carried out, together with the size of the figures. To show a negative zero, the last
column is required. In other words, the outcome of subtracting two equal integers should be +0, not -0.
T
6
UNIT 06: Arithmetic Operations JGI JAIN
DEEMED-TO-BE UNIVERSITY
The signed 2’s complement addition and subtraction method is shown in Figure 2:
Hardware
B Register
Complementer and
V
Parallel Adder
Overflow
AC
D
Algorithm
Subtract Add
E
Minuend in AC Augend in AC
Subtrahend in B Addend in B
V
AC AC + B’+ 1 AC AC + B
R
V overflow V overflow
E
END END
S
Figure 2: Signed 2’s Complement Addition and Subtraction
Minuend in A Augend in A
H
Subtrahend in B Addend in B
=0 =1 =1 =0
IG
As Bs As Bs
As = Bs As = Bs
As = Bs As = Bs
EA A+B+1 EA A+B
AVF O
R
=0 =1 AVF E
E
A<B A>B
Y
A A =0 =0
A
P
A A+1 As O
As As
O
END
(result is in A and As)
C
7
JGI JAIN
DEEMED-TO-BE UNIVERSITY
Computer Organization and Architecture
to supply its carry before proceeding. As a result, there will be a significant time delay due to carry
propagation delay. The design of fast adders is shown in Figure 4:
A3 B3 A2 B2 A1 B1 A0 B0
D
S3 S2 S1 S0
E
Figure 4: Design of Fast Adders
V
The input signals are applied as soon as to the consistent comprehensive adder, the sum S3 is produced.
However, until carry C3 is accessible at its steady state value, the carry input C4 is not available on its
R
ultimate steady state value. Similarly, C3 is dependent on C2 and C2 is dependent on C1. As a result, the
carry must propagate through all stages for output S3 and carry C4 to reach their ultimate steady-state
E
value.
S
The circulation time is intended by multiplying for each adder block’s circulation delay by the number
of adder blocks in the circuit. For example, if each comprehensive adder stage has a 20 nanosecond
E
circulation delay, S3 will reach its final exact value after 60 (20*3) nanoseconds. If we increase the
number of steps to add additional bits, the problem becomes worse.
R
6.5.1 Carry Look-ahead Adder
T
By incorporating more complicated circuitry, a carry look-ahead adder lowers propagation latency. The
ripple carry design is properly modified in this design, reducing the carry logic across fixed groups of
H
bits of the adder to two-level logic. Let’s take a closer look at the design. An example of carry look-ahead
adder as shown in Figure 5:
IG
A
B S
R
Cin
Y
Cout
P
O
A B C C+1 Conditions
0 0 0 0 No Carry Generate
0 0 1 0
0 1 0 0
8
UNIT 06: Arithmetic Operations JGI JAIN
DEEMED-TO-BE UNIVERSITY
A B C C+1 Conditions
0 1 1 1 No Carry Propogate
1 0 0 0
1 0 1 1
1 1 0 1 Carry Generate
1 1 1 1
6.6 Multiplication
D
We observe the successive bits of the multiplier in the operation of multiplication, starting with the
E
minimum substantial bit. The multiplicand is copied down of the multiplier bit is 1, else 0s are copied
down. The numbers copied down in subsequent lines are moved to the left by one position from the
V
preceding number. After that, the numbers are put together to make the product. The sign of the
multiplicand and multiplier determines the sign of the result. If they’re the same, it’s a good indication,
R
but if they’re not, it’s a bad omen.
E
6.6.1 Signed Operand Multiplication
S
In the beginning, the multiplicand is in B and the multiplier in Q. Their corresponding signs are in Bs and
Qs respectively. We compare the signs of both A and Q and set to corresponding sign of the product since
E
a double-length product will be stored in registers A and Q. Registers A and E are vacant and the order
counter SC is a group of the number of bits of the multiplier. Subsequently an operand must be kept
R
with its sign, one bit of the word will be obtained by the sign and the magnitude will contain of n-1 bits.
The multiplier’s of minimum order bit in Qn is now being tested. The multiplicand (B) is further added
T
to the current fractional product (A) if it is 1, else it is 0. Next the register EAQ is stimulated to the right
once more to produce the new fractional product. Then it is reset to 1 and the next value is verified. The
H
procedure is repeated if it is not equal to zero, and a new partial product is created. When SC = 0, the
process is terminated. The hardware implementation of signed multiple operations is shown in Figure 6:
IG
R
Complementer
P
0 E
9
JGI JAIN
DEEMED-TO-BE UNIVERSITY
Computer Organization and Architecture
Multiply operation
Multiplicand in B
Multiplier in Q
As Qs Bs
D
Qs Qs Bs
A 0, E 0
E
SC n-1
V
=0 Qn =1
R
E
shr EAQ EA A+B
SC SC-1
S
=0 END
SC
=0
E (products is in AQ)
R
Figure 7: Flowchart of Signed Multiple Operations
T
Unsigned numbers don’t have any sign; these can hold only magnitude of the number. So, representation
of unsigned binary numbers is all positive numbers only. For example, representation of positive decimal
IG
numbers is positive by default. We always adopt that there is a positive sign symbol in front of every
number. The block diagram of unsigned operand multiplication is shown in Figure 8:
R
Multiplicand
Mn-1 M0
Y
P
O
Shift Right
C An-1 A0 Qn-1 Q0
Multiplier
10
UNIT 06: Arithmetic Operations JGI JAIN
DEEMED-TO-BE UNIVERSITY
START
C,A 0
M Multiplicand
Q Multiplier
Count n
No Yes
D
Q0 = 1?
E
C,A A+M
V
Shift C, A, Q
Count Count = 1
R
No Yes
E
Count = 0? END Product
in A,Q
S
Figure 9: Flowchart of Unsigned Operand Multiplication
zz In the second method, high speed parallel multipliers create the partial products in a parallel way
and add them by a fast multiplier operand adder.
H
zz The third method is to use of an array of identical blocks that creates and adds the partial products
IG
concurrently.
Booth’s Algorithm
R
In the booth algorithm we can multiply the two signed binary integer’s values in 2’s complement. It is
also used to speed up the presentation of the multiplication process. It is very effective too. The booth’s
Y
Sequence
BR Register
O
Counter
C
AC Register QR Register
11
JGI JAIN
DEEMED-TO-BE UNIVERSITY
Computer Organization and Architecture
START
A 0,Q -1 0
M Multiplicand
Q Multiplier
Count n
D
E
=10 = 01
V
Q0, Q-1
R
=11
=00
E
A A-M A A+M
S
Arithmetic Shift
Right: A, Q, Q-1
Count
E Count - 1
R
No Yes
Count = 0? END
T
H
0 or 1, binary division is significantly easier than decimal division because there is no need to predict
how many times the dividend or partial residual fits inside the divisor.
Y
Figure below depicts the division process. An example of integer division is as follows:
P
O
000010101 Quotient
Divisor 1101 100010010 Dividend
C
- 1101
10000
- 1101
1110
- 1101
1 Remainder
12
UNIT 06: Arithmetic Operations JGI JAIN
DEEMED-TO-BE UNIVERSITY
The devisor is compared to the dividend’s five most important bits. We repeat the operation since the
5-bit integer is smaller than B. As the 6-bit number is greater than B, we can add a (1) in the 6th place over
the dividend on behalf of quotient bit. The divisor is now shifted to the right and subtracted from the
payout. Since, the division might have completed here and resulting in a quotient of 1 and the remaining
is equal to the partial remainder, the difference is known as a partial remainder. The process is continual
by associating a partial remainder to the divisor.
The quotient bit is equal to 1 if the partial remainder is larger than or equal to the divisor. After that, the
divisor is moved right and the partial remainder is subtracted.
D
The quotient bit is 0 if the partial residual is smaller than the divisor, and no subtraction is required.
In each case, the divisor is moved one to the right. Clearly, the result yields a quotient as well as a
E
remainder.
V
6.7.1 Hardware Implementation for Signed-Magnitude Data
R
It is suitable to change the method slightly in hardware execution for signed-magnitude data in a digital
computer. Rather than moving the divisor to the right, two dividends, or partial remainders, are shifted
E
to the left, putting the two integers in the proper relative position. Subtraction is accomplished by
multiplying A by B’s 2’s complement.
S
The relative magnitudes are shown by the end carry. The hardware requirements are the same as for
E
multiplication. With 0 put into Qn, register EAQ is now moved to the left, and the prior value of E is lost.
Figure 12 depicts an example of the suggested division procedure. The divisor is kept in the B register,
R
whereas the double-length dividend is kept in the A and Q registers. The divisor is subtracted by
accumulation its 2’s complement value and the dividend are stimulated to the left.
T
Bs
Complementer and
P
parallel adder
O
Qs (rightmost bit)
C
As Qn
E A Register Q Register
13
JGI JAIN
DEEMED-TO-BE UNIVERSITY
Computer Organization and Architecture
Divide Operation
Dividend in AQ
Divisor in B
D
Divide magnitudes
E
V
Qs<-As Bs
SC<- n - 1 shl EAQ
R
E
=0 =1
E
S
EA <- A + B’ + 1
E EA <- A + B’ + 1 A <- A + B’ + 1
R
=1 =0
E
=1
T
A>B A<B
E
H
EA <- A + B Qn<- 1
R
SC<- SC - 1
Y
P
=0 =0
SC
O
C
END
END (Quotient is in Q
(Divide Overflow) remainder is in A)
14
UNIT 06: Arithmetic Operations JGI JAIN
DEEMED-TO-BE UNIVERSITY
Conclusion 6.8 Conclusion
D
zz The NOR gate is a mixture of OR gate followed by an inverter.
E
zz The XNOR (exclusive-NOR) gate is a mixture of XOR gate followed by an inverter.
The flip flops are the simplest kind of consecutive circuits.
V
zz
R
zz The JK flip flop has four possible input mixtures, i.e., 1, 0, “no change” and “clasp”.
The design of fast adders is the two bits to be added instantly exist in the current carry adders for
E
zz
every adder block.
S
zz A sequence of successive link, shift, and subtract operations is used to divide the two fixed-point
binary integers.
E
R
6.9 Glossary
zz Flip-Flop: A flip-flop is a circuit that keeps its state until it is commanded to alter it by input. Four
T
zz
operator.
IG
zz Multiplication: The techniques of consecutive shifts and add operation is used to multiply two fixed
point binary numbers in signed magnitude format.
zz RS Flip Flop: It is measured as one of the simplest consecutive logic circuits.
R
zz D flip flop: It is a clocked flip flop with only digital input ‘D’.
JK flip flop: It is also known as SR flip flop having the accumulation of a clock input circuitry.
Y
zz
zz Design of Fast Adders: In the design of fast adders the two bits to be added instantly exist in the
current carry adders for every adder block.
O
zz Integer division: A sequence of successive link, shifts, and subtracts operations is used to divide the
C
15
JGI JAIN
DEEMED-TO-BE UNIVERSITY
Computer Organization and Architecture
D
c. Encoder
d. All of these
E
2. ____________ is considered to be the important characteristics of computers.
V
a. Precision
b. Storage
R
c. Speed
E
d. All of these
S
3. Which among the following is the normal form of S-R flip flop?
a. Single-Reset
b. Set Reset
E
R
c. Simple-Reset
d. None of these
T
b. Mid-terms
IG
c. Partial Products
d. Multipliers
5. Which among the following is frequently called the double accuracy format?
R
a. 64-bit
Y
b. 16-bit
P
c. 32-bit
d. 128-bit
O
6. By using a single transistor which of these is used to build the digital logic gates?
C
a. AND gates
b. XOR gates
c. NOT gates
d. OR gates
7. When the set is restricted and reset is allowed in S-R flip flop then the output will be ____________.
a. Set Reset
b. Reset
16
UNIT 06: Arithmetic Operations JGI JAIN
DEEMED-TO-BE UNIVERSITY
c. No change
d. None of these
8. In the logic gates the JK flip flop has ____________ memory.
a. Random b. Permanent
c. True d. Temporary
9. Which of the following flip flop is available during the conversion of SR flip flop to JK flip flop?
a. D flip flop b. SR flip flop
D
c. T flip flop d. All of these
10. Which among the following is the application of flip flop?
E
a. Decoders b. Storage devices
V
c. Encoders d. None of these
R
B. Essay Type Questions
E
1. The arithmetic instructions in digital computers are used to modify data. What is an arithmetic
operation?
S
2. In the central processing unit, there is a distinct component called the arithmetic processing unit
that performs arithmetic operations. Determine the concept of logic gates.
E
3. In a consecutive circuit, the current output is not simply determined by the current input but also
R
depends on the previous output. Explain the types of flip flops.
4. There are eight distinct criteria to consider when signed numbers are added or subtracted. Describe
the method of addition and subtraction of signed numbers.
T
5. A sequence of successive link, shift, and subtract operations is used to divide the two fixed-point
H
Q. No. Answer
P
1. a. Flip Flop
2. d. All of these
O
3. b. Set Reset
C
4. c. Partial Products
5. a. 64-bit
6. c. NOT gates
7. b. Reset
8. c. True
9. b. SR flip flop
10. b. Storage devices
17
JGI JAINDEEMED-TO-BE UNIVERSITY
Computer Organization and Architecture
D
every electrical gadget we use today. Logic gates, for example, maybe found in gadgets such as
smartphones, tablets, and memory devices.
E
Refer to Section Logic Gates
V
3. In a consecutive circuit, the current output is not simply determined by the current input but also
depends on the previous output. Flip flops are the simplest kind of consecutive circuits. A flip-flop
R
can preserve a binary state uniqueness which means it can act as 1-bit memory cell. It is a circuit
that keeps its state until it is commanded to alter it by input. Four NAND or four NOR gates can be
E
used to make a simple flip-flop.
Refer to Section Flip Flops
S
4. The magnitude of the two integers is denoted by the letters A and B. There are eight distinct criteria
E
to consider when signed numbers are added or subtracted, depending on the sign of the numbers
and the operation done.
R
Refer to Section Addition and Subtraction of Signed Numbers
5. A sequence of successive link, shift, and subtract operations is used to divide the two fixed-point
T
binary integers in signed magnitude arrangement with paper and pencil. Because the quotient
digits are either 0 or 1, binary division is significantly easier than decimal division because there is
H
no need to predict how many times the dividend or partial residual fits inside the divisor.
Refer to Section Integer Division
IG
zz https://vignan.ac.in/subjectspg/MC119.pdf
Y
zz http://flint.cs.yale.edu/cs422/doc/art-of-asm/pdf/CH09.PDF
P
zz Discuss with your friends and classmates about the concept of arithmetic operations and its uses
with real world examples. Also, discuss about the concept of different logic gates, addition and
subtraction of signed numbers.
18
UNIT
07
D
E
Basic Processing Unit
V
R
E
S
Names of Sub-Units
E
Fundamental Concepts of Processing Unit, Execution of a Complete Instruction, Multiple Bus
R
Organization, Hardwired Control, Micro Programmed Control Unit
T
Overview
H
This unit begins by discussing the concept of the processing unit. Furthermore, the unit explains the
IG
complete execution process of instruction. Next, the unit discusses the multiple bus organisation. In
addition, the unit describe the concept of hardwired control unit. Towards the end, the unit discusses
the microprogrammed control unit.
R
Y
Learning Objectives
P
Learning Outcomes
D
aa Analyse the fundamentals of microprogrammed control unit
E
Pre-Unit Preparatory Material
V
https://miet.ac.in/assets/uploads/cs/Punit%20Mittal%20Monogram%20Control%20Unit.pdf
R
aa
E
7.1 INTRODUCTION
The processing unit of a computer is responsible for executing machine-language instructions and
S
coordinating the actions of other units. The central processing unit is another name for the processing
unit. The CPU performs all sorts of data processing activities as well as all of the computer’s key tasks.
E
It enables input and output devices to communicate with one another and carry out their functions. It
also keeps track of input data, interim outcomes from processing and commands. A typical computer
R
task consists of several steps described by a programme, which is made up of a sequence of machine
instructions. Instruction is carried out by performing a series of simpler actions.
T
A normal computational task includes a set of actions described by a programme, which is made up
of machine-language instructions. The processor fetches each instruction one by one and executes the
IG
defined operation. Until a branch or a jump instruction is encountered, instructions are fetched from
sequential memory blocks. The CPU keeps a list of the address of the next instruction to be fetched and
processed using the programme counter (PC). After retrieval of an instruction, the PC’s contents are
R
1. Fetch the contents of the memory location that the PC has pointed to. The instructions to be executed
P
that are stored at this memory location, which is then loaded into the IR. The needed action in
register transfer notation is:
O
IR ← [[PC]]
2. Increase the PC to point out the next instruction. For example, if the PC is incremented by 8, then the
C
2
UNIT 07: Basic Processing Unit JGI JAIN
DEEMED-TO-BE UNIVERSITY
D
zz Memory address registers (MAR): The System Bus address lines are linked to it. MAR register
defines the location in memory where a read or write operation will take place
E
zz Memory buffer register (MBR): The data lines of the system bus are linked to it. MBR register
contains either the memory value to be saved or the most recent value read from the memory
V
zz Program counter (PC): It stores the address of the next instruction to be fetched
R
zz Instruction register (IR): It stores the last instruction to be fetched
E
Fetch Cycle
At the start of the fetch cycle, the PC contains the location of the next instruction to execute. The steps
S
of the fetch cycle are as follows:
zz
address in the PC is transferred to it.
E
Step 1: Because the MAR is the sole register connected to the address lines of the system bus, the
R
zz Step 2: The address in MAR is placed on the address bus; the control unit then sends a Read order on
the control bus, and the result is shown on the data bus before being transferred into the memory
buffer register. To prepare for the following instruction, the programme counter PC is increased by
T
Decode Cycle
Once an instruction has been acquired, the following step is to fetch source operands. Source Operand
obtains indirect addressing (which may be achieved via any addressing mode, but is done here via
R
indirect addressing). Register-based operands do not need to be fetched. If the opcode is used, a similar
procedure will be required to store the result in the main memory.
Y
zz Step 1: The instruction address field is provided to the MAR. This is used to fetch the address of the
operand.
O
zz Step 2: The MBR is used to update the address field of the IR.
Step 3: The IR has now been assigned to the state. IR is now ready for the execution phase.
C
zz
Execute Cycle
The steps of execute cycle are as follows:
zz Step 1: The address component of IR is loaded into the MAR.
zz Step 2: The MBR is used to update the address field of the IR, allowing the reference memory location
to be read.
zz Step 3: The ALU now combines the contents of IR and MBR.
3
JGI JAIN
DEEMED-TO-BE UNIVERSITY
Computer Organization and Architecture
Multiple Multiple
D
Operands Results
E
Instruction Instruction Operand Operand
V
Data
Address Operation Address Address
Operation
Calculation Decoding Calculation Calculation
R
E
Instructon Complete, Return for String
Fetch Next Instruction or Vector Data
S
E
Figure 1: Instruction Cycle State Diagram
R
7.3.1 Fetching a Word from Memory
The address of the requested information word is transferred from the CPU to the MAR. The requested
T
word’s address is moved to the main memory. Meanwhile, the CPU utilises the memory bus’s control
lines to indicate that a read operation is required.
H
Following this request, the CPU waits for a response from memory, indicating that the needed function
has been performed completely. Memory function completed (MFC) is a control signal on the memory
IG
bus that is used to achieve this. This signal is set to one by the memory to indicate that the contents of a
specific memory location have been read and are accessible on the memory bus data lines.
R
We’ll assume that when the MFC signal is set to one, the data lines’ information is loaded into MDR and
thus ready for use within the CPU. It completes the memory retrieval procedure. Figure 2 shows the
Y
Memory-bus Internal
Data Lines Processor Bus
O
MDRoutE MDRout
C
X X
MDR
X X
MDRinE MDRin
4
UNIT 07: Basic Processing Unit JGI JAIN
DEEMED-TO-BE UNIVERSITY
For example, the course of actions needed for the Move (R1), R2 instruction are as follows:
MAR ← [R1]
zz Start Read operation on the memory bus
zz Wait for the response of the MFC from the memory
zz Load MDR from the memory bus
zz R2 ← [MDR]
D
7.3.2 Storing a Word in Memory
A similar technique is used when writing a word into a memory region. MAR is loaded with the required
E
address. The data to be written is then loaded into MDR, followed by a write command.
V
Example: The course of actions needed for the Move R2,(R1) instruction are as follows:
zz R1 out, MAR in
R
zz R2 out, MDR in, Write
E
zz MDR outE, WMFC
S
7.3.3 Execution of a Complete Instruction
The steps that are used to control the sequence for execution of the instruction Add (R3),R1 are as follows:
1. PC out, MAR in, Read, Select 4, Add, Z in
IG
The contents of the PC are replaced with the branch destination address, which is typically acquired by
adding an offset X to the branch instruction. The difference between the branch destination address
and the address immediately after is generally the offset X.
The steps that are used to control sequence for an unconditional branch instruction are as follows:
1. PC out, MAR in, Read, Select4, Add, Z in
2. Z out, PC in, Y in, WMF C
3. MDR out, IR in
5
JGI JAIN
DEEMED-TO-BE UNIVERSITY
Computer Organization and Architecture
D
Bus A Bus B Bus C
E
Incrementer
V
PC
R
Register
File
E
Constant 4
S
A
ALU R
B
E
R
Instruction
Decoder
T
IR
H
MDR
IG
MAR
Memory-bus Address
R
contains all general-purpose registers. The register file in this picture has three ports, according to
the illustration’s caption. Both buses A and B are equipped with outputs, allowing the contents of two
O
separate registers to be accessed concurrently. Data from bus C can be put into a third register during
the same clock cycle using port number 3.
C
The source operands are sent to the ALU’s A and B inputs, where an arithmetic or logic operation can
be performed, via Buses A and B. The outcome is transported to the final destination using bus C. If
necessary, the ALU can simply transfer one of its two input operands to bus C without modification.
R=A or R=B are the ALU control signals for such an operation. The introduction of the Incrementer
unit, which is used to increment the PC by 4, is the second innovation in Figures Using the Incrementer
instead of the main ALD to add 4 to the PC removes the requirement for a single bus organisation to do
so. The constant 4 sources at the ALU input multiplexer can still be used.
6
UNIT 07: Basic Processing Unit JGI JAIN
DEEMED-TO-BE UNIVERSITY
The steps that are used to control the sequence for execution of the instruction Add R4, R5, R6 are as
follows:
1. PCout, R=B, MARin, Read, IncPC
2. WMFC
3. MDRout, R=B, IRin
4. R4outA, R5outB, SelectA, Add, R6in, End
D
The description of the steps of instruction execution is as follows:
zz Step 1: The contents of the PC are transmitted via the ALU and loaded into the MAR using the R=B
E
control signal to initiate a memory read operation. The PC is also increased by 4 at the same time. It
is worth noting that the value placed into MAR is the PC’s original contents. The increased value is
V
put into the PC at the conclusion of each clock cycle and has no effect on MAR’s contents.
Step 2: the processor waits for MFC to arrive.
R
zz
zz Step 3: The processor stores the desired data into the MDR and then sends it to the IR.
E
zz Step 4: In a single step, the instruction is decoded and the add operation is performed.
S
7.5 HARDWIRED CONTROL UNIT
E
A hardwired control is a method of generating control signals with the help of finite state machines
(FSM). It is made in the form of a sequential logic circuit. Physically connecting elements such as gates,
R
flip flops and drums are used to create the finished circuit. As a result, it is known as a hardwired
controller.
T
Hardwired control unit utilises fixed logic circuits to understand the instructions and produce control
signals. These control signals are determined with the help of the contents of the control step counter,
H
instruction register, and condition code flags as well as the external input signals such as MFC and
interrupt requests. Figure 4 shows the hardwired control unit:
IG
External
P
Inputs
IR Decoder/
O
Encoder
Condition
C
Codes
Control Signals
7
JGI JAIN
DEEMED-TO-BE UNIVERSITY
Computer Organization and Architecture
In Figure 4, the decoder/encoder block is a hybrid circuit that creates the needed control outputs based
on the current state of all of its inputs.
The more comprehensive block diagram is obtained by separating the decoding and encoding processes
as shown in Figure 5:
CLK
Clock
Control Step
Counter
D
E
V
Step Decoder
R
T1 T2 T3
E
S
INS1
INS2
E External Inputs
R
IR Instruction
Decoder Encoder
T
End End
R
Y
Control Signals
P
The instruction decoder reads the instruction from the IR and decodes it. If IR is an 8-bit register, the
instruction decoder will produce 28 (256) lines, one for each instruction. Each machine instruction is
C
represented by a distinct output line INS1 through INSm. Depending on the code in the IR, one of the
output lines from INS1 to INSm is set to 1 and the rest all output lines are set to 0.
Each step in the control sequence has its own signal line, which is provided by the step decoder. The
encoder takes the input from the Instruction decoders, step decoders, external inputs, and condition
codes all feed into the encoder. Now, the encoder produces the individual control signals Zin, such as Yin,
PCout, Add, and End with the help of these inputs. The equation of the Zin control signal is as follows:
Zin=T1+T6.ADD+T4.BR
8
UNIT 07: Basic Processing Unit JGI JAIN
DEEMED-TO-BE UNIVERSITY
Figure 6 shows a logic circuit that depicts how the encoder creates the Zin control signal for the processor
organisation:
Branch Add
T4 T6
T1
D
E
Zin
V
Figure 6: Creating the Zin Control Signal for the Processor Organisation
R
The logic function is implemented in this circuit. This signal is asserted for all instructions within time
E
slot T1, an Add instruction during T6, an unconditional branch instruction during T4, and so on.
S
The end signal is generated when each instruction has been completed. The equation of the End control
signal is as follows:
E
End = T7 . Add + T6.BR + (T5.N + T4. N¯). BRN + ...
R
Figure 7 shows the logic circuit of the End function:
T
N N
T7 T5 T4 T5
IG
R
Y
P
End
O
9
JGI JAIN
DEEMED-TO-BE UNIVERSITY
Computer Organization and Architecture
D
zz
E
7.6 MICROPROGRAMMED CONTROL UNIT
In a microprogrammed control unit, the control signals are generated by using a software technique.
V
With the help of the software, the creation of control signals may be identified. This software is saved
in the processor’s special memory, which is smaller and quicker than regular memory. The software is
R
called a microprogram, and the memory is called a microprogram memory or control store.
E
Figure 8 shows the microprogrammed control unit:
S
Instruction
Register Clock
E
R
Input from Status
and Flag Registers Address Generation
T
Select on
H
Instruction
Microprogram Memory Control
Store
IG
Buffer
R
Y
Specific Microinstruction
O
C
Control Signals
10
UNIT 07: Basic Processing Unit JGI JAIN
DEEMED-TO-BE UNIVERSITY
3. The microinstruction address generator produces the beginning address of the associated micro
procedure in the control store after decoding the instruction.
4. The microinstruction address generator is also responsible to loads the beginning address of the
microprogram routine into the microprogram counter. This allows the microinstruction address
generator to keep track of the addresses of the routine’s subsequent microinstructions.
5. The microinstruction address generator increases the microprogram counter in order to read the
next instruction in the control store’s micro procedure.
6. With the use of a bit in the micro routine’s last instruction, the end of the micro routine may be
D
established. The last bit is referred to as the end bit. When the end bit is set to 1, the micro procedure
has been completed successfully. The new instruction is fetched as a result of this.
E
V
7.6.1 Types of Microprogrammed Control Unit
Depending on the type of control word stored in the control memory (CM), the microprogrammed
R
control units are categorised into two categories, which are as follows:
E
zz Horizontal microprogrammed control unit: The control signals are provided in a 1 bit/CS decoded
binary format. For example, if the CPU has 53 control signals, then 53 bits are necessary. At any
S
given moment, more than one control signal can be activated. Longer control words are supported.
It is employed in applications that require parallel processing. It provides for a greater amount of
E
parallelism. When the degree is set to n, n CS is activated at the same time. There is no need for any
R
additional gear (decoders). It indicates that it is quicker than Vertical Microprogrammed. It is more
adaptable than vertically microprogrammed systems.
Vertical microprogrammed control unit: The encoded binary format is used to represent the
T
zz
control signals. Log2(N) bits are required for N control signals. Shorter control words are supported.
It is more adaptable since it facilitates the addition of new control signals. It supports a low
H
degree of parallelism, i.e., the degree of parallelism might be 0 or 1. It is slower than horizontal
microprogrammed because it requires additional hardware (decoders) to create control signals. It
IG
has less flexibility than a horizontal control unit, but it is more adaptable than a hardwired control
unit.
R
zz
zz It can make the control unit is design easier. As a result, it is less costly and less prone to errors.
zz It can design in a methodical and ordered manner.
zz It is used to control software-based operations rather than hardware-based functions.
zz It is more adaptable.
zz It is used to do complicated functions with ease.
11
JGI JAIN DEEMED-TO-BE UNIVERSITY
Computer Organization and Architecture
zz The processing unit of a computer is responsible for executing machine-language instructions and
coordinating the actions of other units.
D
zz The CPU performs all sorts of data processing activities as well as all of the computer’s key tasks.
E
zz A normal computational task includes a set of actions described by a programme, which is made up
of machine-language instructions.
V
zz The processor fetches each instruction one by one and executes the defined operation.
R
zz A programme is a collection of instructions stored in the memory unit of a computer.
E
zz The address of the requested information word is transferred from the CPU to the MAR.
A similar technique is used when writing a word into a memory region. MAR is loaded with the
S
zz
required address. The data to be written is then loaded into MDR, followed by a write command.
zz
bus.
E
During a clock cycle in a single-bus organisation, just one data item can be transmitted across the
R
zz Commercial processors have numerous internal pathways that allow multiple transfers to take
place in parallel to minimise the number of steps needed.
T
zz A hardwired control is a method of generating control signals with the help of finite state machines
(FSM).
H
zz Hardwired control unit utilises fixed logic circuits to understand the instructions and produce
IG
control signals.
zz In a microprogrammed control unit, the control signals are generated by using a software technique.
R
7.8 GLOSSARY
Y
zz
zz Memory address registers (MAR): It defines the location in memory where a read or write operation
C
12
UNIT 07: Basic Processing Unit JGI JAIN
DEEMED-TO-BE UNIVERSITY
D
c. Data storage
E
d. Memory unit
V
2. Which of the following contains either the memory value to be saved or the most recent value read
from the memory?
R
a. MAR
b. MBR
E
c. PC
S
d. IR
3. E
Which of the following performs all sorts of data processing activities as well as all of the computer’s
key tasks?
R
a. Control unit
b. CPU
T
c. Functional unit
H
d. Memory unit
4. A normal computational task includes a set of actions described by a programme, which is made up
IG
of __________ instructions.
a. High-level language
b. Machine language
R
c. Processor
Y
d. Raw
5. Which of the following is a collection of instructions stored in the memory unit of a computer?
P
a. Logic gates
O
b. Data
C
c. Information
d. Programme
6. The system bus address lines are linked to __________.
a. MAR
b. MBR
c. PC
d. IR
13
JGI JAIN DEEMED-TO-BE UNIVERSITY
Computer Organization and Architecture
7. Which of the following utilises fixed logic circuits to understand the instructions and produce control
signals?
a. Data unit
b. Microprogrammed control unit
c. Hardwired control unit
d. Multiple bus organisation
8. Which of the following is not a disadvantage of hardwired control unit?
D
a. It usage the combinational circuits to create signals.
b. Changes to control signals are challenging since they need rearranging wires in the hardware
E
circuit.
V
c. It is difficult to evaluate and fix flaws in the initial design.
d. It is too expensive.
R
9. In which of the following the control signals are generated by using a software technique?
E
a. Data unit
b. Microprogrammed control unit
S
c. Hardwired control unit
d. Multiple bus organisation
E
R
10. The CPU keeps a list of the address of the next instruction to be fetched and processed using
the __________.
a. MAR b. MBR
T
c. PC d. IR
H
1. The processing unit of a computer is responsible for executing machine-language instructions and
coordinating the actions of other units. Discuss.
2. Discuss the execution process of an instruction.
R
4. A hardwired control is a method of generating control signals with the help of finite state machines
(FSM). Discuss.
P
5. In a microprogrammed control unit, the control signals are generated by using a software technique.
O
Discuss.
C
Q. No. Answer
1. a. Processing unit
14
UNIT 07: Basic Processing Unit JGI JAIN
DEEMED-TO-BE UNIVERSITY
Q. No. Answer
2. b. MBR
3. b. CPU
4. b. Machine language
5. d. Programme
6. a. MAR
D
7. c. Hardwired control unit
E
8. a. It usage the combinational circuits to create signals.
V
9. b. Microprogrammed control unit
R
10. c. PC
E
B. Hints for Essay Type Questions
1. The central processing unit is another name for the processing unit. The CPU performs all sorts
S
of data processing activities as well as all of the computer’s key tasks. It enables input and output
devices to communicate with one another and carry out their functions.
E
Refer to Section Introduction
R
2. A programme is a collection of instructions stored in the memory unit of a computer. For each
instruction, the CPU goes through a cycle of execution. Each instruction cycle in a simple computer
consists of the following phases:
T
3. During a clock cycle in a single-bus organisation, just one data item can be transmitted across the
bus. Commercial processors have numerous internal pathways that allow multiple transfers to take
Y
4. Hardwired control unit utilises fixed logic circuits to understand the instructions and produce
O
control signals. These control signals are determined with the help of the contents of the control
step counter, instruction register, and condition code flags as well as the external input signals, such
C
15
JGI JAINDEEMED-TO-BE UNIVERSITY
Computer Organization and Architecture
zz http://www.cs.binghamton.edu/~reckert/hardwire3new.html
zz https://www.google.co.in/books/edition/COMPUTER_ORGANIZATION_AND_DESIGN/5LNwVRpfkR
gC?hl=en&gbpv=1&dq=Hardwired+control+unit&pg=PA332&printsec=frontcover
D
zz Discuss the concept of the processing unit with your classmates. Also, prepare a presentation on the
E
hardwired control unit and microprogrammed control unit.
V
R
E
S
E
R
T
H
IG
R
Y
P
O
C
16
D
E
UNIT
08
V
Pipelining
R
E
S
Names of Sub-Units
E
R
Introduction to Pipelining, Basic Concepts of Pipelining, Data Hazards, Advantages and Disadvantages
of Pipelining, Instruction Hazards, Influence on Instruction Sets
T
Overview
H
This unit begins by discussing the concept of pipelining. Next, the unit explains the basic concepts
of pipelining. Further, the unit describes the data hazards and advantages and disadvantages of
pipelining. Towards the end, the unit discusses the instruction hazards and influence on instruction
IG
sets.
Learning Objectives
R
Learning Outcomes
D
aa Assess the basic concepts of pipelining
aa Evaluate the importance of data hazards and advantages and disadvantages of pipelining
E
aa Determine the significance of instruction hazards
V
aa Explore the importance of influence on instruction sets
R
Pre-Unit Preparatory Material
E
aa https://www.nitsri.ac.in/Department/Electronics%20&%20Communication%20Engineering/
COA-_Unit_3.pdf
S
8.1 Introduction
E
Pipelining is the process of gathering instructions from the processor through a pipeline. It provides for
the systematic storage and execution of instructions. Pipeline processing is another name for it.
R
Pipelining is a method in which numerous instructions are stacked on top of each other during execution.
The pipeline is divided into stages, each of which is connected to the next to for a pipe-like structure.
Instructions come in from one end and leave from the other.
T
The basic concept of pipelining is that it increases the performance of system to create some simple
changes in design in the hardware. It enables implementation of parallelism at the hardware level.
The execution of a specific instruction is not decrease by a pipelining but it reduces the overall
IG
implementation of time which is essential for a program. While designing a pipeline a no. of instructions
are implemented at a same time, in an overlapped manner.
Pipelining is required to decrease the time and to enhance the performance of CPU within the system
by just making some changes or rearranging the element in the hardware, as no further elements are
Y
added to it. The combinational circuit manipulates the data that is stored in the register. The output of
the combinational circuit is applied to the input register of the following segment as shown in Figure 1:
P
Clock
O
Input Output
C
S1 R1 S2 R2 S3 R3
2
UNIT 08: Pipelining JGI JAIN
DEEMED-TO-BE UNIVERSITY
A pipeline system is similar to a factory’s current assembly line configuration. For example, in the
automobile sector, massive assembly lines are built up with robotic arms performing specific tasks at
each stage, and the car then moves on to the next arm.
D
A pipelining processor is categorised by Handler and Ramamurthy in 1977, into two ways according to
their functionality are as follows:
E
zz Arithmetic pipelining
Instruction pipelining
V
zz
Arithmetic Pipelining
R
An arithmetic pipeline can be found in almost every computer. It is considered to divides an arithmetic
problem into several sub-problems to accomplish high-speed floating point operations such as addition,
E
multiplication, division and other calculations. The flowchart of arithmetic pipeline for floating point
additions is shown in Figure 2:
S
Exponents Mantissa
a b a b
R
E R
R
Compare exponent Difference
Segment 1
by substraction
T
R
H
R
R
Add or Substract
Mantissas
Y
Segment 3 R R
P
O
R R
3
JGI JAIN
DEEMED-TO-BE UNIVERSITY
Computer Organization and Architecture
Instruction Pipelining
Instruction pipeline is used in a digital computer has a multiple instruction to carry out operations such
as decoding, fetching and implementation of instructions.
Generally, the instructions have to be processed in sequential steps are as follows:
D
zz Fetching the instruction
zz Decoding the instruction
E
zz Compute the effective address
Fetching the operand
V
zz
zz Implementation of instruction
R
zz Store the result
E
S
Segment 1: Fetch Instruction
from Memory
E
R
Decode Instruction and
Segment 2:
Calculate Effectie Adress
T
Yes
Branch?
H
No
Fetch Operant
IG
from Memory
Execute Instruction
R
Yes
Interrupt
Y
Interrupt?
Handling
P
No
Update PC
O
Empty Pipe
C
4
UNIT 08: Pipelining JGI JAIN
DEEMED-TO-BE UNIVERSITY
8.3 Data Hazards
When there is a circumstance in a data hazard in which either the source or the target operands of
an instruction do not exist during the predictable time in the pipeline. As a result, because of some
operation has to be late.
D
Pipeline risks occur when a pipeline is forced to halt for whatever reason. Four pipelining risks are as
follows:
E
zz Data dependency
zz Memory delay
V
zz Branch delay
zz Resource limitation
R
8.3.1 Data Dependency
E
Consider the following two instructions and their pipeline execution as shown in Figure 4:
S
10 11 12 13 14 15 16 17
The result of the Add instruction is placed in the register R2 in the figure above, and we know that the
final result is stored at the conclusion of the instruction execution, which occurs at clock cycle t4.
IG
However, the value of the register R2 at cycle t3 is required by the Sub instruction. As a result, the Sub
instruction must pause for two clock cycles. It will produce an erroneous result if it does not stall. Data
dependence occurs when one instruction relies on another for data.
R
When an instruction or piece of data is needed, it is initially looked for in the cache memory; if it is
not found, it is considered a cache miss. The data is then searched in memory, which can take up to 10
P
cycles. As a result, the pipeline must stall for that many cycles, posing a memory latency risk. All future
instructions are delayed as a result of the cache miss.
O
Assume the four instructions are pipelined in the following order: I1, I2, I3, and I4. The Ik is the target
instruction for instruction I1, which is a branch instruction. At the fourth step of cycle t3, processing
begins, with instruction I1 being fetched, decoded, and the target address being calculated.
5
JGI JAIN
DEEMED-TO-BE UNIVERSITY
Computer Organization and Architecture
However, before the target branch address is determined, the instructions I2, I3, and I4 are fetched in
cycles 1, 2, and 3. Because Ik must be performed before I1, the instructions I2, I3, and I4 must be ignored
because I1 is revealed to be a branch instruction. As a result, this three-cycle delay (1, 2, 3) is a branch
delay as shown in Figure 5:
D
10 11 12 13 14 15 16 17 18
E
Instruction 1 IF ID OF IE
V
Instruction 2 IF ID OF
R
Instruction 3 IF ID
E
Instruction 4 IF
S
Instruction K IF ID OF IE OS
E
Branch Pentalty
R
Figure 5: Branch Delay
8.3.4 Resource Limitation
T
If two instructions request access to the same resource within the same clock cycle, one of the instructions
must stall and allow the other to use the resource. The halting is caused by a lack of resources. It can,
H
zz
The process of organizing the components of the CPU such as hardware elements to increase the overall
performance, in the pipelined processor the implementation of more than one instruction takes place. It
P
8.4.1 Advantages of Pipelining
The following points describe the advantages of pipelining are as follows:
C
zz The processor’s cycle time is reduced, and the instruction count is increased the throughput.
zz The CPU Arithmetic logic unit may be constructed faster if pipelining is employed. However, it is
more complicated.
6
UNIT 08: Pipelining JGI JAIN
DEEMED-TO-BE UNIVERSITY
zz In principle, pipelining improves performance by a factor of two over a core that isn’t piped.
zz The clock frequency of pipelined CPUs is higher than that of non-pipelined CPUs.
8.4.2 Disadvantages of Pipelining
D
Pipelining has a number of downsides, however CPU and compiler designers have devised a number of
ways to mitigate the majority of them; the following is a list of the most frequent difficulties.
E
8.5 Instruction Hazards
V
In the context of conflicts induced by hardware resource constraints (structural hazards) and
dependencies between instructions, scoreboards are designed to regulate the flow of data between
registers and various arithmetic units (data hazards). Flow dependencies (Read-After-Write), output
R
dependencies (Write-After-Write), and structural hazards are the four types of data risks (Write-After-
Read).
E
8.5.1 Read-After-Write (RAW) Hazards
S
When one instruction necessitates the outcome of a previously issued but unfinished instruction, a
Read-After-Write danger arises. An example of RAW hazards is shown in Figure 6:
E Write
R
Inst 1 1 2 3 4 5
Inst 2 1 2 2 3 5
T
Read
H
results to R6. Despite the fact that this last example is unlikely to occur in everyday programming, it
must provide the proper result. Without correct interlocks, the add operation would finish first, and the
result in R6 would be overwritten by the multiplication process. An example of WAW hazard is as follows:
Y
R6 = R1 * R2
R6 = R4 + R5
P
When an instruction attempts to write to a register that has not yet been read by a previously issued
but unfinished instruction, a Write-After-Read hazard arises. Because of the way instructions were sent
C
to the arithmetic units, this danger cannot arise in most systems, but it may in the CDC6600. The WAR
example is based on the CDC 6600, which used X registers to store floating-point data is as follows:
X3 =X1/X2
7
JGI JAIN DEEMED-TO-BE UNIVERSITY
Computer Organization and Architecture
X5 = X4 * X3
X4 = X + X6
In the third instruction, the WAR danger is on register X4. It occurs because instructions that are stalled
due to a RAW hazard are nevertheless sent to their arithmetic unit, where they await their operands. As
D
a result, the second instruction can be given immediately after the first, but it is held up in the add unit
as it waits for its operands because to the WAR hazard on X3, which cannot be read until the division unit
finishes its operation. The third command can also be given directly after the second, and it will begin to
E
work. However, because the floating-point add operation takes considerably less time than division, the
add unit is ready to save its output.
V
8.5.4 Structural Hazards
R
A structure hazard is arises when more than two instructions exist in the pipeline and require the same
resource. In this case when, these two instructions already exist in pipeline then they must be executed
E
in sequence rather than parallel, it is also stated as resource hazards. An example of structure hazard
is shown in Figure 7:
S
Clock Cycle Number
Insir 1 2 3 4 5 6 7 8 9
Load
Insir 1
IF ID
IF
EX
ID
MEM
EX E WB
MEM WB
R
Insir 2 IF ID EX MEM WB
In a computer, instruction set is a part which relates to the programming as it also considered as
machine language. The processor work when the instruction set provides commands to tell it what it
needs to do. It contains the addressing modes, instructions, registers, memory architecture, external
IG
input instructions into a sequence which is executed by distinct processor units with distinct parts of
instructions.
Y
Conclusion 8.7 Conclusion
P
zz Pipelining is the process of gathering instructions from the processor through a pipeline
O
zz Data hazard in which either the source or the target operands of an instruction do not exist during
the predictable time in the pipeline.
zz A Write-After-Write (WAR) hazards is arises, when an instruction attempts to write its output in the
same register.
8
UNIT 08: Pipelining JGI JAIN
DEEMED-TO-BE UNIVERSITY
zz A WAR hazard arises if an instruction attempts to write to a register that has not yet been read by
a previously issued.
zz A structure hazard is arises when more than two instructions exist in the pipeline.
zz Instruction set is a part which relates to the programming as it also considered as machine language.
D
8.8 Glossary
E
zz Pipelining: It is the process of gathering instructions from the processor through a pipeline
V
zz Arithmetic pipeline: It is used to accomplish high-speed floating point operations
zz Data hazard: In which either the source or the target operands of an instruction do not exist during
R
the predictable time in the pipeline
zz Write-After-Write (WAR) hazards: It is arises, when an instruction attempts to write its output in
the same register
E
zz WAR hazard: It arises if an instruction attempts to write to a register that has not yet been read by
a previously issued
S
zz Structure hazard: It is arises when more than two instructions exist in the pipeline
zz
language
E
Instruction set: It is a part which relates to the programming as it also considered as machine
R
8.9 Self-Assessment Questions
T
execution.
a. Pipelining
IG
b. Instruction set
c. Data hazards
d. Structural hazards
R
b. 1992
P
c. 1977
d. 1972
O
3. __________ is used in a digital computer has a multiple instruction to carry out operations such as
decoding or fetching.
C
a. Data hazard
b. Instruction pipeline
c. Arithmetic pipeline
d. Pipelining
9
JGI JAIN
DEEMED-TO-BE UNIVERSITY
Computer Organization and Architecture
4. A __________ hazards is arises, when an instruction attempts to write its output in the same register
before its execution completed.
a. WAR
b. RAW
D
c. WAW
d. Structural
E
5. Which among the following hazard is arises when more than two instructions exist in the pipeline
and require the same resource?
V
a. Data hazards
b. Structural hazards
R
c. Instruction hazards
d. RAW hazards
E
6. In pipelining __________ is a method for executing instruction parallelism with a single processor.
a. Data hazard
S
b. Instruction hazard
c. Instruction set
d. Resource limitation
E
R
7. It is considered to divides the problem into several sub-problems to accomplish high-speed floating
point operations, it is called as __________.
a. Arithmetic pipeline
T
b. Instruction pipeline
c. Pipelining
H
d. Instruction set
8. How many Pipeline risks occur when a pipeline is forced to halt?
IG
a. 1
b. 2
c. 3
R
d. 4
Y
9. If two instructions request access to the same resource within the same clock cycle, then it is a
condition of __________.
P
a. Pipelining
b. Data hazards
O
c. Instruction set
d. Resource limitation
C
10. When an instruction attempts to write to a register that has not yet been read by a previously issued
but unfinished instruction, it is called as __________.
a. RAW hazard
b. Structural hazard
10
UNIT 08: Pipelining JGI JAIN
DEEMED-TO-BE UNIVERSITY
c. WAW hazard
d. WAR hazard
D
execution. What do you understand by the term pipelining?
2. A pipeline system is similar to a factory’s current assembly line configuration. Explain the basic
E
concept of pipelining and also discuss its types.
3. Pipeline risks occur when a pipeline is forced to halt. Describe the concept of data hazards and also
V
discuss different types of risk arises during pipelining.
4. In the pipelined processor, the implementation of more than one instruction takes place. Determine
R
the advantages and disadvantages of pipelining.
5. In pipelining, an instruction set is a method for executing instruction parallelism with a single
E
processor. Define the concept of instruction set in pipelining.
S
8.10 Answers AND HINTS FOR Self-Assessment Questions
2. c. 1977
3. b. Instruction pipeline
H
4. c. WAW
5. b. Structural hazards
IG
6. c. Instruction set
7. a. Arithmetic pipeline
R
8. d. 4
9. d. Resource limitation
Y
for the systematic storage and execution of instructions. Refer to Section Introduction
2. The basic concept of pipelining is that it increases the performance of system to create some simple
C
11
JGI JAINDEEMED-TO-BE UNIVERSITY
Computer Organization and Architecture
4. The process of organizing the components of the CPU such as hardware elements to increase the
overall performance. Refer to Section Advantages and Disadvantages of Pipelining
5. In a computer, instruction set is a part which relates to the programming as it also considered as
machine language. Refer to Section Influences on Instruction Sets
D
@ 8.11 Post-Unit Reading Material
E
zz https://www.motivationbank.in/2021/09/computer-organization-and-architecture.html
zz https://www.javatpoint.com/computer-organization-and-architecture-tutorial
V
8.12 Topics for Discussion Forums
R
zz Discuss the concept of pipelining, its types and data hazards with your friends and classmates. Also,
E
discuss the pipelining instruction hazards with real world examples.
S
E
R
T
H
IG
R
Y
P
O
C
12
UNIT
09
D
E
Parallel Computer Models
V
R
E
S
Names of Sub-Units
E
Introduction to Parallel Computer Models, The State of Computing, Multiprocessors, Multicomputer,
R
PRAM Model, VLSI Model
T
Overview
H
This unit begins by discussing the concept of parallel computer models. Next, the unit discusses the
state of computing. Further, the unit explains the multiprocessors and multicomputer. Towards the
IG
Learning Objectives
Y
aa
Learning Outcomes
D
aa Explore the importance of VLSI model
E
Pre-Unit Preparatory Material
V
aa https://www.philadelphia.edu.jo/academics/kaubaidy/uploads/ACA-Lect18.pdf
R
9.1 Introduction
E
Parallel processing has emerged as a useful technique in modern computers to fulfil the demand for
S
better speed, cheaper costs and more accurate outcomes in real-world applications. Due to the technique
of multiprogramming, multiprocessing or multicomputing, concurrent occurrences are frequent
E
in today’s computers. Software packages on modern computers are strong and comprehensive. To
study the evolution of computer performance, we must first comprehend the fundamental evolution
R
of hardware and software. Milestones in computer development mechanical and electromechanical
portions are the two primary stages of computer development. After the development of electrical
components, modern computers emerged. The operating components of mechanical computers were
T
replaced with high mobility electrons in electronic computers. Electric signals, which move at almost
the speed of light, have replaced mechanical gears and levers in data transmission.
H
Data locality and data communication are also linked to parallel computing. The approach of arranging
all resources to optimise speed and programmability within the constraints set by technology and cost
at any particular moment is known as parallel computer architecture. By utilising an increasing number
R
of processors, parallel computer architecture offers a new dimension to the development of computer
systems. In theory, performance gained via the use of a large number of processors is superior to the
Y
performance of a single processor at any given moment. Components of today’s computer hardware,
instruction sets, application programs, system software and user interface make up a contemporary
P
2
UNIT 09: Parallel Computer Models JGI JAIN
DEEMED-TO-BE UNIVERSITY
Numerical computing, logical reasoning and transaction processing are the three types of computing
issues. Some complicated issues may necessitate the use of all three processing techniques. Some of the
issues in parallel architecture are as follows:
zz Evolution of computer architecture: Computer architecture has undergone revolutionary
transformations over the previous four decades computer architecture has undergone revolutionary
changes over the last four decades. From von Neumann architecture to multicomputer and
multiprocessors, we have come a long way.
zz Performance of a computer system: The efficiency of a computer system performance is determined
by both machine capabilities and software behaviour. Better hardware technology, innovative
D
architectural features and effective resource management may all help machines perform better.
Because it is reliant on the application and run-time variables, programe behaviour is unpredictable.
E
V
9.2 The State of Computing
Modern computers have powerful hardware that is controlled by a large number of software programs.
R
To assess the state of computing, we must first examine historical milestones in computer development.
E
9.2.1 Computer Generations
S
Electronic computers have gone through several phases of advancement during the last five decades.
The first three generations each lasted around ten years. The fourth generation spanned a 15-year
E
period. With the usage of CPUs and memory devices with more than one million transistors on a single
silicon chip, we have just reached the fifth generation. Each iteration introduces additional hardware
R
and software features, as shows in the Table 1. The majority of traits introduced in previous generations
have been handed down to subsequent generations is shown in Table 1:
T
Second (1955-64) Discrete transistors and core Shall use with compiler, IBM 7090, CDC 1604,
memories, floating point subroutine libraries, batch Univac LARC
Y
Third (1965-74) Integrated circuits (SSI- Multiprogramming & time IBM 360/370, CDC 6600,
MSI), microprogramming, sharing OS, multi user TI- ASC, PDP-8 VAX 9000,
O
pipelining, cache & lookahead applications Gay XMP, IBM 3090 BBN
processors TC 2000
C
Fourth (1975-90) LSI/VLSI & semiconductor Multiprocessor OS, Fujitsu VPP 500, Gay/MPP,
memory, multiprocessors, languages, compilers & TMC/CM-5, Intel Paragon
vector supercomputers, multi environment for parallel
computers processing
Fifth (1991 present) ULSI/VHSIC processors, Massively parallel
memory & switches, high processing, grand challenge
density packaging, scalable applications, heterogeneous
architectures process
3
JGI JAIN
DEEMED-TO-BE UNIVERSITY
Computer Organization and Architecture
In other words, the latest generation computers have inherited all the bad ones found in previous
generations.
9.3 Multiprocessors
A multiprocessor is a part of computer system that has two or more central processing units (CPUs)
share with each one the common main memory, bus as well as the peripherals. It helps in simultaneous
processing of programs. The key objective of using a multiprocessor is to enhance the system’s
implementation speed, with another objective of being fault tolerance and application matching.
It is considered as a means to enhance computing speeds, performance as well as to give enhanced
D
availability and reliability. A multiprocessor system is shown in Figure 2:
E
V
R
E
S
E
Figure 2: Multiprocessor
A multiprocessor are of two types are as follows:
R
zz Shared memory multiprocessor
zz Distributed memory multiprocessor
T
H
multiprocessor. Three most common shared memory multiprocessors models are as follows:
The physical memory is shared evenly by all processors in this approach. All processors have the same
Y
amount of time to access all memory words. A private cache memory may be available to each CPU. The
same rule applies to peripheral devices. A symmetric multiprocessor is one in which all of the processors
P
have equal access to all of the peripheral devices. An asymmetric multiprocessor is one in which only
one or a few processors have access to external devices, as shown in Figure 3:
O
C
4
UNIT 09: Parallel Computer Models JGI JAIN
DEEMED-TO-BE UNIVERSITY
D
E
V
R
E
Figure 4: Non-Uniform Memory Access (NUMA)
S
Cache Only Memory Architecture (COMA)
E
A particular instance of the NUMA model is the COMA model. All distributed main memories are
transformed to cache memories in this model, as shown in Figure 5:
R
T
H
IG
R
Y
It is a multiprocessor in which each and every CPU of a computer system has its own private memory.
In the present state of distributed memory multiprocessor system programming technology it is
C
commonly expected that the well-organised way of manipulating large scale parallelism by means
of the single program multiple data (SPMD) programming model. The programming approach is to
involve in defining the most suitable distribution of the data space.
9.4 Multicomputer
A multicomputer system is considered as a message passing system that is connected to other computer
system to solve the problem. Each and every processor in a multicomputer has its own private memory
and it is available by that specific processor, the communication takes place with each other through an
5
JGI JAIN
DEEMED-TO-BE UNIVERSITY
Computer Organization and Architecture
interconnection network. As it is a message passing system it is probable to distribute the task among
the processor to accomplish the task. A multicomputer is shown in Figure 6:
D
E
V
R
E
Figure 6: Multicomputer
S
9.4.1 Distributed Memory Multicomputer
E
A distributed memory multicomputer system is made up of many computers called nodes that are
linked together via a message passing network. Each node functions as a self-contained computer with
R
a CPU, local memory and, in certain cases, I/O devices. All local memories are private in this scenario
and only the local processors have access to them. This is why conventional machines are referred to be
NORMA (no-remote-memory-access) computers, as shown in Figure 7:
T
H
IG
R
Y
P
O
C
6
UNIT 09: Parallel Computer Models JGI JAIN
DEEMED-TO-BE UNIVERSITY
If the decoded instructions, on the other hand, are vector operations, then they will be transmitted to the
vector control unit, as shown in Figure 8:
D
E
V
R
E
S
E
R
Figure 8: Vector Supercomputers
T
If the decoded instructions, on the other hand, are vector operations, they will be transmitted to the
vector control unit. A SIMD computers have an ‘N’ number of processors coupled to a control unit and
IG
each CPU has its own memory unit. An interconnection network connects all of the processors as shown
in Figure 9:
R
Y
P
O
C
7
JGI JAIN
DEEMED-TO-BE UNIVERSITY
Computer Organization and Architecture
D
and Sturgis (1963) in RAM. Fortune and Wyllie (1978) created a Parallel Random Access Machine (PRAM)
model to simulate an idealized parallel computer with no memory access cost and synchronisation.
E
A PRAM model is shown in Figure 10:
V
R
E
S
E
R
T
H
IG
R
Y
A shared memory unit is present in an N-processor PRAM. This shared memory might be shared across
O
concurrently.
Following are the possible memory update operations in PRAM model are as follows:
zz Exclusive Read (ER): It allows only single processor to read from any memory location.
zz Exclusive Write (EW): It allows only a single processor at a time to write into a memory location.
zz Concurrent Read (CR): It allows several processors to read the information from the same memory.
zz Concurrent Write (CW): It allows concurrent write operation in the same memory location.
8
UNIT 09: Parallel Computer Models JGI JAIN
DEEMED-TO-BE UNIVERSITY
D
The chip area (A) of the VLSI chip implementation of an algorithm may be used to compute the space
E
complexity of that algorithm. A.T. puts an upper constraint on the total number of bits processed via
the chip (or I/O) if T is the time (delay) required to run the algorithm. There is a lower bound, f(s), for
V
particular computation is as follows:
A.T.2 >= O (f(s))
R
The design flow of VLSI model is shown in Figure 11:
E
S
E
R
T
H
IG
R
Y
P
O
C
9
JGI JAIN DEEMED-TO-BE UNIVERSITY
Computer Organization and Architecture
zz Parallel processing has emerged as a useful technique in modern computers to fulfil the demand for
better speed.
zz Parallel computer architecture offers a new dimension to the development of computer systems.
zz Electronic computers have gone through several phases of advancement during the last five decades.
zz A multiprocessor is a part of computer system that has two or more central processing units (CPUs).
D
zz A multicomputer system is considered as a message passing system that is connected to other
computer system.
E
zz A distributed memory multicomputer system is made up of many computers called nodes.
V
zz A vector processor is an optional feature of a vector computer and it is coupled to the scalar processor.
A SIMD computers have a ‘N’ number of processors coupled to a control unit.
R
zz
zz VLSI technology enables for the integration of a high number of components on a single chip.
E
zz Parallel Random Access Machin (PRAM)e is a model which is used for the parallel computation
known as parallel algorithm.
S
9.8 Glossary E
R
zz Parallel processing: It has emerged as a useful technique in modern computers to fulfil the demand
for better speed
Parallel computer architecture: It offers a new dimension to the development of computer systems.
T
zz
zz Electronic computers: They have gone through several phases of advancement during the last five
H
decades
Multiprocessor: It is a part of computer system that has two or more central processing units (CPUs).
IG
zz
zz
processor
P
zz SIMD computers: They have an ‘N’ number of processors coupled to a control unit.
zz VLSI model: It is a technology that enables for the integration of a high number of components on
O
a single chip
Uniform Memory Access (UMA): The physical memory is shared evenly by all processors in this
C
zz
approach
zz Parallel Random Access Machine (PRAM): It is a model which is used for the parallel computation
known as parallel algorithm
10
UNIT 09: Parallel Computer Models JGI JAIN
DEEMED-TO-BE UNIVERSITY
D
c. Parallel computer
E
d. Multiprocessor
V
2. A __________ is a part of computer system that has two or more central processing units (CPUs)
share with each one the common main memory.
R
a. PRAM
E
b. Parallel processing
c. Multiprocessor
S
d. Multicomputer
E
3. The physical memory is shared evenly by all processors in this approach, it is a __________.
a. COMA
R
b. UMA
c. NUMA
T
d. Multiprocessor
H
b. Multiprocessing
c. Distribution
R
d. Parallelism
Y
5. Which among the following is considered as a message passing system that is connected to other
computer system to solve the problem?
P
a. Multicomputer
O
b. Multiprocessor
c. Shared memory processor
C
11
JGI JAIN
DEEMED-TO-BE UNIVERSITY
Computer Organization and Architecture
7. Which of these technologies enables for the integration of a high number of components on a single
chip, as well as increased clock speeds?
a. Parallel processing
b. VLSI
c. PRAM
d. Parallel architecture
8. The traditional uniprocessor computers were represented as Random-Access Machines by
Shepherdson and Sturgis in __________.
D
a. 1961
E
b. 1962
V
c. 1963
d. 1964
R
9. In the PRAM model the several processors are connected to a single block of memory is termed
as __________.
E
a. Super memory
S
b. Supercomputer
c. Vector computer
d. Global memory
E
R
10. A distributed memory multicomputer system is made up of many computers called __________.
a. Nodes
T
b. CPU
H
c. Multiprocessor
d. Multicomputer
IG
parallel architecture?
2. Electronic computers have gone through several phases of advancement during the last five
P
decades. Describe the concept of electronic computer and its generations in brief.
O
3. The key objective of using a multiprocessor is to enhance the system’s implementation speed. Explain
the importance of multiprocessor with its types in parallel computers in brief.
C
4. It is considered as a model which is used for the parallel computation known as parallel algorithm.
Describe PRAM model in detail.
5. VLSI chips are used to build processor arrays, memory arrays and large-scale switching networks in
parallel computers. Determine the concept of VLSI model in parallel computers.
12
UNIT 09: Parallel Computer Models JGI JAIN
DEEMED-TO-BE UNIVERSITY
Q. No. Answer
1. a. Parallel architecture
2. c. Multiprocessor
D
3. b. UMA
E
4. d. Parallelism
5. a. Multicomputer
V
6. d. ER
R
7. b. VLSI
8. c. 1963
E
9. d. Global memory
S
10. a. Nodes
2. Modern computers have powerful hardware that is controlled by a large number of software
H
programs. To assess the state of computing, we must first examine historical milestones.
Refer to Section The State of Computing
IG
3. A multiprocessor is a part of computer system that has two or more central processing units
(CPUs) share with each one the common main memory, bus as well as the peripherals. It helps in
simultaneous processing of programs.
R
4. Parallel Random Access Machine is also termed as PRAM; it is a model which is used for the parallel
computation known as parallel algorithm. It is the extension of RAM model for sequential algorithm.
P
5. The performance and functionality of a computer system have vastly improved during the previous
50 years. The use of Very Large Scale Integration (VLSI) technology made this possible.
C
zz https://www.javatpoint.com/what-is-parallel-computing
zz https://rc.fas.harvard.edu/wp-content/uploads/2016/03/Intro_Parallel_Computing.pdf
13
JGI JAIN
DEEMED-TO-BE UNIVERSITY
Computer Organization and Architecture
zz Discuss the concept of parallel computers and parallel computer architecture with your friends and
classmates. Also, discuss about the concept of parallel computer model such as PRAM and VLSI
model, Multiprocessor and multicomputer with real world examples.
D
E
V
R
E
S
E
R
T
H
IG
R
Y
P
O
C
14
UNIT
10
D
E
Program and Network Properties
V
R
E
S
Names of Sub-Units
E
Introduction, Conditions of Parallelism, Program Partitioning and Scheduling, Program Flow
R
Mechanisms, System Interconnect Architectures
T
Overview
H
This unit begins by discussing the concept of program and network properties. Next, the unit discusses
IG
the conditions of parallelism and program partitioning and scheduling. Further, the unit explains the
program flow mechanisms. Towards the end, the unit discusses the system interconnect architectures.
R
Learning Objectives
Y
P
Learning Outcomes
D
aa Explore the importance of system interconnect architectures
E
V
Pre-Unit Preparatory Material
R
aa https://www.inf.ed.ac.uk/teaching/courses/pa/Notes/lecture02-types.pdf
E
10.1 Introduction
S
Parallelism is a key idea in today’s computers, and the usage of several functional units within the CPU
E
is an example of parallelism. Because early computers only had one arithmetic and functional unit, only
one operation could be performed at a time. As a result, the ALU function may be spread among many
R
functional units that run in parallel.
H.T. Kung has identified the need to advance in three areas: parallel computing calculation models,
T
parallel architecture inter process communication, and system integration for integrating parallel
systems into a general computing environment.
H
10.2 Conditions of Parallelism
IG
zz Bernstein’s condition
Y
zz Software parallelism
zz Hardware parallelism
P
O
requires that each segment should be independent other segment. Dependencies in various segment
of a programme may be in various form such as resource dependency, control depending and data
depending. Dependence graph is used to describe the relation. Program statements are represented
by nodes and the directed edge with different labels shows the ordered relation among the statements.
After analysing the dependency structure, it may be seen where parallelization and vectorization
opportunities exist. The relation between statements is shown by data dependences.
2
UNIT 10: Program and Network Properties JGI JAIN
DEEMED-TO-BE UNIVERSITY
D
The subscript of variable is itself subscribed.
The subscript does not have the loop index variable.
E
Subscript is non-linear.
V
zz Output dependence: Two statements are output dependence if they produce the same output
variable.
R
zz Flow dependence: If an expression path occurs from ST1 to ST2 and at least one of ST’s outputs feeds
E
in an input to ST2, the statement ST2 is flow dependent.
S
10.2.2 Bernstein’s Condition
E
Bernstein identified a set of criteria that determine whether two processes may run concurrently. A
programme in progress is referred to as a process. Process is a living thing. It is, in fact, a stumbling
R
block of a programme fragment specified at multiple stages of processing. Ii is the input set of process
Pi, which is a collection of all input variables required to complete the process; similarly, the output set
of process Pi is a collection of all output variables created once all processes Pi have been completed. The
T
operands are retrieved from memory or register and are called input variables. The result is placed in
working registers or memory locations as output variables:
H
The two processes P1 & P2 can execute in parallel & are directed by P1/P2 if & only if they are independent
and do not create confusing results.
Y
Control and data dependencies of programmes define software dependence. The programme profile or
O
programme flow graph indicates the degree of parallelism. Algorithm, programming style, and compiler
optimization all influence software parallelism. The pattern of concurrently executed operations is
C
shown in programme flow graphs. During the execution of a programme, the amount of parallelism
changes.
3
JGI JAIN
DEEMED-TO-BE UNIVERSITY
Computer Organization and Architecture
D
A programme is partitioned into two (or more) portions, each signifying a subgroup of the original.
The partitioning must be complete in such a way that the bugs in the partition are very prospective to
E
represent bugs in the original programme, as our goal is to examine programme partitions to uncover
true defects that were not identified through whole-program analysis. Partitioning is based on the idea
V
of a program’s CFG is a static representation of the programme that depicts all control flow options.
Each node in the graph represents a basic block, which is a chunk of code that runs in a straight line
R
with no jumps or targets. There are numerous options which are a chunk of code that runs in a straight
line with no jumps or targets. There are numerous options.
E
S
10.4 Program Flow Mechanisms
Traditional computers rely on a control flow system in which the sequence in which programmes
E
are executed is specified directly in the user programme. At the fine grain instruction level, data flow
R
computers have a high degree of parallelism. Reduction computers are built on a demand-driven
technique that involves which commences operation based on the demand for its result by other
computations.
T
Shared memory is used by Control Flow computers to store programme instructions and data items.
IG
Many instructions change variables in shared memory. Because memory is shared, the execution of one
instruction may have unintended consequences for other instructions. The negative effects of parallel
processing preclude parallel processing in many instances. In reality, due to the employment of control-
R
In a data flow computer, rather than being directed by a programme counter, the execution of an
instruction is governed by data availability. Any instruction should theoretically be ready to execute
O
whenever operands become available. In a data-driven programme, the instructions are not in any way
arranged. Data is directly kept inside instructions rather to being saved in shared memory. The results
C
of computations are transmitted directly between instructions. The data created by an instruction will
be copied and transmitted to any instructions that require it.
There is no shared memory, no programme counter, and no control sequencer in this data-driven
architecture. It does, however, necessitate a unique mechanism for detecting data availability, matching
data tokens with required instructions, and enabling the chain reaction of asynchronous instructions
execution.
4
UNIT 10: Program and Network Properties JGI JAIN
DEEMED-TO-BE UNIVERSITY
There are a slew of new data flow computer projects in the horizon. Arvind and his MIT colleagues
created a tagged token architecture for building data flow computers.
An n x n routing network connects then processing elements (PEs) in the global architecture. In all n
PEs, the whole system supports pipelined data flow operations. The pipelined routing network is used to
communicate between PEs.
Within each PE, the machine has a low-level token matching mechanism that only sends out instructions
for which the input data is already accessible. Each datum is labelled with the address of the instruction
it belongs to as well as the context in which it is being performed. In programme memory, instructions
D
are saved. A local route is used to bring tagged tokens into the PE. The tokens can also be transferred
over the routing network to the other PE. All internal circulation processes are pipelined so that there
E
are no bottlenecks.
In a dataflow computer, the instruction address replaces the programme counter, and in a control flow
V
computer, the context identifier replaces the frame base register. It is the responsibility of the machine
to match data with the same tag to the required instructions. New data will be created as a result, along
R
with a new tag identifying the successor instructions. As a result, each instruction is a synchronization
operation. New tokens are created and distributed along the PE pipeline for sense, or through the global
E
route, which is also pipelined, to other PEs.
S
10.5 System Interconnect Architectures
E
Modern information system operate federal agencies, like other types of organisations, rarely function
R
in isolation, and they usually integrate two or more systems to exchange data or share information
resources. Any direct link between information systems is referred to as a system interconnection;
information system owners must describe all system interconnections in their security plans and select
T
A network’s topology might be static or dynamic. There are built static networks that consist of point-
P
to-point direct connections that do not change during the execution process. Programs that require
communication are able to communicate with each other via switched channels in dynamic networks.
O
When subsystems of a central computer system or numerous computing nodes are connected using
static networks, the link is permanent. In the shared memory multiprocessors, the dynamic networks
C
such as buses, multistage networks, are utilized frequently. SIMD computers have also used both types of
networks to route data between PEs. Most networks consist of nodes connected by directed or undirected
edges, which are represented by a graph. The network size is the number of nodes in the graph.
5
JGI JAIN DEEMED-TO-BE UNIVERSITY
Computer Organization and Architecture
degree for unidirectional channels. In this case, the node degree is calculated by adding the two degrees
together. There are a number of I/O ports required per node as well as a cost for each port. In order to
decrease cost, the node degree should be kept constant. For scalable systems, a constant node degree is
desirable.
As a network grows, its diameter D increases, as does the shortest path between any two nodes. It is
based on the number of links that have been followed. It shows the maximum number of hops between
any two nodes, offering a measure of the network’s communication capacity. A small network diameter
is desirable from a communication standpoint.
D
Bisection Width b is the minimal number of edges along channel bisection when a network is split into
two identical halves. When it comes to communication networks, each edge corresponds to one w-bit
E
channel. The bisection width of a wire is thus B = bw. The wire density of a network is represented by
parameter B in this equation. w = B/b when B is constant. A network’s bisect ion width can be used to
V
determine the maximum communication band width. Bisection width should limit rest cross sections.
R
10.5.3 Data Routing Functions
E
Data interchange between PEs is carried out via data routing networks. It is possible to have a static
S
or dynamic data routing network. Messages are used to route data in multicomputer networks. In
addition, routing networks improve system performance by reducing the amount of time necessary for
E
data interchange. Commonly used data routing functions include shifting, rotation, permutations and
broadcasting.
R
Conclusion 10.6 Conclusion
T
zz Parallelism is a key idea in today’s computers, and the usage of several functional units within the
H
zz In a data flow computer, rather than being directed by a programme counter, the execution of an
instruction is governed by data availability.
Y
zz Bernstein identified a set of criteria that determine whether two processes may run concurrently
P
zz
zz A programme is partitioned into two (or more) portions, each signifying a subgroup of the original.
C
10.7 Glossary
zz Parallelism: It is a key idea in today’s computers, and the usage of several functional units within
the CPU is an example of parallelism
zz Hardware: Its multiplicity and machine hardware define parallelism
6
UNIT 10: Program and Network Properties JGI JAIN
DEEMED-TO-BE UNIVERSITY
D
zz Network: The network topology might be static or dynamic
E
10.8 Self-Assessment Questions
V
A. Multiple Choice Questions
R
1. Which among the following type of data dependencies produce the same output variable?
E
a. Output dependence
S
b. Unknown dependence
c. Flow dependence E
d. Anti-dependence
R
2. __________ identified a set of criteria that determine whether two processes may run concurrently.
a. Data and resource dependencies
T
b. Bernstein’s condition
H
c. Software parallelism
d. Hardware parallelism
IG
3. Which of these is used by control flow computers to store programme instructions and data items?
a. Compiler
R
b. Algorithms
c. Shared memory
Y
d. Network
P
a. CFG
b. Shared memory
C
7
JGI JAIN
DEEMED-TO-BE UNIVERSITY
Computer Organization and Architecture
c. Program partitioning
d. System interconnection
6. If the network grows the diameter between any two nodes will be
a. Decreases
b. Slightly decrease
c. Increases
d. No change
D
7. The number of channels out of a node is called __________.
E
a. In degree
b. Out degree
V
c. Channels
R
d. Edges
E
8. Which among the following is commonly used in shifting, rotation, permutations and broadcasting?
a. Network properties
S
b. Node degree and network diameter
c. Parallelism
E
R
d. Data routing functions
9. Which of these displays the resource usage patterns of operations that are being executed at the
T
same time?
a. Scheduling
H
b. Program partitioning
IG
c. Software parallelism
d. Hardware parallelism
R
10. The __________ is retrieved from memory or registers and is called input variables.
a. Operands
Y
b. CPU
P
c. ALU
d. Memory
O
8
UNIT 10: Program and Network Properties JGI JAIN
DEEMED-TO-BE UNIVERSITY
3. In a dataflow computer, the instruction address replaces the programme counter. Explain the
importance of data flow computer.
4. Programs that require communication are able to communicate with each other via switched
channels in dynamic networks. Describe network properties briefly.
5. What do you understand by parallelism?
D
E
A. Answers to Multiple Choice Questions
V
Q. No. Answer
1. a. Output dependence
R
2. b. Bernstein’s condition
E
3. c. Shared memory
S
4. d. Processing elements (PEs)
5. a. Parallelism E
6. c. Increases
R
7. b. Out degree
8. d. Data routing functions
T
9. d. Hardware parallelism
H
10. a. Operands
IG
2. A programme is partitioned into two (or more) portions, each signifying a subgroup of the original.
P
9
JGI JAINDEEMED-TO-BE UNIVERSITY
Computer Organization and Architecture
zz https://www.nvidia.com/content/cudazone/cudau/courses/ucdavis/lectures/ilp1.pdf
zz https://examradar.com/memory-management/
D
zz Discuss with your friends and classmates about the concept of program and network properties,
parallelism conditions and program partitioning. Also, discuss about the scheduling and system
E
interconnection architectures where they used.
V
R
E
S
E
R
T
H
IG
R
Y
P
O
C
10
UNIT
11
D
E
Principles of Scalable Performance
V
R
E
S
Names of Sub-Units
E
Performance Metrics and Measures, Parallel Processing Applications, Speedup Performance Laws
R
Overview
T
H
This unit begins by discussing the concept of principles of scalable performance. Next, the unit
describes the performance metrics and measures. Further, the unit explains the parallel processing
IG
applications. Towards the end, the unit discusses the speedup performance laws.
R
Learning Objectives
Y
Learning Outcomes
D
aa Describe the concept of Amdahl’s law
E
V
Pre-Unit Preparatory Material
R
aa http://www.nitjsr.ac.in/course_assignment/CS16CS601INTRODUCTION%20TO%20PARALLEL%20
COMPUTING.pdf
E
11.1 Introduction
S
Scalability, at its most basic level, refers to the ability to accomplish more of something. This might include
E
managing more data, responding to more user requests, or completing more tasks. While developing
software capable of performing a lot of work has its own set of challenges, designing software that can
R
do a lot of work has its own set of challenges.
Reduce the time it takes for individual work units to finish to increase the quantity of work an application
H
accomplishes. Reduce the time it takes to process a user request, for example, and you will be able to
handle more requests in the same amount of time. Here are several scenarios in which this theory is
IG
the overhead of retrieving it again. Pooling: By pooling expensive resources, you may decrease the
overhead associated with their use. Decompose the task and parallelise the various stages to shorten
Y
the time it takes to accomplish a unit of work. Dividing: By partitioning the code and collocating relevant
partitions, related processes may be concentrated as near together as feasible.
P
Your possessions reduce the amount of time spent contacting distant services by making the interfaces
O
coarser-grained, for example. It is also important to remember that remote vs. local is an intentional
design decision rather than a switch, and to remember the first law of distributed computing: do not
C
2
UNIT 11: Principles of Scalable Performance JGI JAIN
DEEMED-TO-BE UNIVERSITY
DOP implies an unlimited number of processors; this is not possible in actual computers, thus certain
parallel programme parts must be run in smaller parallel segments sequentially. Other resources might
create restrictions. A plot of DOP vs. time is called a parallelism profile as shown in Figure 1:
DOP
Average
Parallelism
D
E
Time →
t1 t2
V
Figure 1: Parallelism Profile
R
Average Parallelism – 1
E
Assume the following points for average parallelism-1:
n homogeneous processors
S
zz
zz
Average Parallelism – 2
The area under the profile curve is related to the total amount of work done:
R
t2
W = ∆ ∫ DOP(t)dt
Y
t1
∆∑
m
P
W
W==∆ ∑ ii ⋅⋅ ttii
i=1
i=1
where
where tt ii =
= total
total time
time that
that DOP = i,i, and
DOP = and
O
∑
m
∑ tt
i=1
i=1
i
i
= tt 22 −
= − tt 11
C
Average Parallelism – 3
t
1 2
t 2 − t 1 t∫
A= DOP (t) dt
1
m m
A = ∑ i ⋅ ti / ∑ ti
i=1 i=1
3
JGI JAIN DEEMED-TO-BE UNIVERSITY
Computer Organization and Architecture
11.2.2 Available Parallelism
Several investigations have demonstrated that parallelism in scientific and engineering computations
may be quite high (e.g. hundreds or thousands of instructions per clock cycle). But in real machines, the
actual parallelism is much smaller (e.g. 10 or 20).
In compilers, basic blocks are commonly employed as the focus of optimisers (since it’s easier to manage
D
the use of registers utilised in the block).
E
Limiting optimisation to fundamental blocks reduces the amount of parallelism that may be achieved
at the instruction level (to about 2 to 5 in typical code).
V
Asymptotic Speedup – 1
R
Wi = i∆t i
(work done when DOP = i)
E
m
W = ∑ Wi
(relates sum of Wi terms to W)
S
i=1
t i (k) = Wi / k∆
t i (∞) = Wi / i∆
E
(execution time with k processors)
R
(for 1 ≤ i ≤ m)
Asymptotic Speedup – 2
T
m m
Wi
T(1) = ∑ t 1 (1) =∑ (resp. time w/1 proc.)
i=1 i=1 ∆
H
m m
Wi
T(∞) = ∑ t i (∞) = ∑ (resp. time w/ ∞ proc.)
IG
i=1 i=1 i∆
∑ W
m
T(1)
S∞ = = mi = 1 i = A (in the ideal case)
R
T(∞) ∑ W / i
i=1 i
Y
We are looking for a metric that describes the mean, or average, performance of a group of benchmark
O
programmes with a variety of execution options (for example, scalar, vector, sequential, parallel).
We may also want to assign weights to these programmes to highlight the various modes and provide a
C
11.2.5 Arithmetic Mean
The arithmetic mean is a well-known concept (sum of the terms divided by the number of terms).
Execution rates given in MIPS or Mflops will be used in our calculations.
The sum of the inverses of the execution times is proportional to the arithmetic mean of a collection of
execution rates; it is not proportional to the total to execution times.
4
UNIT 11: Principles of Scalable Performance JGI JAIN
DEEMED-TO-BE UNIVERSITY
As a result, when the benchmarks are run, the arithmetic mean fails to represent the true time used by
the benchmarks.
D
11.2.7 Weighted Harmonic Mean
We may compute the weighted harmonic mean by associating weights fi with the benchmarks:
E
m
Rh =
V
∑
m
i=1
(fi / Ri )
R
11.2.8 Weighted Harmonic Mean Speedup
E
T1 = 1/R1 = 1 is the sequential execution time on a single processor with rate R1 = 1.
S
Ti = 1/Ri = 1/i = is the execution time using i processors with a combined execution rate of Ri = i.
Assume there are n execution modes and corresponding weights f1... f n for a programme. It is possible
E
to calculate the weighted harmonic mean speedup as follows:
R
T* = 1 / R h*
1
S = T1 / T* = (weighted arithmetic
( ∑ i= 1 fi / Ri
m
)
T
11.2.9 Amdahl’s Law
IG
Basically, this means the system is used sequentially (with probability a) or all n processors are used
R
n
Sn =
P
1 + (n − 1)α
O
The implication is that the best speedup possible is 1/ a, regardless of n, the number of processors.
C
5
JGI JAINDEEMED-TO-BE UNIVERSITY
Computer Organization and Architecture
zz (n) is roughly equivalent to the total number of instructions executed by the n processors, scaled by
a constant factor.
zz If we define O (1) = T (1), we may safely assume that T (n) O (n) for n > 1 if the programme P can take
advantage of the extra processor in any way (s).
System Efficiency – 2
The speedup factor (how much quicker the programme runs with n CPUs) can now clearly be represented
as:
D
S (n) = T (1) / T (n)
Recall that we expect T (n) < T (1), so S (n) ³ 1.
E
System efficiency is defined as:
V
E (n) = S (n) / n = T (1) / (n ´ T (n))
R
It indicates the actual degree of speedup achieved in a system as compared with the maximum possible
speedup. Thus 1 / n £ E (n) £ 1. The value is 1/n when only one processor is used (regardless of n), and the
E
value is 1 when all processors are fully utilised.
S
11.2.11 Redundancy
In a parallel computation, redundancy is defined as:
E
R
R (n) = O (n) / O (1)
What values can R (n) obtain?
T
R (n) = 1 when O (n) = O (1), or when the number of operations performed is independent of the number
H
It indicates the degree to which the system resources were kept busy during the execution of the program.
Since 1 £ R (n) £ n, and 1 / n £ E (n) £
O
1 / n £ E (n) £ U (n) £ 1
1 £ R (n) £ 1 / E (n) £ n
6
UNIT 11: Principles of Scalable Performance JGI JAIN
DEEMED-TO-BE UNIVERSITY
This measure is directly related to speedup (S) and efficiency (E), and inversely related to redundancy
(R).
The quality measure is bounded by the speedup (that is, Q (n) £ S (n)).
D
a. 10 MIPS CISC computer
E
b. 20 MIPS RISC computer
V
It is impossible to tell without knowing more details about the instruction sets on the machines. Even the
question, “which machine is faster,” is suspect, since we need to say “faster at doing what?”
R
Transactions per second (TPS) is a measure that is appropriate for online systems like those used to
support ATMs, reservation systems and point of sale terminals. The measure may include communication
E
overhead, database search and update as well as logging operations. The benchmark is also useful for
rating relational database performance.
S
KLIPS is the measure of the number of logical inferences per second that can be performed by a system,
E
presumably to relate how well that system will perform at certain AI applications. Since one inference
requires about 100 instructions (in the benchmark), a rating of 400 KLIPS is roughly equivalent to 40
R
MIPS.
The phrase “parallel processing” is used to indicate a broad class of simultaneous data-processing
H
For example, while one instruction is being performed in ALU, the following instruction might be fetched
from memory. The system may have two or more ALUs and be capable of simultaneously executing two
or more instructions. Furthermore, two or more processing is utilised to speed up computer processing
R
capacity, which grows with parallel processing and raises the system’s cost. However, technical
advancements have brought down hardware costs to the point where parallel processing methods are
Y
now cost-effective.
P
Multiple layers of complexity result in parallel processing. The type of registers utilised at the lowest
level distinguishes parallel from serial processes. Shift registers function serially with one bit at a time,
O
whereas parallel registers work with all bits of the word at the same time. Parallel processing is defined
as a set of functional units that conduct independent or comparable tasks at the same time at high
C
degrees of complexity. Parallel processing is set up by spreading data across many functional units.
For example, arithmetic, shift and logic operations may be broken down into three units, each of which
is then turned into a teach unit under the supervision of a control unit.
Figure 2 depicts one technique of splitting the execution unit into eight functional units that operate in
parallel. Operands in the registers are moved to one of the units associated with the operands, depending
on the operation indicated by the instruction. The operation done in each functional unit is indicated
7
JGI JAIN
DEEMED-TO-BE UNIVERSITY
Computer Organization and Architecture
in each block of the diagram. The adder and integer multiplier execute arithmetic operations using
integer numbers.
Three parallel circuits can be used to do floating-point calculations. On distinct data, logic, shift and
increment operations are all executed at the same time. Because all units are independent of one
another, one number is moved while another is increased. A complex control unit is usually connected
with a multi-functional organisation to coordinate all of the operations amongst the many components.
The most essential indicator of a computer’s performance is how rapidly it can run programmes. The
architecture of a computer’s hardware influences the speed with which it runs programmes. It is vital
D
to design the compiler for optimal performance. In a coordinated manner, the machine instruction set
and the hardware.
E
The whole time it takes to run the programme is called elapsed time, and it is a measure of the computer
system’s overall performance. The speed of the CPU, disc and printer all has an impact. The processor
V
time is the amount of time it takes to execute an instruction.
R
The processor time is dependent on the hardware involved in the execution of individual machine
instructions, just as the elapsed time for the execution of a programme is dependent on all units in a
E
computer system.
S
On a single IC chip, the processor and tiny cache memory can be created. Internally, the speed at
which the basic steps of instruction processing are performed on the chip is extremely rapid, and it
E
is significantly quicker than the speed at which the instruction and data are acquired from the main
memory. When the movement of instructions and data between the main memory and the processor is
R
limited, which is achieved by utilising the cache, a programme will run faster.
Consider the following scenario: In a programme loop, a set of instructions are performed repeatedly
T
over a short period. If these instructions are stored in the cache, they can be swiftly retrieved during
periods of frequent use. The same can be said for data that is used frequently. Figure 2 depicts Processor
H
Adder-subtractor
Integer Multiply
R
Logic Unit
Y
Shift Unit
Processor To Memory
P
Incrementer Registers
O
Floating-point Add-subtract
Floating-point Multiply
C
Floating-point Divide
8
UNIT 11: Principles of Scalable Performance JGI JAINDEEMED-TO-BE UNIVERSITY
The major benefit of parallel processing is that it improves system resource usage by increasing resource
multiplicity, which increases total system throughput.
D
employing additional processors. For example, if a programme takes 20 hours to complete using a single
thread, but only 19 hours (p = 0.95) of execution time can be parallelised, the minimum execution time
E
cannot be less than one hour, regardless of how many threads are allocated to a parallelized execution
of this programme. As a result, the potential speedup is capped at 20.
V
Times the single-thread performance,
R
1
= 20
E
1 −p
S
Figure 3 depicts Amdahl’s law is as follows:
20
E
Amdahl’s Law
R
18
Parallel portion
16 50%
T
75%
14 90%
95%
H
12
Speed up
10
IG
6
R
2
Y
0
1
16
32
64
128
256
512
1024
2048
4096
8192
16384
32768
65536
P
Number of processors
O
The key goal is to have the results as soon as possible. In other words, the major objective is to have a
quick turnaround time.
Amdahl’s law can be formulated in the following way:
1
Slatency (8) =
p
(1 − p ) +
s
9
JGI JAIN DEEMED-TO-BE UNIVERSITY
Computer Organization and Architecture
Where,
Slatency is the theoretical speedup of the execution of the whole task;
S is the speedup of the part of the task that benefits from improved system resources;
P is the proportion of execution time that the part benefiting from improved resources originally
occupied. Furthermore,
1
Slatency (S ) ≤
1 − p
D
E
1
lim Slatency (S ) =
s →∞ 1 −p
V
Illustrates that the theoretical speedup of the entire job rises as the system’s resources improve and
R
that the theoretical speedup is always restricted by the part of the work that cannot benefit from the
improvement, regardless of the degree of the improvement.
E
Only in instances when the issue size is fixed does Amdahl’s law apply. In practise, when more computer
S
resources become available, they tend to be utilised on larger issues (larger datasets) and the time spent
on the parallelizable component of the job frequently rises considerably faster than the time spent on
the fundamentally serial task.
E
R
The following points defined three performance laws are as follows:
zz Amdahl’s law (1967) assumes a fixed workload or issue size.
T
zz Gustafson’s law (1987) is used to solve scalable issues, in which the problem size grows as the machine
size grows.
H
zz Sun and Ni’s (1993) speed-up model is for scaling issues with memory constraints.
IG
Amdahl’s Law for fixed workload, in many practical applications, have a set computing burden and
fixed issue size. The fixed workload is dispersed as the number of processors grows. Fixed-load speedup
is a type of speedup acquired for time-critical applications.
R
11.5 Conclusion
Y
Conclusion
zz The degree of parallelism (DOP) is the number of processors employed to run a programme at any
O
given time.
zz A simple block is a set of instructions that has only one entry and exit.
C
zz Transactions per second (TPS) is a measure that is appropriate for online systems like those used to
support ATMs.
zz KLIPS is the measure of the number of logical inferences per second that can be performed by a
system.
zz A parallel processing system may process data in parallel, resulting in quicker execution times.
zz Fixed-load speedup is a type of speedup acquired for time-critical applications.
10
UNIT 11: Principles of Scalable Performance JGI JAIN
DEEMED-TO-BE UNIVERSITY
11.6 Glossary
D
to support ATMs.
zz KLIPS: It is the measure of the number of logical inferences per second that can be performed by a
E
system.
Parallel processing: It is a system that may process data in parallel, resulting in quicker execution
V
zz
times.
R
zz Fixed-load speedup: It is a type of speedup acquired for time-critical applications.
E
11.7 Self-Assessment Questions
S
A. Multiple Choice Questions E
1. Which among the following refers to the ability to accomplish more of something?
R
a. Scalability
b. Parallelism
T
c. Measures
H
d. Mean
2. A simple __________ is a set of instructions that has only one entry and exit.
IG
a. Segment
b. Block
R
c. Profile
d. Measures
Y
11
JGI JAIN
DEEMED-TO-BE UNIVERSITY
Computer Organization and Architecture
5. Which among the following is used to indicate a broad class of simultaneous data-processing
processes with the aim of improving the computing performance of a computer system?
a. Parallel processing
b. Processor
c. Redundancy
d. Harmonic mean
6. Gustafson’s law is used to solve scalable issues, in which the problem size grows as the machine size
grows, in:
D
a. 1967
E
b. 1993
c. 1987
V
d. 1990
R
7. Which of the following can be defined as the ratio of execution time for the whole task without by
means of the enhancement to execution time for the whole task?
E
a. Speedup
S
b. Scalability
c. Parallelism E
d. System utilisation
R
8. Choose the option that implies an unlimited number of processors.
a. Parallel processing
T
b. Metrics
H
c. Mean performance
d. Degree of Parallelism (DOP)
IG
9. The sum of the inverses of the execution times is proportional to the__________ of a collection of
execution rates.
a. Harmonic mean
R
b. Arithmetic mean
Y
10. Which among the following is the execution time that the part benefiting from improved resources
O
originally occupied?
a. Segments
C
b. Blocks
c. Proportion
d. Parallel
12
UNIT 11: Principles of Scalable Performance JGI JAIN
DEEMED-TO-BE UNIVERSITY
3. Multiple layers of complexity result in parallel processing. Outline the significance of parallel
processing.
4. Describe the importance of speedup laws.
5. Transactions Per Second (TPS) is a measure that is appropriate for online systems like those used to
support ATMs. Examine the significance of performance measures.
D
A. Answers to Multiple Choice Questions
E
V
Q. No. Answer
1. a. Scalability
R
2. b. Block
E
3. c. R (n) = O (n) / O (1)
S
4. d. TPS
5. a. Parallel processing E
6. c. 1987
R
7. a. Speedup
9. b. Arithmetic mean
H
10. c. Proportion
IG
include managing more data, responding to more user requests, or completing more tasks. Refer to
Section Introduction
Y
2. The degree of parallelism (DOP) is the number of processors employed to run a programme at any
given time, and it can change over time.
P
3. The phrase “parallel processing” is used to indicate a broad class of simultaneous data-processing
processes with the aim of improving the computing performance of a computer system. Refer to
C
13
JGI JAINDEEMED-TO-BE UNIVERSITY
Computer Organization and Architecture
https://www.cs.umd.edu/~meesh/411/CA-online/chapter/performance-metrics/index.html
D
zz
zz https://www.sciencedirect.com/topics/computer-science/parallel-processing
E
V
11.10 Topics for Discussion Forums
R
zz Discuss with your friends and classmates the concept of scalable performance. Also, discuss about
the performance metrics and measures and parallel processing applications.
E
S
E
R
T
H
IG
R
Y
P
O
C
14
UNIT
12
D
E
Advanced Processor Technology
V
R
E
S
Names of Sub-Units
E
Introduction, Design Space of Processors, CISC Architecture, RISC Architecture, Superscalar
R
Processors, VLIW Architecture, Overview of Vector and Symbolic Processors
T
Overview
H
IG
This unit begins by discussing the concept of advanced processor technology. Next, the unit discusses
the design space for processors and CICS architecture. The unit also covers the RISC architecture,
superscalar processors and VLIW architecture. Towards the end, the unit discusses the overview of
vector and symbolic processors.
R
Y
Learning Objectives
P
O
Learning Outcomes
D
aa Understand the significance of VLIW architecture
aa Analyse the overview of vector and symbolic processors
E
V
Pre-Unit Preparatory Material
R
E
aa https://link.springer.com/content/pdf/bfm%3A978-3-642-58589-0%2F1.pdf
S
12.1 Introduction E
The architectural families of modern CPUs are listed here. CISC, RISC, superscalar, VLIW, super pipelined,
R
vector, and symbolic processors are among the key processor families that will be discussed Numerical
computations are performed using scalar and vector computers. For AI applications, symbolic
processors have been created.
T
H
concerned with the design of a processor, which is a critical component of computer hardware.
The design process begins with the selection of an instruction set and an execution paradigm (e.g. VLIW
R
or RISC), and ends with the creation of a microarchitecture that can be specified in VHDL or Verilog.
This description is then created for microprocessor design using one of the many semiconductor device
Y
fabrication procedures, resulting in a die that is bonded to a chip carrier. This chip carrier is then
attached to a printed circuit board or put into a socket on one (PCB).
P
Any processor’s method of operation is the execution of lists of instructions. Computer or manipulate
O
data values utilising registers, update or retrieve values in read/write memory, perform relational
checks between data values, and regulate programme execution are all examples of instructions.
C
Before sending a CPU design to a foundry for semiconductor production, it is frequently tested and
validated on one or more FPGAs.
2
UNIT 12: Advanced Processor Technology JGI JAIN
DEEMED-TO-BE UNIVERSITY
The space of clock rate versus cycles per instruction is shown in Figure 1:
D
CISC
3
E
CPI
V
2
R
RISC
1
VP
E
S
1 2 3
Clock speed (GHz)
E
R
Figure 1: Clock Rate vs. Cycles per Instruction
Processor families can be mapped onto a coordinated space of clock rate versus cycles per instruction
T
(CPI).
The clock rates of many processors have evolved from low to higher speeds toward the right of the
H
design space as implementation technology advances fast (i.e., increase in clock rate). And processor
makers have been utilising novel hardware techniques to reduce the CPI rate (cycles taken to execute
IG
an instruction).
Two main categories of processors are:
R
zz
P
Products intended for multi-core processors, embedded applications, or cheap cost and/or low power
consumption tend to have lower clock rates in both CISC and RISC categories. Processors with great
performance must be designed to run at high clock rates. Vector processors have been labelled as VP;
vector processing capabilities may be found in either CISC or RISC main processors.
It consists of such computers having insignificant programs. CISC has an enormous number of multiple
instructions, which take a long time to perform. In CISC, numerous steps are performed to protect a
particular set of instructions, in which an individual set of instruction has an additional more than
3
JGI JAIN
DEEMED-TO-BE UNIVERSITY
Computer Organization and Architecture
300 discrete instructions. Many instructions are completed in either two to then machine cycles. The
pipelining of instruction in CISC is not executed easily. Figure 2 depicts basic architecture of CISC:
Instruction and
Control Unit
Data Path
Microprogrammed
D
Cache
Control Memory
E
V
Main Memory
R
E
Figure 2: Basic Architecture of CISC
S
The CISC machineries have good performances, based on the overview of program compilers; as the
range of advanced instructions are merely accessible just in a single instruction set. They design
E
multiple instructions in a single and simple set of instructions.
They accomplish processes are of low-level, also has a large no of addressing nodes and some extra
R
data types of a machine in the hardware simply. However, CICS is referred as less efficient than RISC,
for its ineffectiveness to abolish the codes of the cycles which tends to wasting. In CISC, a chip of a
T
zz
4
UNIT 12: Advanced Processor Technology JGI JAIN
DEEMED-TO-BE UNIVERSITY
D
machines. Figure 3 depicts the basic architecture of RISC:
E
Hardwired
Data Path
V
Control Unit
R
Instruction
Data Cache
Cache
E
S
(Instruction) (Data)
Main Memory
E
R
Figure 3: Basic Architecture of RISC
A Reduced Instruction Set Computer (RISC) is a microprocessor that executes a small number of
instructions at once. These devices use fewer transistors because they are based on tiny commands,
T
making transistor design and production less expensive. The following are some of the characteristics
H
of RISC:
zz There is less need for decoding.
IG
zz
The following points describe the simple compiler design of RISC is as follows:
P
zz
The following points describe the characteristics of RISC architecture are as follows:
zz Decoding of simple instruction because of simple instruction
zz The size of Instruction comes undersize of one word
zz Requirement of a single cycle to execute instructions
5
JGI JAIN
DEEMED-TO-BE UNIVERSITY
Computer Organization and Architecture
D
zz
E
zz It uses only few parameters and the processor of RISC cannot use call instructions.
V
zz In RISC, the operation speed is higher and the time of the execution is lower.
R
zz The performance of RISC processors is mostly determined by the programmer or compiler, as
E
compiler knowledge is critical when converting CISC code to RISC code.
A code extension, or changing the CISC code to a RISC code, will increase the size. The quality of this
S
zz
code extension will be determined by the compiler as well as the instruction set of the machine.
zz
E
The RISC processor’s first level cache is also a disadvantage, because these processors have enormous
memory caches on the chip itself. They need very rapid memory systems to feed the instructions.
R
Some of the problems of RISC which arises during execution are as follows:
More complicated register decoding system
T
zz
A superscalar or vector architecture can improve a CISC or RISC scalar CPU. One instruction per cycle
is executed by scalar processors. Only one instruction is issued each cycle, and the pipeline is intended
R
to complete only one instruction per cycle. In a superscalar processor, multiple instructions are issued
per cycle and multiple results are generated per cycle. A vector processor runs vector instructions on
Y
data arrays; each vector instruction consists of a series of repeated operations, making them suitable
for pipelining with one result every cycle follow the following points:
P
zz Superscalar processors were created as a replacement for vector processors in order to take use of
higher levels of instruction level parallelism.
6
UNIT 12: Advanced Processor Technology JGI JAIN
DEEMED-TO-BE UNIVERSITY
D
E
0 1 2 3 4 5 6 7 8 9 Time in Base Cycles
V
R
Figure 4: A Superscalar Processor of Degree m=3
E
The following points describe the degree of superscalar processor according to the above figure are as
follows:
S
zz A superscalar processor of degree m can only be fully used if m instructions can be executed in
parallel. E
zz This may not be the case for all clock cycles. In that situation, certain pipelines may be stuck in a
R
holding pattern.
zz Simple operation delay should only take one cycle in a superscalar processor, as it does in a basic
scalar processor.
T
zz The superscalar processor is more reliant on an optimising compiler to leverage parallelism due to
H
below.
Very-Long Instruction Word (VLIW) architectures are referred to as VLIW architectures. It’s a good
Y
option for taking advantage of instruction-level parallelism (ILP) in programmes, especially when
running multiple basic (primitive) instructions at once.
P
These processors combine several functional units to retrieve a Very-Long Instruction Word (VLIW)
containing numerous primitive instructions from the instruction cache and dispatch the entire VLIW
O
Compilers take advantage of these capabilities by producing code with many primitive instructions
that can be executed in parallel. Because they do not perform dynamic scheduling or reordering of
operations, the processors have associatively basic control logic.
VLIW’s major goal is to eliminate the convoluted instruction scheduling and parallel dispatch found in
today’s microprocessors. A VLIW processor should be faster and more efficient.
7
JGI JAIN
DEEMED-TO-BE UNIVERSITY
Computer Organization and Architecture
VLIW Architecture
D
E
Functional ____ Functional
Unit 1 Unit 1
V
R
Instruction Cache
E
Figure 5: Basic Architecture of VLIW
S
The numerous functional units share a common multi-ported register file for fetching the operands and
E
storing the results, as illustrated in the diagram. The read/write crossbar facilitates parallel random
access to the register file by the functional units. The load/store operation of data between a RAM and a
R
register file occurs concurrently with the execution of activities in the functional units.
Figure 6 depicts a typical VLIW processor with degree m=3:
T
H
Main
Memory Register File
IG
Load/
Store
F.P. Add Integer Branch
Unit ALU Unit
R
Unit
Y
The following points describe the advantages of VLIW architecture are as follows:
It has the ability to boost performance
C
zz
The following points describe the disadvantages of VLIW architecture are as follows:
zz It can be utilised by a brand-new coder
zz The programme should maintain track of the scheduling of instruction
8
UNIT 12: Advanced Processor Technology JGI JAIN
DEEMED-TO-BE UNIVERSITY
D
instructions including memory address).
E
12.7.1 Vector Instructions
V
A vector instruction is a set or class of instructions that allows parallel processing in a data sets. It
is executed by a vector unit, which alike to a conservative Single Instruction Multiple Data (SIMD)
R
instruction. In a vector instruction, the array of floating point and integer’s numbers is processed with
in a single operation. It also supports recursive implementation of operation of vectors which are not
kept in a memory space.
E
S
12.7.2 Symbolic Processors
E
Because the data and knowledge representations, operations, memory, I/O, and communication aspects
in these applications differ from those in numerical computing, it is used in fields such as theorem
R
proving, pattern recognition, expert systems, machine intelligence, and so on.
Symbolic manipulators are also known as Prolog processors, Lisp processors, or Prolog processors.
T
The following points describe the attributes characteristics of symbolic processing are as follows:
Knowledge representation lists, relational databases, Semantic nets, Frames, Production systems
H
zz
one to the next. Parallelism is based on the concurrent execution of these functions. Lisp’s applicative
and recursive nature need an environment that enables stack calculations and function calls quickly.
P
The use of linked lists as the primary data structure allows for the implementation of an automated
garbage collection system.
O
C
Conclusion 12.8 Conclusion
zz Processor design is a branch of computer engineering and electronics engineering concerned with
the design of a processor.
zz The term CISC stands for ‘’Complex Instruction Set Computer’’. It is a CPU design plan based on
particular commands, which are able in performing multi-step operations.
zz The term RISC stands for ‘’Reduced Instruction Set Computer’’. It is a CPU design plan based on
simple orders and performs the instructions faster.
9
JGI JAIN
DEEMED-TO-BE UNIVERSITY
Computer Organization and Architecture
D
12.9 Glossary
E
zz Processor design: It is a branch of computer engineering and electronics engineering concerned
with the design of a processor.
V
zz Complex Instruction Set Computer (CISC): It is a CPU design plan based on particular commands,
R
which are able in performing multi-step operations.
zz Reduced Instruction Set Computer (RISC): It is a CPU design plan based on simple orders and
E
performs the instructions faster.
Superscalar: It can improve a CISC or RISC scalar CPU.
S
zz
zz Pipelining: It allows the processor to read a new instruction from memory previous to the end of the
existing process. E
Very-Long Instruction Word (VLIW): It is a good option for taking advantage of instruction-level
R
zz
parallelism (ILP) in programmes.
zz Vector processors: It is used to build specifically to do vector computations, and vector instructions
T
1. Which among the following is a branch of computer engineering and electronics engineering
(fabrication) concerned with the design?
Y
a. Processor b. Operation
c. Architecture d. CPU
P
2. The clock rates of many processors have evolved from __________ speeds toward the right of the
O
design space.
a. High to low b. Low to high
C
10
UNIT 12: Advanced Processor Technology JGI JAIN
DEEMED-TO-BE UNIVERSITY
D
6. Which among the following is good option for taking advantage of instruction-level parallelism
(ILP) in programmes?
E
a. RISC b. CISC
V
c. VLIW d. CPI
7. Which of these is correct in the context of CISC?
R
a. It has general-purpose registers
E
b. It has less data types
c. Decoding of simple instruction because of simple instruction
S
d. Instruction may take more than a single clock cycle to get executed.
8. CICS is referred as less efficient than _________.
E
R
a. VLIW b. ILP
c. RISC d. CPI
T
10. ___________ allows the processor to read a new instruction from memory previous to the end of the
existing process.
R
a. Operation b. Processor
c. CPU d. Pipelining
Y
P
11
JGI JAINDEEMED-TO-BE UNIVERSITY
Computer Organization and Architecture
Q. No. Answer
1. a. Processor
2. b. low to high
D
3. c. Complex Instruction Set Computer
4. d. Chip
E
5. a. Reduced Instruction Set Computer
6. c. VLIW
V
7. d. Instruction may take more than a single clock cycle to get executed.
R
8. c. RISC
9. a. Vector processors
E
10. d. Pipelining
S
B. Hints for Essay Type Questions
1. E
Processor design is a branch of computer engineering and electronics engineering (fabrication)
concerned with the design of a processor, which is a critical component of computer hardware.
R
Refer to Section Design Space of Processors
2. The term CISC stands for ‘’Complex Instruction Set Computer’’. It is a CPU design plan based on
particular commands, which are able in performing multi-step operations. Refer to Section Complex
T
3. The term RISC stands for ‘’Reduced Instruction Set Computer’’. It is a CPU design plan based on simple
orders and performs the instructions faster. Refer to Section Reduced Instruction Set Computer
(RISC) Architecture
IG
4. A superscalar or vector architecture can improve a CISC or RISC scalar CPU. One instruction per
cycle is executed by scalar processors. Refer to Section Superscalar Processors
R
5. Very-Long Instruction Word (VLIW) architectures are referred to as VLIW architectures. It’s a good
option for taking advantage of instruction-level parallelism (ILP) in programmes, especially when
Y
running multiple basic (primitive) instructions at once. Refer to Section Very-Long Instruction Word
(VLIW) Architecture
P
O
https://www.geeksforgeeks.org/computer-organization-and-architecture-pipelining-set-1-
C
zz
execution-stages-and-throughput/
zz https://alldifferences.net/difference-between-risc-and-cisc/
zz Discuss the notion of improved processing technology and pipelining with your friends and
classmates. Also, discuss the RISC, CISC, and VLIW architectures where they are used.
12
UNIT
13
D
E
Memory Hierarchy Technology
V
R
E
S
Names of Sub-Units E
Hierarchical Memory Technology, Virtual Memory Technology: Virtual Memory, TLB, Paging and
R
Segmentation, Cache Memory Organization: Cache Addressing Modes, Direct Mapping and Associative
Caches, Set-Associative, Cache Performance Issues.
T
Overview
H
The unit begins by discussing the concept of hierarchical memory technology. Next, the unit discusses
IG
the concept of virtual memory technology and virtual memory. This unit also discusses the concept of
cache addressing modes, direct mapping and associative caches. Towards the end, the unit discusses
the concept of cache performance issue.
R
Y
Learning Objectives
P
Learning Outcomes
D
aa Evaluate the concept of cache performance issue
E
Pre-Unit Preparatory Material
V
R
aa http://eceweb.ucsd.edu/~gert/ece30/CN5.pdf
E
13.1 Introduction
S
A CPU, as well as a huge number of memory devices, were employed in the computer system’s architecture.
However, the primary issue is that these components are costly. As a result, memory hierarchy may be E
used to organise the system’s memory. It has various memory tiers with varying performance rates.
However, any of these can provide as precise objective, reducing access time. The memory hierarchy
R
was created in response to the program’s activity.
Memory hierarchy is a feature in computer system design that helps to arrange memory such that
H
access time is reduced. The memory hierarchy was created using a programming technique called
locality of references. Figure 1 depicts the multiple layers of memory hierarchy:
IG
R
CPU
Increase in Capacity & Access Time
Registers LEVEL 0
Y
Increase in Cost per bit
Cache Memory
P
LEVEL 1
(SRAMs)
O
2
UNIT 13: Memory Hierarchy Technology JGI JAIN
DEEMED-TO-BE UNIVERSITY
This memory hierarchy design is divided into two main categories, which are as follows:
zz External Memory or Secondary Memory: Refers to the storage devices which are accessible by
the processor via I/O module. The examples of secondary memory are magnetic disk, optical disk,
magnetic tape, etc.
zz Internal Memory or Primary Memory: Refers to the memory that the CPU can access directly. The
examples of internal memory are main memory, cache memory, CPU registers, etc.
D
progress along the Hierarchy from top to bottom.
Access time: It’s the time between a read/write request and the data becoming available. The access
E
zz
time grows as we progress along the Hierarchy from top to bottom.
V
zz Performance: At the time of designing the computer system, if we do not consider the memory
hierarchy design, then the difference between both the CPU registers and main memory increased.
R
Hence, the system’s performance suffers, necessitating improvement. This improvement was
produced in the form of Memory Hierarchy Design, which improves the overall system performance.
E
zz Cost per bit: The cost per bit grows as we advance up the hierarchy, hence, internal memory is more
expensive than external memory.
S
13.3 Virtual Memory Technology
E
Virtual memory is an important idea in computer architecture that allows you to execute huge, complex
R
applications on a machine with a limited amount of RAM. Virtual memory allows a computer to manage
the competing needs of several applications within a limited quantity of physical memory. A PC with
T
insufficient memory may execute the same applications as one with plenty of RAM, but at a slower pace.
H
byte. Because the amount of memory on a computer varies, knowing which applications will run on it
can be difficult. Virtual memory overcomes this problem by treating each computer as if it had a lot
of RAM and treating each application as if it only runs on the PC. For each programme, the operating
R
system, such as Microsoft Windows or Apple’s OS X, generates a set of virtual addresses. The operating
system transforms virtual addresses to physical addresses and dynamically allocates RAM to programs
Y
13.3.2 Paging
O
Virtual memory divides programmes into pages of a predetermined size. The operating system loads
all of a program’s pages into RAM if the machine has enough physical memory. If not, the OS crams as
C
much as it can into the available space and executes the instructions on those pages. When the computer
finishes those pages, it puts the rest of the programme into RAM, perhaps overwriting previous pages.
Because the operating system handles these aspects automatically, the software developer may focus
on programme functionality rather than memory concerns.
13.3.3 Multiprogramming
Virtual memory combined with paging allows a computer to execute many programmes at the
same time, practically independent of the amount of RAM available. This functionality, known as
3
JGI JAIN
DEEMED-TO-BE UNIVERSITY
Computer Organization and Architecture
D
it spends more time managing memory and less time doing productive work. A computer should have
enough RAM to accommodate the demands of several programmes, reducing the amount of time it
E
spends maintaining its pages.
V
13.3.5 Memory Protection
R
A computer without virtual memory may nevertheless execute many programmes at the same
time, however one programme may alter the data in another programme, either unintentionally or
E
purposefully, if its addresses refer to the incorrect programme. Because a programme never “sees” its
S
physical addresses, virtual memory precludes this scenario. The virtual memory manager safeguards
data in one application from being tampered with by another.
E
13.3.6 Translation Look aside Buffer (TLB)Mechanism
R
The system includes a special cache memory called a Translation Look aside Buffer (TLB) used with the
address translation hardware. The full-page table is kept in the primary memory. When a page number
T
is translated to a page frame, the map is read from the primary memory into the TLB. The TLB entry
then contains the page number, the physical address of the page frame, and various protection bits.
H
On subsequent references to the page, the map entry will be read from the TLB rather than from the TLB
entry. This entry contains page number, of the page frame, and various protection bits. On subsequent
IG
references to the page, the map entry will be read from the TLB rather than from primary memory.
R
13.3.7 Segmentation
Segmentation allows the programmer to interpret memory as consisting of multiple address spaces or
Y
segments. Segments of the memory may be of unequal dynamic size and have a number and a length.
Segmentation simplifies the handling of increasing data structures. With segmentation, the data
P
structure can be allocated its own segment and the OS would increase or decrease the segment as and
when required.
O
The segmentation system uses a logical limit register for each segment, so references intended to be
C
within a segment cannot accidentally reference information stored in a different segment. Segmented
virtual memory systems also employ protection mechanisms in the address translation to prevent
unauthorized access to segments. Segment-based virtual memory systems provide a means for
processes to share some segments while keeping access to others private.
4
UNIT 13: Memory Hierarchy Technology JGI JAIN
DEEMED-TO-BE UNIVERSITY
amount of locality in terms of time and space, and this information can be used in different manners
for improving the performance of the system. There are mainly two types of reference locality, namely,
temporal locality and spatial locality.
Temporal locality signifies the reuse of particular data or resources within a very short span of time. On
the other hand, spatial locality signifies the usage of data elements in relatively close storage locations.
The spatial locality can be referred to as sequential locality when the data elements are sequenced and
retrieved linearly, similar to traversing elements in case of a one-dimensional array.
D
Cache memory is a type of memory that operates at a very fast speed. It’s used to boost performance
E
and synchronise with a high-speed CPU. Cache memory is more expensive than main memory and disc
memory, but it is less expensive than CPU registers. Cache memory is a form of memory that works as a
V
buffer between RAM and the CPU and is highly fast. It stores frequently requested data and instructions
so that the CPU may access them quickly when needed.
R
The average time taken to retrieve data from the main memory is reduced when cache memory is
E
used. The cache memory is a smaller, faster memory that holds replicas of information from recently
requested addresses of the main memory.
S
Figure 2 shows the cache memory organisation:
E
Cache
R
Memory
CPU Primary Secondary
Memory Memory
T
H
IG
zz
loaded in the CPU instantly. The L1 cache is usually the smallest and is incorporated into the CPU
chip. An individual L1 cache is accessible for every core in multi-core CPUs.
P
zz Level 2 or cache memory: It’s usually built into the CPU, but it’s also a standalone chip that is placed
O
between the CPU and the RAM. L2 cache memory is a sort of secondary cache memory. The L2 cache
has a larger capacity than the L1 cache. It’s placed on a processor in a computer. The CPU first looks
C
for instructions in the L1 cache, then moves on to the L2 cache, if necessary, data or instructions are
not found in the L1 cache. The cache is connected to the processor through a high-speed system bus.
zz Level 3 or main memory: In comparison to L1 and L2 caches, L3 cache is slower, but it is bigger.
Each core in a multi-core processor may have its own L1 and L2 caches, but all cores share a shared
L3 cache. The L3 cache is twice as fast as the RAM. It is the memory on which the computer is now
operating. It is tiny in size, and data is lost when the power supply is turned off.
zz Level 4 or secondary memory: L4 cache is a type of external memory that is not fast in comparison
to main memory and is now uncommon. Data retains permanently in the L4 cache memory.
5
JGI JAIN
DEEMED-TO-BE UNIVERSITY
Computer Organization and Architecture
D
Direct Mapping
Direct mapping is the most basic method, which maps each block of main memory into only one potential
E
cache line. Assign each memory block to a specified line in the cache via direct mapping. If a memory
block has previously occupied a line and a new block has to be loaded, the old block is destroyed. There
V
are two elements to an address space: an index field and a tag field. The tag field is saved in the cache,
while the remainder is kept in main memory. The Hit ratio is directly related to the performance of
R
direct mapping.
E
i = j modulo m
where,
S
i=cache line number
j= main memory block number
E
m=number of lines in the cache
R
Each main memory address may be thought of as having three fields for the purposes of cache access.
Within a block of main memory, the least significant w bits indicate a unique word or byte. The address
T
in most modern computers is at the byte level. The remaining bits designate one of the main memory’s 2s
blocks. These s bits are interpreted by the cache logic as a tag of s-r bits (most significant chunk) and an
H
r-bit line field. This last field indicates one of the cache’s m=2r lines. Figure 3 shows the direct mapping:
IG
Set (0)
Block Frame Set (1)
O
Data
Cache
Control
C
Logic
Set (S-1)
CPU
6
UNIT 13: Memory Hierarchy Technology JGI JAIN
DEEMED-TO-BE UNIVERSITY
Associative Mapping
In this type of mapping, the information and locations of the memory block are stored with the help of
associative memory. Any block can be placed in any cache line. The word id bits are used to determine
which word in the block is required, but the tag becomes all of the remaining bits. This allows any word
to be placed wherever in the cache memory. It is said to be the quickest and most adaptable mapping
method.
Figure 4 shows the associate mapping:
D
Main Memory
E
Set (0)
V
Block Frames
R
Blk(0) Blk(1) Blk(N-1)
E
S
Cache
Control CPU
E
Logic
R
T
Set-Associative
IG
This type of mapping is an improved version of direct mapping that eliminates the disadvantages of
direct mapping. The concern of potential thrashing in the direct mapping approach is addressed by
set associative. It does this by stating that rather than having exactly one line in the cache to which a
R
block can map, we will combine a few lines together to form a set. Then a memory block can correspond
to any one of a set’s lines. Each word in the cache can have two or more words in the main memory at
Y
the same index address thanks to set-associative mapping. The benefits of both direct and associative
cache mapping techniques are combined in set associative cache mapping.
P
In this case, the cache consists of a number of sets, each of which consists of a number of lines. The
relationships are:
O
m = v * k
i= j mod v
C
where,
i=cache set number
j=main memory block number
v=number of sets
m=number of lines in the cache number of sets
k=number of lines in each set
7
JGI JAIN
DEEMED-TO-BE UNIVERSITY
Computer Organization and Architecture
Cache
Tag Line-offset Word-offset
Memory
D
E
Main Memory
V
Block Frames
R
Blk (0) Blk (N-1)
E
Set (0)
S
E Set (N-1)
R
T
Cache
Control
Logic
H
CPU
IG
zz Primary cache: The CPU chip always has a primary cache. This cache is small in size and its access
latency is equivalent to CPU registers.
O
zz Secondary cache: This cache is located between the primary cache and the rest of the system’s
memory. L2 cache is another name for this cache. This cache is located on the CPU chip as well.
C
8
UNIT 13: Memory Hierarchy Technology JGI JAIN
DEEMED-TO-BE UNIVERSITY
Cache memory performance is measured in quantity known as Hit ratio. The formula to calculate the
hit ratio is as follows:
Hit ratio = Hit / (Hit + Miss) = No. of Hits/Total Accesses
When there are a lot of cache misses, performance issues arise. The ratio of cache misses to cache hits
is the best indicator to measure the performance of cache.
Conclusion 13.5 Conclusion
D
zz Memory Hierarchy is a feature in computer system design that helps to arrange memory such that
access time is reduced.
E
zz Virtual memory is an important idea in computer architecture that allows you to execute huge,
complex applications on a machine with a limited amount of RAM.
V
zz A computer’s RAM is accessed via an address system, which is simply a set of integers that identify
R
each byte.
Virtual memory divides programmes into pages of a predetermined size.
E
zz
zz Virtual memory combined with paging allows a computer to execute many programmes at the
S
same time, practically independent of the amount of RAM available
Cache memory is a type of memory that operates at a very fast speed. It’s used to boost performance
zz
E
and synchronise with a high-speed CPU. Cache memory is more expensive than main memory and
R
disc memory, but it is less expensive than
zz Level 1 cache is a type of memory which holds and receives information and data that’s then loaded
in the CPU instantly.
T
zz L4 cache is a type of external memory that is not fast in comparison to main memory and is now
H
uncommon.
zz Direct mapping is the most basic method, which maps each block of main memory into only one
IG
13.6 Glossary
P
zz Memory hierarchy: A feature in computer system design that helps to arrange memory such that
access time is reduced
O
zz Locality of references: A programming technique that is used for creating memory hierarchy
C
zz Access time: It’s the time between a read/write request and the data becoming available
zz Virtual memory: It allows a computer to manage the competing needs of several applications within
a limited quantity of physical memory
zz Cache memory: A type of memory that operates at a very fast speed. It’s used to boost performance
and synchronise with a high-speed CPU
9
JGI JAIN
DEEMED-TO-BE UNIVERSITY
Computer Organization and Architecture
D
2. Which of the following programming technique is used for creating a memory hierarchy?
a. Layers b. Locality of references
E
c. Virtual Addresses d. Structural
V
3. Which of the following refers to the total amount of data that the memory can hold?
R
a. Access time b. Performance
c. Throughput d. Capacity
E
4. ____________ allows a computer to manage the competing needs of several applications within a
limited quantity of physical memory.
S
a. Virtual memory b.
E Virtual address
c. physical address d. Cache memory
R
5. ___________ is a type of memory that operates at a very fast speed. It’s used to boost performance
and synchronise with a high-speed CPU.
a. Virtual memory b. Virtual address
T
6. Which of the following is a type of memory which holds and receives information and data that’s
then loaded in the CPU instantly?
IG
c. Four d. Five
P
8. Which of the following cache addressing mode maps each block of main memory into only one
potential cache line?
O
10
UNIT 13: Memory Hierarchy Technology JGI JAIN
DEEMED-TO-BE UNIVERSITY
10. ____________ in computer system design that helps to arrange memory such that access time is
reduced.
a. Memory hierarchy b. Cache memory
c. Virtual memory d. Primary memory
D
3. Explain the concept of virtual memory. Also, discuss the physical and virtual address.
E
4. What do you understand by the cache memory? Also, explain the different levels of cache memory.
5. Define the direct mapping cache addressing mode.
V
R
13.8 Answers AND HINTS FOR Self-Assessment Questions
E
A. Answers to Multiple Choice Questions
Q. No.
S
Answer
1. a. Memory hierarchy
E
R
2. b. Locality of references
3. d. Capacity
T
4. a. Virtual memory
5. d. Cache memory
H
6. c. Level 1 cache
IG
7. b. Three
8. c. Direct mapping
9. b. Level 4 cache
R
1. Memory hierarchy is a feature in computer system design that helps to arrange memory such that
access time is reduced. The Memory Hierarchy was created using a programming technique called
O
Capacity: It refers to the total amount of data that the memory can hold. The capacity grows as
we progress along the Hierarchy from top to bottom.
Refer to Section Memory Hierarchy Technology
3. Virtual memory is an important idea in computer architecture that allows you to execute huge,
complex applications on a machine with a limited amount of RAM. Virtual memory allows a
computer to manage the competing needs of several applications within a limited quantity of
physical memory. Refer to Section Virtual Memory Technology
11
JGI JAINDEEMED-TO-BE UNIVERSITY
Computer Organization and Architecture
4. Cache Memory is a type of memory that operates at a very fast speed. It’s used to boost performance
and synchronise with a high-speed CPU. Cache memory is more expensive than main memory and
disc memory, but it is less expensive than CPU registers. Refer to Section Cache Memory Organisation
5. Direct mapping is the most basic method, which maps each block of main memory into only one
potential cache line. Assign each memory block to a specified line in the cache via direct mapping.
Refer to Section Cache Memory Organisation
D
zz https://www.msuniv.ac.in/Download/Pdf/19055a11803e457
E
zz https://www.gatevidyalay.com/cache-mapping-cache-mapping-techniques/
V
13.10 Topics for Discussion Forums
R
zz Discuss with your classmates about the concept of cache memory. Also, discuss with the different
E
cache addressing mode.
S
E
R
T
H
IG
R
Y
P
O
C
12
UNIT
14
D
E
SIMD Architecture
V
R
E
S
Names of Sub-Units
E
Parallel Processing, Classification of Parallel Processing, Fine-Grained SIMD Architecture, Coarse-
R
Grained SIMD Architecture.
T
Overview
H
This unit begins by discussing about the concept of SIMD architecture and parallel processing. Next,
IG
the unit discusses the classification of parallel processing. Further the unit explains the fine-grained
SIMD architecture. Towards the end, the unit discusses the coarse-grained SIMD architecture.
R
Learning Objectives
Y
Learning Outcomes
D
aa Explore the coarse-grained SIMD architecture
E
Pre-Unit Preparatory Material
V
R
aa http://cs.ucf.edu/~ahmadian/pubs/SIMD.pdf
E
14.1 Introduction
S
SIMD (single-instruction multiple-data streams) is an abbreviation that stands for single-instruction
multiple-data streams. As indicated in the diagram, the SIMD parallel computing paradigm consists of
E
two parts: a von Neumann-style front-end computer and a processor array.
R
The processor array is a group of synchronised processing units that may conduct the same operation
on many data streams at the same time. While being processed in parallel, the dispersed data is stored
in a small piece of local memory on each processor in the array.
T
The processor array is coupled to the front end’s memory bus, allowing the front end to build local
processor memories at random as if it were another memory. Figure 1 depicts SIMD architecture model:
H
IG
R
Y
Von Neumann
P
Computer
O
Virtual Processors
C
2
UNIT 14: SIMD Architecture JGI JAIN
DEEMED-TO-BE UNIVERSITY
Simultaneous operations across huge amounts of data are employed in SIMD architecture to take
advantage of parallelism. This paradigm works well when dealing with challenges that require the
improvement of a significant amount of data at once. It’s got a lot of dynamism.
In SIMD machines, there are two major configurations. Each CPU has its own local memory in the
first scheme. The interconnectivity network allows processors to communicate with one another. If
the interconnection network does not allow for a direct link between two groups of processors, the
information might be exchanged through an intermediary processor.
Processors and memory modules communicate with each other via the interconnection network in the
D
second SIMD architecture. Two processors can communicate with one another via intermediate memory
modules or, in some cases, intermediary processors (s). The BSP (Burroughs’ Scientific Processor) used
E
the second SIMD scheme.
V
14.2 Parallel Processing
R
Parallel processing is a set of techniques that allows a computer system to perform multiple data-
processing tasks at the same time in order to boost the system’s computational speed.
E
A parallel processing system may process several pieces of data at the same time, resulting in a quicker
execution time. For example, the next instruction can be read from memory while an instruction is
S
being processed in the ALU component of the CPU.
E
The primary goal of parallel processing is to improve the computer’s processing capability and
throughput, or the amount of processing that can be done in a given amount of time.
R
A parallel processing system can be achieved by having a multiplicity of functional units that perform
identical or different operations simultaneously. The data can be distributed among various multiple
T
functional units. One method of dividing the execution unit into eight parallel functional units, the
operation done in each functional unit is specified in each block, as shown in Figure 2:
H
IG
Adder-Subtractor
Integer Multiply
R
Logic unit
Y
Shift Unit
P
Incrementer
To memory
Processor
O
Floating-point multiply
Floating-point divide
3
JGI JAIN
DEEMED-TO-BE UNIVERSITY
Computer Organization and Architecture
The integer multiplier and adder are used to execute arithmetic operations on integer numbers. The
floating-point operations are divided into three circuits that work in tandem. On distinct data, the logic,
shift, and increment operations can all be run at the same time. Because all units are independent of
one another, one number can be moved while another is increased.
D
results. When designing a programme or concurrent system, several system and memory architecture
styles must be considered. It’s critical because one system a memory style may be ideal for one task but
E
error-prone for another.
V
In 1972, Michael Flynn proposed a classification system for distinct types of computer system architecture.
The following are the four different styles defined by this taxonomy:
R
zz SISD (Single Instruction Single Data)
SIMD (Single Instruction Multiple Data)
E
zz
S
zz MIMD (Multiple Instruction Multiple Data)
Like classic Von-Neumann computers, most conventional computers have SISD architecture. Multiple
H
functional units or pipeline processing can be used to achieve parallel processing in this instance. The
SISD architecture model is shown in Figure 3:
IG
IS DS
Control Processing Memory
Unit Unit Unit
R
IS
Y
SIMD
P
Processing
Unit 1 DS Memory Unit 1
O
Processing
Control FS DS Memory Unit 2
C
Unit 2
Unit .. ..
..
..
..
Processing DS Memory Unit N
Unit n
FS
4
UNIT 14: SIMD Architecture JGI JAIN
DEEMED-TO-BE UNIVERSITY
D
14.3.2 SIMD Architecture
E
The acronym SIMD stands for ‘Single Instruction, Multiple Data Stream.’ It symbolises an organization
with a large number of processing units overseen by a central control unit. The control unit sends the
V
same instruction to all processors, but they work on separate data. The SIMD architecture model is
shown in Figure 4:
R
Data
E
Bus 1
Control PE 1 Memory 1
S
Bus Data
Bus 2
Control
PE 2 E Memory 2
R
Unit
T
Data
Bus n
PE n Memory n
H
IG
zz
zz By increasing the number of processing cores, the system’s throughput can be boosted.
P
zz
5
JGI JAIN
DEEMED-TO-BE UNIVERSITY
Computer Organization and Architecture
IS 1
C.U.1 P.S.1
C.U.2 IS 2
P.S.2
DS MU1 MU2 MUn
IS n
D
C.U.n P.S.n
E
V
R
Figure 5: MISD Architecture Model
E
14.3.4 MIMD Architecture
MIMD (Multiple Instruction, Multiple Data) states to a parallel architecture, which is the most
S
fundamental and well-known type of parallel processor. The main goal of MIMD is to achieve parallelism.
E
The MIMD architecture consists of a group of N tightly connected processors. Each processor has
memory that is shared by all processors but cannot be accessed directly by other processors.
R
The processors of the MIMD architecture work independently and asynchronously. Various processors
may be performing various instructions on various pieces of data at any given time. MIMD is further
T
IS 1 IS 1 DS 1
C.U.1 P.S.1 MU1
Y
P
IS 2 IS 2 DS 2
C.U.2 P.S.2 MU2
O
C
IS n DS n
C.U.n P.S.n MUn
IS n
IS 3
IS 2
IS 1
6
UNIT 14: SIMD Architecture JGI JAIN
DEEMED-TO-BE UNIVERSITY
D
zz Waste of bandwidth
E
14.4 Fine-Grained SIMD Architecture
V
A programme is broken down into a high number of tiny pieces in fine-grained parallelism tasks. Many
processors are allocated to these duties independently. The quantity of the amount of labour involved
R
with a parallel job is minimal, and it is evenly divided between the processors. As a result, fine-grained
parallelism makes load balancing easier. The number of processors required to complete each operation
E
decreases as the amount of data processed decreases. The level of full processing is high. As a result,
communication and collaboration improve overhead of synchronization. In architectures that enable
S
rapid communication, fine-grained parallelism is best utilised. Fine-grained parallelism is best achieved
with a shared memory architecture with low communication overhead.
E
It is difficult for programmers to detect parallelism in a program, therefore, it is usually the compilers’
R
responsibility to detect fine-grained parallelism. An example of a fine-grained system (from outside the
parallel computing domain) is the system of neurons in our brain.
T
processing the majority of the data while others remain idle. Furthermore, coarse-grained parallelism
fails to harness the parallelism in the programme because the majority of the computation is executed
sequentially on a machine. This type of parallelism has the benefit of low communication and
R
grained parallelism but smaller than coarse-grained parallelism. This is where most general-purpose
parallel computers belong. In fine-grained parallelism, Assume there are 100 processors tasked with
O
analysing a 10*10 image. The 100 processors can process the 10*10 image in one clock cycle, ignoring
the communication overhead. Each processor works on a single pixel of the image and then sends the
C
7
JGI JAIN DEEMED-TO-BE UNIVERSITY
Computer Organization and Architecture
Conclusion 14.6 Conclusion
zz Parallel processing is a set of techniques that allows a computer system to perform multiple data-
processing tasks at the same time.
zz SISD architecture is the structure of a single computer, which includes a control unit, a processor
unit, and a memory unit.
zz SIMD symbolises an organization with a large number of processing units overseen by a central
control unit.
D
zz Multiple processing units work on a single data stream in MISD.
E
zz MIMD (Multiple Instruction, Multiple Data) states to a parallel architecture, which is the most
fundamental and well-known type of parallel processor.
V
14.7 Glossary
R
zz Parallel processing: It is a set of techniques that allows a computer system to perform multiple
E
data-processing tasks at the same time.
S
zz SISD architecture: It depicts the structure of a single computer, which includes a control unit, a
processor unit, and a memory unit.
zz
by a central control unit.
E
SIMD architecture: It symbolises an organization with a large number of processing units overseen
R
zz MISD architecture: The multiple processing units work on a single data stream in MISD.
zz MIMD architecture: It states to a parallel architecture, which is the most fundamental and well-
T
3. The primary goal of parallel processing is to improve the computer's processing which of the
following?
a. Responsibility b. Functionality
c. Capability d. Flexibility
4. Which of these functional units are used to execute arithmetic operations on integer numbers?
a. Shift unit b. Logic unit
c. Adder d. Integer multiplier
8
UNIT 14: SIMD Architecture JGI JAIN
DEEMED-TO-BE UNIVERSITY
D
they work on separate data?
a. Processing unit b. Control unit
E
c. Functional unit d. Memory unit
V
8. Which of these is a disadvantage of MIMD architecture?
a. Less contention b. High scalability
R
c. Less power d. Load balancing
E
9. Which type of processing can be used to achieve parallel processing in SISD architecture?
S
a. Memory b. Pipeline
c. Interconnection d. Synchronisation
E
10. Which among the following has the benefit of low communication and synchronisation costs?
R
a. Parallel processing b. Fine-grained SIMD
c. Coarse-grained SIMD d. MIMD
T
architecture.
4. Describe the concept of fine-grained SIMD architecture.
Y
5. Coarse-grained type of parallelism has the benefit of low communication and synchronisation costs.
Discuss.
P
O
Q. No. Answer
1. a. Single Instruction, Multiple Data
2. b. Parallel processing
3. c. Capability
9
JGI JAINDEEMED-TO-BE UNIVERSITY
Computer Organization and Architecture
Q. No. Answer
4. d. Integer multiplier
5. a. To achieve parallelism
6. c. Single Instruction, Single Data
7. b. Control unit
8. d. Load balancing
D
9. b. Pipeline
10. c. Coarse-grained SIMD
E
B. Hints for Essay Type Questions
V
1. SIMD (single-instruction multiple-data streams) is an abbreviation that stands for single-instruction
R
multiple-data streams. As indicated in the diagram, the SIMD parallel computing paradigm consists
of two parts: a von Neumann-style front-end computer and a processor array. Refer to Section SIMD
E
Architecture
S
2. Parallel processing is a set of techniques that allows a computer system to perform multiple data-
processing tasks at the same time in order to boost the system’s computational speed. Refer to
Section Parallel Processing E
3. Multiple Instruction and Single Data Stream (MISD) is an acronym that stands for “Multiple
R
Instruction and Single Data Stream.” Refer to Section Classification of Parallel Processing
4. A programme is broken down into a high number of tiny pieces in fine-grained parallelism tasks.
T
Many processors are allocated to these duties independently. Refer to Section Fine-Grained SIMD
Architecture
H
zz https://www.geeksforgeeks.org/computer-organization-and-architecture-pipelining-set-1-
Y
execution-stages-and-throughput/
https://www.geeksforgeeks.org/difference-between-simd-and-mimd/
P
zz
O
zz Discuss with your friends and classmates about the concept of SIMD architecture. Also, try to find
some real world examples of SIMD architecture.
10
UNIT
15
D
E
Storage Systems
V
R
E
S
Names of Sub-Units E
Introduction, Types of Storage Devices, Connecting I/O Devices to CPU/Memory, RAID, I/O Performance
R
Measures.
T
Overview
H
This unit begins by discussing about the concept of storage systems. Next, the unit describes the types
IG
of storage devices and process of connecting I/O devices to CPU/Memory. Further, the unit explains
the RAID. Towards the end, the unit covers the I/O performance measures.
R
Learning Objectives
Y
Learning Outcomes
D
aa Evaluate the importance of connecting I/O devices to CPU/Memory
E
aa Explore the I/O performance measures
V
Pre-Unit Preparatory Material
R
https://www.maths.tcd.ie/~nora/DT315-1/Storage%20Devices.pdf
E
aa
S
15.1 Introduction
E
Many computer components are utilised to store data in computer storage. Primary storage, secondary
storage, and tertiary storage are the three types of storage usually used. In computers, a storage device
R
is used to store data.
Input and output data are the two types of digital information. The input data is provided by the users.
T
Data is output by computers. A computer’s CPU, on the other hand, cannot compute or produce data
without the user’s input.
H
Users can immediately enter the input data into a computer. They discovered early in the computer era,
IG
however, that manually entering data is time and energy consuming. Computer memory, commonly
known as Random Access Memory, is one short-term option (RAM). However, its memory retention and
storage capacity are restricted. The data in Read Only Memory (ROM) can only be read and not modified,
R
A storage unit is a computer system component that stores the data and instructions to be processed.
A storage device is a part of computer hardware that stores data and information so that the results of
O
computation can be processed. Without a storage device, a computer would not be able to run or even
load. A storage device, in other words, is a piece of hardware that stores, transfers, or extracts data files.
C
It can also store data and information both momentarily and permanently.
2
UNIT 15: Storage Systems JGI JAIN
DEEMED-TO-BE UNIVERSITY
Primary storage
D
memory
E
V
R
Secondary storage Off-line storage
E
Removable media drive
S
Mass storage CD-RW, DVD-RW, drive
device Hard disk E
20-120 GB Removable medium
CD-RW
R
650 MB
T
Tertiary Storage
H
IG
medium
Y
P
Primary memory is the main memory in a computer system where data is stored for quick access by
the CPU. This type of memory stores the data temporarily. The CPU is associated with the following two
types of primary memories:
zz Read-Only Memory (ROM): ROM is a built-in computer memory containing data that normally can
only be read but not changed. ROM contains the start-up instructions for the computer. The data
3
JGI JAIN
DEEMED-TO-BE UNIVERSITY
Computer Organization and Architecture
stored in ROM is not lost when the computer power is turned off. ROM is sustained by a small long-
life battery in your computer. The ROM chip is shown in Figure 2:
D
zz
integrated circuit that enables you to access stored data in random order. RAM stores instructions
from the operating system, application programs and data to be processed so that they can be
E
quickly accessed by the computer’s processor. RAM is much faster to read from and write to than
V
other kinds of storage, such as hard disk, floppy disk and CD-ROM in your computer. However, data
stays in RAM only as long as your computer is turned on. When you turn the computer off, RAM loses
R
its data. Figure 3 shows a RAM chip:
E
S
E
R
Figure 3: RAM Chip
T
As discussed earlier, primary memory stores the data temporarily. To store the data permanently, you
need to use secondary memory. The secondary memory is also known as the secondary storage. The
storage capacity of secondary memory devices is measured in terms of Kilobytes (KBs), Megabytes
IG
(MBs), Gigabytes (GBs) and Terabytes (TBs). The different types of secondary storage devices are as
follows:
R
zz Floppy disk: A floppy disk is the oldest type of secondary storage device that is used to transfer
data between computers as well as store data and information. A floppy disk, made up of a flexible
Y
substance called Mylar, consists of a magnetic surface that allows data storage.
Its structure is divided into track and sectors. A track of a floppy disk consists of concentric circles,
P
which are further divided into smaller sections called sectors. The data is stored in these sectors. The
maximum storage capacity of a floppy disk is 1.44 MB. Figure 4 shows floppy disks and floppy track:
O
C
4
UNIT 15: Storage Systems JGI JAIN
DEEMED-TO-BE UNIVERSITY
zz Hard disk: The hard disk in your system is known as the data centre of the PC. It is used to store all
your programs and data. The hard disk is the most important storage type among various types of
secondary storage devices used in a PC, such as CD, DVD and Pen Drive. The hard disk differs from
other storage devices on three counts– size, speed and performance. A hard disk stores information
on one or more circular platters, which are continually spinning disks. These platters are coated with
a magnetic material and are stacked on top of each other with some space between each platter.
While the platter is spinning, information is recorded on the surface with the help of magnetic heads
as magnetic spots. The information is recorded in bands; each band of information is called a track.
The tracks are, in turn, divided into pie-shaped sections known as sectors. A hard disk is shown in
D
Figure 5:
E
V
Spindle
Motor
R
Platter
Read/
E
Write Heads
S
Actuator
E
R
Figure 5: Hard Disk
The common elements of a hard disk are described as follows:
T
Platters: Platters are the actual disks inside the drive that store data. Most drives have at least
two platters. The larger the storage capacity of the drive, the more platters it contains. Each
H
platter can store data on each side. So, a drive with 2 platters has 4 sides to store data.
Spindle and spindle motor: Platters in a drive are separated by disk spacers and are clamped
IG
to a rotating spindle that turns all the platters together. The spindle motor is built right into the
spindle or mounted directly below it and spins the platters at a constant set rate ranging from
3,600 to 7,200 rpm (rotations per minute).
R
Read/Write heads: Read/write heads read and write data to the platters. There is typically one
head per platter side and each head is attached to a single actuator shaft so that all the heads
Y
move in unison. When one head is over a track, all the other heads are at the same location
over their respective surfaces. Typically, only one of the heads is active at a time, i.e., reading or
P
writing data.
O
Head actuator: All the heads are attached to a single head actuator or actuator arm that moves
the heads around the platters.
C
zz Compact disc: A compact disc, also known as CD, is an optical media that is used to store digital
data. The compact discs are cheaper than other storage devices, such as hard disks or RAM. The
compact disc was developed to store and play-back sound recordings. However, later on, it came to
be used as a data storage mechanism. CDs are categorised into the following types:
CD-ROM (Compact Disc-Read Only Memory): A CD-ROM is an optical disc that is primarily used
to store data in the form of text, images, audios and videos. The data available on such discs can
only be read by using a drive, known as a CD-ROM drive. The maximum storage capacity of a
CD-ROM disc is 700 MB of data.
5
JGI JAIN DEEMED-TO-BE UNIVERSITY
Computer Organization and Architecture
CD-R (Compact Disc-Recordable): A CD-R has the ability to create CDs, but it can write data on
the discs only once. The data once stored in these discs cannot be erased. The CD-R technology is
sometimes called the Write Once-Read Many (WORM) technology.
CD-RW (Compact Disc-Rewritable): CD-RW (sometimes called Compact Disc-Erasable) is used
to write data multiple times on a disc. CD-RW discs are good for data backup, data archiving, or
data distribution on CDs. Figure 6 depicts different types of CDs:
-ROM
D
CD C D -R
E
V
R
CD-RW
E
S
E
Figure 6: Different Types of CDs
R
zz Pen/Thumb Drive–Flash Memory: A pen drive is a data storage device. It is also known as a Universal
Serial Bus (USB) flash drive, which is typically a small, lightweight, removable and rewritable device.
T
It is more compact and faster than other external storage mediums and stores more data. The flash
memory neither has parts that can be shifted as in the case of the magnetic storage device nor it
H
uses lasers like optical drivers. On the other hand, the functioning of the flash memory is similar
to that of RAM, with the only difference that in the case of power failure, the data stored in a flash
IG
memory is not destroyed. USB flash drive mainly comes in two variants, that is, USB 2.0 and USB 3.0.
USB 3.0 flash drive works faster than USB 2.0. Figure 7 shows USB flash drives:
R
Y
P
O
C
6
UNIT 15: Storage Systems JGI JAIN
DEEMED-TO-BE UNIVERSITY
D
15.2.3 Tertiary Storage
Tertiary storage is also known as tertiary memory which is a subset of secondary storage. The main aim
E
of tertiary storage is to offers a large amount of data at a low cost. Numerous types of tertiary storage
devices is accessible to be used at the Hierarchical Storage Systems (HSS) level. Usually, it includes a
V
robotic mechanism that will mount (insert) and dismount removable mass storage media into a storage
device according to the system’s demands; such types of data is frequently copied to secondary storage
R
before use.
E
In a computer system, it is a part of an external storage type, which has the slowest speed. However, it is
enough capable to store a huge amount of data and also it is considered offline storage. For the backup
S
of data, this storage device is mostly used. Some of the tertiary storage devices are as follows:
Optical storage: It can be stored data into megabytes or gigabytes. A Compact Disk (CD) with a
zz
E
playtime of about 80 mins can store around 700 megabytes of data. On the other hand, a data of 8.5
R
gigabytes is stored by a Digital Video Disk (DVD) on each side of the disk.
zz Tape storage: Tape storage is one of the cheapest storage medium than disks. Tapes are generally
used for archiving or for the backup of data. It accesses the data consecutively at the beginning as
T
it delivers slow access to the data. Hence, tape storage is also referred to as consecutive-access or
sequential-access storage. It is also considered direct-access storage because we can directly access
H
referred to as a non-volatile storage medium. A USB thumb drive is a good example of offline storage. In
the event of unforeseeable occurrences, such as hardware failure due to a power outage or files infected
Y
by computer viruses, offline storage is utilised for transfer and backup protection. Types of offline
storage are depicted in Figure 9:
P
CPU RAM
C
CD-RW
USB thumb drive
Tape drive
Hard drive
Secondary storage Off-line storage
7
JGI JAIN
DEEMED-TO-BE UNIVERSITY
Computer Organization and Architecture
Three different methods of storage are depicted in figure 9. Offline storage, on the other hand, is a
subset of secondary storage because they both serve the same purpose and do not interact with the CPU
directly. The main distinctions are that offline storage is utilised to physically carry information, it has
a lower capacity, and it cannot be accessed without human contact.
D
connections between I/O devices and the CPU and memory:
E
V
I/O Commands
CPU I/O Device
R
E
Data
S
Data
E
Memory
R
Figure 10: Connecting I/O Devices to CPU/Memory
T
Buses were previously classified as CPU-memory buses or I/O buses. I/O buses can be long, with a
H
variety of devices attached, and a wide range of data bandwidth for the devices connected to them. They
normally follow a bus schedule. On the other hand, short, high-speed CPU-memory buses are matched
to the memory system to maximise memory-CPU bandwidth. A CPU-memory bus designer knows all
IG
of the devices that must join together during the design phase, but an I/O bus designer must accept
devices with varying latency and bandwidth capabilities. To save money, some computers feature a
single bus for both memory and I/O devices. Some buses can help you improve your I/O performance.
R
Reliability
Y
Machine Instructions are commands or programmes encoded in machine code that can be recognised
P
and executed by a machine (computer). A machine instruction is a set of bytes in memory that instructs
the processor to carry out a certain task. The CPU goes through the machine instructions in the main
O
memory one by one, doing one machine action for each one.
C
A machine language programme is a collection of machine instructions stored in the main memory.
A collection of instructions performed directly by a computer’s central processing unit is known as
machine code or machine language (CPU). Each instruction performs a highly particular duty on a
unit of data in a CPU register or memory, such as a load, a jump or an ALU operation. A set of such
instructions make up any programme that is directly executed by a CPU.
Reliability, Availability and Serviceability (RAS) is a collection of related attributes that must be
measured when designing, manufacturing, purchasing or using a computer product or component.
8
UNIT 15: Storage Systems JGI JAIN
DEEMED-TO-BE UNIVERSITY
The term was first used by IBM to define specifications for their mainframe s and originally applied only
to hardware.
Availability
The ratio of time a system or component is functional to the entire time it is required or anticipated
to work is known as availability. This can be represented as a percentage or as a straight proportion
(for example, 9/10 or 0.9). (For example, 90 percent). It can also be represented as total downtime for a
particular week, month, or year, or average downtime per week, month, or year. When a large component
or combination of components fails, availability is sometimes stated in qualitative terms, showing the
D
extent to which a system can continue to function.
E
Dependability
V
In computer architecture, dependability is a measure of a system’s availability, reliability, maintainability,
and maintenance support performance, as well as, in certain circumstances, durability, safety, and
R
security.
E
15.4 Redundant Array of Independent Disks (RAID)
S
RAID can be defined as a redundant array of inexpensive disks. It points to the disk drives that are
independent and multiple in numbers. RAID level decides how the data gets distributed between these
E
drives. RAID is considered as a collection of connected hard drives setup in a way to help protect or
speed up the performance of disk storage of a computer. It is commonly used in servers and in high-
R
performance computers.
The main advantage of RAID is that the array of disks can be accessed as a single disk in the OS.
T
Moreover, RAID is fault-tolerant because there is the redundancy of data in multiple disks in most of
the RAID levels. Sometimes, if one or two disks fail, the data remains safe and the OS will not be able to
H
know about the failure. The loss of data can be prevented as the data can be recovered from that disk
that is not failed.
IG
Some of the most commonly used terminologies and techniques in a RAID are discussed as follows:
zz Stripping in RAID: Writing of data on a single disk is considered a slow process. However, when we
Y
write data on multiple disks in small chunks, it is a faster process. It is easy and fast to spread data in
small amounts into different disks and also fetch in small amounts by various disks. When the data
P
is retrieved using multiple disks, the CPU does not need to wait because the combined throughput of
all the disks is used. Here, every disk drive gets divided into small amounts. Stripping in RAID ranges
O
zz
another disk drive. In mirroring, we create various groups of data on two disks. The main advantage
of mirroring is that it provides 100 percent redundant data. It means suppose there are two drives in
mirroring mode, then both the drives will have an exact copy of the data. In case one disk fails, then
the data is protected in another disk.
zz Parity in RAID: Parity refers to the method used for rebuilding the data when there is a failure
of one of the disks. It uses the well-known binary operation known as XOR. It is a mathematical
operation performed to get one output from two inputs.
9
JGI JAIN
DEEMED-TO-BE UNIVERSITY
Computer Organization and Architecture
D
RAID 0 has no mirroring and no parity checking of data. Thus, no redundancy is there. RAID 0
implementation requires minimum of two disks. The diagram of RAID level 0 is shown in Figure 11:
E
RAID 0
V
R
A1 A2
A3 A4
E
A5 A6
A7 A8
S
Disk 0
E Disk 1
R
Figure 11: Data Disks in RAID Level 0
zz RAID 1: RAID 1 is also called ‘Mirroring’. The data is stored twice by writing it in both the data drive
and mirror drive. Here, the data drive and mirror drive are the set of drives. In case of failure of
T
a drive, the controller uses either the data drive or the mirror drive for the recovery of data and
H
performance suffers because the data is written twice. RAID 1 provides complete redundancy of
data. RAID 1 implementation requires minimum two disks. The diagram of RAID level 0 is shown in
Figure 12:
R
RAID 0
Y
A1 A1
P
A2 A2
A3 A3
O
A4 A4
C
Disk 0 Disk 1
10
UNIT 15: Storage Systems JGI JAIN
DEEMED-TO-BE UNIVERSITY
A parity disk is utilised to rebuild corrupted or lost data. It needs fewer disks than level 1 for providing
redundancy. It requires a few more disks; for example, if 10 data disks are there, it needs 4 check
disks plus parity disk. The diagram of RAID level 2 is shown in Figure 13:
RAID 2
D
D1 D2 D3 D4 Dp1 Dp2 Dp3
E
Disk 0 Disk 1 Disk 2 Disk 3 Disk 4 Disk 5 Disk 6
V
Figure 13: Data Disks in RAID Level 2
R
15.5 I/O Performance Measures
I/O performance is measured in ways that are not seen in the design. In addition to these unique metrics,
E
standard performance metrics (such as reaction time and throughput) are also applicable to I/O. (I/O
bandwidth is also referred to as I/O throughput, while the reaction time is sometimes referred to as
S
latency.)
E
Many recently developed data-intensive applications have been identified as performance killers due to
the I/O system rather than the CPU and RAM. In the high-performance computing industry, evaluating
R
and comprehending the performance of I/O systems has become a hot topic. In traditional I/O
contexts, traditional I/O performance parameters, such as input/output operations per second (IOPS),
bandwidth, reaction time, and other classic I/O performance indicators are helpful. As I/O systems get
T
more complex, existing I/O metrics are becoming less and less capable of reflecting the features of I/O
system performance. We illustrate how existing measurements have limitations and introduce a new
H
I/O statistic, Blocks Per Second (BPS), to evaluate I/O system performance.
IG
Conclusion 15.6 Conclusion
A storage unit is a computer system component that stores the data and instructions to be processed.
R
zz
zz Primary memory is the main memory in a computer system where data is stored for quick access
Y
by the CPU.
zz ROM is a built-in computer memory containing data that normally can only be read but not changed.
P
zz RAM is an integrated circuit that enables you to access stored data in random order.
O
11
JGI JAIN
DEEMED-TO-BE UNIVERSITY
Computer Organization and Architecture
zz The ratio of time a system or component is functional to the entire time it is required or anticipated
to work is known as availability.
zz RAID can be defined as a redundant array of inexpensive disks. It points to the disk drives that are
independent and multiple in numbers.
15.7 Glossary
zz Storage unit: A computer system component that stores the data and instructions to be processed.
D
zz Primary memory: The main memory in a computer system where data is stored for quick access by
the CPU.
E
zz ROM: A built-in computer memory containing data that normally can only be read but not changed.
RAM: An integrated circuit that enables you to access stored data in a random order.
V
zz
zz Floppy disk: The oldest type of secondary storage device that is used to transfer data between
R
computers as well as store data and information is a floppy disk.
zz Hard disk: It is used to store all your programs and data.
E
zz Tertiary storage: It is also known as tertiary memory which is a subset of secondary storage.
S
zz Machine language: A collection of instructions performed directly by a computer's central
processing unit is known as machine code or machine language.
E
zz Availability: The ratio of time a system or component is functional to the entire time it is required
or anticipated.
R
zz Redundant Array Inexpensive Disks (RAID): It points to the disk drives that are independent and
multiple in numbers.
T
H
c. CPU d. Memory
P
2. Which of these is the main memory in a computer system where data is stored for quick access by
the CPU?
O
3. Which among the following is known as the data centre of the PC?
a. ROM b. RAM
c. Hard disk d. Floppy disk
4. _________ is the actual disks inside the drive that store data.
a. Head actuator b. Spindle and Spindle Motor
c. Read/Write Heads d. Platter
12
UNIT 15: Storage Systems JGI JAIN
DEEMED-TO-BE UNIVERSITY
5. A memory stick is a removable flash memory card that is used in electronic products, was launched
by Sony in October __________.
a. 1998 b. 1996
c. 1999 d. 1997
6. Which among the following is known as Universal Serial Bus (USB)?
a. Optical disk b. Tape disk
c. Pen drive d. Memory stick
D
7. The ratio of time a system or component is functional to the entire time it is required or anticipated
to work is known as __________.
E
a. Reliability b. Availability
c. Dependability d. Maintainability
V
8. Which among the following are the points to the disk drives that are independent and multiple in
R
numbers?
a. RAID
E
b. Storage devices
S
c. Memory
d. I/O performance
9. Which level of a RAID is also called ‘Mirroring’?
E
R
a. RAID 0
b. RAID 1
T
c. RAID 2
H
d. RAID
10. What does RAID stands for?
IG
1. Without a storage device, a computer would not be able to run or even load. Define the term storage
device.
O
2. The secondary memory is also known as the secondary storage. Describe the term secondary storage
C
with examples.
3. The OS creates a buffer in memory and instructs the I/O device to use that buffer to send data to the
CPU. Discuss
4. Numerous type of tertiary storage devices are accessible to be used at the Hierarchical Storage
Systems (HSS) level. Outline the significance of tertiary storage.
5. The main advantage of RAID is that the array of disks can be accessed as a single disk in the OS.
Explain the concept of RAID in brief.
13
JGI JAIN DEEMED-TO-BE UNIVERSITY
Computer Organization and Architecture
Q. No. Answer
1. a. Storage devices
2. b. Primary storage
D
3. c. Hard disk
E
4. d. Platter
5. a. 1998
V
6. c. Pen drive
R
7. b. Availability
8. a. RAID
E
9. b. RAID 1
S
10. c. Redundant Array of Independent Disks
you need to use secondary memory. Refer to Section Types of Storage Devices
H
3. The I/O device is directly connected to certain primary memory regions, allowing it to send and
receive data blocks without having to go via the CPU. Refer to Section Connecting I/O Devices to
CPU/Memory
IG
4. Tertiary storage is also known as tertiary memory which is a subset of secondary storage. The main
aim of tertiary storage is to offers a large amount of data at a low cost. Refer to Section Types of
Storage Devices
R
5. RAID can be defined as a redundant array of inexpensive disks. It points to the disk drives that
Y
are independent and multiple in numbers. Refer to Section Redundant Array of Independent Disks
(RAID)
P
@
https://www.trustradius.com/buyer-blog/primary-vs-secondary-storage
C
zz
zz https://www.google.co.in/books/edition/Computer_Architecture/
XX69oNsazH4C?hl=en&gbpv=1&dq=storage+devices+in+COA&pg=PA679&printsec=frontcover
zz Discuss with your friends and classmates the concept of storage devices and their types. Also, try to
find some real-world examples of storage devices.
14