Coa 1

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 133

Subject : Digital systems &Computer Organization

Sub code : IS34


Credits : 2:0:1
Faculty : Mrs. Anitha P

Text Books :
M. Morris Mano & Michael D. Ciletti, Digital Design With an Introduction to Verilog
Design, 5e, Pearson Education,

Carl Hamacher, Zvonko Vranesic, Safwat Zaky, Computer Organization, 5th Edition,
Tata McGraw Hill.
 
Digital systems &Computer
Organization (IS34)
Introduction to Digital System: Introduction, The Map Method, Four-Variable Map, Don’t-Care
Conditions, NAND and NOR Implementation, Exclusive-OR Function.
Basic Structure of Computers: Functional Units, Basic Operational Concepts, Bus structure,
Performance – Processor Clock, Basic Performance Equation, Clock Rate, Performance
Measurement.
Machine Instructions and Programs: Memory Location and Addresses, Memory Operations,
Instructions and instruction Sequencing, Addressing Modes.
Input/Output Organization: Accessing I/O Devices, Interrupts – Interrupt Hardware, Enabling
and Disabling Interrupts, Handling Multiple Devices, Direct Memory Access: Bus Arbitration,
The Memory System: ROM, Speed, size and Cost, Cache Memories – Mapping Functions.
Basic Processing Unit: Some Fundamental Concepts: Register Transfers, Performing ALU
operations, fetching a word from Memory, Storing a word in memory. Execution of a Complete
Instruction. Pipelining: Basic concepts: Role of Cache memory, Pipeline Performance.
Unit II
Basic Structure of Computers:
• Functional Units
• Basic Operational Concepts
• Bus structure
• Performance – Processor Clock
• Basic Performance Equation
• Clock Rate
• Performance Measurement.
Computer Organization describes the function & design of the various units of
digital computers that store & process information. It also deals with units of
computer that receive information from external sources & send results to external
destination.
Functional Units:
A computer consists of 5 functionally independent main parts:
1) Input Unit
2) Memory Unit
3) ALU
4) Output Unit
5) Control unit.

Input unit: Arithmetic and logic unit(ALU):


accepts information fromPerforms the desired operations
• Human operators, on the input information as
• Electromechanical devices (keyboard) determined by instructions in memory
• Other computers
Output unit: sends results of processing to
• To a monitor
• To a printer
Memory unit: Stores information
• In the form of instructions
• In the form of data
Control unit: coordinates various actions
• Input
• Output
• Processing
A computer is a sophisticated electronic calculating machine that:
• Accepts input information,
• Processes the information according to a list of internally stored instructions and
• Produces the resulting output information.
Functions performed by a computer are:
• Accepting information to be processed as input.
• Storing a list of instructions to process the information.
• Processing the information according to the list of instructions.
• Providing the results of the processing as output.
Instructions specify commands to:
• Transfer information within a computer (e.g., from memory to ALU)
• Transfer of information between the computer and I/O devices (e.g., from keyboard to computer, or
computer to printer)
• Perform arithmetic and logic operations (e.g., Add two numbers, Perform a logical AND).

Program: A sequence of instructions to perform a task is called a program, which is stored


in the memory.
Processor fetches instructions that make up a program from the memory and performs
the operations stated in those instructions.
What do the instructions operate upon?
Instructions operate upon data-
• Data could be:
• Numbers,
• Encoded characters.
• Data, in a broad sense means any digital information.
• Computers use data that is encoded as a string of binary digits called bits.
Input Unit:
Binary information must be presented to a computer in a specific format. This task
is performed by the input unit:

• Interfaces with input devices.


• Accepts binary information from the input devices.
• Presents this binary information in a format expected by the computer.
• Transfers this information to a memory or processor
Memory unit:
stores instructions and data.
As we know, data is represented as a series of bits, to store data, memory unit thus
stores bits.
Processor reads instructions and reads/writes data from/to the memory during the
execution of a program.
• In theory, instructions and data could be fetched one bit at a time.
• In practice, a group of bits is fetched at a time.
• Group of bits stored or retrieved at a time is termed as “word”
• Number of bits in a word is termed as the “word length” of a computer.
In order to read/write to and from memory, a processor should know where to look:
• “Address” is associated with each word location
Processor reads/writes to/from memory based on the memory address:
• Access any word location in a short and fixed amount of time based on the address.
• Random Access Memory (RAM) provides fixed access time independent of the
location of the word.
• Access time is known as “Memory Access Time”.
Memory and processor have to “communicate” with each other in order to
read/write information.
• In order to reduce “communication time”, a small amount of RAM (known as
Cache) is tightly coupled with the processor.
Modern computers have three to four levels of RAM units with different speeds
and sizes:
• Fastest, smallest known as Cache
• Slowest, largest known as Main memory.
Primary storage of the computer consists of RAM units.
• Fastest, smallest unit is Cache.
• Slowest, largest unit is Main Memory.
Primary storage is insufficient to store large amounts of data and programs.
• Primary storage can be added, but it is expensive.
Store large amounts of data on secondary storage devices:
• Magnetic disks and tapes,
• Optical disks (CD-ROMS).
• Access to the data stored in secondary storage in slower, but take advantage of
the fact that some information may be accessed infrequently.
Cost of a memory unit depends on its access time, lesser access time implies higher
cost.
Arithmetic and Logic unit:
Operations are executed in the Arithmetic and Logic Unit (ALU).
• Arithmetic operations such as addition, subtraction.
• Logic operations such as comparison of numbers.

In order to execute an instruction, operands need to be brought into the ALU from
the memory.
• Operands are stored in general purpose registers available in the ALU.
• Access times of general purpose registers are faster than the cache.
Results of the operations are stored back in the memory or retained in the
processor for immediate use.
Output Unit:
Computers represent information in a specific binary form. Output units:
• Interface with output devices.
• Accept processed results provided by the computer in specific binary form.
• Convert the information in binary form to a form understood by an output device.
Control Unit: Operations of Input unit, Memory, ALU and Output unit are coordinated by
Control unit.
Instructions control “what” operations take place (e.g. data transfer, processing).
Control unit generates timing signals which determines “when” a particular operation
takes place.
Operation of a computer can be summarized as:
• Accepts information from the input units (Input unit).
• Stores the information (Memory).
• Processes the information (ALU).
• Provides processed results through the output units (Output unit).
Basic Operational Concepts:
Activity in computer is governed by instructions.
Add LOCA, R0
• Add the operand at memory location LOCA to the operand in a register R0 in the
processor.
• Place the sum into register R0.
• The original contents of LOCA are preserved.
• The original contents of R0 is overwritten.
• Instruction is fetched from the memory into the processor
• the operand at LOCA is fetched and added to the contents of R0 – the resulting sum
is stored in register R0.
The effect of above instruction can be realized by 2 instruction sequence:
Load LOCA,R1
Add R1,R0
The following are the steps to execute the instruction:
• Fetch the instruction from main-memory into the processor.
• Fetch the operand at location LOCA from main-memory into the register R1.
• Add the content of Register R1 and the contents of register R0.
• Store the result (sum) in R0

Connections between processor and memory


Functions of PROCESSOR:
• The processor contains ALU, control-circuitry and many registers.
• The processor contains n general-purpose registers R0 through Rn-1.
• The IR holds the instruction that is currently being executed.
• The control-unit generates the timing-signals that determine when a given action
is to take place.
• The PC contains the memory-address of the next-instruction to be fetched &
executed.
• During the execution of an instruction, the contents of PC are updated to point to
next instruction.
• The MAR holds the address of the memory location to be accessed.
• The MDR contains the data to be written into or read out of the addressed
location.
• MAR and MDR facilitates the communication with memory.
STEPS TO EXECUTE AN INSTRUCTION:
1) The address of first instruction (to be executed) gets loaded into PC.
2) The contents of PC (i.e. address) are transferred to the MAR & control-unit
issues Read signal to memory.
3) After certain amount of elapsed time, the first instruction is read out of memory
and placed into MDR.
4) Next, the contents of MDR are transferred to IR. At this point, the instruction can
be decoded & executed.
5) To fetch an operand, it's address is placed into MAR & control-unit issues Read
signal. As a result, the operand is transferred from memory into MDR, and then it is
transferred from MDR to ALU.
6) Likewise required number of operands is fetched into the processor.
7) Finally, ALU performs the desired operation.
8) If the result of this operation is to be stored in the memory, then the result is
sent to the MDR.
9) The address of the location where the result is to be stored is sent to the MAR
and a Write cycle is initiated.
10) At some point during execution, contents of PC are incremented to point to
next instruction in the program.

Bus Structure:
A bus is a group of lines that serves as a connecting path for several devices.
• A bus may be lines or wires.
• The lines carry data or address or control signal.
There are 2 types of Bus structures:
1) Single Bus Structure
2) Multiple Bus Structure.
Single Bus Structure
• Because the bus can be used for only one transfer at a time, only 2 units can
actively use the bus at any given time.
• Bus control lines are used to arbitrate multiple requests for use of the bus.
Advantages:
1) Low cost &
2) Flexibility for attaching peripheral devices.
Multiple Bus Structure
• Systems that contain multiple buses achieve more concurrency in operations.
• Two or more transfers can be carried out at the same time.
• Advantage: Better performance.
Disadvantage: Increased cost.

Buffer Registers:
The devices connected to a bus vary widely in their speed of operation, like input
output devices are slow in execution compared to magnetic or optical devices. To
synchronize multiple devices operational-speed, buffer-registers can be used.
Buffer registers are included with the devices to hold the information during
transfers.
Ex: printing an encoded character.
PERFORMANCE
The most important measure of performance of a computer is how quickly it can
execute programs.
The speed of a computer is affected by the design of
1) Instruction-set.
2) Hardware & the technology in which the hardware is implemented.
3) Software including the operating system
• Because programs are usually written in a HLL, performance is also affected by
the compiler that translates programs into machine language. (HLL High Level
Language).
• For best performance, it is necessary to design the compiler, machine instruction
set and hardware in a co-ordinated way.
Let us examine the flow of program instructions and data between the memory
& the processor.
• At the start of execution, all program instructions are stored in the main-memory.
• As execution proceeds, instructions are fetched into the processor, and a copy is
placed in the cache.
• Later, if the same instruction is needed a second time, it is read directly from the
cache.
• A program will be executed faster if movement of instruction/data between the
main-memory and the processor is minimized which is achieved by using the
cache.
PROCESSOR CLOCK
• Processor circuits are controlled by a timing signal called a Clock.
• The clock defines regular time intervals called Clock Cycles.
• To execute a machine instruction, the processor divides the action to be
performed into a sequence of basic steps such that each step can be completed in
one clock cycle.
• Let P = Length of one clock cycle
R = Clock rate.
• Relation between P and R is given by
R=1/P
• R is measured in cycles per second.
• Cycles per second is also called Hertz (Hz) in electrical engineering terminology,
millions is called as Mega(M) and billions is called as Giga(G). Ex: 500 ‘millions per
second’ is called as 500Megahertz.
BASIC PERFORMANCE EQUATION
Let T = Processor time required to execute a program.
N = Actual number of instruction required for executions.
S = Average number of basic steps needed to execute one machine instruction.
R = Clock rate in cycles per second.
The program execution time is given by
T= (N*S) / R
Above equation is referred to as the basic performance equation.
• To achieve high performance, the computer designer must reduce the value of T,
which means reducing N and S, and increasing R.
• The value of N is reduced if source program is compiled into fewer machine instructions.
• The value of S is reduced if instructions have a smaller number of basic steps to perform.
• The value of R can be increased by using a higher frequency clock.
Note: Care has to be taken while modifying values since changes in one parameter may affect
the other
CLOCK RATE
• There are 2 possibilities for increasing the clock rate R:
1) Improving the IC technology makes logic-circuits faster.
This reduces the time needed to compute a basic step. (IC integrated circuits).
This allows the clock period P to be reduced and the clock rate R to be increased.
2) Reducing the amount of processing done in one basic step also reduces the clock
period P.
In presence of a cache, the percentage of accesses to the main-memory is
small. Hence, much of performance-gain expected from the use of faster
technology can be realized.
The value of T will be reduced by same factor as R is increased. S & N are not
affected.
PERFORMANCE MEASUREMENT
To assess the performance of a computer:
• Benchmark refers to standard task used to measure how well a processor
operates.
• The Performance Measure is the time taken by a computer to execute a given
benchmark.
• SPEC selects & publishes the standard programs along with their test results for
different application domains. (SPEC -> System Performance Evaluation
Corporation).
• SPEC Rating is given
SPEC rating= (running time on the reference computer)
(running time on the computer under test)
• SPEC rating = 50 The computer under test is 50 times as fast as reference-
computer.
• The test is repeated for all the programs in the SPEC suite. Then, the geometric
mean of the results is computed.
• Let SPECi = Rating for program i in the suite.
overall SPEC rating for the computer is given by

Where n=no of programs in the suite


UNIT III

Memory Location and Addresses


Memory Operations
Instructions and instruction Sequencing
Addressing Modes.
Combinational Logic:
Decoders, Encoders, Multiplexers. Demultiplexer.
Verilog codes for Combinational logic Circuits.
Memory Location and Addresses:
• Memory consists of many millions of storage cells (flip-flops).
• Each cell can store a bit of information i.e. 0 or 1
• Each group of n bits is referred to as a word of information, and n is called the
word length.
• In modern computers, word length can vary from 16 to 64 bits.
• A unit of 8 bits is called a byte.
• Accessing the memory to store or retrieve a single item of information
(word/byte) requires distinct addresses for each item location. (It is customary to
use numbers from 0 through 2k-1 as the addresses of successive-locations in the
memory).
• If 2k = no. of addressable locations; then 2k addresses constitute the address-space
of the computer.
• For example, a 24-bit address generates an address-space of 2 24 locations (16 MB).
Following diagrams represent a word length of a computer is 32 bits
BYTE-ADDRESSABILITY:
• In byte-addressable memory, successive addresses refer to successive byte
locations in the memory.
• Byte locations have addresses 0, 1, 2. . . . .
• If the word-length is 32 bits, successive words are located at addresses 0, 4, 8. .
with each word having 4 bytes.
BIG-ENDIAN & LITTLE-ENDIAN ASSIGNMENTS:
These terms describes the order in which a sequence of bytes is stored in computer
memory. There are two ways in which byte addresses are arranged.
1. Big-Endian: Lower byte-addresses are used for the more significant bytes of the
word.
2. Little-Endian: Lower byte-addresses are used for the less significant bytes of the
word.
In both cases, byte-addresses 0, 4, 8. . . . . are taken as the addresses of successive
words in the memory.
Big and little Endian assignments:
Consider a 32-bit integer (in hex): 0x12345678 which consists of 4 bytes: 12, 34,
56, and 78.
• Hence this integer will occupy 4 bytes in memory.
• Assume, we store it at memory address starting 1000.
• On little-endian, memory will look like Address Value
1000 78
1001 56
1002 34
1003 12
On big-endian, memory will look like
Address Value
1000 12
1001 34
1002 56
1003 78
Uses of big-endian and little-endian:
• Both big-endian and little-endian are widely used in digital electronics. The CPU
typically determines the endianness in use.
• many mainframe computers are big-endian, most modern computers are little-
endian.
• IBM's 370 mainframes, most reduced instruction set computers (RISC)-based
computers and Motorola microprocessors use the big-endian approach. 
• Transmission Control Protocol/Internet Protocol (TCP/IP) also uses the big-
endian approach. For this reason, big-endian is sometimes called network order.
•  Intel processors, DEC Alphas and at least some programs that run on them are
little-endian.
WORD ALIGNMENT
• Words are said to be Aligned in memory if they begin at a byte-address that is a multiple of
the number of bytes in a word.
• For example,
• If the word length is 16(2 bytes), aligned words begin at byte-addresses 0, 2, 4 . . . . .
• If the word length is 64(23 bytes), aligned words begin at byte-addresses 0, 8, 16 . . . . .
• Words are said to have Unaligned Addresses, if they begin at an arbitrary byte-address.
ACCESSING NUMBERS, CHARACTERS & CHARACTER STRINGS:
A number usually occupies one word. It can be accessed in the memory by specifying its word
address. Similarly, individual characters can be accessed by their byte-address.
There are two ways to indicate the length of the string:
1) A special control character with the meaning "end of string" can be used as the last
character
in the string.
2) A separate memory word location or register can contain a number indicating the length of
the string in bytes.
MEMORY OPERATIONS:
Two memory operations are:
1) Load (Read/Fetch) &
2) Store (Write).
The Load operation transfers a copy of the contents of a specific memory-
location to the processor.
The memory contents remain unchanged. Steps for Load operation:
1) Processor sends the address of the desired location to the memory.
2) Processor issues a ‘read’ signal to memory to fetch the data.
3) Memory reads the data stored at that address.
4) Memory sends the read data to the processor.
The Store operation transfers the information from the register to the specified
memory-location.
This will destroy the original contents of that memory-location. Steps for Store
operation are:
1) Processor sends the address of the memory-location where it wants to store
data.
2) Processor issues a ‘write’ signal to memory to store the data.
3) Content of register(MDR) is written into the specified memory-location.
INSTRUCTIONS & INSTRUCTION SEQUENCING:
A computer must have instructions capable of performing 4 types of operations:
• Data transfers between the memory and the registers (MOV, PUSH, POP, XCHG).
• Arithmetic and logic operations on data (ADD, SUB, MUL, DIV, AND, OR, NOT).
• Program sequencing and control (CALL,RET, LOOP, INT).
• I/0 transfers (IN, OUT).
REGISTER TRANSFER NOTATION (RTN)
The possible locations in which transfer of information occurs are:
1) Memory-location
2) Processor register
3) Registers in I/O device.
Location Hardware Binary Address Example Description

Memory LOC, PLACE, NUM R1 <- [LOC] Contents of memory-location LOC are transferred into
register R1.

Processor R0, R1 ,R2 [R3] <- [R1]+[R2] Add the contents of register R1 &R2
and places their sum into R3.

I/O Registers DATAIN, DATAOUT R1 <- DATAIN Contents of I/O register DATAIN are
transferred into register R1.
ASSEMBLY LANGUAGE NOTATION:
To represent machine instructions and programs, an assembly language format is
used.
Assembly Language Format Description
Move LOC, R1 Transfer data from memory-location LOC to register R1. The contents of LOC are unchanged
by the execution of this instruction, but the old contents of register R1 are overwritten.

Add R1, R2, R3 Add the contents of registers R1 and R2, and places their sum into register R3

BASIC INSTRUCTION TYPES:


Instruction Syntax Example Description Instructions
Type for
Operation
C<-[A]+[B]
Three Opcode Add A,B,C Add the contents of memory-locations A &
Address Source1,Source2,Destination B. Then, place the result into location C.
Instruction Syntax Example Description Instructions
Type for
Operation
C<-[A]+[B]

Two Address Opcode Source, Add A,B Add the contents of memory-locations A & B. Then, place the Move B, C
Destination result into location B, replacing the original contents of this Add A, C
location. Operand B is both a source and a destination.

One Address Opcode Load A Copy contents of memory location A into accumulator. Load A
Source/Destinati Add B
on Add B Add contents of memory- Store C location B to contents of Store C
accumulator register & place sum back into accumulator.

Store C Copy the contents of the accumulator into location C.

Zero Address Opcode [no Push Locations of all operands are defined implicitly. The operands Uses stack
Source/Destinati are stored in a pushdown stack. Push
on] pop
INSTRUCTION EXECUTION & STRAIGHT LINE SEQUENCING:
The program is executed as follows:
1) Initially, the address of the first instruction is loaded into PC (Figure 2.8).
2) Then, the processor control circuits use the information in the PC to fetch and
execute instructions, one at a time, in the order of increasing addresses. This is
called Straight-Line sequencing.
3) During the execution of each instruction, PC is incremented by 4 to point to next
instruction.
There are 2 phases for Instruction Execution:
1) Fetch Phase: The instruction is fetched from the memory-location and placed in
the IR.
2) Execute Phase: The contents of IR is examined to determine which operation is
to be performed. The specified-operation is then performed by the processor.
Program Explanation
• Consider the program for adding a list of n numbers (Figure 2.9).
• The Address of the memory-locations containing the n numbers are symbolically
given as NUM1, NUM2…..NUMn.
• Separate Add instruction is used to add each number to the contents of register
R0.
• After all the numbers have been added, the result is placed in memory-location
SUM.
BRANCHING:
• Consider the task of adding a list of „n‟ numbers (Figure 2.10).
• Number of entries in the list „n‟ is stored in memory-location N.
• Register R1 is used as a counter to determine the number of times the loop is
executed.
• Content-location N is loaded into register R1 at the beginning of the program.
• The Loop is a straight line sequence of instructions executed as many times as
needed.
The loop starts at location LOOP and ends at the instruction Branch>0.
• During each pass,
→ address of the next list entry is determined and
→ that entry is fetched and added to R0.
• The instruction Decrement R1 reduces the contents of R1 by 1 each time through the
loop.
• Then Branch Instruction loads a new value into the program counter. As a result, the
processor fetches and executes the instruction at this new address called the Branch
Target.
• A Conditional Branch Instruction causes a branch only if a specified condition is
satisfied. If the condition is not satisfied, the PC is incremented in the normal way,
and the next instruction in sequential address order is fetched and executed.
CONDITION CODES
• The processor keeps track of information about the results of various operations.
This is accomplished by recording the required information in individual bits,
called Condition Code Flags.
• These flags are grouped together in a special processor-register called the
condition code register (or status register).
Four commonly used flags are:
1) N (negative) set to 1 if the result is negative, otherwise cleared to 0.
2) Z (zero) set to 1 if the result is 0; otherwise, cleared to 0.
3) V (overflow) set to 1 if arithmetic overflow occurs; otherwise, cleared to 0.
4) C (carry) set to 1 if a carry-out results from the operation; otherwise cleared to
0.
ADDRESSING MODES: The term addressing modes refers to the way in which the
operand of an instruction is specified. The addressing mode specifies a rule for
interpreting or modifying the address field of the instruction before the operand is
actually executed. Table 2.1 lists the most important addressing modes found in
modern processors:
IMPLEMENTATION OF VARIABLE AND CONSTANTS:
• Variable is represented by allocating a memory-location to hold its value.
• Thus, the value can be changed as needed using appropriate instructions.
• There are 2 accessing modes to access the variables:
1) Register Mode
2) Absolute Mode
Register Mode:
• The operand is the contents of a register. The name (or address) of the register
is given in the instruction.
• Registers are used as temporary storage locations where the data in a register
are accessed.
For example, the instruction
Move R1, R2 ;Copy content of register R1 into register R2.
Absolute (Direct) Mode:
• The operand is in a memory-location.
• The address of memory-location is given explicitly in the instruction.
• The absolute mode can represent global variables in the program.
For example, the instruction
Move LOC, R2 ;Copy content of memory-location LOC into register R2.
Immediate Mode:
• The operand is given explicitly in the instruction.
• For example, the instruction
Move #200, R0 ;Place the value 200 in register R0.
• Clearly, the immediate mode is only used to specify the value of a source-
operand.
INDIRECTION AND POINTERS:
• Instruction does not give the operand or its address explicitly.
• Instead, the instruction provides information from which the new address of the
operand can be determined.
• This address is called Effective Address (EA) of the operand.
Indirect Mode:
• The EA of the operand is the contents of a register(or memory-location).
• The register (or memory-location) that contains the address of an operand is called a
Pointer.
• We denote the indirection by
→ name of the register or
→ new address given in the instruction.
E.g: Add (R1),R0 ;The operand is in memory. Register R1 gives the effective-address (B)
of the operand. The data is read from location B and added to contents of register R0.
Indirect Addressing:
• To execute the Add instruction in fig 2.11 (a), the processor uses the value which
is in register R1, as the EA of the operand.
• It requests a read operation from the memory to read the contents of location B.
The value read is the desired operand, which the processor adds to the contents
of register R0.
• Indirect addressing through a memory-location is also possible as shown in fig
2.11(b). In this case, the processor first reads the contents of memory-location
A, then requests a second read operation using the value B as an address to
obtain the operand.
Explanation:
• Register R2 is used as a pointer to the numbers in the list, and the operands are
accessed indirectly through R2.
• The initialization-section of the program loads the counter-value n from memory-
location N into R1 and uses the immediate addressing-mode to place the address
value NUM1, which is the address of the first number in the list, into R2. Then it
clears R0 to 0.
• The first two instructions in the loop implement the unspecified instruction block
starting at LOOP.
• The first time through the loop, the instruction Add (R2), R0 fetches the operand
at location NUM1 and adds it to R0.
• The second Add instruction adds 4 to the contents of the pointer R2, so that it
will contain the address value NUM2 when the above instruction is executed in
the second pass through the loop.
INDEXING AND ARRAYS
A different kind of flexibility for accessing operands is useful in dealing with lists
and arrays.
Index mode: The operation is indicated as X(Ri)
where X=the constant value which defines an offset(also called a
displacement).
Ri=the name of the index register which contains address of a new location.
• The effective-address of the operand is given by EA=X+[Ri]
• The contents of the index-register are not changed in the process of generating
the effective address.
• The constant X may be given either
→ as an explicit number or
→ as a symbolic-name representing a numerical value.
Ex:
• Fig(a) illustrates two ways of using the Index mode. In fig(a), the index register,
R1, contains the address of a memory-location, and the value X defines an
offset(also called a displacement) from this address to the location where the
operand is found.
• To find EA of operand:
Eg: Add 20(R1), R2
EA=>1000+20=1020
• An alternative use is illustrated in fig(b). Here, the constant X corresponds to a
memory address, and the contents of the index register define the offset to the
operand. In either case, the effective-address is the sum of two values; one is
given explicitly in the instruction, and the other is stored in a register.
Ex:
Base with Index Mode:
• Another version of the Index mode uses 2 registers which can be denoted as (Ri, Rj)
• Here, a second register may be used to contain the offset X.
• The second register is usually called the base register.
• The effective-address of the operand is given by EA=[Ri]+[Rj]
• This form of indexed addressing provides more flexibility in accessing operands
because both components of the effective-address can be changed.
Base with Index & Offset Mode
• Another version of the Index mode uses 2 registers plus a constant, which can be
denoted as X(Ri, Rj)
• The effective-address of the operand is given by EA=X+[Ri]+[Rj]
• This added flexibility is useful in accessing multiple components inside each item in
a record, where the beginning of an item is specified by the (Ri, Rj) part of the
addressing - mode. In other words, this mode implements a 3-dimensional array.
RELATIVE MODE:
• This is similar to index-mode with one difference:
• The effective-address is determined using the PC in place of the general
purpose register Ri.
• The operation is indicated as X(PC).
• X(PC) denotes an effective-address of the operand which is X locations above or
below the current contents of PC.
• Since the addressed-location is identified "relative" to the PC, the name
Relative mode is associated with this type of addressing.
• This mode is used commonly in conditional branch instructions.
• An instruction such as Branch > 0 LOOP ; Causes program execution to go to
the branch target location identified by name
LOOP if branch condition is satisfied.
ADDITIONAL ADDRESSING MODES:
1) Auto Increment Mode
• Effective-address of operand is contents of a register specified in the instruction (Fig: 2.16).
• After accessing the operand, the contents of this register are automatically incremented to
point to the next item in a list.
• Implicitly, the increment amount is 1.
• This mode is denoted as (Ri)+ ;
• Increment is 1 for byte sized operands, 2 for 16 bit operands and 4 for 32 bit operands.
2) Auto Decrement Mode
• The contents of a register specified in the instruction are first automatically decremented
and are then used as the effective-address of the operand.
• This mode is denoted as
-(Ri)
These 2 modes can be used together to implement an important data structure called a
stack.
Ex:

Similarly auto
UNIT IV
Input/Output Organization:
Accessing I/O Devices, Interrupts – Interrupt Hardware, Enabling and Disabling
Interrupts, Handling Multiple Devices
Direct Memory Access: Bus Arbitration, The Memory System: ROM, Speed, size
and Cost, Cache Memories – Mapping Functions.
Sequential Logic: Introduction, Flip-Flops. Verilog codes for Sequential logic
Circuits.
ACCESSING I/O-DEVICES
• A single bus-structure can be used for connecting I/O-devices to a computer
(Figure 7.1).
• Each I/O device is assigned a unique set of address.
• Bus consists of 3 sets of lines to carry address, data & control signals.
• When processor places an address on address-lines, the intended-device
responds to the command.
• The processor requests either a read or write-operation.
• The requested-data are transferred over the data-lines.
There are 2 ways to deal with I/O-devices: 1) Memory-mapped I/O & 2) I/O-mapped I/O.
1) Memory-Mapped I/O
• Memory and I/O-devices share a common address-space.
• Any data-transfer instruction (like Move, Load) can be used to exchange information.
For example,
Move DATAIN, R0; This instruction sends the contents of location DATAIN to register R0. Here,
DATAIN -> address of the input-buffer of the keyboard.
2) I/O-Mapped I/O
• Memory and I/0 address-spaces are different.
• A special instructions named IN and OUT are used for data-transfer.
• Advantage of separate I/O space: I/O-devices deal with fewer address-lines.
I/O Interface for an Input Device
1) Address Decoder: enables the device to recognize its address when this address appears on the
address-lines (Figure 7.2).
2) Status Register: contains information relevant to operation of I/O-device.
3) Data Register: holds data being transferred to or from processor. There are 2 types:
i) DATAIN -> Input-buffer associated with keyboard.
ii) DATAOUT -> Output data buffer of a display/printer.
Simple example of I/O operations involving a Keyboard and a display in a computer system
For an input device, SIN status flag is used.
SIN = 1 -> when a character is entered at the keyboard.
SIN = 0 -> when the character is read by processor.
Four registers used are
DATAIN
DATAOUT
STATUS
STATUS
CONTROL
A program that reads one line from the keyboard, stores it in memory buffer, and
echoes it back to the display
Move #LINE,R0 Initialize memory pointer
WAITK TestBit #0,STATUS Test SIN
Branch=0 WAITK Wait for character to be entered
Move DATAIN,R1 Read character
WAITD TestBit #1,STATUS Test SOUT
Branch=0 WAITD Wait for display to become ready
Move R1,DATAOUT Send character to display
Move R1,(R0)+ Store character and advance pointer
Compare #$0D,R1 Check if Carriage Return
Branch=0 WAITK If not, get another character
Move #$0A,DATAOUT Otherwise, send Line Feed
Call PROCESS Call a subroutine to process the input line
MECHANISMS USED FOR INTERFACING or implementing I/O operations
1) Program Controlled I/O
• Processor repeatedly checks status-flag to achieve required synchronization b/w processor
& I/O device. (We say that the processor polls the device).
Main drawback:
• The processor wastes time in checking status of device before actual data-transfer takes
place.
2) Interrupt I/O
• I/O-device initiates the action instead of the processor.
• I/O-device sends an INTR signal over bus whenever it is ready for a data-transfer operation.
• Like this, required synchronization is done between processor & I/O device.
3) Direct Memory Access (DMA)
• Device-interface transfer data directly to/from the memory without continuous
involvement by the processor.
• DMA is a technique used for high speed I/O-device.
INTERRUPTS:
• There are many situations where other tasks can be performed while waiting
for an I/O device to become ready.
• A hardware signal called an Interrupt will alert the processor when an I/O
device becomes ready.
• Interrupt-signal is sent on the interrupt-request line.
• The processor can be performing its own task without the need to continuously
check the I/O-device.
• The routine executed in response to an interrupt-request is called ISR.
• The processor must inform the device that its request has been recognized by
sending INTA signal. (INTR -> Interrupt Request, INTA -> Interrupt Acknowledge,
ISR -> Interrupt Service Routine)
• For example, consider COMPUTE and PRINT routines (Figure 3.6).
• The processor first completes the execution of instruction i.
• Then, processor loads the PC with the address of the first instruction of the ISR.
• After the execution of ISR, the processor has to come back to instruction i+1.
• Therefore, when an interrupt occurs, the current content of PC is put in temporary storage
location.
• A return at the end of ISR reloads the PC from that temporary storage location.
• This causes the execution to resume at instruction i+1.
• When processor is handling interrupts, it must inform device that its request has
been recognized.
• This may be accomplished by INTA signal.
• The task of saving and restoring the information can be done automatically by the
processor.
• The processor saves only the contents of PC & Status register.
• Saving registers also increases the Interrupt Latency.
• Interrupt Latency is a delay between
→ time an interrupt-request is received and
→ start of the execution of the ISR.
• Generally, the long interrupt latency in unacceptable.
INTERRUPT HARDWARE:
• Most computers are likely to have several I/O devices that can request an interrupt.
• A single interrupt request line may be used to serve n devices as depicted in figure.
• All devices are connected to line via switches to ground. To request an interrupt, a device closes its
associated switch.
• Thus if all Interrupt request signals INTR1 to INTR n are inactive, that is if all switches are open, the
voltage on the interrupt request line will be equal to Vdd .
• When a device requests an interrupt by closing its switch, the voltage on the line drops to 0, causing
interrupt request signal INTR, received by the processor to go to 1.
• Closing of one or more switches will cause the line voltage to drop to 0, the value of INTR is the logical
OR of requests from individual devices, that is
INTR=INTR1+NTR2………+INTRn
• A special gates known as open-collector or open-drain are used to drive the INTR line.
• The Output of the open collector control is equal to a switch to the ground that is
→ open when gates input is in ”0‟ state and
→ closed when the gates input is in “1‟ state.
• Resistor R is called a Pull-up Resistor because it pulls the line voltage up to the high-voltage state when the
switches are open
Difference between Subroutine & ISR
Subroutine ISR
A subroutine performs a function required by the ISR may not have anything in common with program
program from which it is called being executed at time INTR is received.

Subroutine is just a linkage of 2 or more function Interrupt is a mechanism for coordinating I/O
related to each other. transfers.

ENABLING & DISABLING INTERRUPTS:


• All computers fundamentally should be able to enable and disable interruptions as desired.
• The problem of infinite loop occurs due to successive interruptions of active INTR signals.
There are 3 mechanisms to solve problem of infinite loop:
1) Processor should ignore the interrupts until execution of the first instruction of the ISR.
2) Processor should automatically disable interrupts before starting the execution of the
ISR.
3) Processor has a special INTR line for which the interrupt-handling circuit. Interrupt-
circuit responds only to leading edge of signal. Such line is called edge-triggered.
Sequence of events involved in handling an interrupt-request:
1) The device raises an interrupt-request.
2) The processor interrupts the program currently being executed.
3) Interrupts are disabled by changing the control bits in the processor status register
(PS).
4) The device is informed that its request has been recognized. In response, the
device deactivates the interrupt-request signal.
5) The action requested by the interrupt is performed by the interrupt-service
routine.
6) Interrupts are enabled and execution of the interrupted program is resumed.
HANDLING MULTIPLE DEVICES
While handling multiple devices, the issues concerned are:
1) How can the processor recognize the device requesting an interrupt?
2) How can the processor obtain the starting address of the appropriate ISR?
3) Should a device be allowed to interrupt the processor while another interrupt is
being serviced?
4) How should 2 or more simultaneous interrupt-requests be handled?
POLLING : Simplest way to identify interrupting-device is to have ISR poll all devices
connected to bus.
The first device encountered with its IRQ bit set is serviced. After servicing first
device, next requests may be serviced.
• Information needed to determine whether device is requesting interrupt is
available in status-register
Following condition-codes are used:
• DIRQ -> Interrupt-request for display.
• KIRQ -> Interrupt-request for keyboard.
• KEN -> keyboard enable.
• DEN -> Display Enable.
• SIN, SOUT -> status flags.
SIN = 1 -> when a character is entered at the keyboard.
SIN = 0 -> when the character is read by processor.
IRQ=1 -> when a device raises an interrupt-requests
Advantage of Polling : Simple & easy to implement.
Disadvantage of polling: More time spent polling IRQ bits of all devices
VECTORED INTERRUPTS
• A device requesting an interrupt identifies itself by sending a special-code to processor over
bus.
• Then, the processor starts executing the ISR.
• The special-code indicates starting-address of ISR.
• The special-code length ranges from 4 to 8 bits.
• The location pointed to by the interrupting-device is used to store the staring address to ISR.
• The staring address to ISR is called the interrupt vector.
Processor
→ loads interrupt-vector into PC &
→ executes appropriate ISR.
• When processor is ready to receive interrupt-vector code, it activates INTA line.
• Then, I/O-device responds by sending its interrupt-vector code & turning off the INTR signal.
• The interrupt vector also includes a new value for the Processor Status Register.
INTERRUPT NESTING
• A multiple-priority scheme is implemented by using separate INTR & INTA lines for
each device
• Each INTR line is assigned a different priority-level (Figure 4.7).
• Priority-level of processor is the priority of program that is currently being executed.
• Processor accepts interrupts only from devices that have higher-priority than its
own.
• At the time of execution of ISR for some device, priority of processor is raised to
that of the device.
• Thus, interrupts from devices at the same level of priority or lower are disabled.
Privileged Instruction
• Processor's priority is encoded in a few bits of PS word. (PS -> Processor-Status).
• Encoded-bits can be changed by Privileged Instructions that write into PS.
• Privileged-instructions can be executed only while processor is running in
Supervisor Mode.
• Processor is in supervisor-mode only when executing operating-system routines.
Privileged Exception:
• User program cannot
→ accidently or intentionally change the priority of the processor &
→ disrupt the system-operation.
• An attempt to execute a privileged-instruction while in user-mode leads to a
Privileged Exception.
A multiple priority scheme is shown in fig
4.7. Each of the interrupt request lines is
Assigned a different priority level.
Interrupt Requests received over these
Lines are sent to a priority arbitration
circuit in processor. A request is accepted
only if it has a higher priority than
currently assigned to processor.
SIMULTANEOUS REQUESTS
• The processor must have some mechanisms to decide which request to service when
simultaneous requests arrive.
• INTR line is common to all devices (Figure 4.8a).
• INTA line is connected in a daisy-chain fashion.
• INTA signal propagates serially through devices.
• When several devices raise an interrupt-request, INTR line is activated.
• Processor responds by setting INTA line to 1. This signal is received by device 1.
• Device-1 passes signal on to device 2 only if it does not require any service.
• If device-1 has a pending-request for interrupt, the device-1
→ blocks INTA signal &
→ proceeds to put its identifying-code on data-lines.
• Device that is electrically closest to processor has highest priority.
Advantage: It requires fewer wires than the individual connections.
Arrangement of Priority Groups
• Here, the devices are organized in groups & each group is connected at a different priority
level.
• Within a group, devices are connected in a daisy chain. (Figure 4.8b).
DIRECT MEMORY ACCESS (DMA): The transfer of a block of data directly b/w an
external device & main-memory without continuous involvement by processor is
called DMA.
DMA controller
→ is a control circuit that performs DMA transfers (Figure 8.13).
→ is a part of the I/O device interface.
→ performs the functions that would normally be carried out by processor.
While a DMA transfer is taking place, the processor can be used to execute another
program.
DMA interface has three registers (Figure 8.12):
1) First register is used for storing starting-address.
2) Second register is used for storing word-count.
3) Third register contains status- & control-flags.
The R/W bit determines direction of transfer.
• If R/W=1, controller performs a read-operation (i.e. it transfers data from memory to I/O),
Otherwise, controller performs a write-operation (i.e. it transfers data from I/O to memory).
If Done=1, the controller
→ has completed transferring a block of data and
→ is ready to receive another command. (IE -> Interrupt Enable).
• If IE=1, controller raises an interrupt after it has completed transferring a block of
data.
• If IRQ=1, controller requests an interrupt.
• Requests by DMA devices for using the bus are always given higher priority than
processor requests.
There are 2 ways in which the DMA operation can be carried out:
1) Processor originates most memory-access cycles.
DMA controller is said to "steal" memory cycles from processor. Hence, this
technique is usually called Cycle Stealing.
2) DMA controller is given exclusive access to main-memory to transfer a block of
data without any interruption. This is known as Block Mode (or burst mode).
BUS ARBITRATION
The device that is allowed to initiate data-transfers on bus at any given time is called bus-
master.
• There can be only one bus-master at any given time.
• Bus Arbitration is the process by which
→ next device to become the bus-master is selected &
→ bus-mastership is transferred to that device.
The two approaches are:
1) Centralized Arbitration: A single bus-arbiter performs the required arbitration.
2) Distributed Arbitration: All devices participate in selection of next bus-master.
• A conflict may arise if both the processor and a DMA controller or two DMA controllers
try to use the bus at the same time to access the main-memory.
• To resolve this, an arbitration procedure is implemented on the bus to coordinate the
activities of all devices requesting memory transfers.
• The bus arbiter may be the processor or a separate unit connected to the bus.
CENTRALIZED ARBITRATION:
• A single bus-arbiter performs the required arbitration (Figure: 4.20).
• Normally, processor is the bus-master.
• Processor may grant bus-mastership to one of the DMA controllers.
• A DMA controller indicates that it needs to become bus-master by activating BR line.
• The signal on the BR line is the logical OR of bus-requests from all devices connected to it.
• Then, processor activates BG1 signal indicating to DMA controllers to use bus when it becomes
free.
• BG1 signal is connected to all DMA controllers using a daisy-chain arrangement.
• If DMA controller-1 is requesting the bus,
Then, DMA controller-1 blocks propagation of grant-signal to other devices. Otherwise, DMA
controller-1 passes the grant downstream by asserting BG2.
• Current bus-master indicates to all devices that it is using bus by activating BBSY line.
• The bus-arbiter is used to coordinate the activities of all devices requesting memory transfers.
• Arbiter ensures that only 1 request is granted at any given time according to a priority scheme.
(BR -> Bus-Request, BG -> Bus-Grant, BBSY -> Bus Busy).
The timing diagram shows the sequence of events for the devices connected to the processor.
• DMA controller-2
→ requests and acquires bus-mastership and
→ later releases the bus. (Figure: 4.21).
• After DMA controller-2 releases the bus, the processor resources bus-mastership
DISTRIBUTED ARBITRATION
• All device participate in the selection of next bus-master (Figure 4.22).
• Each device on bus is assigned a 4-bit identification number (ID).
• When 1 or more devices request bus, they
→ assert Start-Arbitration signal &
→ place their 4-bit ID numbers on four open-collector lines ARB 0 through ARB 3 .
• A winner is selected as a result of interaction among signals transmitted over these lines.
• Net-outcome is that the code on 4 lines represents request that has the highest ID number.
• Advantage:
This approach offers higher reliability since operation of bus is not dependent on any single
device.
For example:
• Assume 2 devices A & B have their ID 5 (0101), 6 (0110) and their code is 0111.
• Each device compares the pattern on the arbitration line to its own ID starting from MSB.
• If the device detects a difference at any bit position, it disables the drivers at that bit position
and also lower order bits.
• Driver is disabled by placing ”0”
at the input of the driver.
• In e.g. “A” detects a difference in
line ARB1, hence it disables the
drivers on lines ARB1 & ARB0.
• This causes pattern on
arbitration-line to change to
0110. This means that “B” has
won contention.
READ ONLY MEMORY (ROM)
• Both SRAM and DRAM chips are volatile, i.e. They lose the stored information if
power is turned off.
• Many application requires non-volatile memory which retains the stored
information if power is turned off.
For ex: OS software has to be loaded from disk to memory i.e. it requires non-
volatile memory.
• Non-volatile memory is used in embedded system.
• Since the normal operation involves only reading of stored data, a memory of this
type is called ROM.
At Logic value ‘0’ -> Transistor(T) is connected to the ground point (P).
Transistor switch is closed & voltage on bit-line nearly drops to zero (Figure 8.11).
At Logic value ‘1’ -> Transistor switch is open. The bit-line remains at high voltage.
• To read the state of the cell, the word-line is activated.
• A Sense circuit at the end of the bit-line generates the proper output value.
TYPES OF ROM
Different types of non-volatile memory are
1) PROM
2) EPROM
3) EEPROM &
4) Flash Memory (Flash Cards & Flash Drives)
PROM (PROGRAMMABLE ROM)
• PROM allows the data to be loaded by the user.
• Programmability is achieved by inserting a „fuse‟ at point P in a ROM cell.
• Before PROM is programmed, the memory contains all 0‟s.
• User can insert 1‟s at required location by burning-out fuse using high current-pulse.
• This process is irreversible.
• Advantages:
1) It provides flexibility.
2) It is faster.
3) It is less expensive because they can be programmed directly by the user.
EPROM (ERASABLE REPROGRAMMABLE ROM)
• EPROM allows
→ stored data to be erased and
→ new data to be loaded.
• In cell, a connection to ground is always made at P and a special transistor is used.
• The transistor has the ability to function as
→ a normal transistor or
→ a disabled transistor that is always turned off.
• Transistor can be programmed to behave as a permanently open switch, by injecting charge into it.
• Erasure requires dissipating the charges trapped in the transistor of memory-cells. This can be done by
exposing the chip to ultra-violet light.
• Advantages:
1) It provides flexibility during the development-phase of digital-system.
2) It is capable of retaining the stored information for a long time.
• Disadvantages:
1) The chip must be physically removed from the circuit for reprogramming.
2) The entire contents need to be erased by UV light.
EEPROM (ELECTRICALLY ERASABLE ROM)
• Advantages:
1) It can be both programmed and erased electrically.
2) It allows the erasing of all cell contents selectively.
• Disadvantage: It requires different voltage for erasing, writing and reading the stored data.
FLASH MEMORY
• In EEPROM, it is possible to read & write the contents of a single cell.
• In Flash device, it is possible to read contents of a single cell & write entire contents of a
block.
• Prior to writing, the previous contents of the block are erased.
Eg. In MP3 player, the flash memory stores the data that represents sound.
• Single flash chips cannot provide sufficient storage capacity for embedded-system.
Advantages:
1) Flash drives have greater density which leads to higher capacity & low cost per bit.
2) It requires single power supply voltage & consumes less power.
There are 2 methods for implementing larger memory: 1) Flash Cards & 2) Flash Drives
1) Flash Cards
• One way of constructing larger module is to mount flash-chips on a small card.
• Such flash-card have standard interface.
• The card is simply plugged into a conveniently accessible slot.
• Memory-size of the card can be 8, 32 or 64MB.
Eg: A minute of music can be stored in 1MB of memory. Hence 64MB flash cards can store an hour
of music.
2) Flash Drives
• Larger flash memory can be developed by replacing the hard disk-drive.
• The flash drives are designed to fully emulate the hard disk.
• The flash drives are solid state electronic devices that have no movable parts.
Advantages:
1) They have shorter seek & access time which results in faster response.
2)They have low power consumption. .‟. they are attractive for battery-driven application
3) They are insensitive to vibration.
Disadvantages:
1) The capacity of flash drive (<1GB) is less than hard disk (>1GB).
2) It leads to higher cost per bit.
3) Flash memory will weaken after it has been written a number of times (typically
at least 1 million times).
SPEED, SIZE and COST:
• The main-memory can be built with DRAM (Figure 8.14)
• Thus, SRAM‟s are used in smaller units where speed is of essence.
• The Cache-memory is of 2 types:
1) Primary/Processor Cache (Level1 or L1 cache)
• It is always located on the processor-chip.
2) Secondary Cache (Level2 or L2 cache)
It is placed between the primary-cache and the rest of the memory.
The memory is implemented using the dynamic components (SIMM, RIMM,
DIMM).
The access time for main-memory is about 10 times longer than the access time for
L1 cache.
• CACHE MEMORIES
• The effectiveness of cache mechanism is based on the property of “Locality of Reference”.
• Locality of Reference
• Many instructions in the localized areas of program are executed repeatedly during some time
period
• Remainder of the program is accessed relatively infrequently (Figure 8.15).
There are 2 types:
1) Temporal
The recently executed instructions are likely to be executed again very soon.
2) Spatial
• Instructions in close proximity to recently executed instruction are also likely to be executed
soon.
• If active segment of program is placed in cache-memory, then total execution time can be
reduced.
• Block refers to the set of contiguous address locations of some size.
• The cache-line is used to refer to the cache-block.
The Cache-memory store this number of blocks is small compared to the total number of blocks
available in main-memory.
• Correspondence b/w main-memory-block & cache-memory-block is specified by mapping-
function.
• Cache control hardware decides which block should be removed to create space for the new block.
• The collection of rule for making this decision is called the Replacement Algorithm.
• The cache s a reasonable number of blocks at a given time.
• control-circuit determines whether the requested-word currently exists in the cache.
• The write-operation is done in 2 ways: 1) Write-through protocol & 2) Write-back protocol.
Write-Through Protocol
Here the cache-location and the main-memory-locations are updated simultaneously.
Write-Back Protocol
This technique is to
→ update only the cache-location &
→ mark the cache-location with associated flag bit called Dirty/Modified Bit.
The word in memory will be updated later, when the marked-block is removed from cache.
During Read-operation
• If the requested-word currently not exists in the cache, then read-miss will occur.
• To overcome the read miss, Load–through/Early restart protocol is used.
Load–Through Protocol
The block of words that contains the requested-word is copied from the memory into cache.
After entire block is loaded into cache, the requested-word is forwarded to processor.
During Write-operation
• If the requested-word not exists in the cache, then write-miss will occur.
1) If Write Through Protocol is used, the information is written directly into main-memory.
2) If Write Back Protocol is used,
→ then block containing the addressed word is first brought into the cache &
→ then the desired word in the cache is over-written with the new information.
MAPPING-FUNCTION
Three types of mapping-function are:
1) Direct Mapping
2) Associative Mapping
3) Set-Associative Mapping
DIRECT MAPPING
• The block-j of the main-memory maps onto block-j modulo-128 of the cache (Figure 8.16).
• When the memory-blocks 0, 128, & 256 are loaded into cache, the block is stored in cache-
block 0. Similarly, memory-blocks 1, 129, 257 are stored in cache-block 1.
• The contention may arise when
1) When the cache is full.
2) When more than one memory-block is mapped onto a given cache-block position.
• The contention is resolved by allowing the new blocks to overwrite the currently resident-
block.
• Memory-address determines placement of block in the cache.
ASSOCIATIVE MAPPING
• The memory-block can be placed into any cache-block position. (Figure 8.17).
• 12 tag-bits will identify a memory-block when it is resolved in the cache.
• Tag-bits of an address received from processor are compared to the tag-bits of
each block of cache.
• This comparison is done to see if the desired block is present.
It gives complete freedom in choosing the cache-location.
• A new block that has to be brought into the cache has to replace an existing
block if the cache is full.
• The memory has to determine whether a given block is in the cache.
Advantage: It is more flexible than direct mapping technique.
Disadvantage: Its cost is high.
The cache which contains 1 block per set is called direct mapping.
• A cache that has „k‟ blocks per set is called as “k-way set associative cache‟.
• Each block contains a control-bit called a valid-bit.
• The Valid-bit indicates that whether the block contains valid-data.
• The dirty bit indicates that whether the block has been modified during its cache residency.
Valid-bit=0 - When power is initially applied to system.
Valid-bit=1 - When the block is loaded from main-memory at first time.
• If the main-memory-block is updated by a source & if the block in the source is already
exists in the cache, then the valid-bit will be cleared to “0‟.
• If Processor & DMA uses the same copies of data then it is called as Cache Coherence
Problem.
Advantages:
1) Contention problem of direct mapping is solved by having few choices for block placement.
2) The hardware cost is decreased by reducing the size of associative search.
UNIT 5: BASIC PROCESSING UNIT
SOME FUNDAMENTAL CONCEPTS
To execute an instruction, processor has to perform following 3 steps:
1) Fetch contents of memory-location pointed to by PC. Content of this location is an instruction to be
executed. The instructions are loaded into IR, Symbolically, this operation is written as:
IR <- [[PC]]
2) Increment PC by 4.
PC <- [PC] +4
3) Carry out the actions specified by instruction (in the IR).
The first 2 steps are referred to as Fetch Phase.
Step 3 is referred to as Execution Phase.
The operation specified by an instruction can be carried out by performing one or more of the following
actions:
1) Read the contents of a given memory-location and load them into a register.
2) Read data from one or more registers.
3) Perform an arithmetic or logic operation and place the result into a register.
4) Store data from a register into a given memory-location.
The hardware-components needed to perform these actions are shown in Figure 5.1.
SINGLE BUS ORGANIZATION
• ALU and all the registers are interconnected via a Single Common Bus (Figure 7.1).
• Data & address lines of the external memory-bus is connected to the internal processor-bus via
MDR & MAR respectively. (MDR- Memory Data Register, MAR - Memory Address Register).
• MDR has 2 inputs and 2 outputs. Data may be loaded
→ into MDR either from memory-bus (external) or
→ from processor-bus (internal).
• MAR‟s input is connected to internal-bus; MAR‟s output is
connected to external-bus.
• Instruction Decoder & Control Unit is responsible for
→ issuing the control-signals to all the units inside the processor.
→ implementing the actions specified by the instruction (loaded in the IR).
• Register R0 through R(n-1) are the Processor Registers.
The programmer can access these registers for general-purpose use.
• Only processor can access 3 registers Y, Z & Temp for temporary storage during
program-execution. The programmer cannot access these 3 registers.
In ALU, 1) ‘A’ input gets the operand from the output of the multiplexer (MUX).
2) ‘B’ input gets the operand directly from the processor-bus.
There are 2 options provided for ‘A’ input of the ALU.
• MUX is used to select one of the 2 inputs.
• MUX selects either
→ output of Y or
→ constant-value 4( which is used to increment PC content).
An instruction is executed by performing one or more of the following operations:
1) Transfer a word of data from one
register to another or to the ALU.
2) Perform arithmetic or a logic operation
and store the result in a register.
3) Fetch the contents of a given
memory-location and load them into a register.
4) Store a word of data from a register
into a given memory-location.
REGISTER TRANSFERS
• Instruction execution involves a sequence of steps in which data are transferred from one
register to another.
• For each register, two control-signals are used: Riin & Riout. These are called Gating Signals.
• Riin=1 -> data on bus is loaded into Ri. Riout=1 -> content of Ri is placed on bus.
Riout=0, -> bus can be used for transferring data from other registers.
• For example, Move R1, R2; This transfers the contents of register R1 to register R2. This can
be accomplished as follows:
1) Enable the output of registers R1 by setting R1out to 1 (Figure 7.2). This places the contents
of R1 on processor-bus.
2) Enable the input of register R2 by setting R2in to 1.
This loads data from processor-bus into register R2.
• All operations and data transfers within the processor take place within time-periods defined
by the processor-clock.
• The control-signals that govern a particular transfer are asserted at the start of the clock
cycle.
Input & Output Gating for one Register Bit
• A 2-input multiplexer is used to select the data applied to the input of an edge-
triggered D flip-flop.
• Riin=1 -> mux selects data on bus. This data will be loaded into flip-flop at rising-
edge of clock. Riin=0 -> mux feeds back the value currently stored in flip-flop (Figure
7.3).
• Q output of flip-flop is connected to bus via a tri-state gate. Riout=0 -> gate's
output is in the high-impedance state.
Riout=1 -> the gate drives the bus to 0 or 1, depending on the value of Q.
PERFORMING AN ARITHMETIC OR LOGIC OPERATION
• The ALU performs arithmetic operations on the 2 operands applied to its A and B
inputs.
• One of the operands is output of MUX; And, the other operand is obtained
directly from processor-bus.
The result (produced by the ALU) is stored temporarily in register Z.
• The sequence of operations for [R3]<-[R1]+[R2] is as follows:
1) R1out, Yin
2) R2out, Select Y, Add, Zin
3) Zout, R3in
• Instruction execution proceeds as follows:
Step 1 --> Contents from register R1 are loaded into register Y.
Step2 --> Contents from Y and from register R2 are applied to the A and B inputs of
ALU; Addition is performed & Result is stored in the Z register.
Step 3 --> The contents of Z register is stored in the R3 register.
• The signals are activated for the duration of the clock cycle corresponding to that
step. All other signals are inactive.
CONTROL-SIGNALS OF MDR
The MDR register has 4 control-signals (Figure 7.4):
1) MDRin & MDRout control the connection to the internal processor data bus &
2) MDRinE & MDRoutE control the connection to the memory Data bus.
MAR register has 2 control-signals.
1) MARin controls the connection to the internal processor address bus &
2) MARout controls the connection to the memory address bus.
FETCHING A WORD FROM MEMORY:
To fetch instruction/data from memory, processor transfers required address to MAR.
At the same time, processor issues Read signal on control-lines of memory-bus.
• When requested-data are received from memory, they are stored in MDR. From
MDR, they are transferred to other registers.
• The response time of each memory access varies (based on cache miss, memory-
mapped I/O). To accommodate this, MFC is used. (MFC-> Memory Function
Completed).
• MFC is a signal sent from addressed-device to the processor. MFC informs the
processor that the requested operation has been completed by addressed-device.
Consider the instruction Move (R1),R2. The sequence of steps is (Figure 7.5):
1) MAR <- [R1] ;R1out, MARin
2) Read ;desired address is loaded into MAR & Read command is issued.
3) Wait for MFC response from memory.
4) MDRinE, WMFC ;load MDR from memory-bus
5) R2 <- [MDR] ;MDRout, R2in - load R2 from MDR.
where WMFC=control-signal that causes processor's control.
circuitry to wait for arrival of MFC
signal.
Above operation requires 3 steps:
1) R1out, MARin, Read
2) MDRinE, WMFC
3) MDRout, R2in
Storing a Word in Memory
• Consider the instruction Move R2,(R1). This requires the following sequence:
1) R1out, MARin ;desired address is loaded into MAR.
2) R2out, MDRin, Write ;data to be written are loaded into MDR & Write
command is issued.
3) MDRoutE, WMFC ;load data into memory-location pointed by R1 from MDR.
EXECUTION OF A COMPLETE INSTRUCTION:
Consider the instruction Add (R3),R1 which adds the contents of a memory-
location pointed by R3 to register R1. Executing this instruction requires the
following actions:
1) Fetch the instruction.
2) Fetch the first operand.
3) Perform the addition &
4) Load the result into R1.
Instruction execution proceeds as follows:
Step1--> The instruction-fetch operation is initiated by
→ loading contents of PC into MAR &
→ sending a Read request to memory.
The Select signal is set to Select4, which causes the Mux to select constant 4. This
value is added to operand at input B (PC’s content), and the result is stored in Z.
Step2--> Updated value in Z is moved to PC. This completes the PC increment
operation and PC will now point to next instruction.
Step3--> Fetched instruction is moved into MDR and then to IR. The step 1 through
3 constitutes the Fetch Phase.
At the beginning of step 4, the instruction decoder interprets the contents of the
IR. This enables the control circuitry to activate the control-signals for steps 4
through 7.
The step 4 through 7 constitutes the Execution Phase.
Step4--> Contents of R3 are loaded into MAR & a memory read signal is issued.
Step5--> Contents of R1 are transferred to Y to prepare for addition.
Step6--> When Read operation is completed, memory-operand is available in MDR,
and the addition is performed.
Step7--> Sum is stored in Z, then transferred to R1.The End signal causes a new
instruction fetch cycle to begin by returning to step1.
BRANCHING INSTRUCTIONS: A branch instruction replaces the contents of PC with
branch target address. Control sequence for an unconditional branch instruction is
as follows:
Instruction execution proceeds as follows:
Step 1-3--> The processing starts & the fetch phase ends in step3.
Step 4--> The offset-value is extracted from IR by instruction-decoding circuit.
Since the updated value of PC is already available in register Y, the offset X is gated
onto the bus, and an addition operation is performed.
Step 5--> the result, which is the branch-address, is loaded into the PC.
• The branch instruction loads the branch target address in PC so that PC will fetch
the next instruction from the branch target address.
• The branch target address is usually obtained by adding the offset in the contents
of PC.
• The offset X is usually the difference between the branch target-address and the
address immediately following the branch instruction.
In case of conditional branch,
• we have to check the status of the condition-codes before loading a new value
into the PC. e.g.: Offset-field-of-IRout, Add, Zin, If N=0 then End
• If N=0, processor returns to step 1 immediately after step 4.
• If N=1, step 5 is performed to load a new value into PC.
Pipelining:
Pipelining as a means for executing machine instructions concurrently.
Basic Concepts:
Pipelining is a particularly effective way of organizing concurrent activity in a
computer system. The basic idea is very simple. It is frequently encountered in
manufacturing plants, where pipelining is commonly known as an assembly-line
operation
Example: car manufacturing
Pipelining example of a car manufacturing
• The first station in an assembly line may prepare the chassis of a car, the next
station adds the body, the next one installs the engine, and so on.
• While one group of workers is installing the engine on one car, another group is
fitting a car body on the chassis of another car, and yet another group is
preparing a new chassis for a third car.
• It may take days to complete work on a given car, but it is possible to have a new
car rolling off the end of the assembly line every few minutes.
Consider how the idea of pipelining can be used in a computer.
The processor executes a program by fetching and executing instructions, one after
the other.
Let Fi and Ei refer to the fetch and execute steps for instruction Ii .
Execution of a program consists of a sequence of fetch and execute steps, as shown
in Figure 8.1a.
Basic idea of pipelining
• The computer is controlled by a clock whose period is such that the fetch and
execute steps of any instruction can each be completed in one clock cycle.
Operation of the computer proceeds as in Figure 8.1c.
• In the first clock cycle, the fetch unit fetches an instruction I1 (step F1) and stores
it in buffer B1 at the end of the clock cycle
• In the second clock cycle, the instruction fetch unit proceeds with the fetch
operation for instruction I2 (step F2).
• Meanwhile, the execution unit performs the operation specified by instruction I1,
which is available to it in buffer B1 (step E1). By the end of the second clock cycle,
the execution of instruction I1 is completed and instruction I2 is available
• Instruction I2 is stored in B1, replacing I1, which is no longer needed. Step E2 is
performed by the execution unit during the third clock cycle, while instruction I3
is being fetched by the fetch unit. In this manner, both the fetch and execute units
are kept busy all the time.
The processing of an instruction need not be divided into only two steps. For
example, a pipelined processor may process each instruction in four steps, as
follows:
F Fetch: read the instruction from the memory.
D Decode: decode the instruction and fetch the source operand(s).
E Execute: perform the operation specified by the instruction.
W Write: store the result in the destination location.
The sequence of events for this case is shown in Figure 8.2a
Four instructions are in progress at any given
time. This means that four distinct hardware
units are needed, as shown in Figure 8.2b.
ROLE OF CACHE MEMORY and PIPELINE PERFORMANCE: Refer pdf sent

Note: Please refer textbook along with these ppts.


Solve the problems from the exercises.

You might also like