0% found this document useful (0 votes)
11 views33 pages

COMPUTER ORGANIZATION (UNIT - 2) - Note

Computer organisation unit2 note

Uploaded by

lstij
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views33 pages

COMPUTER ORGANIZATION (UNIT - 2) - Note

Computer organisation unit2 note

Uploaded by

lstij
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 33

UNIT 2

BASIC PROCESSING UNIT:


REGISTER TRANSFERS
• Instruction execution involves a sequence of steps in which data are transferred from one
register to another.
• Input & output of register Ri is connected to bus via switches controlled by 2 control-
signals: Riin & Riout. These are called gating signals.
• When Riin=1, data on bus is loaded into Ri. Similarly, when Riout=1, content of Ri is placed
on bus.
• When Riout=0, bus can be used for transferring data from other registers.
• All operations and data transfers within the processor take place within time-periods
defined by the processor- clock.
• When edge-triggered flip-flops are not used, 2 or more clock-signals may be needed to
guarantee proper transfer of data. This is known as multiphase clocking.

Input & Output Gating for one Register Bit


• A 2-input multiplexer is used to select the data applied to the input of an edge-triggered D
flip-flop.
• When Riin=1, mux selects data on bus. This data will be loaded into flip-flop at rising-edge
of clock
When Riin=0, mux feeds back the value currently stored in flip-flop.
• Q output of flip-flop is connected to bus via a tri-state gate.
When Riout=0, gate's output is in the high-impedance state. (This corresponds to the open-
circuit state of a switch).
When Riout=1, the gate drives the bus to 0 or 1, depending on the value of Q.
PERFORMING AN ARITHMETIC OR LOGIC OPERATION
• The ALU performs arithmetic operations on the 2 operands applied to its A and B inputs.
• One of the operands is output of MUX & the other operand is obtained directly from bus.
• The result (produced by the ALU) is stored temporarily in register Z.

• The sequence of operations for [R3] [R1]+[R2] is as follows


1) R1out, Yin //transfer the contents of R1 to Y register
2) R2out, SelectY, Add, Zin //R2 contents are transferred directly to B input of ALU.
// The numbers of added. Sum stored in register Z
3) Zout, R3in //sum is transferred to register R3
• The signals are activated for the duration of the clock cycle corresponding to that step. All
other signals are inactive.
Write the complete control sequence for the instruction : Move (Rs),Rd
• This instruction copies the contents of memory-location pointed to by Rs into Rd. This is a
memory read operation. This requires the following actions
→ fetch the instruction
→ fetch the operand (i.e. the contents of the memory-location pointed by Rs).
→ transfer the data to Rd.
• The control-sequence is written as follows
1) PCout, MARin, Read, Select4, Add, Zin
2) Zout, PCin, Yin, WMFC
3) MDR out, Irin
4) Rs, MARin, Read
5) MDR inE, WMFC
6) MDR out, Rd, End

FETCHING A WORD FROM MEMORY


• To fetch instruction/data from memory, processor transfers required address to MAR
(whose output is connected to address-lines of memory-bus).
At the same time, processor issues Read signal on control-lines of memory-bus.
• When requested-data are received from memory, they are stored in MDR. From MDR,
they are transferred to other registers
• MFC (Memory Function Completed): Addressed-device sets MFC to 1 to indicate that the
contents of the specified location
→ have been read &
→ are available on data-lines of memory-bus
• Consider the instruction Move (R1),R2. The sequence of steps is:
1) R1out, MARin, Read ;desired address is loaded into MAR & Read command is issued
2) MDRinE, WMFC ;load MDR from memory bus & Wait for MFC response from
memory
3) MDRout, R2in Storing a Word in Memory ;load R2 from MDR where WMFC=control
signal that causes processor's control circuitry to wait for arrival of MFC signal
S

Storing a Word in Memory


• Consider the instruction Move R2,(R1). This requires the following sequence
1) R1out, MARin ;desired address is loaded into MAR
2) R2out, MDRin, Write ;data to be written are loaded into MDR & Write
command is issued
3) MDRoutE, WMFC ;load data into memory location pointed by R1 from

EXECUTION OF A COMPLETE INSTRUCTION


• Consider the instruction Add (R3),R1 which adds the contents of a memory-
location pointed by R3 to register R1. Executing this instruction requires the following
actions:
1) Fetch the instruction.
2) Fetch the first operand.
3) Perform the addition.
4) Load the result into R1.
• Control sequence for execution of this instruction is as follows
1) PCout, MARin, Read, Select4, Add, Zin
2) Zout, PCin, Yin, WMFC
3) MDRout, Irin
4) R3out, MARin, Read
5) R1out, Yin, WMFC 6) MDRout, SelectY, Add, Zin
7) Zout, R1in, End
• Instruction execution proceeds as follows:
Step1--> The instruction-fetch operation is initiated by loading contents of PC into
MAR & sending a Read request to memory. The Select signal is set to Select4, which causes
the Mux to select constant 4. This value is added to operand at input B (PC‟s content), and
the result is stored in Z
Step2--> Updated value in Z is moved to PC.
Step3--> Fetched instruction is moved into MDR and then to IR.
Step4--> Contents of R3 are loaded into MAR & a memory read signal is issued.
Step5 -->Contents of R1 are transferred to Y to prepare for addition.
Step6--> When Read operation is completed, memory-operand is available in MDR,
and the addition is performed.
Step7--> Sum is stored in Z, then transferred to R1.The End signal causes a new
instruction fetch cycle to begin by returning to step1.

BRANCHING INSTRUCTIONS
• Control sequence for an unconditional branch instruction is as follows:
1) PCout, MARin, Read, Select4,Add,Zin
2) Zout, PCin, Yin, WMFC
3) MDRout, IRin
4) Offset-field-of-IRout, Add, Zin
5) Zout, PCin, End
• The processing starts, as usual, the fetch phase ends in step3.
• In step 4, the offset-value is extracted from IR by instruction-decoding circuit.
• Since the updated value of PC is already available in register Y, the offset X is gated
onto the bus, and an addition operation is performed.
• In step 5, the result, which is the branch-address, is loaded into the PC.
• The offset X used in a branch instruction is usually the difference between the
branch target-address and the address immediately following the branch instruction. (For
example, if the branch instruction is at location 1000 and branch target-address is 1200, then
the value of X must be 196, since the PC will be containing the address 1004 after fetching
the instruction at location 1000).
• In case of conditional branch, we need to check the status of the condition-codes
before loading a new value into the PC.
e.g.: Offset-field-of-IRout, Add, Zin, If N=0 then End
If N=0, processor returns to step 1 immediately after step 4.
If N=1, step 5 is performed to load a new value into PC.

MULTIPLE BUS ORGANIZATION


• All general-purpose registers are combined into a single block called the register
file.
• Register-file has 3 ports. There are 2 outputs allowing the contents of 2 different
registers to be simultaneously placed on the buses A and B.
• Register-file has 3 ports.
1) Two output-ports allow the contents of 2 different registers to be simultaneously
placed on buses A & B.
2) Third input-port allows data on bus C to be loaded into a third register during the
same clock-cycle.
• Buses A and B are used to transfer source-operands to A & B inputs of ALU.
• Result is transferred to destination over bus C.
• Incrementer-unit is used to increment PC by 4.
• Control sequence for the instruction Add R4,R5,R6 is as follows
1) PCout, R=B, MARin, Read, IncPC
2) WMFC
3) MDRout, R=B, Irin
4) R4outA, R5outB, SelectA, Add, R6in, End
• Instruction execution proceeds as follows:
Step 1--> Contents of PC are passed through ALU using R=B control-signal
and loaded into MAR to start a memory Read operation. At the same time, PC is
incremented by 4.
Step2--> Processor waits for MFC signal from memory.
Step3--> Processor loads requested-data into MDR, and then transfers them
to IR.
Step4--> The instruction is decoded and add operation take place in a single step.
Note:
To execute instructions, the processor must have some means of generating the
control signals needed in the proper sequence.
There are two approaches for this purpose: 1) Hardwired control and 2)
Microprogrammed control

HARDWIRED CONTROL
• Decoder/encoder block is a combinational-circuit that generates required control-
outputs depending on state of all its inputs.
• Step-decoder provides a separate signal line for each step in the control sequence.
Similarly, output of instruction-decoder consists of a separate line for each machine
instruction.
• For any instruction loaded in IR, one of the output-lines INS1 through INSm is set to
1, and all other lines are set to 0.
• The input signals to encoder-block are combined to generate the individual
control-signals Yin, PCout, Add, End and so on.
• For example, Zin=T1+T6.ADD+T4.BR ;This signal is asserted during time-slot T1 for
all instructions, during T6 for an Add instruction during T4 for unconditional branch
instruction
• When RUN=1, counter is incremented by 1 at the end of every clock cycle. When
RUN=0, counter stops counting.
• Sequence of operations carried out by this machine is determined by wiring of logic
elements, hence the name “hardwired”.
• Advantage: Can operate at high speed. Disadvantage: Limited flexibility.
COMPLETE PROCESSOR
• This has separate processing-units to deal with integer data and floating-point
data.
• A data-cache is inserted between these processing-units & main-memory.
• Instruction-unit fetches instructions
→ from an instruction-cache or
→ from main-memory when desired instructions are not already in cache
• Processor is connected to system-bus & hence to the rest of the computer by
means of a bus interface
• Using separate caches for instructions & data is common practice in many
processors today.
• A processor may include several units of each type to increase the potential for
concurrent operations.

MICROPROGRAMMED CONTROL
• Control-signals are generated by a program similar to machine language programs.
• Control word(CW) is a word whose individual bits represent various control-signals(like
Add, End, Zin). {Each of the control-steps in control sequence of an instruction defines a
unique combination of 1s & 0s in the CW}.
• Individual control-words in microroutine are referred to as microinstructions.
• A sequence of CWs corresponding to control-sequence of a machine instruction
constitutes the microroutine.
• The microroutines for all instructions in the instruction-set of a computer are
stored in a special memory called the control store(CS).
• Control-unit generates control-signals for any instruction by sequentially reading
CWs of corresponding microroutine from CS.
• Microprogram counter(µPC) is used to read CWs sequentially from CS.
• Every time a new instruction is loaded into IR, output of "starting address
generator" is loaded into µPC.
• Then, µPC is automatically incremented by clock,

causing successive microinstructions to be read from CS.

Hence, control-signals are delivered to various parts of processor in correct sequence.

MICROINSTRUCTIONS
• Drawbacks of microprogrammed control:

1) Assigning individual bits to each control-signal results in long microinstructions because


the number of required signals is usually large.

2) Available bit-space is poorly used because only a few bits are set to 1 in any given
microinstruction.

• Solution: Signals can be grouped because

1) Most signals are not needed simultaneously.

2) Many signals are mutually exclusive.

• Grouping control-signals into fields requires a little more hardware because decoding-circuits must
be used to decode bit patterns of each field into individual control signals.

• Advantage: This method results in a smaller control-store (only 20 bits are needed to store the
patterns for the 42 signals).

MICROPROGRAMMING SEQUENCING
• Two major disadvantage of microprogrammed control is:

1) Having a separate microroutine for each machine instruction results in a large total
number of microinstructions and a large control-store.

2) Execution time is longer because it takes more time to carry out the required branches.

• Consider the instruction Add src,Rdst ;which adds the source-operand to the contents of
Rdst and places the sum in Rdst.

• Let source-operand can be specified in following addressing modes: register,


autoincrement, autodecrement and indexed as well as the indirect forms of these 4 modes.

• Each box in the chart corresponds to a microinstruction that controls the transfers and
operations indicated within the box.

• The microinstruction is located at the address indicated by the octal number (001,002).
WIDE BRANCH ADDRESSING
• The instruction-decoder(InstDec) generates the starting-address of the microroutine that
implements the instruction that has just been loaded into the IR.

• Here, register IR contains the Add instruction, for which the instruction decoder generates
the microinstruction address 101. (However, this address cannot be loaded as is into the μPC).

• The source-operand can be specified in any of several addressing-modes. The bit-ORing


technique can be used to modify the starting-address generated by the instruction-decoder to reach
the appropriate path.

Use of WMFC
• WMFC signal is issued at location 112 which causes a branch to the microinstruction in
location 171.

• WMFC signal means that the microinstruction may take several clock cycles to complete. If
the branch is allowed to happen in the first clock cycle, the microinstruction at location 171 would be
fetched and executed prematurely. To avoid this problem, WMFC signal must inhibit any change in
the contents of the μPC during the waiting-period.

Detailed Examination
• Consider Add (Rsrc)+,Rdst; which adds Rsrc content to Rdst content, then stores the sum in
Rdst and finally increments Rsrc by 4 (i.e. auto-increment mode).

• In bit 10 and 9, bit-patterns 11, 10, 01 and 00 denote indexed, auto-decrement, auto-
increment and register modes respectively. For each of these modes, bit 8 is used to specify the
indirect version

• The processor has 16 registers that can be used for addressing purposes;
eachspecified using a 4-bit-code.

• There are 2 stages of decoding:

1) The microinstruction field must be decoded to determine that an Rsrc or Rdst register is involved.

2) The decoded output is then used to gate the contents of the Rsrc or Rdst fields in the IR into a
second decoder, which produces the gating-signals for the actual registers R0 TO R15.

MICROINSTRUC TIONS WITH NEXT-ADDRESS FILDS


• The microprogram requires several branch microinstructions which perform no useful operation.
Thus, they detract from the operating speed of the computer.

• Solution: Include an address-field as a part of every microinstruction to indicate the location of the
next microinstruction to be fetched. (This means every microinstruction becomes a branch
microinstruction).

• The flexibility of this approach comes at the expense of additional bits for the address-field.

• Advantage: Separate branch microinstructions are virtually eliminated. There are few limitations in
assigning addresses to microinstructions. There is no need for a counter to keep track of sequential
addresse. Hence, the μPC is replaced with a μAR (Microinstruction Address Register). {which is
loaded from the next-address field in each microinstruction}.

• The next-address bits are fed through the OR gate to the μAR, so that the address can be modified
on the basis of the data in the IR, external inputs and condition-codes.

• The decoding circuits generate the starting-address of a given microroutine on the basis of the
opcode in the IR.

PREFETCHING MICROINSTRUCTIONS
• Drawback of microprogrammed control: Slower operating speed because of the time it takes to
fetch microinstructions from the control-store.

• Solution: Faster operation is achieved if the next microinstruction is pre-fetched while the current
one is being executed.

Emulation
• The main function of microprogrammed control is to provide a means for simple, flexible and
relatively inexpensive execution of machine instruction.

• Its flexibility in using a machine's resources allows diverse classes of instructions to be


implemented.

• Suppose we add to the instruction-repository of a given computer M1, an entirely new set of
instructions that is in fact the instruction-set of a different computer M2.

• Programs written in the machine language of M2 can be then be run on computer M1 i.e. M1
emulates M2.

• Emulation allows us to replace obsolete equipment with more up-to-date machines.

• If the replacement computer fully emulates the original one, then no software changes have to be
made to run existing programs.

• Emulation is easiest when the machines involved have similar architectures.

CACHE MEMORIES
MAPPING FUNCTION
Prefetching is a computer system process that involves bringing data into a cache before it's
needed, which helps reduce the time it takes to fetch data from memory. Prefetching can be
done automatically or explicitly by programmers to optimize performance.

Here are some advantages of prefetching:

• Reduced latency
Prefetching can reduce latency by preloading resources before they're needed. This means
that when a user performs an action, the content is already stored in the cache, which
reduces the time it takes to load the page.

• Faster access
Because cache memories are typically much faster to access than main memory, prefetching
data and then accessing it from caches is usually much faster than accessing it directly from
main memory

What is prefetch Microinstruction?


A technique used by computer processors to boost execution performance by fetching
instructions or data from their original storage is called prefetching of microinstructions.

Emulation in computer organization is the process of using a program, application, or device


to mimic the behavior of another program or device. Emulation can be used for many
purposes, including:

• Running software
Emulation can be used to run software programs or operating systems on hardware or in
operating systems that they weren't originally designed for.

• Running peripherals
Emulation can be used to run peripherals designed for a different system.

• Analyzing files and URLs


Emulation environments can be used to analyze potentially malicious files and URLs.

• Debugging and verifying systems


Hardware emulation can be used to debug and verify systems that are still being designed.

• Connecting devices
Emulation can be used to connect devices to each other or to a mainframe computer.

Rosetta 2 is an example of an emulator that uses emulation technology to allow a Mac with
Apple silicon to run applications designed for a Mac with an Intel CPU.

Cache memory is a small, high-speed storage area in a computer. The cache is a smaller and
faster memory that stores copies of the data from frequently used main memory locations.
There are various independent caches in a CPU, which store instructions and data. The most
important use of cache memory is that it is used to reduce the average time to access data
from the main memory.
By storing this information closer to the CPU, cache memory helps speed up the overall
processing time. Cache memory is much faster than the main memory (RAM). When the
CPU needs data, it first checks the cache. If the data is there, the CPU can access it quickly. If
not, it must fetch the data from the slower main memory.
Characteristics of Cache Memory
• Cache memory is an extremely fast memory type that acts as a buffer
between RAM and the CPU.
• Cache Memory holds frequently requested data and instructions so that they are
immediately available to the CPU when needed.
• Cache memory is costlier than main memory or disk memory but more economical
than CPU registers.
• Cache Memory is used to speed up and synchronize with a high-speed CPU.
Understanding cache memory and its role in computer architecture is crucial for excelling in
exams like GATE, where computer organization is a core topic. To deepen your
understanding and enhance your exam preparation, consider enrolling in the GATE CS Self-
Paced Course . This course offers in-depth coverage of computer architecture, including
detailed explanations of cache memory and its optimization, helping you build the expertise
needed to perform well in your exams.

Cache Memory
Levels of Memory
• Level 1 or Register: It is a type of memory in which data is stored and accepted that
are immediately stored in the CPU. The most commonly used register is
Accumulator, Program counter , Address Register, etc.
• Level 2 or Cache memory: It is the fastest memory that has faster access time where
data is temporarily stored for faster access.
• Level 3 or Main Memory: It is the memory on which the computer works currently. It
is small in size and once power is off data no longer stays in this memory.
• Level 4 or Secondary Memory: It is external memory that is not as fast as the main
memory but data stays permanently in this memory.
Cache Performance
When the processor needs to read or write a location in the main memory, it first checks for
a corresponding entry in the cache.
• If the processor finds that the memory location is in the cache, a Cache Hit has
occurred and data is read from the cache.
• If the processor does not find the memory location in the cache, a cache miss has
occurred. For a cache miss, the cache allocates a new entry and copies in data from
the main memory, then the request is fulfilled from the contents of the cache.
The performance of cache memory is frequently measured in terms of a quantity called Hit
ratio.
Hit Ratio(H) = hit / (hit + miss) = no. of hits/total accesses
Miss Ratio = miss / (hit + miss) = no. of miss/total accesses = 1 - hit ratio(H)
We can improve Cache performance using higher cache block size, and higher associativity,
reduce miss rate, reduce miss penalty, and reduce the time to hit in the cache.
Cache Mapping
There are three different types of mapping used for the purpose of cache memory which is
as follows:
• Direct Mapping
• Associative Mapping
• Set-Associative Mapping
1. Direct Mapping
The simplest technique, known as direct mapping, maps each block of main memory into
only one possible cache line. or In Direct mapping, assign each memory block to a specific
line in the cache. If a line is previously taken up by a memory block when a new block needs
to be loaded, the old block is trashed. An address space is split into two parts index field and
a tag field. The cache is used to store the tag field whereas the rest is stored in the main
memory. Direct mapping`s performance is directly proportional to the Hit ratio.
i = j modulo m
where
i = cache line number
j = main memory block number
m = number of lines in the cache
Direct Mapping
For purposes of cache access, each main memory address can be viewed as consisting of
three fields. The least significant w bits identify a unique word or byte within a block of main
memory. In most contemporary machines, the address is at the byte level. The remaining s
bits specify one of the 2 s blocks of main memory. The cache logic interprets these s bits as a
tag of s-r bits (the most significant portion) and a line field of r bits. This latter field identifies
one of the m=2 r lines of the cache. Line offset is index bits in the direct mapping.

Direct Mapping – Structure


2. Associative Mapping
In this type of mapping, associative memory is used to store the content and addresses of
the memory word. Any block can go into any line of the cache. This means that the word id
bits are used to identify which word in the block is needed, but the tag becomes all of the
remaining bits. This enables the placement of any word at any place in the cache memory. It
is considered to be the fastest and most flexible mapping form. In associative mapping, the
index bits are zero.

Associative Mapping – Structure


3. Set-Associative Mapping
This form of mapping is an enhanced form of direct mapping where the drawbacks of direct
mapping are removed. Set associative addresses the problem of possible thrashing in the
direct mapping method. It does this by saying that instead of having exactly one line that a
block can map to in the cache, we will group a few lines together creating a set . Then a
block in memory can map to any one of the lines of a specific set. Set-associative mapping
allows each word that is present in the cache can have two or more words in the main
memory for the same index address. Set associative cache mapping combines the best of
direct and associative cache mapping techniques. In set associative mapping the index bits
are given by the set offset bits. In this case, the cache consists of a number of sets, each of
which consists of a number of lines.
- Set Associative Mapping
Relationships in the Set-Associative Mapping can be defined as:
m=v*k
i= j mod v

where
i = cache set number
j = main memory block number
v = number of sets
m = number of lines in the cache number of sets
k = number of lines in each set

Set-Associative Mapping – Structure


For more, you can refer to the Difference between Types of Cache Mapping .
Application of Cache Memory
Here are some of the applications of Cache Memory.
• Primary Cache: A primary cache is always located on the processor chip. This cache is
small and its access time is comparable to that of processor registers.
• Secondary Cache: Secondary cache is placed between the primary cache and the rest
of the memory. It is referred to as the level 2 (L2) cache. Often, the Level 2 cache is
also housed on the processor chip.
• Spatial Locality of Reference: Spatial Locality of Reference says that there is a chance
that the element will be present in close proximity to the reference point and next
time if again searched then more close proximity to the point of reference.
• Temporal Locality of Reference: Temporal Locality of Reference uses the Least
recently used algorithm will be used. Whenever there is page fault occurs within a
word will not only load the word in the main memory but the complete page fault
will be loaded because the spatial locality of reference rule says that if you are
referring to any word next word will be referred to in its register that’s why we load
complete page table so the complete block will be loaded.
Advantages
• Cache Memory is faster in comparison to main memory and secondary memory.
• Programs stored by Cache Memory can be executed in less time.
• The data access time of Cache Memory is less than that of the main memory.
• Cache Memory stored data and instructions that are regularly used by the CPU,
therefore it increases the performance of the CPU.
Disadvantages
• Cache Memory is costlier than primary memory and secondary memory .

• Data is stored on a temporary basis in Cache Memory.

• Whenever the system is turned off, data and instructions stored in cache memory get
destroyed.
• The high cost of cache memory increases the price of the Computer System.
Conclusion
In conclusion, cache memory plays a crucial role in making computers faster and more
efficient. By storing frequently accessed data close to the CPU , it reduces the time the CPU
spends waiting for information. This means that tasks are completed quicker, and the overall
performance of the computer is improved. Understanding cache memory helps us
appreciate how computers manage to process information so swiftly, making our everyday
digital experiences smoother and more responsive.

Let's start at the beginning and talk about what caching even is.
Caching is the process of storing some data near where It's supposed to be used rather than
accessing them from an expensive origin, every time a request comes in.
Caches are everywhere. From your CPU to your browser. So there's no doubt that caching is
extremely useful. implementing a high-performance cache system comes with its own set of
challenges. In this post, we'll focus on cache replacement algorithms.

from wikipedia.com
Cache Replacement Algorithms
We talked about what caching is and how we can utilize it but there's a dinosaur in the
room; Our cache storage is finite. Especially in caching environments where high-
performance and expensive storage is used. So in short, we have no choice but to evict some
objects and keep others.
Cache replacement algorithms do just that. They decide which objects can stay and which
objects should be evicted.
After reviewing some of the most important algorithms we go through some of the
challenges that we might encounter.
LRU
The least recently used (LRU) algorithm is one of the most famous cache replacement
algorithms and for good reason!
As the name suggests, LRU keeps the least recently used objects at the top and evicts
objects that haven't been used in a while if the list reaches the maximum capacity.
So it's simply an ordered list where objects are moved to the top every time they're
accessed; pushing other objects down.
LRU is simple and providers a nice cache-hit rate for lots of use-cases.
LFU
the least frequently used (LFU) algorithm works similarly to LRU except it keeps track of how
many times an object was accessed instead of how recently it was accessed.
Each object has a counter that counts how many times it was accessed. When the list
reaches the maximum capacity, objects with the lowest counters are evicted.
LFU has a famous problem. Imagine an object was repeatedly accessed for a short period
only. Its counter increases by a magnitude compared to others so it's very hard to evict this
object even if it's not accessed for a long time.
FIFO
FIFO (first-in-first-out) is also used as a cache replacement algorithm and behaves exactly as
you would expect. Objects are added to the queue and are evicted with the same order.
Even though it provides a simple and low-cost method to manage the cache but even the
most used objects are eventually evicted when they're old enough.
from wikipedia.com
Random Replacement (RR)
This algorithm randomly selects an object when it reaches maximum capacity. It has the
benefit of not keeping any reference or history of objects and being very simple to
implement at the same time.
This algorithm has been used in ARM processors and the famous Intel i860.

Memory Interleaving
Last Updated : 31 Jul, 2021
Generative Summary
Now you can generate the summary of any article of your choice.
Got it

Prerequisite – Virtual Memory


Abstraction is one of the most important aspects of computing. It is a widely implemented
Practice in the Computational field.
Memory Interleaving is less or More an Abstraction technique. Though it’s a bit different
from Abstraction. It is a Technique that divides memory into a number of modules such that
Successive words in the address space are placed in the Different modules.
Consecutive Word in a Module:
Figure-1: Consecutive Word in a Module
Let us assume 16 Data’s to be Transferred to the Four Module. Where Module 00 be Module
1, Module 01 be Module 2, Module 10 be Module 3 & Module 11 be Module 4. Also, 10, 20,
30….130 are the data to be transferred.
From the figure above in Module 1, 10 [Data] is transferred then 20, 30 & finally, 40 which
are the Data. That means the data are added consecutively in the Module till its max
capacity.
Most significant bit (MSB) provides the Address of the Module & the least significant bit
(LSB) provides the address of the data in the module.
For Example, to get 90 (Data) 1000 will be provided by the processor. This 10 will indicate
that the data is in module 10 (module 3) & 00 is the address of 90 in Module 10 (module 3).
So,

Module 1 Contains Data : 10, 20, 30, 40


Module 2 Contains Data : 50, 60, 70, 80
Module 3 Contains Data : 90, 100, 110, 120
Module 4 Contains Data : 130, 140, 150, 160
Consecutive Word in Consecutive Module:

Figure-2: Consecutive Word in Consecutive Module


Now again we assume 16 Data’s to be transferred to the Four Module. But Now the
consecutive Data are added in Consecutive Module. That is, 10 [Data] is added in Module 1,
20 [Data] in Module 2 and So on.
Least Significant Bit (LSB) provides the Address of the Module & Most significant bit (MSB)
provides the address of the data in the module.
For Example, to get 90 (Data) 1000 will be provided by the processor. This 00 will indicate
that the data is in module 00 (module 1) & 10 is the address of 90 in Module 00 (module 1).
That is,

Module 1 Contains Data : 10, 50, 90, 130


Module 2 Contains Data : 20, 60, 100, 140
Module 3 Contains Data : 30, 70, 110, 150
Module 4 Contains Data : 40, 80, 120, 160
Why do we use Memory Interleaving? [Advantages]:
Whenever Processor requests Data from the main memory. A block (chunk) of Data is
Transferred to the cache and then to Processor. So whenever a cache miss occurs the Data is
to be fetched from the main memory. But main memory is relatively slower than the cache.
So to improve the access time of the main memory interleaving is used.
We can access all four Modules at the same time thus achieving Parallelism. From Figure 2
the data can be acquired from the Module using the Higher bits. This method Uses memory
effectively.

hit rate and miss penalty in cache memory

A great topic in computer architecture!


Hit Rate: The hit rate, also known as the cache hit ratio, is the percentage of memory
accesses that find the required data in the cache. It's a measure of how often the cache
provides the needed data, reducing the need for slower main memory accesses.
Hit Rate = (Number of cache hits) / (Number of cache accesses)
A higher hit rate indicates better cache performance, as it reduces the number of main
memory accesses, which are slower and more energy-consuming.
Miss Penalty: A miss penalty, also known as a cache miss penalty, is the additional time it
takes to access main memory when the cache doesn't have the required data. This penalty is
incurred when the processor needs to fetch data from main memory, which is slower than
accessing the cache.
The miss penalty is typically measured in clock cycles and can be broken down into several
components:
1. Cache miss detection: The time it takes for the processor to determine that the
cache doesn't have the required data.
2. Memory access: The time it takes to access main memory to retrieve the required
data.
3. Data transfer: The time it takes to transfer the data from main memory to the cache.
The miss penalty can be significant, often ranging from tens to hundreds of clock cycles,
depending on the system architecture and memory technology.
To minimize the miss penalty, cache hierarchies are often used, where smaller, faster caches
(e.g., L1, L2) are used to filter out most cache misses, and larger, slower caches (e.g., L3, L4)
or main memory are used to handle the remaining misses.
Would you like me to elaborate on any of these topics or discuss related concepts, such as
cache replacement policies or cache coherence protocols?

caches on processor chip

Caches on a processor chip are small, fast memory storage locations that store frequently-
used data or instructions. They act as a buffer between the main memory and the processor,
providing quick access to the data and instructions that the processor needs to execute.
Here's a breakdown of the different levels of caches typically found on a processor chip:
Level 1 Cache (L1 Cache)
The L1 cache is the smallest and fastest cache level, built into the processor core. It's usually
around 8-64 KB in size and has a very low latency (around 1-2 clock cycles).
Level 2 Cache (L2 Cache)
The L2 cache is larger than the L1 cache, typically ranging from 256 KB to 512 KB in size. It's
also built into the processor chip and has a slightly higher latency than the L1 cache (around
5-10 clock cycles).
Level 3 Cache (L3 Cache)
The L3 cache is shared among multiple processor cores in a multi-core processor. It's usually
larger than the L2 cache, ranging from 1 MB to 64 MB in size. The latency is higher than the
L2 cache, but still lower than the main memory (around 20-30 clock cycles).
Benefits of Caches
Caches provide several benefits, including:
• Improved performance: By storing frequently-used data and instructions in a fast,
local memory, caches reduce the time it takes for the processor to access the main
memory.
• Reduced power consumption: Caches reduce the number of times the processor
needs to access the main memory, which consumes more power.
• Increased bandwidth: Caches can handle multiple requests simultaneously,
increasing the overall bandwidth of the processor.

Virtual memory address translation is a crucial concept in computer organization.


What is Virtual Memory?
Virtual memory is a memory management technique that allows a computer to use more
memory than is physically available in the system's RAM (Random Access Memory). It uses a
combination of RAM and secondary storage (hard disk or solid-state drive) to provide a large
address space for programs to run.
Address Translation
In a virtual memory system, the CPU generates virtual addresses, which are not the actual
physical addresses in memory. The virtual addresses need to be translated into physical
addresses before the CPU can access the memory. This process is called address translation.
Address Translation Steps
Here are the steps involved in virtual memory address translation:
1. Virtual Address Generation: The CPU generates a virtual address, which is divided
into two parts: a page number and an offset.
2. Page Table Lookup: The CPU uses the page number to index into a page table, which
is a data structure that contains the mapping of virtual pages to physical pages.
3. Page Table Entry (PTE) Retrieval: The CPU retrieves the Page Table Entry (PTE)
corresponding to the virtual page number.
4. Physical Page Number Retrieval: The PTE contains the physical page number, which
is used to locate the physical page in memory.
5. Offset Addition: The CPU adds the offset to the physical page number to generate
the physical address.
6. Memory Access: The CPU accesses the memory location using the physical address.
Types of Address Translation
There are two types of address translation:
1. Hardware-based Translation: The CPU's Memory Management Unit (MMU)
performs address translation using a Translation Lookaside Buffer (TLB).
2. Software-based Translation: The operating system performs address translation
using a page table.
Advantages of Virtual Memory Address Translation
1. Memory Virtualization: Virtual memory allows multiple programs to share the same
physical memory space.
2. Memory Protection: Virtual memory provides memory protection by preventing a
program from accessing memory regions allocated to other programs.
3. Efficient Memory Use: Virtual memory allows efficient use of physical memory by
swapping out inactive pages to secondary storage.

You might also like