COMPUTER ORGANIZATION (UNIT - 2) - Note
COMPUTER ORGANIZATION (UNIT - 2) - Note
BRANCHING INSTRUCTIONS
• Control sequence for an unconditional branch instruction is as follows:
1) PCout, MARin, Read, Select4,Add,Zin
2) Zout, PCin, Yin, WMFC
3) MDRout, IRin
4) Offset-field-of-IRout, Add, Zin
5) Zout, PCin, End
• The processing starts, as usual, the fetch phase ends in step3.
• In step 4, the offset-value is extracted from IR by instruction-decoding circuit.
• Since the updated value of PC is already available in register Y, the offset X is gated
onto the bus, and an addition operation is performed.
• In step 5, the result, which is the branch-address, is loaded into the PC.
• The offset X used in a branch instruction is usually the difference between the
branch target-address and the address immediately following the branch instruction. (For
example, if the branch instruction is at location 1000 and branch target-address is 1200, then
the value of X must be 196, since the PC will be containing the address 1004 after fetching
the instruction at location 1000).
• In case of conditional branch, we need to check the status of the condition-codes
before loading a new value into the PC.
e.g.: Offset-field-of-IRout, Add, Zin, If N=0 then End
If N=0, processor returns to step 1 immediately after step 4.
If N=1, step 5 is performed to load a new value into PC.
HARDWIRED CONTROL
• Decoder/encoder block is a combinational-circuit that generates required control-
outputs depending on state of all its inputs.
• Step-decoder provides a separate signal line for each step in the control sequence.
Similarly, output of instruction-decoder consists of a separate line for each machine
instruction.
• For any instruction loaded in IR, one of the output-lines INS1 through INSm is set to
1, and all other lines are set to 0.
• The input signals to encoder-block are combined to generate the individual
control-signals Yin, PCout, Add, End and so on.
• For example, Zin=T1+T6.ADD+T4.BR ;This signal is asserted during time-slot T1 for
all instructions, during T6 for an Add instruction during T4 for unconditional branch
instruction
• When RUN=1, counter is incremented by 1 at the end of every clock cycle. When
RUN=0, counter stops counting.
• Sequence of operations carried out by this machine is determined by wiring of logic
elements, hence the name “hardwired”.
• Advantage: Can operate at high speed. Disadvantage: Limited flexibility.
COMPLETE PROCESSOR
• This has separate processing-units to deal with integer data and floating-point
data.
• A data-cache is inserted between these processing-units & main-memory.
• Instruction-unit fetches instructions
→ from an instruction-cache or
→ from main-memory when desired instructions are not already in cache
• Processor is connected to system-bus & hence to the rest of the computer by
means of a bus interface
• Using separate caches for instructions & data is common practice in many
processors today.
• A processor may include several units of each type to increase the potential for
concurrent operations.
MICROPROGRAMMED CONTROL
• Control-signals are generated by a program similar to machine language programs.
• Control word(CW) is a word whose individual bits represent various control-signals(like
Add, End, Zin). {Each of the control-steps in control sequence of an instruction defines a
unique combination of 1s & 0s in the CW}.
• Individual control-words in microroutine are referred to as microinstructions.
• A sequence of CWs corresponding to control-sequence of a machine instruction
constitutes the microroutine.
• The microroutines for all instructions in the instruction-set of a computer are
stored in a special memory called the control store(CS).
• Control-unit generates control-signals for any instruction by sequentially reading
CWs of corresponding microroutine from CS.
• Microprogram counter(µPC) is used to read CWs sequentially from CS.
• Every time a new instruction is loaded into IR, output of "starting address
generator" is loaded into µPC.
• Then, µPC is automatically incremented by clock,
MICROINSTRUCTIONS
• Drawbacks of microprogrammed control:
2) Available bit-space is poorly used because only a few bits are set to 1 in any given
microinstruction.
• Grouping control-signals into fields requires a little more hardware because decoding-circuits must
be used to decode bit patterns of each field into individual control signals.
• Advantage: This method results in a smaller control-store (only 20 bits are needed to store the
patterns for the 42 signals).
MICROPROGRAMMING SEQUENCING
• Two major disadvantage of microprogrammed control is:
1) Having a separate microroutine for each machine instruction results in a large total
number of microinstructions and a large control-store.
2) Execution time is longer because it takes more time to carry out the required branches.
• Consider the instruction Add src,Rdst ;which adds the source-operand to the contents of
Rdst and places the sum in Rdst.
• Each box in the chart corresponds to a microinstruction that controls the transfers and
operations indicated within the box.
• The microinstruction is located at the address indicated by the octal number (001,002).
WIDE BRANCH ADDRESSING
• The instruction-decoder(InstDec) generates the starting-address of the microroutine that
implements the instruction that has just been loaded into the IR.
• Here, register IR contains the Add instruction, for which the instruction decoder generates
the microinstruction address 101. (However, this address cannot be loaded as is into the μPC).
Use of WMFC
• WMFC signal is issued at location 112 which causes a branch to the microinstruction in
location 171.
• WMFC signal means that the microinstruction may take several clock cycles to complete. If
the branch is allowed to happen in the first clock cycle, the microinstruction at location 171 would be
fetched and executed prematurely. To avoid this problem, WMFC signal must inhibit any change in
the contents of the μPC during the waiting-period.
Detailed Examination
• Consider Add (Rsrc)+,Rdst; which adds Rsrc content to Rdst content, then stores the sum in
Rdst and finally increments Rsrc by 4 (i.e. auto-increment mode).
• In bit 10 and 9, bit-patterns 11, 10, 01 and 00 denote indexed, auto-decrement, auto-
increment and register modes respectively. For each of these modes, bit 8 is used to specify the
indirect version
• The processor has 16 registers that can be used for addressing purposes;
eachspecified using a 4-bit-code.
1) The microinstruction field must be decoded to determine that an Rsrc or Rdst register is involved.
2) The decoded output is then used to gate the contents of the Rsrc or Rdst fields in the IR into a
second decoder, which produces the gating-signals for the actual registers R0 TO R15.
• Solution: Include an address-field as a part of every microinstruction to indicate the location of the
next microinstruction to be fetched. (This means every microinstruction becomes a branch
microinstruction).
• The flexibility of this approach comes at the expense of additional bits for the address-field.
• Advantage: Separate branch microinstructions are virtually eliminated. There are few limitations in
assigning addresses to microinstructions. There is no need for a counter to keep track of sequential
addresse. Hence, the μPC is replaced with a μAR (Microinstruction Address Register). {which is
loaded from the next-address field in each microinstruction}.
• The next-address bits are fed through the OR gate to the μAR, so that the address can be modified
on the basis of the data in the IR, external inputs and condition-codes.
• The decoding circuits generate the starting-address of a given microroutine on the basis of the
opcode in the IR.
PREFETCHING MICROINSTRUCTIONS
• Drawback of microprogrammed control: Slower operating speed because of the time it takes to
fetch microinstructions from the control-store.
• Solution: Faster operation is achieved if the next microinstruction is pre-fetched while the current
one is being executed.
Emulation
• The main function of microprogrammed control is to provide a means for simple, flexible and
relatively inexpensive execution of machine instruction.
• Suppose we add to the instruction-repository of a given computer M1, an entirely new set of
instructions that is in fact the instruction-set of a different computer M2.
• Programs written in the machine language of M2 can be then be run on computer M1 i.e. M1
emulates M2.
• If the replacement computer fully emulates the original one, then no software changes have to be
made to run existing programs.
CACHE MEMORIES
MAPPING FUNCTION
Prefetching is a computer system process that involves bringing data into a cache before it's
needed, which helps reduce the time it takes to fetch data from memory. Prefetching can be
done automatically or explicitly by programmers to optimize performance.
• Reduced latency
Prefetching can reduce latency by preloading resources before they're needed. This means
that when a user performs an action, the content is already stored in the cache, which
reduces the time it takes to load the page.
• Faster access
Because cache memories are typically much faster to access than main memory, prefetching
data and then accessing it from caches is usually much faster than accessing it directly from
main memory
• Running software
Emulation can be used to run software programs or operating systems on hardware or in
operating systems that they weren't originally designed for.
• Running peripherals
Emulation can be used to run peripherals designed for a different system.
• Connecting devices
Emulation can be used to connect devices to each other or to a mainframe computer.
Rosetta 2 is an example of an emulator that uses emulation technology to allow a Mac with
Apple silicon to run applications designed for a Mac with an Intel CPU.
Cache memory is a small, high-speed storage area in a computer. The cache is a smaller and
faster memory that stores copies of the data from frequently used main memory locations.
There are various independent caches in a CPU, which store instructions and data. The most
important use of cache memory is that it is used to reduce the average time to access data
from the main memory.
By storing this information closer to the CPU, cache memory helps speed up the overall
processing time. Cache memory is much faster than the main memory (RAM). When the
CPU needs data, it first checks the cache. If the data is there, the CPU can access it quickly. If
not, it must fetch the data from the slower main memory.
Characteristics of Cache Memory
• Cache memory is an extremely fast memory type that acts as a buffer
between RAM and the CPU.
• Cache Memory holds frequently requested data and instructions so that they are
immediately available to the CPU when needed.
• Cache memory is costlier than main memory or disk memory but more economical
than CPU registers.
• Cache Memory is used to speed up and synchronize with a high-speed CPU.
Understanding cache memory and its role in computer architecture is crucial for excelling in
exams like GATE, where computer organization is a core topic. To deepen your
understanding and enhance your exam preparation, consider enrolling in the GATE CS Self-
Paced Course . This course offers in-depth coverage of computer architecture, including
detailed explanations of cache memory and its optimization, helping you build the expertise
needed to perform well in your exams.
Cache Memory
Levels of Memory
• Level 1 or Register: It is a type of memory in which data is stored and accepted that
are immediately stored in the CPU. The most commonly used register is
Accumulator, Program counter , Address Register, etc.
• Level 2 or Cache memory: It is the fastest memory that has faster access time where
data is temporarily stored for faster access.
• Level 3 or Main Memory: It is the memory on which the computer works currently. It
is small in size and once power is off data no longer stays in this memory.
• Level 4 or Secondary Memory: It is external memory that is not as fast as the main
memory but data stays permanently in this memory.
Cache Performance
When the processor needs to read or write a location in the main memory, it first checks for
a corresponding entry in the cache.
• If the processor finds that the memory location is in the cache, a Cache Hit has
occurred and data is read from the cache.
• If the processor does not find the memory location in the cache, a cache miss has
occurred. For a cache miss, the cache allocates a new entry and copies in data from
the main memory, then the request is fulfilled from the contents of the cache.
The performance of cache memory is frequently measured in terms of a quantity called Hit
ratio.
Hit Ratio(H) = hit / (hit + miss) = no. of hits/total accesses
Miss Ratio = miss / (hit + miss) = no. of miss/total accesses = 1 - hit ratio(H)
We can improve Cache performance using higher cache block size, and higher associativity,
reduce miss rate, reduce miss penalty, and reduce the time to hit in the cache.
Cache Mapping
There are three different types of mapping used for the purpose of cache memory which is
as follows:
• Direct Mapping
• Associative Mapping
• Set-Associative Mapping
1. Direct Mapping
The simplest technique, known as direct mapping, maps each block of main memory into
only one possible cache line. or In Direct mapping, assign each memory block to a specific
line in the cache. If a line is previously taken up by a memory block when a new block needs
to be loaded, the old block is trashed. An address space is split into two parts index field and
a tag field. The cache is used to store the tag field whereas the rest is stored in the main
memory. Direct mapping`s performance is directly proportional to the Hit ratio.
i = j modulo m
where
i = cache line number
j = main memory block number
m = number of lines in the cache
Direct Mapping
For purposes of cache access, each main memory address can be viewed as consisting of
three fields. The least significant w bits identify a unique word or byte within a block of main
memory. In most contemporary machines, the address is at the byte level. The remaining s
bits specify one of the 2 s blocks of main memory. The cache logic interprets these s bits as a
tag of s-r bits (the most significant portion) and a line field of r bits. This latter field identifies
one of the m=2 r lines of the cache. Line offset is index bits in the direct mapping.
where
i = cache set number
j = main memory block number
v = number of sets
m = number of lines in the cache number of sets
k = number of lines in each set
• Whenever the system is turned off, data and instructions stored in cache memory get
destroyed.
• The high cost of cache memory increases the price of the Computer System.
Conclusion
In conclusion, cache memory plays a crucial role in making computers faster and more
efficient. By storing frequently accessed data close to the CPU , it reduces the time the CPU
spends waiting for information. This means that tasks are completed quicker, and the overall
performance of the computer is improved. Understanding cache memory helps us
appreciate how computers manage to process information so swiftly, making our everyday
digital experiences smoother and more responsive.
Let's start at the beginning and talk about what caching even is.
Caching is the process of storing some data near where It's supposed to be used rather than
accessing them from an expensive origin, every time a request comes in.
Caches are everywhere. From your CPU to your browser. So there's no doubt that caching is
extremely useful. implementing a high-performance cache system comes with its own set of
challenges. In this post, we'll focus on cache replacement algorithms.
from wikipedia.com
Cache Replacement Algorithms
We talked about what caching is and how we can utilize it but there's a dinosaur in the
room; Our cache storage is finite. Especially in caching environments where high-
performance and expensive storage is used. So in short, we have no choice but to evict some
objects and keep others.
Cache replacement algorithms do just that. They decide which objects can stay and which
objects should be evicted.
After reviewing some of the most important algorithms we go through some of the
challenges that we might encounter.
LRU
The least recently used (LRU) algorithm is one of the most famous cache replacement
algorithms and for good reason!
As the name suggests, LRU keeps the least recently used objects at the top and evicts
objects that haven't been used in a while if the list reaches the maximum capacity.
So it's simply an ordered list where objects are moved to the top every time they're
accessed; pushing other objects down.
LRU is simple and providers a nice cache-hit rate for lots of use-cases.
LFU
the least frequently used (LFU) algorithm works similarly to LRU except it keeps track of how
many times an object was accessed instead of how recently it was accessed.
Each object has a counter that counts how many times it was accessed. When the list
reaches the maximum capacity, objects with the lowest counters are evicted.
LFU has a famous problem. Imagine an object was repeatedly accessed for a short period
only. Its counter increases by a magnitude compared to others so it's very hard to evict this
object even if it's not accessed for a long time.
FIFO
FIFO (first-in-first-out) is also used as a cache replacement algorithm and behaves exactly as
you would expect. Objects are added to the queue and are evicted with the same order.
Even though it provides a simple and low-cost method to manage the cache but even the
most used objects are eventually evicted when they're old enough.
from wikipedia.com
Random Replacement (RR)
This algorithm randomly selects an object when it reaches maximum capacity. It has the
benefit of not keeping any reference or history of objects and being very simple to
implement at the same time.
This algorithm has been used in ARM processors and the famous Intel i860.
Memory Interleaving
Last Updated : 31 Jul, 2021
Generative Summary
Now you can generate the summary of any article of your choice.
Got it
•
Caches on a processor chip are small, fast memory storage locations that store frequently-
used data or instructions. They act as a buffer between the main memory and the processor,
providing quick access to the data and instructions that the processor needs to execute.
Here's a breakdown of the different levels of caches typically found on a processor chip:
Level 1 Cache (L1 Cache)
The L1 cache is the smallest and fastest cache level, built into the processor core. It's usually
around 8-64 KB in size and has a very low latency (around 1-2 clock cycles).
Level 2 Cache (L2 Cache)
The L2 cache is larger than the L1 cache, typically ranging from 256 KB to 512 KB in size. It's
also built into the processor chip and has a slightly higher latency than the L1 cache (around
5-10 clock cycles).
Level 3 Cache (L3 Cache)
The L3 cache is shared among multiple processor cores in a multi-core processor. It's usually
larger than the L2 cache, ranging from 1 MB to 64 MB in size. The latency is higher than the
L2 cache, but still lower than the main memory (around 20-30 clock cycles).
Benefits of Caches
Caches provide several benefits, including:
• Improved performance: By storing frequently-used data and instructions in a fast,
local memory, caches reduce the time it takes for the processor to access the main
memory.
• Reduced power consumption: Caches reduce the number of times the processor
needs to access the main memory, which consumes more power.
• Increased bandwidth: Caches can handle multiple requests simultaneously,
increasing the overall bandwidth of the processor.