0% found this document useful (0 votes)
60 views42 pages

Coa 2..2

The document discusses the memory hierarchy in computer systems. It describes the different levels of memory from fastest to slowest as CPU registers, cache memory, main memory, secondary storage, and tertiary storage. The key aspects of each level are speed of access, storage capacity, cost per bit, and whether it is internal or external to the CPU. Cache memory is discussed in more detail as a faster memory located between the CPU and main memory that stores frequently accessed data to bridge differences in access times. The document contrasts cache memory with main memory and describes the concept of cache hits and misses.

Uploaded by

Raman Ray 105
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
60 views42 pages

Coa 2..2

The document discusses the memory hierarchy in computer systems. It describes the different levels of memory from fastest to slowest as CPU registers, cache memory, main memory, secondary storage, and tertiary storage. The key aspects of each level are speed of access, storage capacity, cost per bit, and whether it is internal or external to the CPU. Cache memory is discussed in more detail as a faster memory located between the CPU and main memory that stores frequently accessed data to bridge differences in access times. The document contrasts cache memory with main memory and describes the concept of cache hits and misses.

Uploaded by

Raman Ray 105
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 42

Memory organization: Memory hierarchy 2.2.

1
Memory hierarchy
The computer memory hierarchy looks like a pyramid structure which is used to describe the
differences among memory types. It separates the computer storage based on hierarchy.
Level 0: CPU registers
Level 1: Cache memory
Level 2: Main memory or primary memory
Level 3: Magnetic disks or secondary memory
Level 4: Optical disks or magnetic types or tertiary Memory

In Memory Hierarchy the cost of memory, capacity is inversely proportional to speed. Here
the devices are arranged in a manner Fast to slow, that is form register to Tertiary memory.
Let us discuss each level in detail:
Level-0 − Registers
The registers are present inside the CPU. As they are present inside the CPU, they have least
access time. Registers are most expensive and smallest in size generally in kilobytes. They
are implemented by using Flip-Flops.
Level-1 − Cache
Cache memory is used to store the segments of a program that are frequently accessed by the
processor. It is expensive and smaller in size generally in Megabytes and is implemented by
using static RAM.
Level-2 − Primary or Main Memory
It directly communicates with the CPU and with auxiliary memory devices through an I/O
processor. Main memory is less expensive than cache memory and larger in size generally in
Gigabytes. This memory is implemented by using dynamic RAM.
Level-3 − Secondary storage
Secondary storage devices like Magnetic Disk are present at level 3. They are used as
backup storage. They are cheaper than main memory and larger in size generally in a few
TB.
Level-4 − Tertiary storage
Tertiary storage devices like magnetic tape are present at level 4. They are used to store
removable files and are the cheapest and largest in size (1-20 TB).
Let us see the memory levels in terms of size, access time, bandwidth.

Level Register Cache Primary Secondary


memory memory
Bandwidth 4k to 32k 800 to 5k 400 to 2k 4 to 32 MB/sec
MB/sec MB/sec MB/sec
Size Less than 1KB Less than 4MB Less than 2 GB Greater than 2
GB
Access time 2 to 5nsec 3 to 10 nsec 80 to 400 nsec 5ms
Managed by Compiler Hardware Operating OS or user
system

Why memory Hierarchy is used in systems?


Memory hierarchy is arranging different kinds of storage present on a computing device
based on speed of access. At the very top, the highest performing storage is CPU registers
which are the fastest to read and write to. Next is cache memory followed by conventional
DRAM memory, followed by disk storage with different levels of performance including
SSD, optical and magnetic disk drives.
To bridge the processor memory performance gap, hardware designers are increasingly
relying on memory at the top of the memory hierarchy to close / reduce the performance gap.
This is done through increasingly larger cache hierarchies (which can be accessed by
processors much faster), reducing the dependency on main memory which is slower.
This Memory Hierarchy Design is divided into 2 main types:
 External Memory or Secondary Memory –
Comprising of Magnetic Disk, Optical Disk, Magnetic Tape i.e. peripheral storage
devices which are accessible by the processor via I/O Module.
 Internal Memory or Primary Memory –
Comprising of Main Memory, Cache Memory & CPU registers. This is directly
accessible by the processor.
These are the following characteristics of Memory Hierarchy Design from above figure:
1. Capacity:
It is the global volume of information the memory can store. As we move from
top to bottom in the Hierarchy, the capacity increases.
2. Access Time:
It is the time interval between the read/write request and the availability of the
data. As we move from top to bottom in the Hierarchy, the access time
increases.
3. Performance:
Earlier when the computer system was designed without Memory Hierarchy
design, the speed gap increases between the CPU registers and Main Memory
due to large difference in access time. This results in lower performance of the
system and thus, enhancement was required. This enhancement was made in the
form of Memory Hierarchy Design because of which the performance of the
system increases. One of the most significant ways to increase system
performance is minimizing how far down the memory hierarchy one has to go to
manipulate data.
4. Cost per bit:
As we move from bottom to top in the Hierarchy, the cost per bit increases i.e.
Internal Memory is costlier than External Memory.

References
Reference Books:
 J.P. Hayes, “Computer Architecture and Organization”, Third Edition.
 Mano, M., “Computer System Architecture”, Third Edition, Prentice Hall.
 Stallings, W., “Computer Organization and Architecture”, Eighth Edition, Pearson Education.
Text Books:
 Carpinelli J.D,” Computer systems organization &Architecture”, Fourth Edition, Addison
Wesley.
 Patterson and Hennessy, “Computer Architecture” , Fifth Edition Morgaon Kauffman.
Reference Website
 Memory Hierarchy Design and its Characteristics - GeeksforGeeks
 What is memory hierarchy? (tutorialspoint.com)

Cache Memory and Associative Memory 2.2.2

A faster and smaller segment of memory whose access time is as close as registers are
known as Cache memory. In a hierarchy of memory, cache memory has access time lesser
than primary memory. Generally, cache memory is very smaller and hence is used as a
buffer.

Need of cache memory


Data in primary memory can be accessed faster than secondary memory but still, access
times of primary memory are generally in few microseconds, whereas CPU is capable of
performing operations in nanoseconds. Due to the time lag between accessing data and
acting of data performance of the system decreases as the CPU is not utilized properly, it
may remain idle for some time. In order to minimize this time gap new segment of memory
is Introduced known as Cache Memory.

How does cache work?

In order to understand the working of cache we must understand few points:


 Cache memory is faster, they can be accessed very fast
 Cache memory is smaller, a large amount of data cannot be stored
Whenever CPU needs any data, it searches for corresponding data in the cache (fast
process) if data is found, it processes the data according to instructions, however, if data is
not found in the cache CPU search for that data in primary memory (slower process) and
loads it into the cache. This ensures frequently accessed data are always found in the cache
and hence minimizes the time required to access the data.

Cache performance

 On searching in the cache if data is found, a cache hit has occurred.


 On searching in the cache if data is not found, a cache miss has occurred.

Performance of cache is measured by the number of cache hits to the number of searches.
This parameter of measuring performance is known as the Hit Ratio.
Hit ratio = (Number of cache hits)/(Number of searches)

Types of Cache Memory

L1 or Level 1 Cache: It is the first level of cache memory that is present inside the
processor. It is present in a small amount inside every core of the processor separately. The
size of this memory ranges from 2KB to 64 KB.
L2 or Level 2 Cache: It is the second level of cache memory that may present inside or
outside the CPU. If not present inside the core, It can be shared between two cores
depending upon the architecture and is connected to a processor with the high-speed bus.
The size of memory ranges from 256 KB to 512 KB.
L3 or Level 3 Cache: It is the third level of cache memory that is present outside the CPU
and is shared by all the cores of the CPU. Some high processors may have this cache. This
cache is used to increase the performance of the L2 and L1 cache. The size of this memory
ranges from 1 MB to 8MB.

Cache vs RAM

Although Cache and RAM both are used to increase the performance of the system there
exists a lot of differences in which they operate to increase the efficiency of the system.
RAM Cache

The cache is smaller in size. Memory


RAM is larger in size compared to cache. ranges from 2KB to a few MB
Memory ranges from 1MB to 16GB generally.

It stores data that is currently processed by the


processor. It holds frequently accessed data.

OS interacts with secondary memory to get data OS interacts with primary memory to
to be stored in Primary Memory or RAM get data to be stored in Cache.

It is ensured that data in RAM are loaded before


access to the CPU. This eliminates RAM miss CPU searches for data in Cache, if not
never. found cache miss occur.

Associative Memory
An associative memory can be treated as a memory unit whose saved information can be
recognized for approach by the content of the information itself instead of by an address or
memory location. Associative memory is also known as Content Addressable Memory
(CAM).
The block diagram of associative memory is shown in the figure. It includes a memory array
and logic for m words with n bits per word. The argument registers A and key register K
each have n bits, one for each bit of a word.
The match register M has m bits, one for each memory word. Each word in memory is
related in parallel with the content of the argument register.
The words that connect the bits of the argument register set an equivalent bit in the match
register. After the matching process, those bits in the match register that have been set denote
the fact that their equivalent words have been connected.
Reading is proficient through sequential access to memory for those words whose equivalent
bits in the match register have been set.
An associative memory can be treated as a memory unit whose saved information can be
recognized for approach by the content of the information itself instead of by an address or
memory location. Associative memory is also known as Content Addressable Memory
(CAM).
The block diagram of associative memory is shown in the figure. It includes a memory array
and logic for m words with n bits per word. The argument register A and key register K each
have n bits, one for each bit of a word.
The match register M has m bits, one for each memory word. Each word in memory is
related in parallel with the content of the argument register.
The words that connect the bits of the argument register set an equivalent bit in the match
register. After the matching process, those bits in the match register that have been set denote
the fact that their equivalent words have been connected.
Reading is proficient through sequential access to memory for those words whose equivalent
bits in the match register have been set.

The key register supports a mask for selecting a specific field or key in the argument word.
The whole argument is distinguished with each memory word if the key register includes all
1's.
Hence, there are only those bits in the argument that have 1's in their equivalent position of
the key register are compared. Therefore, the key gives a mask or recognizing a piece of data
that determines how the reference to memory is created.
The following figure can define the relation between the memory array and the external
registers in associative memory.
The cells in the array are considered by the letter C with two subscripts. The first subscript
provides the word number and the second determines the bit position in the word. Therefore,
cell Cij is the cell for bit j in word i.
A bit in the argument register is compared with all the bits in column j of the array supported
that Kj = 1. This is completed for all columns j = 1, 2, . . ., n.
If a match appears between all the unmasked bits of the argument and the bits in word i, the
equivalent bit Mi in the match register is set to 1. If one or more unmasked bits of the
argument and the word do not match, Mi is cleared to 0.
Associative Cache
A type of CACHE designed to solve the problem of cache CONTENTION that plagues the
DIRECT MAPPED CACHE. In a fully associative cache, a data block from any memory
address may be stored into any CACHE LINE, and the whole address is used as the cache
TAG: hence, when looking for a match, all the tags must be compared simultaneously with
any requested address, which demands expensive extra hardware. However, contention is
avoided completely, as no block need ever be flushed unless the whole cache is full, and
then the least recently used may be chosen.
A set-associative cache is a compromise solution in which the cache lines are divided into
sets, and the middle bits of its address determine which set a block will be stored in: within
each set the cache remains fully associative. A cache that has two lines per set is called two-
way set-associative and requires only two tag comparisons per access, which reduces the
extra hardware required. A DIRECT MAPPED CACHE can be thought of as being one-way
set associative, while a fully associative cache is n-way associative where n is the total
number of cache lines. Finding the right balance between associativity and total cache
capacity for a particular processor is a fine art-various current cpus employ 2 way, 4-way
and 8-way designs.
Associative memory is also known as content addressable memory (CAM) or associative
storage or associative array. It is a special type of memory that is optimized for performing
searches through data, as opposed to providing a simple direct access to the data based on the
address.
It can store the set of patterns as memories when the associative memory is being presented
with a key pattern, it responds by producing one of the stored patterns which closely
resembles or relates to the key pattern.
It can be viewed as data correlation here. input data is correlated with that of stored data in
the CAM.
It forms of two types:
Associative memory of conventional semiconductor memory (usually RAM) with added
comparison circuity that enables a search operation to complete in a single clock cycle. It is a
hardware search engine, a special type of computer memory used in certain very high
searching applications. Applications of Associative memory:
It can be only used in memory allocation format.
It is widely used in the database management systems, etc.
Advantages of Associative memory:
1. It is used where search time needs to be less or short.
2. It is suitable for parallel searches.
3. It is often used to speedup databases.
4. It is used in page tables used by the virtual memory and used in neural networks.
Disadvantages of Associative memory:
1. It is more expensive than RAM.
2. Each cell must have storage capability and logical circuits for matching its
content with external argument.

Associative Memory

The time required to find an object stored in memory can be significantly reduced if the
stored data can be identified by the content of the data for its own use rather than by access.
A memory unit accessed by a material is known as an associative memory or a content
addressable memory (CAM). This type of memory is accessed simultaneously and in
parallel based on the data content rather than the specific address or location. if a word is
written in associative memory, no address is given. Memory is capable of finding empty
unused space to store the word, or part of the word specified. memory detects all words that
match the specified content and marks them for reading.

Cache Memory

If the active part of the program and data can be kept in fast memory, the total execution
time can be reduced significantly. Such memory is known as cache memory, which is
inserted between the CPU and the main memory. To make this arrangement effective. The
cache needs to be much faster than main memory. This approach is more economical than
the use of fast memory devices to implement the entire main memory.

Differences between associative and cache memory:

S.
No. Associative Memory Cache Memory

A memory unit access by content is A fast and small memory is called


1 called associative memory. cache memory.

It reduces the time required to find the It reduces the average memory access
2 item stored in memory. time.

3 Here data accessed by its content. Here, data are accessed by its address.

It is used where search time is very It is used when particular group of


4 short. data is accessed repeatedly.

Its basic characteristic is its logic circuit Its basic characteristic is its fast
5 for matching its content. access.

It is expensive as compared to
6 It is not as expensive as cache memory. associative memory.

It is suitable for parallel data search It is useful in increasing the efficiency


7 mechanism. of data retrieval.

References
Reference Books:
 J.P. Hayes, “Computer Architecture and Organization”, Third Edition.
 Mano, M., “Computer System Architecture”, Third Edition, Prentice Hall.
 Stallings, W., “Computer Organization and Architecture”, Eighth Edition, Pearson Education.
Text Books:
 Carpinelli J.D,” Computer systems organization &Architecture”, Fourth Edition, Addison
Wesley.
 Patterson and Hennessy, “Computer Architecture”, Fifth Edition Morgaon Kauffman.
Reference Website
 Differences between Associative and Cache Memory - GeeksforGeeks
Cache size vs block size 2.2.3
CACHE PERFORMANCE

When the processor needs to read or write a location in main memory, it first checks for a
corresponding entry in the cache.

 If the processor finds that the memory location is in the cache, a cache hit has
occurred and data is read from cache
 If the processor does not find the memory location in the cache, a cache miss has
occurred. For a cache miss, the cache allocates a new entry and copies in data from
main memory, then the request is fulfilled from the contents of the cache.

The performance of cache memory is frequently measured in terms of a quantity called Hit
ratio.

Hit ratio = hit / (hit + miss) = no. of hits/total accesses

We can improve Cache performance using higher cache block size, higher associativity,
reduce miss rate, reduce miss penalty, and reduce the time to hit in the cache.

CACHE LINES

Cache memory is divided into equal size partitions called as cache lines.

 While designing a computer’s cache system, the size of cache lines is an important
parameter.
 The size of cache line affects a lot of parameters in the caching system.

The following results discuss the effect of changing the cache block (or line) size in a caching
system.

Result-01: Effect of Changing Block Size on Spatial Locality-

The larger the block size, better will be the spatial locality.

Explanation-

Keeping the cache size constant, we have-

Case-01: Decreasing the Block Size-


 A smaller block size will contain a smaller number of nearby addresses in it.
 Thus, only smaller number of nearby addresses will be brought into the cache.
 This increases the chances of cache miss which reduces the exploitation of spatial
locality.
 Thus, smaller is the block size, inferior is the spatial locality.

Case-02: Increasing the Block Size-


 A larger block size will contain a larger number of nearby addresses in it.
 Thus, larger number of nearby addresses will be brought into the cache.
 This increases the chances of cache hit which increases the exploitation of spatial
locality.
 Thus, larger is the block size, better is the spatial locality.

Result-02: Effect of Changing Block Size on Cache Tag in Direct Mapped Cache-

In direct mapped cache, block size does not affect the cache tag anyhow.

Explanation-

Keeping the cache size constant, we have-

Case-01: Decreasing the Block Size-


 Decreasing the block size increases the number of lines in cache.
 With the decrease in block size, the number of bits in block offset decreases.
 However, with the increase in the number of cache lines, number of bits in line
number increases.
 So, number of bits in line number + number of bits in block offset = remains constant.
 Thus, there is no effect on the cache tag.

Example-
Case-02: Increasing the Block Size-
 Increasing the block size decreases the number of lines in cache.
 With the increase in block size, the number of bits in block offset increases.
 However, with the decrease in the number of cache lines, number of bits in line
number decreases.
 Thus, number of bits in line number + number of bits in block offset = remains
constant.
 Thus, there is no effect on the cache tag.

Example-
Result-03: Effect of Changing Block Size on Cache Tag in Fully Associative Cache-

In fully associative cache, on decreasing block size, cache tag is reduced and vice versa.

Explanation-

Keeping the cache size constant, we have-

Case-01: Decreasing the Block Size-

 Decreasing the block size decreases the number of bits in block offset.
 With the decrease in number of bits in block offset, number of bits in tag increases.

Case-02: Increasing the Block Size-

 Increasing the block size increases the number of bits in block offset.
 With the increase in number of bits in block offset, number of bits in tag decreases.

Result-04: Effect of Changing Block Size on Cache Tag in Set Associative Cache-

In set associative cache, block size does not affect cache tag anyhow.

Explanation-

Keeping the cache size constant, we have-

Case-01: Decreasing the Block Size-


 Decreasing the block size increases the number of lines in cache.
 With the decrease in block size, number of bits in block offset decreases.
 With the increase in the number of cache lines, number of sets in cache increases.
 With the increase in number of sets in cache, number of bits in set number increases.
 So, number of bits in set number + number of bits in block offset = remains constant.
 Thus, there is no effect on the cache tag.

Example-
Case-02: Increasing the Block Size-

 Increasing the block size decreases the number of lines in cache.


 With the increase in block size, number of bits in block offset increases.
 With the decrease in the number of cache lines, number of sets in cache decreases.
 With the decrease in number of sets in cache, number of bits in set number decreases.
 So, number of bits in set number + number of bits in block offset = remains constant.
 Thus, there is no effect on the cache tag.

Example-
Result-05: Effect of Changing Block Size on Cache Miss Penalty-

A smaller cache block incurs a lower cache miss penalty.

Explanation-
 When a cache miss occurs, block containing the required word has to be brought from
the main memory.
 If the block size is small, then time taken to bring the block in the cache will be less.
 Hence, less miss penalty will incur.
 But if the block size is large, then time taken to bring the block in the cache will be
more.
 Hence, more miss penalty will incur.

Result-06: Effect of Cache Tag on Cache Hit Time-

A smaller cache tag ensures a lower cache hit time.

Explanation-
 Cache hit time is the time required to find out whether the required block is in cache
or not.
 It involves comparing the tag of generated address with the tag of cache lines.
 Smaller is the cache tag, lesser will be the time taken to perform the comparisons.
 Hence, smaller cache tag ensures lower cache hit time.
 On the other hand, larger is the cache tag, more will be time taken to perform the
comparisons.
 Thus, larger cache tag results in higher cache hit time.

PRACTICE PROBLEM BASED ON CACHE LINE-


Problem-

In designing a computer’s cache system, the cache block or cache line size is an important
parameter. Which of the following statements is correct in this context?

1. A smaller block size implies better spatial locality


2. A smaller block size implies a smaller cache tag and hence lower cache tag overhead
3. A smaller block size implies a larger cache tag and hence lower cache hit time
4. A smaller bock size incurs a lower cache miss penalty
Solution-

Option (D) is correct. (Result-05)

Reasons-

Option (A) is incorrect because-

 Smaller block does not imply better spatial locality.


 Always, Larger the block size, better is the spatial locality.

Option (B) is incorrect because-

 In direct mapped cache and set associative cache, there is no effect of changing block
size on cache tag.
 In fully associative mapped cache, on decreasing block size, cache tag becomes
larger.
 Thus, smaller block size does not imply smaller cache tag in any cache organization.

Option (C) is incorrect because-

 “A smaller block size implies a larger cache tag” is true only for fully associative
mapped cache.
 Larger cache tag does not imply lower cache hit time rather cache hit time is
increased.

What is the cache block size?


(4bytes)
Since each cache block is of size 4 bytes, the total number of sets in the cache is 256/4, which
equals 64 sets or cache lines. The incoming address to the cache is divided into bits for offset
and tag.

Where is cache block size in Word?


In a nutshell the block offset bits determine your block size (how many bytes are in a cache
row, how many columns if you will). The index bits determine how many rows are in each
set. The capacity of the cache is therefore 2^(blockoffsetbits + indexbits) * #sets. In this case
that is 2^(4+4) * 4 = 256*4 = 1 kilobyte.

What is the cache block or line size in words)?

Each cache line is 1 word (4 bytes).

What is word size in cache?


A cache memory has a line size of eight 64-bit words and a capacity of 4K words. computer-
architecture cpu-cache. A cache memory has a line size of eight 64-bit words and a capacity
of 4K words.

How do I check my cache size?

Right-click on the Start button and click on Task Manager. 2. On the Task Manager screen,
click on the Performance tab > click on CPU in the left pane. In the right-pane, you will see
L1, L2 and L3 Cache sizes listed under “Virtualization” section.

What are block sizes?

Concrete Block (CMU) Sizes

CMU Size Nominal Dimensions D x H x Actual Dimensions D x H x


L L

4″ CMU Full 4″ x 8″ x 16″ 3 5/8″ x 7 5/8″ x 15 5/8″


Block

4″ CMU Half- 4″ x 8″ x 8″ 3 5/8″ x 7 5/8″ x 7 5/8″


Block

6″ CMU Full 6″ x 8″ x 16″ 5 5/8″ x 7 5/8″ x 15 5/8″


Block
6″ CMU Half- 6″ x 8″ x 8″ 5 5/8″ x 7 5/8″ x 7 5/8″
Block

How do I know my cache memory size?


How do I know my cache line size?

The size of these chunks is called the cache line size. Common cache line sizes are 32, 64 and
128 bytes. A cache can only hold a limited number of lines, determined by the cache size. For
example, a 64 kilobyte cache with 64-byte lines has 1024 cache lines.

What is a cache block?

cache block – The basic unit for cache storage. May contain multiple bytes/words of data.
Because different regions of memory may be mapped into a block, the tag is used to
differentiate between them. valid bit – A bit of information that indicates whether the data in
a block is valid (1) or not (0).

What is a computer’s word size?


The word size of a computer generally indicates the largest integer it can process in a single
instruction, and the size of a memory address, which is usually, but not necessarily the same
as the integer size. The main indication of the word size is how much memory the processor
can address.

How big is the block size of the cache?

In the example the cache block size is 32 bytes, i.e., byte-addressing is being used; with four-
byte words, this is 8 words. As you can see there are four hits out of 12 accesses, so the hit
rate should be 33%.

What’s the line size of a cache memory?

A cache memory has a line size of eight 64-bit words and a capacity of 4K words. A cache
memory has a line size of eight 64-bit words and a capacity of 4K words.

How big is a 64-bit word cache?


A 64-bit word means 8 bytes. Line size: 8 words in a line, means 8 x 8 bytes = 64 bytes in a
line = 2 6 bytes. Cache size: 4k words, meaning 4096 x 8 bytes = 32k total bytes.

How does cache size affect the cache tag?


Thus, there is no effect on the cache tag. A smaller cache block incurs a lower cache miss
penalty. When a cache miss occurs, block containing the required word has to be brought
from the main memory. If the block size is small, then time taken to bring the block in the
cache will be less. Hence, less miss penalty will incur.

Cache size, Block size, Mapping function, Replacement algorithm, and Write policy. These
are explained as following below.

1. Cache Size:

It seems that moderately tiny caches will have a big impact on performance.

2. Block Size:

Block size is the unit of information changed between cache and main memory. As
the block size will increase from terribly tiny to larger sizes, the hit magnitude
relation can initially increase as a result of the principle of locality. the high chance
that knowledge within the neck of the woods of a documented word square measure
possible to be documented within the close to future. As the block size increases, a
lot of helpful knowledge square measure brought into the cache.
a. The hit magnitude relation can begin to decrease, however, because the
block becomes even larger and also the chance of victimization the new
fetched knowledge becomes but the chance of reusing the information that
ought to be abstracted of the cache to form area for the new block.
3. Mapping Function:

When a replacement block of data is scan into the cache, the mapping performs
determines that cache location the block will occupy. Two constraints have an effect
on the planning of the mapping perform. First, once one block is scan in, another
could be replaced.
a. We would wish to do that in such the simplest way to minimize the chance
that we are going to replace a block which will be required within the close
to future. A lot of versatile the mapping performs, a lot of scopes we’ve to
style a replacement algorithmic rule to maximize the hit magnitude relation.
Second, a lot of versatile the mapping performs, a lot of advanced is that the
electronic equipment needed to look the cache to see if a given block is
within the cache.
4. Replacement Algorithm:

The replacement algorithmic rule chooses, at intervals, the constraints of the


mapping perform, which block to interchange once a replacement block is to be
loaded into the cache and also the cache already has all slots full of alternative
blocks. We would wish to replace the block that’s least possible to be required once
more within the close to future. Although it’s impossible to spot such a block, a
fairly effective strategy is to interchange the block that has been within the cache
longest with no relevance.
a. This policy is spoken because of the least-recently-used (LRU) algorithmic
rule. Hardware mechanisms square measure required to spot the least-
recently-used block
5. Write Policy:

If the contents of a block within the cache square measure altered, then it’s
necessary to write down it back to main memory before exchange it. The written
policy dictates once the memory write operation takes place. At one extreme, the
writing will occur whenever the block is updated.
a. At the opposite extreme, the writing happens only if the block is replaced.
The latter policy minimizes memory write operations however leaves the
main memory in associate obsolete state. This can interfere with the
multiple-processor operation and with direct operation by I/O hardware
modules.

References
Reference Books:
 J.P. Hayes, “Computer Architecture and Organization”, Third Edition.
 Mano, M., “Computer System Architecture”, Third Edition, Prentice Hall.
 Stallings, W., “Computer Organization and Architecture”, Eighth Edition, Pearson Education.
Text Books:
 Carpinelli J.D,” Computer systems organization &Architecture”, Fourth Edition, Addison
Wesley.
 Patterson and Hennessy, “Computer Architecture”, Fifth Edition Morgaon Kauffman.
Other References
 (1) New Message! (knowledgeburrow.com)
 Cache Memory Design - GeeksforGeeks
 https://www.gatevidyalay.com/cache-line-cache-line-size-cache-memory/

 https://www.geeksforgeeks.org/cache-memory-in-computer-organization/

 https://stackoverflow.com/questions/8107965/concept-of-block-size-in-a-cache

Basic optimization techniques in cache memory 2.2.4


Generally, in any device, memories that are large (in terms of capacity), fast and affordable
are preferred. But all three qualities can’t be achieved at the same time. The cost of the
memory depends on its speed and capacity. With the Hierarchical Memory System, all
three can be achieved simultaneously.

Memory Hierarchy
The cache is a part of the hierarchy present next to the CPU. It is used in storing the
frequently used data and instructions. It is generally very costly i.e., the larger the cache
memory, the higher the cost. Hence, it is used in smaller capacities to minimize costs. To
make up for its less capacity, it must be ensured that it is used to its full potential.
Optimization of cache performance ensures that it is utilized in a very efficient manner to
its full potential.

Average Memory Access Time (AMAT):

AMAT helps in analyzing the Cache memory and its performance. The lesser the AMAT,
the better the performance is. AMAT can be calculated as,
AMAT = Hit Ratio * Cache access time + Miss Ratio * Main memory access time
= (h * tc) + (1-h) * (tc + tm)
Note: Main memory is accessed only when a cache miss occurs. Hence, cache time is also
included in the main memory access time.
Example 1: What is the average memory access time for a machine with a cache hit rate of
75% and cache access time of 3 ns and main memory access time of 110 ns.
Solution:
Average Memory Access Time(AMAT) = (h * tc) + (1-h) * (tc + tm)
Given,
Hit Ratio(h) = 75/100 = 3/4 = 0.75
Miss Ratio (1-h) = 1-0.75 = 0.25
Cache access time(tc) = 3ns

Main memory access time(effectively) = tc + tm = 3 + 110 = 113 ns


Average Memory Access Time(AMAT) = (0.75 * 3) + (0.25 * (3+110))
= 2.25 + 28.25
= 30.5 ns

Note: AMAT can also be calculated as Hit Time + (Miss Rate * Miss Penalty)

Example 2: Calculate AMAT when Hit Time is 0.9 ns, Miss Rate is 0.04, and Miss
Penalty is 80 ns.
Solution :
Average Memory Access Time(AMAT) = Hit Time + (Miss Rate * Miss Penalty)
Here, Given,
Hit time = 0.9 ns
Miss Rate = 0.04
Miss Penalty = 80 ns
Average Memory Access Time(AMAT) = 0.9 + (0.04*80)
= 0.9 + 3.2
= 4.1 ns
Hence, if Hit time, Miss Rate, and Miss Penalty are reduced, the AMAT reduces which in
turn ensures optimal performance of the cache.

Methods for reducing Hit Time, Miss Rate, and Miss Penalty:

Methods to reduce Hit Time:


1. Small and simple caches: If lesser hardware is required for the implementation of
caches, then it decreases the Hit time because of the shorter critical path through the
Hardware.
2. Avoid Address translation during indexing: Caches that use physical addresses for
indexing are known as a physical cache. Caches that use the virtual addresses for indexing
are known as virtual cache. Address translation can be avoided by using virtual caches.
Hence, they help in reducing the hit time.

Methods to reduce Miss Rate:

1. Larger block size: If the block size is increased, spatial locality can be exploited in an
efficient way which results in a reduction of miss rates. But it may result in an increase in
miss penalties. The size can’t be extended beyond a certain point since it affects negatively
the point of increasing miss rate. Because larger block size implies a lesser number of
blocks which results in increased conflict misses.
2. Larger cache size: Increasing the cache size results in a decrease of capacity misses,
thereby decreasing the miss rate. But, they increase the hit time and power consumption.
3. Higher associativity: Higher associativity results in a decrease in conflict misses.
Thereby, it helps in reducing the miss rate.

Methods to reduce Miss Penalty:

1. Multi-Level Caches: If there is only one level of cache, then we need to decide between
keeping the cache size small in order to reduce the hit time or making it larger so that the
miss rate can be reduced. Both of them can be achieved simultaneously by introducing
cache at the next levels.
Suppose, if a two-level cache is considered:
 The first level cache is smaller in size and has faster clock cycles comparable to
that of the CPU.
 Second-level cache is larger than the first-level cache but has faster clock cycles
compared to that of main memory. This large size helps in avoiding much access
going to the main memory. Thereby, it also helps in reducing the miss penalty.
Hierarchical representation of Memory

2. Critical word first and Early Restart: Generally, the processor requires one word of
the block at a time. So, there is no need of waiting until the full block is loaded before
sending the requested word. This is achieved using:
 The critical word first: It is also called a requested word first. In this method,
the exact word required is requested from the memory and as soon as it arrives,
it is sent to the processor. In this way, two things are achieved, the processor
continues execution, and the other words in the block are read at the same time.
 Early Restart: In this method, the words are fetched in the normal order. When
the requested word arrives, it is immediately sent to the processor which
continues execution with the requested word.
These are the basic methods through which the performance of cache can be optimized.

The following as five categories of activity for optimizing cache performance:

1. Reducing the hit time – Small and simple first-level caches and way-prediction. Both
techniques also generally decrease power consumption.
2. Increasing cache bandwidth – Pipelined caches, multi-banked caches, and non-
blocking caches. These techniques have varying impacts on power consumption.
3. Reducing the miss penalty – Critical word first and merging write buffers. These
optimizations have little impact on power.
4. Reducing the miss rate – Compiler optimizations. Obviously, any improvement at
compile time improves power consumption.
5. Reducing the miss penalty or miss rate via parallelism – Hardware prefetching and
compiler prefetching. These optimizations generally increase power consumption,
primarily due to prefetched data that are unused.

References
Reference Books:
 J.P. Hayes, “Computer Architecture and Organization”, Third Edition.
 Mano, M., “Computer System Architecture”, Third Edition, Prentice Hall.
 Stallings, W., “Computer Organization and Architecture”, Eighth Edition, Pearson
Education.
Text Books:
 Carpinelli J.D,” Computer systems organization &Architecture”, Fourth Edition,
Addison Wesley.
 Patterson and Hennessy, “Computer Architecture”, Fifth Edition Morgaon Kauffman.
Reference Website
 Basic Cache Optimization Techniques - GeeksforGeeks
 Optimizing Cache Memory Performance (And the Math Behind It All) (aberdeen.com)

Cache memory with associative memory 2.2.5


Associative Cache
A type of CACHE designed to solve the problem of cache CONTENTION that plagues the
DIRECT MAPPED CACHE. In a fully associative cache, a data block from any memory
address may be stored into any CACHE LINE, and the whole address is used as the cache
TAG: hence, when looking for a match, all the tags must be compared simultaneously with
any requested address, which demands expensive extra hardware. However, contention is
avoided completely, as no block need ever be flushed unless the whole cache is full, and
then the least recently used may be chosen.
A set-associative cache is a compromise solution in which the cache lines are divided into
sets, and the middle bits of its address determine which set a block will be stored in: within
each set the cache remains fully associative. A cache that has two lines per set is called two-
way set-associative and requires only two tag comparisons per access, which reduces the
extra hardware required. A DIRECT MAPPED CACHE can be thought of as being one-way
set associative, while a fully associative cache is n-way associative where n is the total
number of cache lines. Finding the right balance between associativity and total cache
capacity for a particular processor is a fine art-various current cpus employ 2 way, 4-way
and 8-way designs.
REPLACEMENT ALGORITHMS OF CACHE MEMORY

Replacement algorithms are used when there is no available space in a cache in which to
place a data. Four of the most common cache replacement algorithms are described below:

Least Recently Used (LRU):

The LRU algorithm selects for replacement the item that has been least recently used by the
CPU.

First-In-First-Out (FIFO):
The FIFO algorithm selects for replacement the item that has been in the cache from the
longest time.

Least Frequently Used (LRU):

The LRU algorithm selects for replacement the item that has been least frequently used by
the CPU.

Random:

The random algorithm selects for replacement the item randomly

NEED OF REPLACEMENT ALGORITHM

In direct mapping,

 There is no need of any replacement algorithm.


 This is because a main memory block can map only to a particular line of the cache.
 Thus, the new incoming block will always replace the existing block (if any) in that
particular line.

In Set Associative mapping,

 Set associative mapping is a combination of direct mapping and fully associative


mapping.
 It uses fully associative mapping within each set.
 Thus, set associative mapping requires a replacement algorithm.

Fully associative mapping,

 A replacement algorithm is required.


 Replacement algorithm suggests the block to be replaced if all the cache lines are
occupied.
 Thus, replacement algorithm like FCFS Algorithm, LRU Algorithm etc is employed.

LRU (LEAST RECENTLY USED)


 The page which was not used for the largest period of time in the past will get
reported first.
 We can think of this strategy as the optimal cache- replacement algorithm looking
backward in time, rather than forward.
 LRU is much better than FIFO replacement.
 LRU is also called a stack algorithm and can never exhibit belady's anamoly.
 The problem which is most important is how to implement LRU replacement. An
LRU page replacement algorithm may require a sustainable hardware resource.

Example: Let we have a sequence 7, 0 ,1, 2, 0, 3, 0, 4, 2, 3 and cache memory has 3 lines.
There are a total of 6 misses in the LRU replacement policy.

FIRST IN FIRST OUT POLICY


 The block which has entered first in the main be replaced first.
 This can lead to a problem known as "Belady's Anamoly", it starts that if we increase
the no. of lines in cache memory the cache miss will increase.

Belady's Anamoly: For some cache replacement algorithm, the pages fault or miss rate
increase as the number of allocated frame increase.

Example: Let we have a sequence 7, 0 ,1, 2, 0, 3, 0, 4, 2, 3, and cache memory has 4 lines.

There are a total of 6 misses in the FIFO replacement policy.

LEAST FREQUENTLY USED (LFU):

This cache algorithm uses a counter to keep track of how often an entry is accessed. With the
LFU cache algorithm, the entry with the lowest count is removed first. This method isn't used
that often, as it does not account for an item that had an initially high access rate and then was
not accessed for a long time.

References
Reference Books:
 J.P. Hayes, “Computer Architecture and Organization”, Third Edition.
 Mano, M., “Computer System Architecture”, Third Edition, Prentice Hall.
 Stallings, W., “Computer Organization and Architecture”, Eighth Edition, Pearson
Education.
Text Books:
 Carpinelli J.D,” Computer systems organization &Architecture”, Fourth Edition,
Addison Wesley.
 Patterson and Hennessy, “Computer Architecture”, Fifth Edition Morgaon Kauffman.
Other References
 What is Associative Cache? - Computer Notes (ecomputernotes.com)
 http://www.eazynotes.com/notes/computer-system-architecture/slides/cache-
memory.pdf
 file:///C:/Users/91987/Desktop/COA/lect11-cache-replacement.pdf
 https://searchstorage.techtarget.com/definition/cache-algorithm
 https://www.includehelp.com/cso/types-of-cache-replacement-policies.aspx

Virtual Memory: Paging 2.2.6

Paging and segmentation are processes by which data is stored to, then retrieved from, a
computer's storage disk.

Paging is a computer memory management function that presents storage locations to the
computer's CPU as additional memory, called virtual memory. Each piece of data needs a
storage address.

Segmentation is a virtual process that creates variable-sized address spaces in computer


storage for related data, called segments. This process speed retrieval.

Managing computer memory is a basic operating system function -- both paging and
segmentation are basic functions of the OS. No system can efficiently rely on limited RAM
alone. So, the computer’s memory management unit (MMU) uses the storage
disk, HDD or SSD, as virtual memory to supplement RAM.

Let's look in-depth at paging, then we'll look in-depth at segmentation.

WHAT IS PAGING?

As mentioned above, the memory management function called paging specifies storage
locations to the CPU as additional memory, called virtual memory. The CPU cannot directly
access storage disk, so the MMU emulates memory by mapping pages to frames that are in
RAM.

Before we launch into a more detailed explanation of pages and frames, let’s define some
technical terms.

 Page: A fixed-length contiguous block of virtual memory residing on disk.


 Frame: A fixed-length contiguous block located in RAM; whose sizing is identical to
pages.
 Physical memory: The computer’s random access memory (RAM), typically
contained in DIMM cards attached to the computer’s motherboard.
 Virtual memory: Virtual memory is a portion of an HDD or SSD that is reserved to
emulate RAM. The MMU serves up virtual memory from disk to the CPU to reduce
the workload on physical memory.
 Virtual address: The CPU generates a virtual address for each active process. The
MMU maps the virtual address to a physical location in RAM and passes the address
to the bus. A virtual address space is the range of virtual addresses under CPU
control.
 Physical address: The physical address is a location in RAM. The physical address
space is the set of all physical addresses corresponding to the CPU’s virtual addresses.
A physical address space is the range of physical addresses under MMU control.

By assigning an address to a piece of data using a "page table" between the CPU and the
computer's physical memory, a computer's MMU enables the system to retrieve that data
whenever needed.

Paging

 External fragmentation is avoided by using paging technique.


 Paging is a technique in which physical memory is broken into blocks of the same
size called pages (size is power of 2, between 512 bytes and 8192 bytes).
 When a process is to be executed, its corresponding pages are loaded into any
available memory frames.
 Logical address space of a process can be non-contiguous and a process is allocated
physical memory whenever the free memory frame is available.
 Operating system keeps track of all free frames. Operating system needs n free frames
to run a program of size n pages.

Address generated by CPU is divided into

 Page number (p) -- page number is used as an index into a page table which contains
base address of each page in physical memory.
 Page offset (d) -- page offset is combined with base address to define the physical
memory address.
Fig 2.8.2 Paging Table Architecture

THE PAGING PROCESS

A page table stores the definition of each page. When an active process requests data, the
MMU retrieves corresponding pages into frames located in physical memory for faster
processing. The process is called paging.

The MMU uses page tables to translate virtual addresses to physical ones. Each table entry
indicates where a page is located: in RAM or on disk as virtual memory. Tables may have a
single or multi-level page table such as different tables for applications and segments.

However, constant table lookups can slow down the MMU. A memory cache called the
Translation Lookaside Buffer (TLB) stores recent translations of virtual to physical addresses
for rapid retrieval. Many systems have multiple TLBs, which may reside at different
locations, including between the CPU and RAM, or between multiple page table levels.
Different frame sizes are available for data sets with larger or smaller pages and matching-
sized frames. 4KB to 2MB are common sizes, and GB-sized frames are available in high-
performance servers.

An issue called hidden fragmentation used to be a problem in older Windows deployments


(95, 98, and Me). The problem was internal (or hidden) fragmentation. Unlike the serious
external fragmentation of segmenting, internal fragmentation occurred if every frame is not
the exact size of the page size. However, this is not an issue in modern Windows OS.

WHAT IS SEGMENTATION?

The process known as segmentation is a virtual process that creates address spaces of various
sizes in a computer system, called segments. Each segment is a different virtual address space
that directly corresponds to process objects.

When a process executes, segmentation assigns related data into segments for faster
processing. The segmentation function maintains a segment table that includes physical
addresses of the segment, size, and other data.

Fig 2.8.3 Segmentation

Segmentation speeds up a computer's information retrieval by assigning related data into a


"segment table" between the CPU and the physical memory.

 Segmentation is a technique to break memory into logical pieces where each piece
represents a group of related information.
 For example, data segments or code segment for each process, data segment for
operating system and so on.
 Segmentation can be implemented using or without using paging.
 Unlike paging, segment are having varying sizes and thus eliminates internal
fragmentation.
 External fragmentation still exists but to lesser extent.
Fig 2.8.4 Logical Address Space

Address generated by CPU is divided into

 Segment number (s) -- segment number is used as an index into a segment table
which contains base address of each segment in physical memory and a limit of
segment.
 Segment offset (o) -- segment offset is first checked against limit and then is
combined with base address to define the physical memory address.

Fig 2.8.5 Address generated by CPU

THE SEGMENTATION PROCESS


Each segment stores the processes primary function, data structures, and utilities. The CPU
keeps a segment map table for every process and memory blocks, along with segment
identification and memory locations.

The CPU generates virtual addresses for running processes. Segmentation translates the CPU-
generated virtual addresses into physical addresses that refer to a unique physical memory
location. The translation is not strictly one-to-one: different virtual addresses can map to the
same physical address.

THE CHALLENGE OF FRAGMENTATION

Although segmentation is a high-speed and highly secure memory management function,


external fragmentation proved to be an insurmountable challenge. Segmentation causes
external fragmentation to the point that modern x86-64 servers treat it as a legacy application,
and only support it for backwards compatibility.

External fragmentation occurs when unusable memory is located outside of allocated


memory blocks. The issue is that the system may have enough memory to satisfy process
request, but the available memory is not in a contiguous location. In time, the fragmentation
worsens and significantly slows the segmentation process.

SEGMENTED PAGING

Some modern computers use a function called segmented paging. Main memory is divided
into variably-sized segments, which are then divided into smaller fixed-size pages on disk.
Each segment contains a page table, and there are multiple page tables per process.

Each of the tables contains information on every segment page, while the segment table has
information about every segment. Segment tables are mapped to page tables, and page tables
are mapped to individual pages within a segment.

Advantages include less memory usage, more flexibility on page sizes, simplified memory
allocation, and an additional level of data access security over paging. The process does not
cause external fragmentation.

PAGING AND SEGMENTATION:ADVANTAGES AND DISADVANTAGES

Paging Advantages

 On the programmer level, paging is a transparent function and does not require
intervention.
 No external fragmentation.
 No internal fragmentation on updated OS’s.
 Frames do not have to be contiguous.
Paging Disadvantages

 Paging causes internal fragmentation on older systems.


 Longer memory lookup times than segmentation; remedy with TLB memory caches.

Segmentation Advantages

 No internal fragmentation.
 Segment tables consumes less space compared to page tables.
 Average segment sizes are larger than most page sizes, which allows segments to
store more process data.
 Less processing overhead.
 Simpler to relocate segments than to relocate contiguous address spaces on disk.
 Segment tables are smaller than page tables, and takes up less memory.

Segmentation Disadvantages

 Uses legacy technology in x86-64 servers.


 Linux only supports segmentation in 80x86 microprocessors: states that paging
simplifies memory management by using the same set of linear addresses.
 Porting Linux to different architectures is problematic because of limited
segmentation support.
 Requires programmer intervention.
 Subject to serious external fragmentation.

KEY DIFFERENCES: PAGING AND SEGMENTATION

Size:
 Paging: Fixed block size for pages and frames. Computer hardware determines
page/frame sizes.
 Segmentation: Variable size segments are user-specified.
Fragmentation:
 Paging: Older systems were subject to internal fragmentation by not allocating entire
pages to memory. Modern OS’s no longer have this problem.
 Segmentation: Segmentation leads to external fragmentation.
Tables:
 Paging: Page tables direct the MMU to page location and status. This is a slower
process than segmentation tables, but TLB memory cache accelerates it.
 Segmentation: Segmentation tables contain segment ID and information, and are
faster than direct paging table lookups.
Availability:
 Paging: Widely available on CPUs and as MMU chips.
 Segmentation: Windows servers may support backwards compatibility, while Linux
has very limited support.

References
Reference Books:
 J.P. Hayes, “Computer Architecture and Organization”, Third Edition.
 Mano, M., “Computer System Architecture”, Third Edition, Prentice Hall.
 Stallings, W., “Computer Organization and Architecture”, Eighth Edition, Pearson
Education.
Text Books:
 Carpinelli J.D,” Computer systems organization &Architecture”, Fourth Edition,
Addison Wesley.
 Patterson and Hennessy, “Computer Architecture”, Fifth Edition Morgaon Kauffman.

Other References

 https://www.ques10.com/p/10067/what-is-virtual-memory-explain-the-role-of-pagin-
1/
 https://www.enterprisestorageforum.com/storage-hardware/paging-and-
segmentation.html
 https://www.cmpe.boun.edu.tr/~uskudarli/courses/cmpe235/Virtual%20Memory.pdf

Virtual Memory: Segmentation 2.2.7

Paging and segmentation are processes by which data is stored to, then retrieved from, a
computer's storage disk.

Paging is a computer memory management function that presents storage locations to the
computer's CPU as additional memory, called virtual memory. Each piece of data needs a
storage address.

Segmentation is a virtual process that creates variable-sized address spaces in computer


storage for related data, called segments. This process speed retrieval.

Managing computer memory is a basic operating system function -- both paging and
segmentation are basic functions of the OS. No system can efficiently rely on limited RAM
alone. So, the computer’s memory management unit (MMU) uses the storage
disk, HDD or SSD, as virtual memory to supplement RAM.

Let's look in-depth at paging, then we'll look in-depth at segmentation.

WHAT IS PAGING?

As mentioned above, the memory management function called paging specifies storage
locations to the CPU as additional memory, called virtual memory. The CPU cannot directly
access storage disk, so the MMU emulates memory by mapping pages to frames that are in
RAM.

Before we launch into a more detailed explanation of pages and frames, let’s define some
technical terms.
 Page: A fixed-length contiguous block of virtual memory residing on disk.
 Frame: A fixed-length contiguous block located in RAM; whose sizing is identical to
pages.
 Physical memory: The computer’s random access memory (RAM), typically
contained in DIMM cards attached to the computer’s motherboard.
 Virtual memory: Virtual memory is a portion of an HDD or SSD that is reserved to
emulate RAM. The MMU serves up virtual memory from disk to the CPU to reduce
the workload on physical memory.
 Virtual address: The CPU generates a virtual address for each active process. The
MMU maps the virtual address to a physical location in RAM and passes the address
to the bus. A virtual address space is the range of virtual addresses under CPU
control.
 Physical address: The physical address is a location in RAM. The physical address
space is the set of all physical addresses corresponding to the CPU’s virtual addresses.
A physical address space is the range of physical addresses under MMU control.

By assigning an address to a piece of data using a "page table" between the CPU and the
computer's physical memory, a computer's MMU enables the system to retrieve that data
whenever needed.

Paging

 External fragmentation is avoided by using paging technique.


 Paging is a technique in which physical memory is broken into blocks of the same
size called pages (size is power of 2, between 512 bytes and 8192 bytes).
 When a process is to be executed, its corresponding pages are loaded into any
available memory frames.
 Logical address space of a process can be non-contiguous and a process is allocated
physical memory whenever the free memory frame is available.
 Operating system keeps track of all free frames. Operating system needs n free frames
to run a program of size n pages.

Address generated by CPU is divided into

 Page number (p) -- page number is used as an index into a page table which contains
base address of each page in physical memory.
 Page offset (d) -- page offset is combined with base address to define the physical
memory address.
Fig 2.8.2 Paging Table Architecture

THE PAGING PROCESS

A page table stores the definition of each page. When an active process requests data, the
MMU retrieves corresponding pages into frames located in physical memory for faster
processing. The process is called paging.

The MMU uses page tables to translate virtual addresses to physical ones. Each table entry
indicates where a page is located: in RAM or on disk as virtual memory. Tables may have a
single or multi-level page table such as different tables for applications and segments.

However, constant table lookups can slow down the MMU. A memory cache called the
Translation Lookaside Buffer (TLB) stores recent translations of virtual to physical addresses
for rapid retrieval. Many systems have multiple TLBs, which may reside at different
locations, including between the CPU and RAM, or between multiple page table levels.
Different frame sizes are available for data sets with larger or smaller pages and matching-
sized frames. 4KB to 2MB are common sizes, and GB-sized frames are available in high-
performance servers.

An issue called hidden fragmentation used to be a problem in older Windows deployments


(95, 98, and Me). The problem was internal (or hidden) fragmentation. Unlike the serious
external fragmentation of segmenting, internal fragmentation occurred if every frame is not
the exact size of the page size. However, this is not an issue in modern Windows OS.

WHAT IS SEGMENTATION?

The process known as segmentation is a virtual process that creates address spaces of various
sizes in a computer system, called segments. Each segment is a different virtual address space
that directly corresponds to process objects.

When a process executes, segmentation assigns related data into segments for faster
processing. The segmentation function maintains a segment table that includes physical
addresses of the segment, size, and other data.

Fig 2.8.3 Segmentation

Segmentation speeds up a computer's information retrieval by assigning related data into a


"segment table" between the CPU and the physical memory.

 Segmentation is a technique to break memory into logical pieces where each piece
represents a group of related information.
 For example, data segments or code segment for each process, data segment for
operating system and so on.
 Segmentation can be implemented using or without using paging.
 Unlike paging, segment are having varying sizes and thus eliminates internal
fragmentation.
 External fragmentation still exists but to lesser extent.
Fig 2.8.4 Logical Address Space

Address generated by CPU is divided into

 Segment number (s) -- segment number is used as an index into a segment table
which contains base address of each segment in physical memory and a limit of
segment.
 Segment offset (o) -- segment offset is first checked against limit and then is
combined with base address to define the physical memory address.

Fig 2.8.5 Address generated by CPU

THE SEGMENTATION PROCESS


Each segment stores the processes primary function, data structures, and utilities. The CPU
keeps a segment map table for every process and memory blocks, along with segment
identification and memory locations.

The CPU generates virtual addresses for running processes. Segmentation translates the CPU-
generated virtual addresses into physical addresses that refer to a unique physical memory
location. The translation is not strictly one-to-one: different virtual addresses can map to the
same physical address.

THE CHALLENGE OF FRAGMENTATION

Although segmentation is a high-speed and highly secure memory management function,


external fragmentation proved to be an insurmountable challenge. Segmentation causes
external fragmentation to the point that modern x86-64 servers treat it as a legacy application,
and only support it for backwards compatibility.

External fragmentation occurs when unusable memory is located outside of allocated


memory blocks. The issue is that the system may have enough memory to satisfy process
request, but the available memory is not in a contiguous location. In time, the fragmentation
worsens and significantly slows the segmentation process.

SEGMENTED PAGING

Some modern computers use a function called segmented paging. Main memory is divided
into variably-sized segments, which are then divided into smaller fixed-size pages on disk.
Each segment contains a page table, and there are multiple page tables per process.

Each of the tables contains information on every segment page, while the segment table has
information about every segment. Segment tables are mapped to page tables, and page tables
are mapped to individual pages within a segment.

Advantages include less memory usage, more flexibility on page sizes, simplified memory
allocation, and an additional level of data access security over paging. The process does not
cause external fragmentation.

PAGING AND SEGMENTATION:ADVANTAGES AND DISADVANTAGES

Paging Advantages

 On the programmer level, paging is a transparent function and does not require
intervention.
 No external fragmentation.
 No internal fragmentation on updated OS’s.
 Frames do not have to be contiguous.
Paging Disadvantages

 Paging causes internal fragmentation on older systems.


 Longer memory lookup times than segmentation; remedy with TLB memory caches.

Segmentation Advantages

 No internal fragmentation.
 Segment tables consumes less space compared to page tables.
 Average segment sizes are larger than most page sizes, which allows segments to
store more process data.
 Less processing overhead.
 Simpler to relocate segments than to relocate contiguous address spaces on disk.
 Segment tables are smaller than page tables, and takes up less memory.

Segmentation Disadvantages

 Uses legacy technology in x86-64 servers.


 Linux only supports segmentation in 80x86 microprocessors: states that paging
simplifies memory management by using the same set of linear addresses.
 Porting Linux to different architectures is problematic because of limited
segmentation support.
 Requires programmer intervention.
 Subject to serious external fragmentation.

KEY DIFFERENCES: PAGING AND SEGMENTATION

Size:
 Paging: Fixed block size for pages and frames. Computer hardware determines
page/frame sizes.
 Segmentation: Variable size segments are user-specified.
Fragmentation:
 Paging: Older systems were subject to internal fragmentation by not allocating entire
pages to memory. Modern OS’s no longer have this problem.
 Segmentation: Segmentation leads to external fragmentation.
Tables:
 Paging: Page tables direct the MMU to page location and status. This is a slower
process than segmentation tables, but TLB memory cache accelerates it.
 Segmentation: Segmentation tables contain segment ID and information, and are
faster than direct paging table lookups.
Availability:
 Paging: Widely available on CPUs and as MMU chips.
 Segmentation: Windows servers may support backwards compatibility, while Linux
has very limited support.

References
Reference Books:
 J.P. Hayes, “Computer Architecture and Organization”, Third Edition.
 Mano, M., “Computer System Architecture”, Third Edition, Prentice Hall.
 Stallings, W., “Computer Organization and Architecture”, Eighth Edition, Pearson
Education.
Text Books:
 Carpinelli J.D,” Computer systems organization &Architecture”, Fourth Edition,
Addison Wesley.
 Patterson and Hennessy, “Computer Architecture”, Fifth Edition Morgaon Kauffman.

Other References

 https://www.ques10.com/p/10067/what-is-virtual-memory-explain-the-role-of-pagin-
1/
 https://www.enterprisestorageforum.com/storage-hardware/paging-and-
segmentation.html
 https://www.cmpe.boun.edu.tr/~uskudarli/courses/cmpe235/Virtual%20Memory.pdf

You might also like