Unit 4
Unit 4
Unit 4
Memory Hierarchy
A memory unit is an essential component in any digital computer since it is needed for storing
programs and data.
1. The memory unit that establishes direct communication with the CPU is called Main
Memory. The main memory is often referred to as RAM (Random Access Memory).
2. The memory units that provide backup storage are called Auxiliary Memory. For instance,
magnetic disks and magnetic tapes are the most commonly used auxiliary memories.
Apart from the basic classifications of a memory unit, the memory hierarchy consists all of the
storage devices available in a computer system ranging from the slow but high-capacity auxiliary
memory to relatively faster main memory.
Auxiliary Memory
Magnetic tape is a storage medium that allows for data archiving, collection, and backup for different
kinds of data.
Main Memory
he main memory in a computer system is often referred to as Random Access Memory (RAM).
This memory unit communicates directly with the CPU and with auxiliary memory devices through
an I/O processor.
The programs that are not currently required in the main memory are transferred into auxiliary
memory to provide space for currently used programs and data.
I/O Processor
The primary function of an I/O Processor is to manage the data transfers between auxiliary memories
and the main memory.
Cache Memory
The data or contents of the main memory that are used frequently by CPU are stored in the cache
memory so that the processor can easily access that data in a shorter time. Whenever the CPU
requires accessing memory, it first checks the required data into the cache memory. If the data is
found in the cache memory, it is read from the fast memory. Otherwise, the CPU moves onto the
main memory for the required data.
Main Memory
The main memory acts as the central storage unit in a computer system. It is a relatively large and
fast memory which is used to store programs and data during the run time operations.
The primary technology used for the main memory is based on semiconductor integrated circuits.
The integrated circuits for the main memory are classified into two major units.
The RAM integrated circuit chips are further classified into two possible operating
modes, static and dynamic.
The primary compositions of a static RAM are flip-flops that store the binary information. The nature
of the stored information is volatile, i.e. it remains valid as long as power is applied to the system.
The static RAM is easy to use and takes less time performing read and write operations as compared
to dynamic RAM.
The dynamic RAM exhibits the binary information in the form of electric charges that are applied to
capacitors. The capacitors are integrated inside the chip by MOS transistors. The dynamic RAM
consumes less power and provides large storage capacity in a single memory chip.
RAM chips are available in a variety of sizes and are used as per the system requirement. The
following block diagram demonstrates the chip interconnection in a 128 * 8 RAM chip.
o A 128 * 8 RAM chip has a memory capacity of 128 words of eight bits (one byte) per word.
This requires a 7-bit address and an 8-bit bidirectional data bus.
o The 8-bit bidirectional data bus allows the transfer of data either from memory to CPU during
a read operation or from CPU to memory during a write operation.
o The read and write inputs specify the memory operation, and the two chip select (CS)
control inputs are for enabling the chip only when the microprocessor selects it.
o The bidirectional data bus is constructed using three-state buffers.
o The output generated by three-state buffers can be placed in one of the three possible states
which include a signal equivalent to logic 1, a signal equal to logic 0, or a high-impedance
state.
o The following function table specifies the operations of a 128 * 8 RAM chip.
o
o From the functional table, we can conclude that the unit is in operation only when CS1 = 1
and CS2 = 0. The bar on top of the second select variable indicates that this input is enabled
when it is equal to 0.
The primary component of the main memory is RAM integrated circuit chips, but a portion of
memory may be constructed with ROM chips.
A ROM memory is used for keeping programs and data that are permanently resident in the
computer.
Apart from the permanent storage of data, the ROM portion of main memory is needed for storing an
initial program called a bootstrap loader. The primary function of the bootstrap loader program is
to start the computer software operating when power is turned on.
ROM chips are also available in a variety of sizes and are also used as per the system requirement.
The following block diagram demonstrates the chip interconnection in a 512 * 8 ROM chip.
o A ROM chip has a similar organization as a RAM chip. However, a ROM can only perform
read operation; the data bus can only operate in an output mode.
o The 9-bit address lines in the ROM chip specify any one of the 512 bytes stored in it.
o The value for chip select 1 and chip select 2 must be 1 and 0 for the unit to operate.
Otherwise, the data bus is said to be in a high-impedance state.
Auxiliary Memory
Magnetic Disks
A magnetic disk is a type of memory constructed using a circular plate of metal or plastic coated with
magnetized materials. Usually, both sides of the disks are used to carry out read/write operations.
However, several disks may be stacked on one spindle with read/write head available on each
surface.
The following image shows the structural representation for a magnetic disk.
o The memory bits are stored in the magnetized surface in spots along the concentric circles
called tracks.
o The concentric circles (tracks) are commonly divided into sections called sectors.
Magnetic Tape
Magnetic tape is a storage medium that allows data archiving, collection, and backup for different
kinds of data. The magnetic tape is constructed using a plastic strip coated with a magnetic recording
medium.
The bits are recorded as magnetic spots on the tape along several tracks. Usually, seven or nine bits
are recorded simultaneously to form a character together with a parity bit.
Magnetic tape units can be halted, started to move forward or in reverse, or can be rewound.
However, they cannot be started or stopped fast enough between individual characters. For this
reason, information is recorded in blocks referred to as records.
Associative Memory
An associative memory can be considered as a memory unit whose stored data can be identified for
access by the content of the data itself rather than by an address or memory location.
When a write operation is performed on associative memory, no address or memory location is given
to the word. The memory itself is capable of finding an empty unused location to store the word.
On the other hand, when the word is to be read from an associative memory, the content of the word,
or part of the word, is specified. The words which match the specified content are located by the
memory and are marked for reading.
The functional registers like the argument register A and key register K each have n bits, one for
each bit of a word. The match register M consists of m bits, one for each memory word.
The words which are kept in the memory are compared in parallel with the content of the argument
register.
The key register (K) provides a mask for choosing a particular field or key in the argument word. If
the key register contains a binary value of all 1's, then the entire argument is compared with each
memory word. Otherwise, only those bits in the argument that have 1's in their corresponding
position of the key register are compared. Thus, the key provides a mask for identifying a piece of
information which specifies how the reference to memory is made.
The following diagram can represent the relation between the memory array and the external
registers in an associative memory.
The cells present inside the memory array are marked by the letter C with two subscripts. The first
subscript gives the word number and the second specifies the bit position in the word. For instance,
the cell Cij is the cell for bit j in word i.
A bit Aj in the argument register is compared with all the bits in column j of the array provided that
Kj = 1. This process is done for all columns j = 1, 2, 3......, n.
If a match occurs between all the unmasked bits of the argument and the bits in word i, the
corresponding bit Mi in the match register is set to 1. If one or more unmasked bits of the argument
and the word do not match, Mi is cleared to 0.
Cache Memory
The data or contents of the main memory that are used frequently by CPU are stored in the cache
memory so that the processor can easily access that data in a shorter time. Whenever the CPU needs
to access memory, it first checks the cache memory. If the data is not found in cache memory, then
the CPU moves into the main memory.
Cache memory is placed between the CPU and the main memory. The block diagram for a cache
memory can be represented as:
The cache is the fastest component in the memory hierarchy and approaches the speed of CPU
components.
o When the CPU needs to access memory, the cache is examined. If the word is found in the
cache, it is read from the fast memory.
o If the word addressed by the CPU is not found in the cache, the main memory is accessed to
read the word.
o A block of words one just accessed is then transferred from main memory to cache memory.
The block size may vary from one word (the one just accessed) to about 16 words adjacent to
the one just accessed.
o The performance of the cache memory is frequently measured in terms of a quantity
called hit ratio.
o When the CPU refers to memory and finds the word in cache, it is said to produce a hit.
o If the word is not found in the cache, it is in main memory and it counts as a miss.
o The ratio of the number of hits divided by the total CPU references to memory (hits plus
misses) is the hit ratio.
Virtual Memory
Virtual Memory is a space where large programs can store themselves in form of pages while their
execution and only the required pages or portions of processes are loaded into the main memory.
This technique is useful as large virtual memory is provided for user programs when a very small
physical memory is there.
In real scenarios, most processes never need all their pages at once, for following reasons :
• Error handling code is not needed unless that specific error occurs, some of which are quite
rare.
• Arrays are often over-sized for worst-case scenarios, and only a small fraction of the arrays
are actually used in practice.
• Certain features of certain programs are rarely used.
1. Large programs can be written, as virtual space available is huge compared to physical
memory.
2. Less I/O required, leads to faster and easy swapping of processes.
3. More physical memory available, as programs are stored on virtual memory, so they occupy
very less space on actual physical memory.
Initially only those pages are loaded which will be required the process immediately.
The pages that are not moved into the memory, are marked as invalid in the page table. For an
invalid entry the rest of the table is empty. In case of pages that are loaded in the memory, they are
marked as valid along with the information about where to find the swapped out page.
When the process requires any of the page that is not loaded into the memory, a page fault trap is
triggered and following steps are followed,
1. The memory address which is requested by the process is first checked, to verify the request
made by the process.
2. If its found to be invalid, the process is terminated.
3. In case the request by the process is valid, a free frame is located, possibly from a free-frame
list, where the required page will be moved.
4. A new operation is scheduled to move the necessary page from disk to the specified memory
location. ( This will usually block the process on an I/O wait, allowing some other process to
use the CPU in the meantime. )
5. When the I/O operation is complete, the process's page table is updated with the new frame
number, and the invalid bit is changed to valid.
6. The instruction that caused the page fault must now be restarted from the beginning.
There are cases when no pages are loaded into the memory initially, pages are only loaded when
demanded by the process by generating page faults. This is called Pure Demand Paging.
The only major issue with Demand Paging is, after a new page is loaded, the process starts execution
from the beginning. Its is not a big issue for small programs, but for larger programs it affects
performance drastically.
Page Replacement
As studied in Demand Paging, only certain pages of a process are loaded initially into the memory.
This allows us to get more number of processes into the memory at the same time. but what happens
when a process requests for more pages and no free memory is available to bring them in. Following
steps can be taken to deal with this problem :
1. Put the process in the wait queue, until any other process finishes its execution thereby
freeing frames.
2. Or, remove some other process completely from the memory to free frames.
3. Or, find some pages that are not being used right now, move them to the disk to get free
frames. This technique is called Page replacement and is most commonly used. We have
some great algorithms to carry on page replacement efficiently.
• Find the location of the page requested by ongoing process on the disk.
• Find a free frame. If there is a free frame, use it. If there is no free frame, use a page-
replacement algorithm to select any existing frame to be replaced, such frame is known
as victim frame.
• Write the victim frame to disk. Change all related page tables to indicate that this page is no
longer in memory.
• Move the required page and store it in the frame. Adjust all related page and frame tables to
indicate the change.
• Restart the process that was waiting for this page.
Thrashing
A process that is spending more time paging than executing is said to be thrashing. In other words it
means, that the process doesn't have enough frames to hold all the pages for its execution, so it is
swapping pages in and out very frequently to keep executing. Sometimes, the pages which will be
required in the near future have to be swapped out.
Initially when the CPU utilization is low, the process scheduling mechanism, to increase the level of
multiprogramming loads multiple processes into the memory at the same time, allocating a limited
amount of frames to each process. As the memory fills up, process starts to spend a lot of time for the
required pages to be swapped in, again leading to low CPU utilization because most of the
proccesses are waiting for pages. Hence the scheduler loads more processes to increase CPU
utilization, as this continues at a point of time the complete system comes to a stop.
To prevent thrashing we must provide processes with as many frames as they really need "right
now".
A secondary storage device refers to any non-volatile storage device that is internal or external to the
computer. It can be any storage device beyond the primary storage that enables permanent data storage.
A secondary storage device is also known as an auxiliary storage device, backup storage device, tier 2
storage, or external storage. These devices store virtually all programs and applications on a computer,
including the operating system, device drivers, applications and general user data.
The Secondary storage media can be fixed or removable. Fixed Storage media is an internal storage
medium like a hard disk that is fixed inside the computer. A storage medium that is portable and can
be taken outside the computer is termed removable storage media. The main advantage of using
secondary storage devices is:
o In Secondary storage devices, the stored data might not be under the direct control of the
operating system. For example, many organizations store their archival data or critical
documents on secondary storage drives, which their main network cannot access to ensure their
preservation whenever a data breach occurs.
o Since these drives do not interact directly with the main infrastructure and can be situated in a
remote or secure site, it is unlikely that a hacker may access these drives unless they're
physically stolen.
Computers use main memory such as random access memory (RAM) and cache to hold data that is
being processed. However, this type of memory is volatile, and it loses its data when the computer is
switched off. General-purpose computers, such as personal computers and tablets, need to store
programs and data for later use.
Skip Ad
That's why secondary storage is needed to keep programs and data long term. Secondary storage is
non-volatile and able to keep data as long term storage. They are used for various purposes such as
backup data used for future restores or disaster recovery, long-term archiving of data that is not
frequently accessed, and storage of non-critical data in lower-performing, less expensive drives.
Without secondary storage, all programs and data would be lost when the computer is switched off.
These are some characteristics of secondary memory, which distinguish it from primary memory, such
as:
Here are the two types of secondary storage devices, i.e., fixed storage and removable storage.
1. Fixed Storage
Fixed storage is an internal media device used by a computer system to store data. Usually, these are
referred to as the fixed disk drives or Hard Drives.
Fixed storage devices are not fixed. These can be removed from the system for repairing work,
maintenance purposes, and also for an upgrade, etc. But in general, this can not be done without a
proper toolkit to open up the computer system to provide physical access, which needs to be done by
an engineer.
Technically, almost all data, i.e. being processed on a computer system, is stored on some built-in fixed
storage device. We have the following types of fixed storage:
2. Removable Storage
Removable storage is an external media device that is used by a computer system to store data. Usually,
these are referred to as the Removable Disks drives or the External Drives. Removable storage is any
storage device that can be removed from a computer system while the system is running. Examples of
external devices include CDs, DVDs, Blu-ray disk drives, and diskettes and USB drives. Removable
storage makes it easier for a user to transfer data from one computer system to another.
The main benefit of removable disks in storage factors is that they can provide the fast data transfer
rates associated with storage area networks (SANs). We have the following types of Removable
Storage:
The following image shows the classification of commonly used secondary storage devices.
Sequential Access Storage Device
It is a class of data storage devices that read stored data in a sequence. This is in contrast to random
access memory (RAM), where data can access in any order, and magnetic tape is the common
sequential access storage device.
i. Magnetic tape: It is a medium for magnetic recording, made of a thin, magnetizable coating
on a long, narrow strip of plastic film. Devices that record and play audio and video using
magnetic tape are tape recorders and videotape recorders. A device that stores computer data
on magnetic tape is known as a tape drive.
It was a key technology in early computer development, allowing unparalleled amounts of data
to be mechanically created, stored for long periods, and rapidly accessed.
A direct-access storage device (DASD) is another name for secondary storage devices that store data
in discrete locations with a unique address, such as hard disk drives, optical drives and most magnetic
storage devices.
1. Magnetic disks: A magnetic disk is a storage device that uses a magnetization process to write,
rewrite and access data. It is covered with a magnetic coating and stores data in the form of tracks,
spots and sectors. Hard disks, zip disks and floppy disks are common examples of magnetic disks.
i. Floppy Disk: A floppy disk is a flexible disk with a magnetic coating on it, and it is packaged
inside a protective plastic envelope. These are among the oldest portable storage devices that
could store up to 1.44 MB of data, but now they are not used due to very little memory storage.
ii. Hard Disk Drive (HDD): Hard disk drive comprises a series of circular disks
called platters arranged one over the other almost ½ inches apart around a spindle. Disks are
made of non-magnetic material like aluminium alloy and coated with 10-20 nm magnetic
material. The standard diameter of these disks is 14 inches, and they rotate with speeds varying
from 4200 rpm (rotations per minute) for personal computers to 15000 rpm for servers.
Data is stored by magnetizing or demagnetizing the magnetic coating. A magnetic reader arm
is used to read data from and write data to the disks. A typical modern HDD has a capacity in
terabytes (TB).
2. Optical Disk: An optical disk is any computer disk that uses optical storage techniques and
technology to read and write data. It is a computer storage disk that stores data digitally and uses laser
beams to read and write data.
i. CD Drive: CD stands for Compact Disk. CDs are circular disks that use optical rays, usually
lasers, to read and write data. They are very cheap as you can get 700 MB of storage space for
less than a dollar. CDs are inserted in CD drives built into the CPU cabinet. They are portable
as you can eject the drive, remove the CD and carry it with you. There are three types of CDs:
o CD-ROM (Compact Disk - Read Only Memory): The manufacturer recorded the
data on these CDs. Proprietary Software, audio or video are released on CD-ROMs.
o CD-R (Compact Disk - Recordable): The user can write data once on the CD-R. It
cannot be deleted or modified later.
o CD-RW (Compact Disk - Rewritable): Data can repeatedly be written and deleted on
these optical disks.
ii. DVD Drive: DVD stands for digital video display. DVD is an optical device that can store 15
times the data held by CDs. They are usually used to store rich multimedia files that need high
storage capacity. DVDs also come in three varieties - read-only, recordable and rewritable.
iii. Blu Ray Disk: Blu Ray Disk (BD) is an optical storage media that stores high definition (HD)
video and other multimedia files. BD uses a shorter wavelength laser than CD/DVD, enabling
the writing arm to focus more tightly on the disk and pack in more data. BDs can store up to
128 GB of data.
3. Memory Storage Devices: A memory device contains trillions of interconnected memory cells that
store data. When switched on or off, these cells hold millions of transistors representing 1s and 0s in
binary code, allowing a computer to read and write information. It includes USB drives, flash memory
devices, SD and memory cards, which you'll recognize as the storage medium used in digital cameras.
i. Flash Drive: A flash drive is a small, ultra-portable storage device. USB flash drives were
essential for easily moving files from one device to another. Flash drives connect to computers
and other devices via a built-in USB Type-Aor USB-C plug, making one a USB device and
cable combination.
Flash drives are often referred to as pen drives, thumb drives, or jump drives. The terms USB
drive and solid-state drive (SSD) are also sometimes used, but most of the time, those refer to
larger, not-so-mobile USB-based storage devices like external hard drives.
These days, a USB flash drive can hold up to 2 TB of storage. They're more expensive per
gigabyte than an external hard drive, but they have prevailed as a simple, convenient solution
for storing and transferring smaller files.
Pen drive has the following advantages in computer organization, such as:
o Transfer Files: A pen drive is a device plugged into a USB port of the system that is
used to transfer files, documents, and photos to a PC and vice versa.
o Portability: The lightweight nature and smaller size of a pen drive make it possible to
carry it from place to place, making data transportation an easier task.
o Backup Storage:Most of the pen drives now come with the feature of having password
encryption, important information related to family, medical records, and photos can be
stored on them as a backup.
o Transport Data: Professionals or Students can now easily transport large data files and
video, audio lectures on a pen drive and access them from anywhere. Independent PC
technicians can store work-related utility tools, various programs, and files on a high-
speed 64 GB pen drive and move from one site to another.
ii. Memory card: A memory cardor memory cartridge is an electronic data storage device used
for storing digital information, typically using flash memory. These are commonly used in
portable electronic devices, such as digital cameras, mobile phones, laptop computers, tablets,
PDAs, portable media players, video game consoles, synthesizers, electronic keyboards and
digital pianos, and allow adding memory to such devices without compromising ergonomy, as
the card is usually contained within the device rather than protruding like USB flash drives.
Below are some main differences between primary and secondary memory in computer organization.
Primary memory is directly accessed by the Secondary memory is not accessed directly by the
Central Processing Unit (CPU). Central Processing Unit (CPU). Instead, data
accessed from a secondary memory is first loaded
into Random Access Memory (RAM) and then sent
to the Processing Unit.
RAM provides a much faster-accessing speed Secondary memory is slower in data accessing.
to data than secondary memory. Computers Typically primary memory is six times faster than
can quickly process data by loading software secondary memory.
programs and required files into primary
memory (RAM).
Primary memory is volatile and gets Secondary memory provides a feature of being non-
completely erased when a computer is shut volatile, which means it can hold on to its data with
down. or without an electrical power su
RAID (Redundant arrays of independent disks) concept will be given as separate file.
PU – Processing Unit
Uniprocessors
1. MIMDs offer flexibility. With the correct hardware and software support, MIMDs
can function as single-user multiprocessors focusing on high performance for one
application, as multiprogrammed multiprocessors running many tasks simultaneously, or
as some combination of these functions.
2. MIMDs can build on the cost/performance advantages of off-the-shelf
microprocessors. In fact, nearly all multiprocessors built today use the same
microprocessors found in workstations and single-processor servers.
With an MIMD, each processor is executing its own instruction stream. In many cases,
each processor executes a different process. Recall from the last chapter, that a process is
an segment of code that may be run independently, and that the state of the process contains
all the information necessary to execute that program on a processor. In a
multiprogrammed environment, where the processors may be running independent tasks,
each process is typically independent of the processes on other processors.
It is also useful to be able to have multiple processors executing a single program and
sharing the code and most of their address space. When multiple processes share code
and data in this way, they are often called threads
. Today, the term thread is often used in a casual way to refer to multiple loci of execution
that may run on different processors, even when they do not share an address space. To
take advantage of an MIMD multiprocessor with n processors, we must usually have at
least n threads or processes to execute. The independent threads are typically identified by
the programmer or created by the compiler. Since the parallelism in this situation is
contained in the threads, it is called thread-level parallelism.
Threads may vary from large-scale, independent processes–for example, independent
programs running in a multiprogrammed fashion on different processors– to parallel
iterations of a loop, automatically generated by a compiler and each executing for
perhaps less than a thousand instructions. Although the size of a thread is important in
considering how to exploit thread-level parallelism efficiently, the important qualitative
distinction is that such parallelism is identified at a high-level by the software system and
that the threads consist of hundreds to millions of instructions that may be executed in
parallel. In contrast, instruction level parallelism is identified by primarily by the
hardware, though with software help in some cases, and is found and exploited one
instruction at a time.
Existing MIMD multiprocessors fall into two classes, depending on the number of
processors involved, which in turn dictate a memory organization and interconnect
strategy. We refer to the multiprocessors by their memory organization, because what
constitutes a small or large number of processors is likely to change over time.
The first group, which we call
centralized shared-memory architectures
Centralized shared memory architectures have at most a few dozen processors in 2000.
For multiprocessors with small processor counts, it is possible for the processors to share
a single centralized memory and to interconnect the processors and memory by a bus. With
large caches, the bus and the single memory, possibly with multiple banks, can satisfy the
memory demands of a small number of processors. By replacing a single bus with multiple
buses, or even a switch, a centralized shared memory design can be scaled to a few dozen
processors. Although scaling beyond that is technically conceivable, sharing a centralized
memory, even organized as multiple banks, becomes less attractive as the number of
processors sharing it increases.
Because there is a single main memory that has a symmetric relationship to all processors
and a uniform access time from any processor, these multiprocessors are often called
symmetric (shared-memory) multiprocessors ( SMPs), and this style of architecture is
sometimes called UMA for uniform memory access. This type of centralized shared-
memory architecture is currently by far the most popular organization.
The second group consists of multiprocessors with physically distributed memory. To
support larger processor counts, memory must be distributed among the processors rather
than centralized; otherwise the memory system would not be able to support the bandwidth
demands of a larger number of processors without incurring excessively long access
latency. With the rapid increase in processor performance and the associated increase in a
processor’s memory bandwidth requirements, the scale of multiprocessor for which
distributed memory is preferred over a single, centralized memory continues to
decrease in number (which is another reason not to use small and large scale). Of course,
the larger number of processors raises the need for a high bandwidth interconnect.
Distributed-memory multiprocessor
Distributing the memory among the nodes has two major benefits. First, it is a cost-
effective way to scale the memory bandwidth, if most of the accesses are to the local
memory in the node. Second, it reduces the latency for accesses to the local memory. These
two advantages make distributed memory attractive at smaller processor counts as
processors get ever faster and require more memory bandwidth and lower memorylatency.
The key disadvantage for a distributed memory architecture is that communicating data
between processors becomes somewhat more complex and has higher latency, at least
when there is no contention, because the processors no longer share a single centralized
memory. As we will see shortly, the use of distributed memory leads to two different
paradigms for interprocessor communication.
Typically, I/O as well as memory is distributed among the nodes of the multiprocessor,
and the nodes may be small SMPs (2–8 processors). Although the use of multiple
processors in a node together with a memory and a network interface is quite useful from
the cost-efficiency viewpoint.
FractionParallel = 0.9975
– shared memory: access shared address space implicitly via load and store
operations.
– message-passing: done by explicitly passing messages among the
processors
• can invoke software with Remote Procedure Call (RPC)
Message-Passing Multiprocessor
- The address space can consist of multiple private address spaces that are
logically disjoint and cannot be addressed by a remote processor
Multicomputer (cluster):
Developed by
• IBM – One chip multiprocessor
• AMD and INTEL- Two –Processor
• SUN – 8 processor multi core
Symmetric shared – memory support caching of
• Shared Data
• Private Data
Private data: used by a single processor
When a private item is cached, its location is migrated to the cache Since no other
processor uses the data, the program behavior is identical to that in a uniprocessor.
Cache Coherence
Unfortunately, caching shared data introduces a new problem because the view of memory
held by two different processors is through their individual caches, which, without any
additional precautions, could end up seeing two different values.
I. e, If two different processors have two different values for the same location, this
difficulty is generally referred to as cache coherence problem
Cache coherence problem for a single memory location
• Informally:
– “Any read must return the most recent write”
– Too strict and too difficult to implement
• Better:
– “Any write must eventually be seen by a read”
– All writes are seen in proper order (“serialization”)
• Two rules to ensure this:
– “If P writes x and then P1 reads it, P’s write will be seen by P1 if the read
and write are sufficiently far apart”
– Writes to a single location are serialized: seen in one order
• Latest write will be seen
• Otherwise could see writes in illogical order (could see older
value after a newer value)
Directory based
• Sharing status of a block of physical memory is kept in one location called the
directory.
• Directory-based coherence has slightly higher implementation overhead than
snooping.
• It can scale to larger processor count.
Snooping
• Every cache that has a copy of data also has a copy of the sharing status of the
block.
• No centralized state is kept.
• Caches are also accessible via some broadcast medium (bus or switch)
• Cache controller monitor or snoop on the medium to determine whether or not
they have a copy of a block that is represented on a bus or switch access.
Snooping protocols are popular with multiprocessor and caches attached to single shared
memory as they can use the existing physical connection- bus to memory, to interrogate
the status of the caches. Snoop based cache coherence scheme is implemented on a
shared bus. Any communication medium that broadcasts cache misses to all the processors.
Write Invalidate
Write Update
Example Protocol
Implementation Complications
• Write Races:
– Cannot update cache until bus is obtained
• Otherwise, another processor may get bus first,
and then write the same cache block!
– Two step process:
• Arbitrate for bus
• Place miss on bus and complete operation
– If miss occurs to block while waiting for bus,
handle miss (invalidate may be needed) and then restart.
– Split transaction bus:
• Bus transaction is not atomic:
can have multiple outstanding transactions for a block
• Multiple misses can interleave,
allowing two caches to grab block in the Exclusive state
• Must track and prevent multiple misses for one block
• Must support interventions and invalidations
Performance Measurement
• Overall cache performance is a combination of
– Uniprocessor cache miss traffic
– Traffic caused by communication – invalidation and subsequent cache
misses
• Changing the processor count, cache size, and block size can affect these two
components of miss rate
• Uniprocessor miss rate: compulsory, capacity, conflict
• Communication miss rate: coherence misses
– True sharing misses + false sharing misses
Time P1 P2
1 Write x1
2 Read x2
3 Write x1
4 Write x2
5 Read x2
Example Result
• True sharing miss (invalidate P2)
• 2: False sharing miss
– x2 was invalidated by the write of P1, but that value of x1 is not used in
P2
• 3: False sharing miss
– The block containing x1 is marked shared due to the read in P2, but P2 did
not read x1. A write miss is required to obtain exclusive access to the
block
• 4: False sharing miss
• 5: True sharing miss
• In addition to tracking the state of each cache block, we must track the processors
that have copies of the block when it is shared (usually a bit vector for each memory
block: 1 if processor has copy)
• Keep it simple(r):
– Writes to non-exclusive data => write miss
– Processor blocks until access completes
– Assume messages received and acted upon in order sent
• What is consistency? When must a processor see the new value? e.g., seems that
P1: A = 0; P2: B = 0;
..... .....
A = 1; B = 1;
L1: if (B == 0) ... L2: if (A == 0) ...
• Impossible for both if statements L1 & L2 to be true?
– What if write invalidate is delayed & processor continues?
• Memory consistency models:
what are the rules for such cases?
• Sequential consistency: result of any execution is the same as if the accesses of
each processor were kept in order and the accesses among different processors
were interleaved ⇒ assignments before ifs above
– SC: delay all memory accesses until all invalidates done
• Schemes faster execution to sequential consistency
• Not an issue for most programs; they are synchronized
– A program is synchronized if all access to shared data are ordered by
synchronization operations
write (x)
...
release (s) {unlock}
...
acquire (s) {lock}
...
read(x)
• Only those programs willing to be nondeterministic are not synchronized: “data
race”: outcome f(proc. speed)
• Several Relaxed Models for Memory Consistency since most programs are
synchronized; characterized by their attitude towards: RAR, WAR, RAW, WAW
to different addresses
Relaxed Consistency Models: The Basics
• Key idea: allow reads and writes to complete out of order, but to use
synchronization operations to enforce ordering, so that a synchronized
programbehaves as if the processor were sequentially consistent
– By relaxing orderings, may obtain performance advantages
– Also specifies range of legal compiler optimizations on shared data
– Unless synchronization points are clearly defined and programs
are synchronized, compiler could not interchange read and write
of 2 shareddata items because might affect the semantics of the
program
• 3 major sets of relaxed orderings:
1. W→R ordering (all writes completed before next read)
• Because retains ordering among writes, many programs that
operate undersequential consistency operate under this model,
without additional synchronization. Called processor consistency
2. W → W ordering (all writes completed before next write)
3. R → W and R → R orderings, a variety of models depending on
orderingrestrictions and how synchronization operations enforce
ordering
• Many complexities in relaxed consistency models; defining precisely
what it means for a write to complete; deciding when processors can see
values that it haswritten