BiD 05
BiD 05
Nizamettin AYDIN
[email protected]
http://www.yildiz.edu.tr/~naydin
Introduction
• Blocks: 8 or 16 bytes
• Tags: location in main memory
• Cache controller
— hardware that checks tags
• Cache Line
— Unit of transfer between storage and cache memory
• Hit Ratio: ratio of hits out of total requests
• Synchronizing cache and memory
— Write through
— Write back
Step-by-Step Use of Cache
Step-by-Step Use of Cache
Cache vs. Virtual Memory
• Cache speeds up memory access
• Virtual memory increases amount of
perceived storage
—independence from the configuration and
capacity of the memory system
—low cost per bit
Cache/Main Memory Structure
Cache
line Main Memory blocks held
0 000000, 010000, ...,
FF0000
1 000004, 010004, ...,
FF0004
. .
. .
. .
214-1 00FFFC, 01FFFC, ..., FFFFFC
For example:
For the memory location 16339C
1 6 3 3 9 C
0001 0110 0011 0011 1001 1100
Word = 0000 = 0
Line = 0000110011100111 = 0CE7
Tag= 0001 0110 = 16
Direct Mapping Summary
• Address length = (s + w) bits
• Number of addressable units = 2s+w words
or bytes
• Block size = line size = 2w words or bytes
• Number of blocks in main memory =
2s+w/2w = 2s
• Number of lines in cache = m = 2r
• Size of tag = (s – r) bits
Direct Mapping pros & cons
• Simple
• Inexpensive
• Fixed location for given block
—If a program accesses 2 blocks that map to the
same line repeatedly, cache misses are very
high
Associative Mapping
• A main memory block can load into any
line of cache
• Memory address is interpreted as tag and
word
• Tag uniquely identifies block of memory
• Every line’s tag is examined for a match
• Cache searching gets expensive
Fully Associative Cache Organization
Associative
Mapping Example
Mapping for the example:
For the memory location 16339C
• Memory address
1 6 3 3 9 C
0001 0110 0011 0011 1001 1100
Word = 0000 =0
Tag=
0000 0101 1000 1100 1110 0111
0 5 8 C E 7
Associative Mapping
Address Structure (for given example)
Word
Tag 22 bit 2 bit
• 22 bit tag stored with each 32 bit block of data
• Compare tag field with tag entry in cache to
check for hit
• Least significant 2 bits of address identify which
16 bit word is required from 32 bit data block
• e.g.
— Address Tag Data Cache
line
— FFFFFC 3FFFFF 24682468 3FFF
Associative Mapping Summary
• Address length = (s + w) bits
• Number of addressable units = 2s+w words
or bytes
• Block size = line size = 2w words or bytes
• Number of blocks in main memory =
2s+w/2w = 2s
• Number of lines in cache = undetermined
• Size of tag = s bits
Set Associative Mapping
• Cache is divided into a number of sets
• Each set contains a number of lines
• A given block maps to any line in a given
set
—e.g. Block B can be in any line of set i
• e.g. 2 lines per set
—2 way associative mapping
—A given block can be in one of 2 lines in only
one set
Two Way Set Associative Cache
Organization
Set Associative Mapping
Example
• 13 bit set number
• Block number in main memory is modulo
213
• 000000, 00A000, 00B000, 00C000 … map
to same set
Two Way
Set
Associative
Mapping
Example
Mapping for the example:
For the memory location 16339C
Memory address
1 6 3 3 9 C
0001 0110 0011 0011 1001 1100
Word = 0000 =0
Word
Tag 9 bit Set 13 bit 2 bit
• Main memory and virtual memory are divided into equal sized
pages.
• The entire address space required by a process need not be
in memory at once. Some parts can be on disk, while others
are in main memory.
• Further, the pages allocated to a process do not need to be
stored contiguously-- either on disk or in memory.
• In this way, only the needed pages are in memory at any
time, the unnecessary pages are in slower disk storage.
Virtual Memory
• If the valid bit is zero in the page table entry for the logical
address, this means that the page is not in memory and must be
fetched from disk.
— This is a page fault.
— If necessary, a page is evicted from memory and is
replaced by the page retrieved from disk, and the valid bit
is set to 1.
• If the valid bit is 1, the virtual page number is replaced by the
physical frame number.
• The data is then accessed by adding the offset to the physical
frame number.
Virtual Memory
If the valid bit is zero in the page table entry for the
logical address, this means that the page is not in memory
and must be fetched from disk.
This is a page fault.
If necessary, a page is evicted from memory and is
replaced by the page retrieved from disk, and the valid
bit is set to 1.
Virtual Memory
The next slide shows how all the pieces fit together.
Virtual Memory
Virtual Memory
• Large page tables are cumbersome and slow, but with its
uniform memory mapping, page operations are fast.
Segmentation allows fast access to the segment table, but
segment loading is labor-intensive.
• Paging and segmentation can be combined to take advantage of
the best features of both by assigning fixed-size pages within
variable-sized segments.
• Each segment has a page table. This means that a memory
address will have three fields, one for the segment, another for
the page, and a third for the offset.
Real-World Example
• Both volatile
— Power needed to preserve data
• Dynamic cell
— Simpler to build, smaller
— More dense
— Less expensive
— Needs refresh
— Larger memory units
• Static
— Faster
— Cache
Advanced DRAM Organization
• Basic DRAM same since first RAM chips
• Enhanced DRAM
—Contains small SRAM as well
—SRAM holds last line read (c.f. Cache!)
• Cache DRAM
—Larger SRAM component
—Use as cache or serial buffer
Synchronous DRAM (SDRAM)
• Access is synchronized with an external clock
• Address is presented to RAM
• RAM finds data (CPU waits in conventional DRAM)
• Since SDRAM moves data in time with system
clock, CPU knows when data will be ready
• CPU does not have to wait, it can do something
else
• Burst mode allows SDRAM to set up stream of
data and fire it out in block
• DDR-SDRAM sends data twice per clock cycle
(leading & trailing edge)
SDRAM
RAMBUS
• Adopted by Intel for Pentium & Itanium
• Main competitor to SDRAM
• Vertical package – all pins on one side
• Data exchange over 28 wires < cm long
• Bus addresses up to 320 RDRAM chips at
1.6Gbps
• Asynchronous block protocol
—480ns access time
—Then 1.6 Gbps
RAMBUS Diagram
DDR SDRAM
• SDRAM can only send data once per clock
• Double-data-rate SDRAM can send data
twice per clock cycle
—Rising edge and falling edge
Cache DRAM
• Mitsubishi
• Integrates small SRAM cache (16 kb) onto
generic DRAM chip
• Used as true cache
—64-bit lines
—Effective for ordinary random access
• To support serial access of block of data
—E.g. refresh bit-mapped screen
– CDRAM can prefetch data from DRAM into SRAM
buffer
– Subsequent accesses solely to SRAM
Read Only Memory (ROM)
• Permanent storage
—Nonvolatile
• Used in...
—Microprogramming
—Library subroutines
—Systems programs (BIOS)
—Function tables
Types of ROM