Lab3 Suppl
Lab3 Suppl
Blocking
1
Outline
Memory organization
Caching
Different types of locality
Cache organization
Cache lab
Cache Structure
getopt/fscanf/Malloc
Page Replacement
LRU algorithm
FIFO algorithm
2
Memory Hierarchy
CPU registers hold words retrieved from L1
cache
Smaller, L0:
faster,
costlier Registers
per byte
L1 cache holds cache lines retrieved from
L1: L1 cache L2 cache
(SRAM)
3
SRAM vs DRAM tradeoff
SRAM (cache)
Faster (L1 cache: 1 CPU cycle)
Smaller (Kilobytes (L1) or Megabytes (L2))
More expensive and “energy-hungry”
DRAM (main memory)
Relatively slower (hundreds of CPU cycles)
Larger (Gigabytes)
Cheaper
4
Locality
Temporal locality
Recently referenced items are likely
to be referenced again in the near future
After accessing address X in memory, save the bytes in cache for
future access
Spatial locality
Items with nearby addresses tend
to be referenced close together in time
After accessing address X, save the block of memory around X in
cache for future access
5
Memory Address
For example, 64-bit on shark machines
6
Cache
A cache is a set of 2^s cache sets
7
Visual Cache Terminology
E lines per set
Address of word:
t bits s bits b bits
S = 2s sets
tag set block
index offset
v tag 0 1 2 B-1
valid bit
B = 2b bytes per cache block (the data)
8
General Cache Concepts
4 5 6 7
8 9 10 11
12 13 14 15
9
General Cache Concepts: Miss
Request: 12 Data in block b is needed
Cache
Block b is not in cache:
8 9
12 14 3
Miss!
Memory
Block b is stored in cache
0 1 2 3 • Placement policy:
4 5 6 7 determines where b goes
• Replacement policy:
8 9 10 11
determines which block
12 13 14 15 gets evicted (victim)
10
General Caching Concepts:
Types of Cache Misses
Cold (compulsory) miss
The first access to a block has to be a miss
Conflict miss
Conflict misses occur when the level k cache is large enough, but multiple
data objects all map to the same level k block
E.g., Referencing blocks 0, 8, 0, 8, 0, 8, ... would miss every time
Capacity miss
Occurs when the set of active cache blocks (working set) is larger than
the cache
11
Cache Simulator
A cache simulator is NOT a cache!
Memory contents NOT stored
Block offsets are NOT used – the b bits in your address don’t
matter.
Simply count hits, misses, and evictions
12
Cache structure
A cache is just 2D array of cache lines:
struct cache_line cache[S][E];
S = 2^s, is the number of sets
E is associativity
13
getopt
getopt() automates parsing elements on the unix
command line If function declaration is missing
Typically called in a loop to retrieve arguments
Its return value is stored in a local variable
When getopt() returns -1, there are no more options
14
getopt
A switch statement is used on the local variable holding
the return value from getopt()
Each command line input case can be taken care of separately
“optarg” is an important variable – it will point to the value of the
option argument
15
getopt Example
int main(int argc, char** argv){
int opt,x,y;
/* looping over arguments */
while(-1 != (opt = getopt(argc, argv, “x:y:"))){
/* determine which argument it’s processing */
switch(opt) {
case 'x':
x = atoi(optarg);
break;
case ‘y':
y = atoi(optarg);
break;
default:
printf(“wrong argument\n");
break;
}
}
}
Suppose the program executable was called “foo”.
Then we would call “./foo -x 1 –y 3“ to pass the value 1
to variable x and 3 to y.
16
fscanf
The fscanf() function is just like scanf() except it can specify
a stream to read from (scanf always reads from stdin)
parameters:
A stream pointer
format string with information on how to parse the file
the rest are pointers to variables to store the parsed data
You typically want to use this function in a loop. It returns -1 when
it hits EOF or if the data doesn’t match the format string
For more information,
man fscanf
http://crasseux.com/books/ctutorial/fscanf.html
fscanf will be useful in reading lines from the trace files.
L 10,1
M 20,1
17
fscanf example
FILE * pFile; //pointer to FILE object
char identifier;
unsigned address;
int size;
// Reading lines like " M 20,1" or "L 19,3"
18
Malloc/free
Use malloc to allocate memory on the heap
19
Page Replacement Algorithms
When cache is full, a cached data should be replaced with the
new data
The new data is referred right now and just used
To find a space to hold the new data, choose a data which was
previously cached as a victim
The victim data is least likely used
Algorithms
First-In-First-Out (FIFO) Algorithm
Optimal Algorithm
Least Recently Used (LRU) Algorithm
Implementation of the algorithms in the cachelab
LRU should be implemented by default
FIFO and optimal algorithms could be implemented for extra credits
20
First-In-First-Out (FIFO) Algorithm
Reference string: 7,0,1,2,0,3,0,4,2,3,0,3,0,3,2,1,2,0,1,7,0,1
Cache: Direct Mapped, 3 sets are instantiated
Note that the number of sets must be 2𝑠 in your actual cachelab.
Page: data block
21
Optimal Algorithm
Replace page(data block) that will not be used for longest
period of time
Used for measuring how well your algorithm performs
23
LRU Implementation
Counter implementation
Every page entry has a counter; every time page is referenced
through this entry, copy the clock into the counter
When a page needs to be changed, look at the counters to find
smallest value
Search through table needed
Stack implementation
Keep a stack of page numbers in a double link form:
Page referenced:
move it to the top
requires 6 pointers to be changed
But each update more expensive
No search for replacement
24
LRU Implementation Example with Stack
25