0% found this document useful (0 votes)

42 views

Lecture 15

The document summarizes research on cache-oblivious data structures. It describes the cache-oblivious model, which models multi-level memory hierarchies without requiring algorithms to be tuned for specific cache parameters. It then discusses several approaches for implementing a cache-oblivious linked list that supports insertion, deletion, and traversal operations with efficient memory transfer bounds.

Uploaded by

Mahesh Gondi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

42 views

Lecture 15

Uploaded by

Mahesh Gondi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

6.

897: Advanced Data Structures

Spring 2003

Lecture 15 April 16, 2003

Prof. Erik Demaine Scribes: Hans Robertson and Daniel Aguayo

Review of the Cache-Oblivious Model

The last lecture introduced the external-memory model, which models a two-level memory system. The rst level of memory (the cache) of size M is partitioned into blocks of size B. The second level (main memory or disk ) is limitless but requires us to pay a charge for each access (memory/block transfer ). In this model, we measure performance by the number of memory transfers (cache misses). In the cache-oblivious model, we still measure performance by the number of memory transfers, but the algorithm has no knowledge of the values of M and B. Also, we do not have explicit control over the cache (e.g., the ability to request, fetch block i and put it in cache slot j). Instead, block fetches occur automatically, triggered by element accesses, and the cache system implements an omnisciently optimal block-replacement strategy. Ecient algorithms using this model aord us the advantage that they do not need to be tuned for a given machine or memory subsystem. Indeed, because the amount of cache memory often varies across the same machine model and is sometimes not exposed to the operating system, this tuning can be dicult in practice. More importantly, such algorithms will perform well when used on machines with multi-level memory hierarchies. For more information on caches, see [HP03].

1.1

A Simple Example

One example of an operation which translates directly to the cache-oblivious model is scanning N elements in memory. In the external-memory model, if the elements are stored in adjacent positions in memory with the rst element on a block boundary, scanning requires N/B memory transfers. In the cache-oblivious model, without knowledge of the block size B, we cannot ensure that the rst element lies on a block boundary, so scanning may require an extra block transfer: N/B + 1 transfers in total. This bound is very close to the external-memory performance.

1.2

Block Replacement

Before continuing, we address some details of the cache behavior in the cache-oblivious model. One important aspect of a cache design is the block replacement strategy that determines which current block should be evicted from the cache when we fetch a new block from a lower level of memory. There are several simple possibilities:

Least Recently Used (LRU). Evict the block which was last accessed the longest time ago. Most Recently Used. Evict the block which was most recently accessed. (This behavior seems counterintuitive, but might make sense if we know that data recently referenced is less likely to be referenced again soon.) First In, First Out (FIFO). Evict the block which has been in the cache the longest. Random. Evict a block chosen at random. Optimal (OPT). Evict the block to be accessed which will be accessed furthest in the future. Sleator and Tarjan [ST85] showed that both LRU and FIFO are within a constant factor of the optimal strategy, if we allow changing both the resource bounds (M and B) and the time bound. Specically: cost(LRU with cache of size 2M ) 2 cost(OPT with cache of size M ) and cost(FIFO with cache of size 2M ) 2 cost(OPT with cache of size M ) With this in mind, we can assume an optimal replacement strategy in the cache-oblivious model. For most algorithms, changing M by a constant factor changes the running time by only a constant factor, and we are happy.

1.3

Associativity

The cache-oblivious model further assumes that any memory block can be stored anywhere in the cache. This property is known as full associativity. Many caches allow a given block to be stored only in a subset of the available slots. If all sets are of size k, the cache is said to be k-way set associative. If k = 1, the cache is said to be direct mapped. On a new insert, the block replacement strategy is used to determine which of these k blocks should be ejected. Frigo et al. [FLPR99] showed that automatic replacement on a fully associative cache can be simulated using manual replacement on a direct-mapped cache. The basic idea is to use two-way universal hashing; by hashing the block addresses, you avoid conicts. Such a scheme is outside the cache-oblivious model, because it requires knowing B. Because of the constant-factor overheads, it is unclear whether this scheme is practical, but at least it is comforting from a theoretical point of view.

External-Memory Linked Lists

Consider the problem of storing a linked list on a machine with multiple levels of memory. First we address the problem in the external-memory model. In Section 3, we present a solution in the cache-oblivious model. Our linked list should support the following operations: 2

Insert: Insert a new element between two given elements. Delete: Delete a given element. Traverse(K): Given an element, visit the next K elements. Our goal is to support traversals in O( K/B ) memory transfers. A fairly simple data structure achieves this bound and achieves updates in O(1) memory transfers. We cluster the elements of the list into (N/B) chunks, each of size between B/2 and B. Each chunk is stored in one block. The underlying structure is an ordinary linked list. We only impose that each block stores consecutive elements in the list. As a result, traversals take O( K/B ) memory transfers. Updates are also straightforward. An insertion proceeds as in a normal linked list, except if a memory block becomes too full, in which case we split the block in two and put the new block at the end of memory. Similarly, if deleting an element results in a block merge and a resulting hole in memory, we just move the last block in memory into the gap. Each update takes O(1) memory transfers. In the external-memory model with two levels, cache and disk, elements may be arbitrarily ordered within each block, and the blocks may be stored in arbitrary locations. If we were to try to extend this scheme to multiple levels, the layout would be more restricted in order to avoid higher-level cache misses. In this case, the data structure gets worse because of O(L) levels of recursion to handle L levels of memory. Knowledge of B is essential to this scheme. In the next section we devise a cache-oblivious scheme.

Cache-Oblivious Linked Lists

A linked list in the cache-oblivious model is more challenging.

3.1

Via Ordered-File Maintenance

If we use the ordered-le maintenance structure presented in Lecture 14, we immediately obtain the following bounds: Insert and Delete in O((lg2 N )/B) amortized memory transfers, assuming M 2B. When inserting an element, we have two interleaved scans, one to the left and one to the right of the new element. Rebalancing also uses two such scans. Traversal(K) in O( K/B ) memory transfers. Because of the lower bound we imposed on the density of elements in the structure, the gaps between elements are of O(1) size. This is the best cache-oblivious update bound known if we want a O( K/B ) worst-case traversal bound.

3.2

Sacricing Traversals Slightly

Bender et al. [BCDF00] have shown that if we relax the worst-case traversal bound slightly and store the elements out-of-order, we can do updates better. This is somewhat counterintuitive because, by not knowing B, we dont know just how disordered the elements can be. Nonetheless, the solution supports updates in O((lg lg n)2+ /B) amortized memory transfers, for any > 0. Traversal(K) uses O( K/B + [B if K B 1 ]) transfers. This traversal bound is suboptimal only when K is in the range from B 1 to B. The algorithm which provides these bounds is somewhat complicated, so we will describe another.

3.3

Self-Adjusting Data Structure

We will describe a data structure which achieves the same bounds as the external-memory data structure, except that both bounds become amortized: Insert and Delete in O(1) amortized memory transfers. Traversal(K) in O( K/B ) amortized memory transfers. Updates with a constant number of memory transfers are simple. For Insert, we just put the new element at the end of memory, and update the pointers. For Delete, we just update the pointers, and erase the element. We dont make any eort to ll the gap. Both operations run in O(1) transfers. For Traversal, we simply visit the elements. If were lucky, and the elements were inserted in order, then we have a run of consecutive elements and the traversal uses the desired number of memory transfers. In the worst case, however, we might end up with K disjoint runs.

Figure 1: A linked list with 3 runs. We dene r to be the actual number of runs. Then the cost of the traversal is no more than 2r + O( K/B ). To improve the amortized performance, we can easily merge r 2 runsall but the rst and lastinto one run at the end of memory. We omit the rst and last runs because moving these might break an existing run, negating our attempted improvement.

3.4

Analysis of Traversals

Every run in the data structure must have been created by an update. The cost of a traversal is therefore O( K/B ) amortized, because we can charge r 3 of the r + O( K/B ) cost to the updates which created those runs and were removed by the Traversal.

3.5

Recompactication

In this scheme, each traversal might increase the size of the data structure, leaving long gaps in the middle. Thus, when the size of the structure grows by a constant factor, it is necessary to recompactify by shifting elements left to ll the holes, and updating all the pointers. Recompactifying requires O(N ) memory transfers, but it too amortizes to O(1).

References
[BCDF00] Michael A. Bender, Richard Cole, Erik D. Demaine, and Martin Farach-Colton, Scanning and Traversing: Maintaining Data for Traversals in a Memory Hierarchy, in Proceedings of the 10th Annual European Symposium on Algorithms (ESA 2002), Lecture Notes in Computer Science, volume 2461, Rome, Italy, September 17-21, 2002, pages 139-151. [FLPR99] M. Frigo, C.E. Leiserson, H. Prokop, and S. Ramachandran, Cache-oblivious algorithms, in Proc. 40th IEE Sympos. Found. Comp. Sci., New York, Oct. 1999, pages 285-297. [HP03] [ST85] John Hennessy and David Patterson, Computer Architecture: A Quantitative Approach.3rd Ed. Morgan Kaufmann, 2003. Daniel Sleator and Robert Tarjan,Amortized eciency of list updates and paging rules, Communications of the ACM, 28:202208, 1985.

C & C++ Interview Questions You'll Most Likely Be Asked
From Everand
C & C++ Interview Questions You'll Most Likely Be Asked
Vibrant Publishers
No ratings yet
Elements of Cache Design
No ratings yet
Elements of Cache Design
6 pages
Interview Program Questions
No ratings yet
Interview Program Questions
20 pages
Cache-Oblivious Data Structures
No ratings yet
Cache-Oblivious Data Structures
29 pages
10 1 1 92 377 PDF
No ratings yet
10 1 1 92 377 PDF
22 pages
Cache-Oblivious Algorithms: E A Matteo Frigo Charles E. Leiserson Harald Prokop Sridhar Ramachandran
No ratings yet
Cache-Oblivious Algorithms: E A Matteo Frigo Charles E. Leiserson Harald Prokop Sridhar Ramachandran
13 pages
Cache Oblivious Algorithms
No ratings yet
Cache Oblivious Algorithms
13 pages
Programming Assignment Help
No ratings yet
Programming Assignment Help
2 pages
6.851 Advanced Data Structures (Spring'12) Prof. Erik Demaine Problem 4 Sample Solution
No ratings yet
6.851 Advanced Data Structures (Spring'12) Prof. Erik Demaine Problem 4 Sample Solution
2 pages
Frigo CacheOblivious FOCS99
No ratings yet
Frigo CacheOblivious FOCS99
15 pages
GG FEb 0 ODBVx X76 MBHC 64 R 9 Hux BPYax Ab 6 NRV QGNW OV0 Yh RRRDYo 5 o 0 QJTN FC ZZ O
No ratings yet
GG FEb 0 ODBVx X76 MBHC 64 R 9 Hux BPYax Ab 6 NRV QGNW OV0 Yh RRRDYo 5 o 0 QJTN FC ZZ O
10 pages
CA_L7a_SoftwareTechniquesToImproveCachePerformance
No ratings yet
CA_L7a_SoftwareTechniquesToImproveCachePerformance
15 pages
os paper answers
No ratings yet
os paper answers
21 pages
L17
No ratings yet
L17
23 pages
External Memory Alorithm
No ratings yet
External Memory Alorithm
5 pages
IXY 3Pf7Eei5Kg7DUflKxA Course Notes MOOC IO
No ratings yet
IXY 3Pf7Eei5Kg7DUflKxA Course Notes MOOC IO
36 pages
Lesson 3 4 Cache Oblivious Algorithms: The Ideal Cache Model
No ratings yet
Lesson 3 4 Cache Oblivious Algorithms: The Ideal Cache Model
4 pages
QB-OS Solved Macro
No ratings yet
QB-OS Solved Macro
5 pages
Cache-Oblivious String Dictionaries
No ratings yet
Cache-Oblivious String Dictionaries
10 pages
2008 - Evaluation of a Cache-Oblivious Data Structure
No ratings yet
2008 - Evaluation of a Cache-Oblivious Data Structure
10 pages
Algorithm For Page Replacement
No ratings yet
Algorithm For Page Replacement
9 pages
Binary Trees
No ratings yet
Binary Trees
11 pages
Onur 447 Spring15 Lecture19 High Performance Caches Afterlecture
No ratings yet
Onur 447 Spring15 Lecture19 High Performance Caches Afterlecture
57 pages
9 - CH05 - Cache Memory Organization
No ratings yet
9 - CH05 - Cache Memory Organization
27 pages
Data Structure
No ratings yet
Data Structure
68 pages
CH04 COA10e
No ratings yet
CH04 COA10e
41 pages
Set Associative Mapping: 2 Way Associative Mapping A Given Block Can Be in One of 2 Lines in Only One Set
No ratings yet
Set Associative Mapping: 2 Way Associative Mapping A Given Block Can Be in One of 2 Lines in Only One Set
13 pages
15 SearchTrees
No ratings yet
15 SearchTrees
67 pages
Lecture 17
No ratings yet
Lecture 17
9 pages
Algorithms, Fall 2005. (Massachusetts Institute of Technology: MIT
No ratings yet
Algorithms, Fall 2005. (Massachusetts Institute of Technology: MIT
18 pages
CSCI220 Final Exam
No ratings yet
CSCI220 Final Exam
8 pages
Chapter 4 (Continued) : Caching Testing Memory Modules
No ratings yet
Chapter 4 (Continued) : Caching Testing Memory Modules
20 pages
Swapping Algorithms
No ratings yet
Swapping Algorithms
22 pages
Lec 30
No ratings yet
Lec 30
22 pages
A Course in In-Memory Data Management: Prof. Hasso Plattner
No ratings yet
A Course in In-Memory Data Management: Prof. Hasso Plattner
9 pages
Cache Mapping
No ratings yet
Cache Mapping
11 pages
Arrays LinkedList CheatSheet
No ratings yet
Arrays LinkedList CheatSheet
7 pages
Introduction To Algorithms, Recitation 2
No ratings yet
Introduction To Algorithms, Recitation 2
10 pages
Data Structures (KCS301)
100% (1)
Data Structures (KCS301)
21 pages
CH04 Cache Memory
No ratings yet
CH04 Cache Memory
44 pages
Com Arch Lec Slide 3 2
No ratings yet
Com Arch Lec Slide 3 2
31 pages
Os Practical File
No ratings yet
Os Practical File
18 pages
Lecture 8
No ratings yet
Lecture 8
22 pages
CSC Assignment 123
No ratings yet
CSC Assignment 123
7 pages
IAT-III Question Paper with Solution of BCS303 Operating Systems March-2024-Attar Mahay Sheetal
No ratings yet
IAT-III Question Paper with Solution of BCS303 Operating Systems March-2024-Attar Mahay Sheetal
13 pages
Lab3-suppl
No ratings yet
Lab3-suppl
25 pages
Unit 6
No ratings yet
Unit 6
25 pages
Data Oriented Design
No ratings yet
Data Oriented Design
17 pages
William Stallings Computer Organization and Architecture 9 Edition
No ratings yet
William Stallings Computer Organization and Architecture 9 Edition
46 pages
Cache Organization 2
No ratings yet
Cache Organization 2
22 pages
Lect 5 Data Structure
No ratings yet
Lect 5 Data Structure
7 pages
William Stallings Computer Organization and Architecture 9 Edition
No ratings yet
William Stallings Computer Organization and Architecture 9 Edition
46 pages
DSA Activity Lec 2
No ratings yet
DSA Activity Lec 2
8 pages
Assignment # 2 DSA Lab
No ratings yet
Assignment # 2 DSA Lab
10 pages
os exam
No ratings yet
os exam
10 pages
Name: Shawal Ahammed Prince STUDENT ID: SUKD1902646 Tutorial 3 Memory Management - Virtual Memory
No ratings yet
Name: Shawal Ahammed Prince STUDENT ID: SUKD1902646 Tutorial 3 Memory Management - Virtual Memory
4 pages
STLs
No ratings yet
STLs
7 pages
CDT25 CacheMemory
No ratings yet
CDT25 CacheMemory
7 pages
Lists: ECE 250 Algorithms and Data Structures
No ratings yet
Lists: ECE 250 Algorithms and Data Structures
28 pages
chapter 5
No ratings yet
chapter 5
16 pages
Bhavesh Mahadu Nikam Provisional Offer Letter
No ratings yet
Bhavesh Mahadu Nikam Provisional Offer Letter
11 pages
Rajesh Tambe
No ratings yet
Rajesh Tambe
1 page
Design of Single Phase Cyclo-Converter Using Cascaded Multilevel Inverter
No ratings yet
Design of Single Phase Cyclo-Converter Using Cascaded Multilevel Inverter
5 pages
M100 Grade Concrete
No ratings yet
M100 Grade Concrete
6 pages
Courcelle's Theorem A Self-Contained Proof and A Path-Width Variant
No ratings yet
Courcelle's Theorem A Self-Contained Proof and A Path-Width Variant
252 pages
The Different Functions of Intonation
No ratings yet
The Different Functions of Intonation
30 pages
314328 (1)
No ratings yet
314328 (1)
328 pages
Fabrication of Sheet Metal Cutting
No ratings yet
Fabrication of Sheet Metal Cutting
14 pages
Management Services
No ratings yet
Management Services
6 pages
BAIN BRIEF Leading A Digical Transformation
No ratings yet
BAIN BRIEF Leading A Digical Transformation
16 pages
GX 470+ride+height
No ratings yet
GX 470+ride+height
5 pages
Hypothesis 2
No ratings yet
Hypothesis 2
26 pages
Policy of Prescription Pad
No ratings yet
Policy of Prescription Pad
2 pages
KK C55 User Manual
No ratings yet
KK C55 User Manual
5 pages
"A Celebration of Stars": Save The Date
No ratings yet
"A Celebration of Stars": Save The Date
4 pages
zf4hp16 PDF
100% (2)
zf4hp16 PDF
3 pages
Instant Download The Geography of Transport Systems 3rd Edition Jean-Paul Rodrigue PDF All Chapters
100% (1)
Instant Download The Geography of Transport Systems 3rd Edition Jean-Paul Rodrigue PDF All Chapters
55 pages
Emergency Nursing
100% (1)
Emergency Nursing
57 pages
1426 Ec2262
No ratings yet
1426 Ec2262
3 pages
2025 Adventurers Challenge
No ratings yet
2025 Adventurers Challenge
5 pages
Application For Issuance of Degree
No ratings yet
Application For Issuance of Degree
1 page
Internal Audit Policy
No ratings yet
Internal Audit Policy
14 pages
Active Suspension: Eself Study Program 960393
No ratings yet
Active Suspension: Eself Study Program 960393
33 pages
Moments Chapter 2 The Adventures of Toto
0% (1)
Moments Chapter 2 The Adventures of Toto
5 pages
VOTV
No ratings yet
VOTV
24 pages
Singularity University SU EB The Exponential Leaders Guide To Disruption EN
No ratings yet
Singularity University SU EB The Exponential Leaders Guide To Disruption EN
18 pages
Overview MSSL
No ratings yet
Overview MSSL
23 pages
IOT Based Smart Cradle System With An App For Baby Monitoring
No ratings yet
IOT Based Smart Cradle System With An App For Baby Monitoring
4 pages
Is 2102-2 (Iso 2768-2) - 2
No ratings yet
Is 2102-2 (Iso 2768-2) - 2
1 page
BMW Performance n52
No ratings yet
BMW Performance n52
237 pages

Lecture 15

Uploaded by

Lecture 15

Uploaded by

6.

897: Advanced Data Structures

Lecture 15 April 16, 2003

Review of the Cache-Oblivious Model

External-Memory Linked Lists

Cache-Oblivious Linked Lists

A linked list in the cache-oblivious model is more challenging.

Via Ordered-File Maintenance

Sacricing Traversals Slightly

Self-Adjusting Data Structure

You might also like