0% found this document useful (0 votes)

82 views24 pages

This Unit: Caches: - Basic Memory Hierarchy Concepts

This document discusses caches and memory hierarchies. It provides details on cache organization and different memory technologies like SRAM, DRAM, and disks. It explains how caches exploit locality to reduce average memory access time and describes the evolution of cache hierarchies in modern processors.

Uploaded by

jameskipkosgei

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

82 views24 pages

This Unit: Caches: - Basic Memory Hierarchy Concepts

Uploaded by

jameskipkosgei

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 24

This Unit: Caches

Application OS

Basic memory hierarchy concepts

Firmware I/O

CIS 501 Computer Architecture

Unit 5: Caches
Slides developed by Milo Martin & Amir Roth at the University of Pennsylvania with sources that included University of Wisconsin slides by Mark Hill, Guri Sohi, Jim Smith, and David Wood.

Compiler CPU

Speed vs capacity

Caches Advanced memory hierarchy Later

Virtual memory

Memory Digital Circuits Gates & Transistors

Note: basic caching should be review, but some new stuff

1 CIS 501 (Martin): Caches 2

CIS 501 (Martin): Caches

Readings
MA:FSPTCM
Section 2.2 Sections 6.1, 6.2, 6.3.1

Start-of-class Exercise
Youre a researcher
You frequently use books from the library Your productivity is reduced while waiting for books

Paper:
Jouppi, Improving Direct-Mapped Cache Performance by the Addition of a Small Fully-Associative Cache and Prefetch Buffers, ISCA 1990 ISCAs most influential paper award awarded 15 years later

How do you:
Coordinate/organize/manage the books? Fetch the books from the library when needed How do you reduce overall waiting? What techniques can you apply? Consider both simple & more clever approaches

CIS 501 (Martin): Caches

Analogy Partly Explained

Youre a processor designer
The processor frequently use data from the memory The processors performance is reduced while waiting for data

Big Picture Motivation

Processor can compute only as fast as memory
A 3Ghz processor can execute an add operation in 0.33ns Todays Main memory latency is more than 33ns Nave implementation: loads/stores can be 100x slower than other operations

How does the processor:

Coordinate/organize/manage the data Fetch the data from the memory when needed How do you reduce overall memory latency? What techniques can you apply? Consider both simple & more clever approaches

Unobtainable goal:
Memory that operates at processor speeds Memory as large as needed for all running programs Memory that is cost effective

Cant achieve all of these goals at once

Example: latency of an SRAM is at least: sqrt(number of bits)

CIS 501 (Martin): Caches

Types of Memory
Static RAM (SRAM)
6 or 8 transistors per bit Two inverters (4 transistors) + transistors for reading/writing Optimized for speed (first) and density (second) Fast (sub-nanosecond latencies for small SRAM) Speed roughly proportional to its area Mixes well with standard processor logic

Dynamic RAM (DRAM)

Memories (SRAM & DRAM)

1 transistor + 1 capacitor per bit Optimized for density (in terms of cost per bit) Slow (>40ns internal access, ~100ns pin-to-pin) Different fabrication steps (does not mix well with logic)

Nonvolatile storage: Magnetic disk, Flash RAM

CIS 501 (Martin): Caches 7 CIS 501 (Martin): Caches 8

SRAM Circuit Implementation

SRAM:
address Six transistors (6T) cells 4 for the cross-coupled inverters 2 access transistors

DRAM
DRAM: dynamic RAM
address Bits as capacitors Transistors as ports 1T cells: one access transistor per bit

Static
Cross-coupled inverters hold state

Dynamic means
Capacitors not connected to power/ground Stored charge decays over time Must be explicitly refreshed

To read
Equalize (pre-charge to 0.5), swing, amplify

To write
sense amp.
CIS 501 (Martin): Caches

Designed for density

data1 data0 + ~68X denser than SRAM But slower too
10

Overwhelm
9

~data1

data1 ~data0

data0

CIS 501 (Martin): Caches

DRAM: Capacitor Storage

drain gate
trench capacitor

Memory & Storage Technologies

Cost - what can $200 buy (2009)?
SRAM: 16MB DRAM: 4,000MB (4GB) 250x cheaper than SRAM Flash: 64,000MB (64GB) 16x cheaper than DRAM Disk: 2,000,000MB (2TB) 32x vs. Flash (512x vs. DRAM) SRAM: <1 to 2ns (on chip) DRAM: ~50ns 100x or more slower than SRAM Flash: 75,000ns (75 microseconds) 1500x vs. DRAM Disk: 10,000,000ns (10ms) 133x vs Flash (200,000x vs DRAM) SRAM: 300GB/sec (e.g., 12-port 8-byte register file @ 3Ghz) DRAM: ~25GB/s Flash: 0.25GB/s (250MB/s), 100x less than DRAM Disk: 0.1 GB/s (100MB/s), 250x vs DRAM, sequential access only
12

DRAM process
Same basic materials/steps as CMOS But optimized for DRAM

Trench capacitors ++++

Conductor in insulated trenches Stores charge (or lack of charge) Stored charge leaks over time

Latency

IBMs embedded (on-chip) DRAM

Fabricate processors with some DRAM Denser than on-chip SRAM Slower than on-chip SRAM More processing steps (more $$$)
11

Bandwidth

CIS 501 (Martin): Caches

Memory Technology Trends

The Memory Wall

+35 to 55% Cost Log scale +7%

Copyright Elsevier Scientific 2003

Processors are get faster more quickly than memory (note log scale)
Copyright Elsevier Scientific 2003

Access Time
13

Processor speed improvement: 35% to 55% Memory latency improvement: 7%

CIS 501 (Martin): Caches 14

CIS 501 (Martin): Caches

Known From the Beginning

Ideally, one would desire an infinitely large memory capacity such that any particular word would be immediately available We are forced to recognize the possibility of constructing a hierarchy of memories, each of which has a greater capacity than the preceding but which is less quickly accessible.

The Memory Hierarchy

Burks, Goldstine, VonNeumann Preliminary discussion of the logical design of an electronic computing instrument IAS memo 1946
15 CIS 501 (Martin): Caches 16

CIS 501 (Martin): Caches

Locality to the Rescue

Locality of memory references
Property of real programs, few exceptions Books and library analogy (next slide)

Library Analogy
Consider books in a library Library has lots of books, but it is slow to access
Far away (time to walk to the library) Big (time to walk within the library)

Temporal locality
Recently referenced data is likely to be referenced again soon Reactive: cache recently used data in small, fast memory

How can you avoid these latencies?

Check out books, take them home with you Put them on desk, on bookshelf, etc. But desks & bookshelves have limited capacity Keep recently used books around (temporal locality) Grab books on related topic at the same time (spatial locality) Guess what books youll need in the future (prefetching)
17 CIS 501 (Martin): Caches 18

Spatial locality
More likely to reference data near recently referenced data Proactive: fetch data in large chunks to include nearby data

Holds for data and instructions

CIS 501 (Martin): Caches

Library Analogy Explained

Registers books on your desk
Actively being used, small capacity

Exploiting Locality: Memory Hierarchy

CPU M1 M2

Hierarchy of memory components

Upper components Fast Small Expensive Lower components Slow Big Cheap

Caches bookshelves
Moderate capacity, pretty fast to access

Connected by buses
Which also have latency and bandwidth issues

Main memory library

Big, holds almost all data, but slow M3

Most frequently accessed data in M1

M1 + next most frequently accessed in M2, etc. Move data up-down hierarchy

Disk (virtual memory) inter-library loan

Very slow, but hopefully really uncommon M4
CIS 501 (Martin): Caches 19 CIS 501 (Martin): Caches

Optimize average access time

latencyavg = latencyhit + %miss*latencymiss Attack each component
20

Concrete Memory Hierarchy

Processor Regs

Evolution of Cache Hierarchies

64KB I$ 8KB I/D$ 64KB D$

0th level: Registers 1st level: Primary caches Compiler

Managed

Split instruction (I$) and data (D$) Typically 8KB to 64KB each

L2, L3

2nd level: 2nd and 3rd cache (L2, L3)

Hardware On-chip, typically made of SRAM Managed nd

Main Memory

2 level typically ~256KB to 512KB Last level cache typically 4MB to 16MB Made of DRAM (Dynamic RAM) Typically 1GB to 4GB for desktops/laptops Servers can have 100s of GB Uses magnetic disks or flash drives

1.5MB L2 256KB L2 (private) Intel 486 8MB L3 (shared)

3rd level: main memory

Software Managed (by OS)

L3 (quad Intel Core i7tags core)

Disk
CIS 501 (Martin): Caches

4th level: disk (swap and files)

Chips today are 3070% cache by area

CIS 501 (Martin): Caches 22

This Unit: Caches

CPU I$ L2 D$

Memory and Disk

CPU I$ L2$ D$

Cache: hardware managed

Hardware automatically retrieves missing data Built from fast SRAM, usually on-chip today In contrast to off-chip, DRAM main memory

Main memory
DRAM-based memory systems Virtual memory

Cache organization
ABC Miss classification

Disks and Storage

Disks vs Flash Disk arrays (for performance and reliability)

High-performance techniques
Main Memory Reducing misses Improving miss penalty Improving hit latency Main Memory

Not covering disks this year

Make room for more on multicore

Some example performance calculations

Disk
CIS 501 (Martin): Caches 23

Disk
CIS 501 (Martin): Caches 24

Warmup
What is a hash table?
What is it used for? How does it work?

Short answer:
Maps a key to a value Constant time lookup/insert Have a table of some size, say N, of buckets Take a key value, apply a hash function to it Insert and lookup a key at hash(key) modulo N Need to store the key and value in each bucket Need to check to make sure the key matches Need to handle conflicts/overflows somehow (chaining, re-hashing)
25 CIS 501 (Martin): Caches 26

Cache Basics

CIS 501 (Martin): Caches

Basic Memory Array Structure

Number of entries
where n is number of address bits Example: 1024 entries, 10 bit address Decoder changes n-bit address to 2n bit one-hot signal One-bit address travels on wordlines 2n,
0 1 2 3

FYI: Physical Memory Layout

Logical layout
1024*256bit SRAM
0 1 255 512 513 767

Arrays are vertically contiguous

Physical layout - roughly square

Vertical partitioning to minimize wire lengths H-tree: horizontal/vertical partitioning layout Applied recursively Each node looks like an H

10 bits

wordlines

256 510 511

768 1022 1023

Size of entries
Width of data accessed Data travels on bitlines 256 bits (32 bytes) in example

1021 1022 1023

address
bitlines

data

CIS 501 (Martin): Caches

Physical Cache Layout

Arrays and h-trees make caches easy to spot in graphs

Caches: Finding Data via Indexing

Basic cache: array of block frames
Example: 32KB cache (1024 frames, 32B blocks) Hash table in hardware Which part? 32-bit address 32B blocks 5 lowest bits locate byte in block These are called offset bits 1024 frames next 10 bits find frame These are the index bits Note: nothing says index must be these bits But these work best (think about why)
[31:15] index [14:5] [4:0] 0 1 2 3

1024* 256bit SRAM

wordlines
1021 1022 1023

To find frame: decode part of address

bitlines

<< data
30

CIS 501 (Martin): Caches

address

Knowing that You Found It: Tags

Each frame can hold one of 217 blocks
All blocks with same index bit pattern
0 1 2 3

Calculating Tag Overhead

32KB cache means cache holds 32KB of data
Called capacity Tag storage is considered overhead

How to know which if any is currently there?

wordlines

To each frame attach tag and valid bit Compare frame tag to address tag bits No need to match index bits (why?)

Tag overhead of 32KB cache with 1024 32B frames

1021 1022 1023

Lookup algorithm
Read frame indicated by index bits Hit if tag matches and valid bit is set Otherwise, a miss. Get data from next level
tag [31:15] index [14:5] [4:0]

= << data hit?

32B frames 5-bit offset 1024 frames 10-bit index 32-bit address 5-bit offset 10-bit index = 17-bit tag (17-bit tag + 1-bit valid)* 1024 frames = 18Kb tags = 2.2KB tags ~6% overhead

What about 64-bit addresses?

Tag increases to 49 bits, ~20% overhead (worst case)
CIS 501 (Martin): Caches 32

CIS 501 (Martin): Caches

address

Handling a Cache Miss

What if requested data isnt in the cache?
How does it get in there?

Cache Performance Equation

For a cache thit
Cache Access: read or write to cache Hit: desired data found in cache Miss: desired data not found in cache Must get from another component No notion of miss in register file Fill: action of placing data into cache %miss (miss-rate): #misses / #accesses thit: time to read data from (write data to) cache tmiss: time to read data into cache

Cache controller: finite state machine

Remembers miss address Accesses next level of memory Waits for response Writes data/tag into proper locations

%miss tmiss

All of this happens on the fill path Sometimes called backside

Performance metric: average access time tavg = thit + (%miss * tmiss)

33 CIS 501 (Martin): Caches 34

CIS 501 (Martin): Caches

CPI Calculation with Cache Misses

Parameters
Simple pipeline with base CPI of 1 Instruction mix: 30% loads/stores I$: %miss = 2%, tmiss = 10 cycles D$: %miss = 10%, tmiss = 10 cycles

Calculations: Book versus Lecture Notes

My calculation equation: latencyavg = latencyhit + %miss * latencymiss_additional The book uses a different equation: latencyavg = (latencyhit * %hit ) + (latencymiss_total * (1 - %hit)) These are actually the same: latencymiss_total = latencymiss_additional + latencyhit %hit = 1 - %miss, so: latencyavg = = (latencyhit * %hit ) + (latencymiss_total * (1 - %hit)) = (latencyhit * (1 - %miss)) + (latencymiss_total * %miss) = latencyhit + latencyhit * (- %miss) + (latencymiss_total * %miss) = latencyhit + (%miss * -1 * (latencyhit - latencymiss_total)) = latencyhit + (%miss * (latencymiss_total - latencyhit)) = latencyhit + (%miss * (latencymiss_total - latencyhit)) = latencyhit + (%miss * latencymiss_additional)
CIS 501 (Martin): Caches 36

What is new CPI?

CPII$ = %missI$*tmiss = 0.02*10 cycles = 0.2 cycle CPID$ = %load/store*%missD$*tmissD$ = 0.3 * 0.1*10 cycles = 0.3 cycle CPInew = CPI + CPII$ + CPID$ = 1+0.2+0.3 = 1.5

CIS 501 (Martin): Caches

Measuring Cache Performance

Ultimate metric is tavg
Cache capacity and circuits roughly determines thit Lower-level memory structures determine tmiss Measure %miss Hardware performance counters Simulation

Cache Examples
4-bit addresses 16B memory
Simpler cache diagrams than 32-bits

8B cache, 2B blocks

tag (1 bit)

index (2 bits)

1 bit

Figure out number of sets: 4 (capacity / block-size) Figure out how address splits into offset/index/tag bits Offset: least-significant log2(block-size) = log2(2) = 1 0000 Index: next log2(number-of-sets) = log2(4) = 2 0000 Tag: rest = 4 1 2 = 1 0000

CIS 501 (Martin): Caches

4-bit Address, 8B Cache, 2B Blocks

0000 0001 0010 0011 0100 0101 0110 0111 1000 1001 1010 1011 1100 1101 1110 1111 A B C D E F G H I J K L M N P Q
39

4-bit Address, 8B Cache, 2B Blocks

1 bit

Main memory

tag (1 bit)

index (2 bits)

0000 0001 0010 0011 0100 0101 0110 0111 1000 1001 1010 1011 1100 1101 1110 1111

A B C D E F G H I J K L M N P Q

Main memory Load: 1110 Miss

tag (1 bit)

index (2 bits)

1 bit

Data Set 00 01 10 11 Tag 0 1

Data Set 00 01 10 11 Tag 0 0 0 0 0 A C E G 1 B D F H

CIS 501 (Martin): Caches

4-bit Address, 8B Cache, 2B Blocks

0000 0001 0010 0011 0100 0101 0110 0111 1000 1001 1010 1011 1100 1101 1110 1111 A B C D E F G H I J K L M N P Q
41

Capacity and Performance

1 bit

Main memory Load: 1110 Miss

tag (1 bit)

index (2 bits)

Simplest way to reduce %miss: increase capacity

Data Set 00 01 10 11 Tag 0 0 0 1 0 A C E P 1 B D F Q

+ Miss rate decreases monotonically Working set: insns/data program is actively using Diminishing returns However thit increases Latency proportional to sqrt(capacity) tavg ?
%miss working set size

Cache Capacity

Given capacity, manipulate %miss by changing organization

CIS 501 (Martin): Caches 42

CIS 501 (Martin): Caches

Block Size
Given capacity, manipulate %miss by changing organization One option: increase block size 512*512bit
Exploit spatial locality Notice index/offset bits change Tag remain the same
SRAM
0 1 2

Block Size and Tag Overhead

Tag overhead of 32KB cache with 1024 32B frames
32B frames 5-bit offset 1024 frames 10-bit index 32-bit address 5-bit offset 10-bit index = 17-bit tag (17-bit tag + 1-bit valid) * 1024 frames = 18Kb tags = 2.2KB tags ~6% overhead

Ramifications
+ Reduce %miss (up to a point) + Reduce tag overhead (why?) Potentially useless data transfer Premature replacement of useful data Fragmentation
[31:15] [14:6]

510 511

Tag overhead of 32KB cache with 512 64B frames

= << data hit?
43

9-bit block size

[5:0]

64B frames 6-bit offset 512 frames 9-bit index 32-bit address 6-bit offset 9-bit index = 17-bit tag (17-bit tag + 1-bit valid) * 512 frames = 9Kb tags = 1.1KB tags + ~3% overhead
CIS 501 (Martin): Caches 44

CIS 501 (Martin): Caches

address

4-bit Address, 8B Cache, 4B Blocks

0000 0001 0010 0011 0100 0101 0110 0111 1000 1001 1010 1011 1100 1101 1110 1111 A B C D E F G H I J K L M N P Q
45

4-bit Address, 8B Cache, 4B Blocks

2 bit

Main memory Load: 1110 Miss

tag (1 bit)

index (1 bits)

0000 0001 0010 0011

A B C D E F G H I J K L M N P Q

Main memory Load: 1110 Miss

tag (1 bit)

index (1 bits)

2 bit

Data Set Tag 0 1 0 0 00 A E 01 B F 10 C G 11 D H

0100 0101 0110 0111 1000 1001 1010 1011 1100 1101 1110 1111

Data Set Tag 0 1 0 1 00 A M 01 B N 10 C P 11 D Q

CIS 501 (Martin): Caches

Effect of Block Size on Miss Rate

Two effects on miss rate
+ Spatial prefetching (good) For blocks with adjacent addresses Turns miss/miss into miss/hit pairs %miss Interference (bad) For blocks with non-adjacent addresses (but in adjacent frames) Turns hits into misses by disallowing simultaneous residence Consider entire cache as one big block

Block Size and Miss Penalty

Does increasing block size increase tmiss?
Dont larger blocks take longer to read, transfer, and fill? They do, but

tmiss of an isolated miss is not affected

Block Size

Critical Word First / Early Restart (CRF/ER) Requested word fetched first, pipeline restarts immediately Remaining words in block transferred/filled in the background

Both effects always present

Spatial prefetching dominates initially Depends on size of the cache Good block size is 16128B Program dependent
CIS 501 (Martin): Caches 47

tmisses of a cluster of misses will suffer

Reads/transfers/fills of two misses cant happen at the same time Latencies can start to pile up This is a bandwidth problem
CIS 501 (Martin): Caches 48

Cache Conflicts
0000 0001 0010 0011 0100 0101 0110 0111 1000 1001 1010 1011 1100 1101 1110 1111 A B C D E F G H I J K L M N P Q

Set-Associativity
tag (1 bit) index (2 bits) 1 bit

Main memory

Set-associativity
Block can reside in one of few frames Frame groups called sets Each frame in set called a way This is 2-way set-associative (SA) 1-way direct-mapped (DM) 1-set fully-associative (FA)

ways
0 1 2 512 513 514

Data Set 00 01 10 11 Tag 0 1

sets
510 511

1022 1023

Pairs like 0010 and 1010 conflict

Same index!

+ Reduces conflicts Increases latencyhit: additional tag match & muxing Note: valid bit not shown

=
9-bit associativity

Can such pairs to simultaneously reside in cache?

A: Yes, if we reorganize cache to do so
49

[31:14]

[13:5]

[4:0]

<< data
50

CIS 501 (Martin): Caches

address

hit?

Set-Associativity
Lookup algorithm
Use index bits to find set Read data/tags in all frames in parallel Any (match and valid bit), Hit Notice tag/index/offset bits Only 9-bit index (versus 10-bit for direct mapped)
9-bit associativity
[31:14] [13:5] [4:0]

Replacement Policies
ways
0 1 2 512 513 514

Set-associative caches present a new design choice

On cache miss, which block in set to replace (kick out)?

sets

Some options
Random FIFO (first-in first-out) LRU (least recently used) Fits with temporal locality, LRU = least likely to be used in future NMRU (not most recently used) An easier to implement approximation of LRU Is LRU for 2-way set-associative caches Beladys: replace block that will be used furthest in future Unachievable optimum Which policy is simulated in previous example?

510 511

1022 1023

<< data
51

CIS 501 (Martin): Caches

address

hit?

CIS 501 (Martin): Caches

LRU and Miss Handling

Add LRU field to each set
Least recently used LRU data is encoded way Hit? update MRU data from memory

4-bit Address, 8B Cache, 2B Blocks, 2-way

0000 0001 0010
0 1 511 512 513 1023

A B C D E F G H I J K L M N P Q

Main memory

tag (2 bit)

index (1 bits)

1 bit

0011 0100 0101 0110 0111

Way 0 Data Set 0 1 Tag 0 1

LRU Tag

Way 1 Data 0 1

LRU bits updated on each access

1000 1001 1010 1011 1100 1101 1110 1111

[31:15]

[14:5]

[4:0]

<< data
53

CIS 501 (Martin): Caches

address

hit?

CIS 501 (Martin): Caches

4-bit Address, 8B Cache, 2B Blocks, 2-way

0000 0001 0010 0011 0100 0101 0110 0111 1000 1001 1010 1011 1100 1101 1110 1111 A B C D E F G H I J K L M N P Q
55

4-bit Address, 8B Cache, 2B Blocks, 2-way

0000 0001 0010 A B C D E F G H I J K L M N P Q
56

Main memory Load: 1110 Miss

tag (2 bit)

index (1 bits)

1 bit

Main memory Load: 1110 Miss

tag (2 bit)

index (1 bits)

1 bit

Way 0 Data Set 0 1 Tag 00 00 0 A C 1 B D

LRU Tag 0 1 01 01

Way 1 Data 0 E G 1 F H

0011 0100 0101 0110 0111 1000 1001 1010 1011 1100 1101 1110 1111

Way 0 Data Set 0 1 Tag 00 00 0 A C 1 B D

LRU Tag 0 0 01 11

Way 1 Data 0 E P 1 F Q

LRU updated on each access (not just misses)

CIS 501 (Martin): Caches

Associativity and Performance

Higher associative caches
+ Have better (lower) %miss Diminishing returns However thit increases The more associative, the slower What about tavg?

%miss

Associativity

Block-size and number of sets should be powers of two

Makes indexing easier (just rip bits out of the address)

3-way set-associativity? No problem

Implementing SetAssociative Caches

57 CIS 501 (Martin): Caches 58

CIS 501 (Martin): Caches

Option#1: Parallel Tag Access

Data and tags actually physically separate
Split into two different memory structures

Option#2: Serial Tag Access

Tag match first, then access only one data block
Advantages: lower power, fewer wires Disadvantages: slower
tag 2-bit index offset

Option#1: read both structures in parallel:

tag 2-bit index offset

2-bit

Four blocks transferred Core Tags

Serial

2-bit 4-bit Data

= = = =

2-bit

Parallel Core Data

= = = =

2-bit

<<
CIS 501 (Martin): Caches

Tags
59 CIS 501 (Martin): Caches

Only one block transferred

data

Best of Both? Way Prediction

Predict way of block
Just a hint Use the index plus some tag bits Table of n-bit entries for 2n associative cache Update on mis-prediction or replacement

High (Full) Associative Caches

How to implement full (or at least high) associativity?
This way is terribly inefficient Matching each tag is needed, but not reading out each tag
no index bits
tag offset

tag

2-bit index

offset

Advantages
Fast Low-power

2-bit

== == ==
4-bit
Way Predictor

== ==

Disadvantage
More misses = = = =
CIS 501 (Martin): Caches

2-bit

addr

data

= hit data

hit
61 CIS 501 (Martin): Caches 62

High (Full) Associative Caches with CAMs

CAM: content addressable memory
Array of words with built-in comparators No separate decoder logic Input is value to match (tag) Generates 1-hot encoding of matching slot

Fully associative cache

Tags as CAM, data as RAM Effective but somewhat expensive But cheaper than any other way Used for high (16-/32-way) associativity No good way to build 1024-way associativity + No real need for it, either

== == ==

== ==

Improving Effectiveness of Memory Hierarchy

hit
data
63 CIS 501 (Martin): Caches 64

CAMs are used elsewhere, too

CIS 501 (Martin): Caches

addr

Classifying Misses: 3C Model

Divide cache misses into three categories
Compulsory (cold): never seen this address before Would miss even in infinite cache Capacity: miss caused because cache is too small Would miss even in fully associative cache Identify? Consecutive accesses to block separated by access to at least N other distinct blocks (N is number of frames in cache) Conflict: miss caused because cache associativity is too low Identify? All other misses (Coherence): miss due to external invalidations Only in shared memory multiprocessors (later)

Miss Rate: ABC

Why do we care about 3C miss model?
So that we know what to do to eliminate misses If you dont have conflict misses, increasing associativity wont help

Associativity Block size

+ Decreases conflict misses Increases latencyhit Increases conflict/capacity misses (fewer frames) + Decreases compulsory/capacity misses (spatial locality) No significant effect on latencyhit + Decreases capacity misses Increases latencyhit

Calculated by multiple simulations

Simulate infinite cache, fully-associative cache, normal cache Subtract to find each count
CIS 501 (Martin): Caches 65

Capacity

CIS 501 (Martin): Caches

Reducing Conflict Misses: Victim Buffer

Conflict misses: not enough associativity
High-associativity is expensive, but also rarely needed 3 blocks mapping to same 2-way set

Overlapping Misses: Lockup Free Cache

Lockup free: allows other accesses while miss is pending
Consider: Load [r1] -> r2; Load [r3] -> r4; Add r2, r4 -> r5 Handle misses in parallel memory-level parallelism Makes sense for Processors that can go ahead despite D$ miss (out-of-order) Implementation: miss status holding register (MSHR) Remember: miss address, chosen frame, requesting instruction When miss returns know where to put block, who to inform Common scenario: hit under miss Handle hits while miss is pending Easy Less common, but common enough: miss under miss A little trickier, but common anyway Requires multiple MSHRs: search to avoid frame conflicts
CIS 501 (Martin): Caches 68

Victim buffer (VB): small fully-associative cache

Sits on I$/D$ miss path Small so very fast (e.g., 8 entries) Blocks kicked out of I$/D$ placed in VB On miss, check VB: hit? Place block back in I$/D$ 8 extra ways, shared among all sets +Only a few sets will need it at any given time + Very effective in practice I$/D$

L2
67

CIS 501 (Martin): Caches

Software Restructuring: Data

Capacity misses: poor spatial or temporal locality
Several code restructuring techniques to improve both Compiler must know that restructuring preserves semantics

Software Restructuring: Data

Loop blocking: temporal locality
Poor code for (k=0; k<NUM_ITERATIONS; k++) for (i=0; i<NUM_ELEMS; i++) X[i] = f(X[i]); // for example Better code Cut array into CACHE_SIZE chunks Run all phases on one chunk, proceed to next chunk for (i=0; i<NUM_ELEMS; i+=CACHE_SIZE) for (k=0; k<NUM_ITERATIONS; k++) for (j=0; j<CACHE_SIZE; j++) X[i+j] = f(X[i+j]); Assumes you know CACHE_SIZE, do you? Loop fusion: similar, but for multiple consecutive loops
CIS 501 (Martin): Caches 70

Loop interchange: spatial locality

Example: row-major matrix: X[i][j] followed by X[i][j+1] Poor code: X[I][j] followed by X[i+1][j] for (j = 0; j<NCOLS; j++) for (i = 0; i<NROWS; i++) sum += X[i][j]; Better code for (i = 0; i<NROWS; i++) for (j = 0; j<NCOLS; j++) sum += X[i][j];
CIS 501 (Martin): Caches 69

Software Restructuring: Code

Compiler an layout code for temporal and spatial locality
If (a) { code1; } else { code2; } code3; But, code2 case never happens (say, error condition)

Prefetching
Prefetching: put blocks in cache proactively/speculatively
Key: anticipate upcoming miss addresses accurately Can do in software or hardware Simple example: next block prefetching Miss on address X anticipate miss on X+block-size +Works for insns: sequential execution I$/D$ +Works for data: arrays Timeliness: initiate prefetches sufficiently in advance Coverage: prefetch for as many misses as possible prefetch logic Accuracy: dont pollute with unnecessary data L2 It evicts useful data
71 CIS 501 (Martin): Caches 72

Better locality

Fewer taken branches, too

CIS 501 (Martin): Caches

Software Prefetching
Use a special prefetch instruction
Tells the hardware to bring in data, doesnt actually read it Just a hint

Hardware Prefetching
What to prefetch?
Use a hardware table to detect strides, common patterns

Inserted by programmer or compiler Example

int tree_add(tree_t* t) { if (t == NULL) return 0; __builtin_prefetch(t->left); return t->val + tree_add(t->right) + tree_add(t->left); }

Stride-based sequential prefetching

Can also do N blocks ahead to hide more latency + Simple, works for sequential things: insns, array data + Works better than doubling the block size

Address-prediction
Needed for non-sequential data: lists, trees, etc. Large table records (miss-addr next-miss-addr) pairs On miss, access table to find out what will miss next Its OK for this table to be large and slow

20% performance improvement for large trees (>1M nodes) But ~15% slowdown for small trees (<1K nodes)

Multiple prefetches bring multiple blocks in parallel

More Memory-level parallelism (MLP)
CIS 501 (Martin): Caches 73

CIS 501 (Martin): Caches

Write Issues
So far we have looked at reading from cache
Instruction fetches, loads

What about writing into cache

Stores, not an issue for instruction caches (why they are simpler)

Several new issues

Tag/data access Write-through vs. write-back Write-allocate vs. write-not-allocate Hiding write miss latency

What About Stores? Handling Cache Writes

CIS 501 (Martin): Caches 75

CIS 501 (Martin): Caches

Tag/Data Access
Reads: read tag and data in parallel
Tag mis-match data is wrong (OK, just stall until good data arrives)

Write Propagation
When to propagate new value to (lower level) memory? Option #1: Write-through: immediately
On hit, update cache Immediately send the write to the next level

Writes: read tag, write data in parallel? No. Why?

Tag mis-match clobbered data (oops) For associative caches, which way was written into?

Option #2: Write-back: when block is replaced

Requires additional dirty bit per block Replace clean block: no extra traffic $ Replace dirty block: extra writeback of block 2 + Writeback-buffer (WBB): 1 Hide latency of writeback (keep off critical path) WBB 4 Step#1: Send fill request to next-level 3 Step#2: While waiting, write dirty block to buffer Next-level-$ Step#3: When new blocks arrives, put it into cache Step#4: Write buffer contents to next-level
77 CIS 501 (Martin): Caches 78

Writes are a pipelined two step (multi-cycle) process

Step 1: match tag Step 2: write to matching way Bypass (with address check) to avoid load stalls May introduce structural hazards

CIS 501 (Martin): Caches

Write Propagation Comparison

Write-through
Requires additional bus bandwidth Consider repeated write hits Next level must handle small writes (1, 2, 4, 8-bytes) + No need for dirty bits in cache + No need to handle writeback operations Simplifies miss handling (no write-back buffer) Sometimes used for L1 caches (for example, by IBM)

Write Miss Handling

How is a write miss actually handled? Write-allocate: fill block from next level, then write it
+ Decreases read misses (next read to block will hit) Requires additional bandwidth Commonly used (especially with write-back caches)

Write-back
+ Key advantage: uses less bandwidth Reverse of other pros/cons above Used by Intel, AMD, and ARM Second-level and beyond are generally write-back caches
CIS 501 (Martin): Caches 79

Write-non-allocate: just write to next level, no allocate

Potentially more read misses + Uses less bandwidth Use with write-through

CIS 501 (Martin): Caches

Write Misses and Store Buffers

Read miss?
Load cant go on without the data, it must stall

Write miss?
Technically, no instruction is waiting for data, why stall?

Processor SB

Store buffer: a small buffer

Stores put address/value to store buffer, keep going Store buffer writes stores to D$ in the background Loads must search store buffer (in addition to D$) + Eliminates stalls on write misses (mostly) WBB Creates some problems (later)

Cache

Cache Hierarchies
Next-level cache
81 CIS 501 (Martin): Caches 82

Store buffer vs. writeback-buffer

Store buffer: in front of D$, for hiding store misses Writeback buffer: behind D$, for hiding writebacks
CIS 501 (Martin): Caches

Designing a Cache Hierarchy

For any memory component: thit vs. %miss tradeoff Upper components (I$, D$) emphasize low thit
Frequent access thit important tmiss is not bad %miss less important Lower capacity and lower associativity (to reduce thit) Small-medium block-size (to reduce conflicts)

Memory Hierarchy Parameters

Parameter thit tmiss Capacity Block size Associativity I$/D$ 2ns 10ns 8KB64KB 16B64B 14 L2 10ns 30ns 256KB8MB 32B128B 416 L3 30ns 100ns 216MB 32B-256B 4-16 Main Memory 100ns 10ms (10M ns) 1-4GBs NA NA

Moving down (L2, L3) emphasis turns to %miss

Some other design parameters

Split vs. unified insns/data Inclusion vs. exclusion vs. nothing

Infrequent access thit less important tmiss is bad %miss important Higher capacity, associativity, and block size (to reduce %miss)

CIS 501 (Martin): Caches

Split vs. Unified Caches

Split I$/D$: insns and data in different caches
To minimize structural hazards and thit Larger unified I$/D$ would be slow, 2nd port even slower Optimize I$ to fetch multiple instructions, no writes Why is 486 I/D$ unified?

Hierarchy: Inclusion versus Exclusion

Inclusion
Bring block from memory into L2 then L1 A block in the L1 is always in the L2 If block evicted from L2, must also evict it from L1 Why? more on this when we talk about multicore

Exclusion Unified L2, L3: insns and data together

To minimize %miss + Fewer capacity misses: unused insn capacity can be used for data More conflict misses: insn/data conflicts A much smaller effect in large caches Insn/data structural hazards are rare: simultaneous I$/D$ miss Go even further: unify L2, L3 of multiple cores in a multi-core
CIS 501 (Martin): Caches 85

Bring block from memory into L1 but not L2 Move block to L2 on L1 eviction L2 becomes a large victim cache Block is either in L1 or L2 (never both) Good if L2 is small relative to L1 Example: AMDs Duron 64KB L1s, 64KB L2

Non-inclusion
CIS 501 (Martin): Caches

No guarantees

Memory Performance Equation

CPU

Hierarchy Performance
CPU tavg = tavg-M1 M1 M2 tmiss-M2 = tavg-M3 M3 tmiss-M3 = tavg-M4 M4
87 CIS 501 (Martin): Caches 88

For memory component M thit

Access: read or write to M Hit: desired data found in M Miss: desired data not found in M Must get from another (slower) component Fill: action of placing data in M %miss (miss-rate): #misses / #accesses thit: time to read data from (write data to) M tmiss: time to read data into M

tmiss-M1 = tavg-M2

%miss tmiss

tavg tavg-M1 thit-M1 + thit-M1 + thit-M1 + thit-M1 +

(%miss-M1tmiss-M1) (%miss-M1tavg-M2) (%miss-M1(thit-M2 + (%miss-M2tmiss-M2))) (%miss-M1* (thit-M2 + (%miss-M2*tavg-M3)))

Performance metric
tavg: average access time
CIS 501 (Martin): Caches

tavg = thit + (%miss * tmiss)

Recall: Performance Calculation

Parameters
Base pipeline CPI = 1 In this case, already incorporates thit Instruction mix: 30% loads/stores I$: %miss = 2% of accesses, tmiss = 10 cycles D$: %miss = 10% of accesses, tmiss = 10 cycles

Performance Calculation (Revisited)

Parameters
Base pipeline CPI = 1 In this case, already incorporates thit I$: %miss = 2% of instructions, tmiss = 10 cycles D$: %miss = 3% of instructions, tmiss = 10 cycles

What is new CPI?

CPII$ = %missI$*tmiss = 0.02*10 cycles = 0.2 cycle CPID$ = %memory*%missD$*tmissD$ = 0.30*0.10*10 cycles = 0.3 cycle CPInew = CPI + CPII$ + CPID$ = 1+0.2+0.3= 1.5

What is new CPI?

CPII$ = %missI$*tmiss = 0.02*10 cycles = 0.2 cycle CPID$ = %missD$*tmissD$ = 0.03*10 cycles = 0.3 cycle CPInew = CPI + CPII$ + CPID$ = 1+0.2+0.3= 1.5

CIS 501 (Martin): Caches

Miss Rates: per access vs instruction

Miss rates can be expressed two ways:
Misses per instruction (or instructions per miss), -or Misses per cache access (or accesses per miss)

Multilevel Performance Calculation

Parameters
30% of instructions are memory operations L1: thit = 1 cycles (included in CPI of 1), %miss = 5% of accesses L2: thit = 10 cycles, %miss = 20% of L2 accesses Main memory: thit = 50 cycles

For first-level caches, use instruction mix to convert

If memory ops are 1/3rd of instructions.. 2% of instructions miss (1 in 50) is 6% of accesses miss (1 in 17)

Calculate CPI
CPI = 1 + 30% * 5% * tmissD$ tmissD$ = tavgL2 = thitL2+(%missL2*thitMem )= 10 + (20%*50) = 20 cycles Thus, CPI = 1 + 30% * 5% * 20 = 1.3 CPI

What about second-level caches?

Misses per instruction still straight-forward (global miss rate) Misses per access is trickier (local miss rate) Depends on number of accesses (which depends on L1 rate)
CIS 501 (Martin): Caches 91

Alternate CPI calculation:

What % of instructions miss in L1 cache? 30%*5% = 1.5% What % of instructions miss in L2 cache? 20%*1.5% = 0.3% of insn CPI = 1 + (1.5% * 10) + (0.3% * 50) = 1 + 0.15 + 0.15 = 1.3 CPI
CIS 501 (Martin): Caches 92

Summary
Average access time of a memory component Memory hierarchy
latencyavg = latencyhit + %miss * latencymiss Hard to get low latencyhit and %miss in one structure hierarchy Cache (SRAM) memory (DRAM) virtual memory (Disk) Smaller, faster, more expensive bigger, slower, cheaper

Cache ABCs (capacity, associativity, block size)

3C miss model: compulsory, capacity, conflict

Performance optimizations Write issues

%miss: prefetching latencymiss: victim buffer, critical-word-first, lockup-free design Write-back vs. write-through/write-allocate vs. write-no-allocate

CIS 501 (Martin): Caches

DSTN Merged - CSI ZC447 ES ZC447IS ZC447SS ZC447 - CS-1&2
No ratings yet
DSTN Merged - CSI ZC447 ES ZC447IS ZC447SS ZC447 - CS-1&2
108 pages
Investment Management With SAP ERP-FB
No ratings yet
Investment Management With SAP ERP-FB
349 pages
Summer Internship Report1
0% (1)
Summer Internship Report1
17 pages
Ddca 2024 Lecture24 Memory Hierarchy and Caches Beforelecture
No ratings yet
Ddca 2024 Lecture24 Memory Hierarchy and Caches Beforelecture
304 pages
All CSC 417 Note
No ratings yet
All CSC 417 Note
238 pages
CSE2213 Lecture 8 Chapter5 the-Memory-System
No ratings yet
CSE2213 Lecture 8 Chapter5 the-Memory-System
48 pages
11 Caches
No ratings yet
11 Caches
87 pages
Link Full (SFILE - MOBI) - 1
50% (2)
Link Full (SFILE - MOBI) - 1
15 pages
Redux: #React Notes
No ratings yet
Redux: #React Notes
24 pages
Memory & I/O Systems: Unit Iv
No ratings yet
Memory & I/O Systems: Unit Iv
38 pages
Ca Ut5
No ratings yet
Ca Ut5
54 pages
Lecture 3 (Memory Hierarchy and Caches)
No ratings yet
Lecture 3 (Memory Hierarchy and Caches)
88 pages
Chapter 3 Lecture 1
No ratings yet
Chapter 3 Lecture 1
54 pages
CS5204/EE5364 - Advanced Computer Architecture - Memory
No ratings yet
CS5204/EE5364 - Advanced Computer Architecture - Memory
67 pages
Unit 2
No ratings yet
Unit 2
68 pages
EE6304 Lecture8 Mem Hierarchy
No ratings yet
EE6304 Lecture8 Mem Hierarchy
54 pages
Week 10
No ratings yet
Week 10
59 pages
Chapter 4 - Memory Organization
No ratings yet
Chapter 4 - Memory Organization
35 pages
Lecture Slides On Instruction Set Architecture and Memory Design
No ratings yet
Lecture Slides On Instruction Set Architecture and Memory Design
23 pages
Jeppiaar Institute of Technology: Department OF Computer Science and Engineering
No ratings yet
Jeppiaar Institute of Technology: Department OF Computer Science and Engineering
35 pages
Coa Unit Iv
No ratings yet
Coa Unit Iv
147 pages
BCS 302 (Unit 4)
No ratings yet
BCS 302 (Unit 4)
45 pages
L05 Memory
No ratings yet
L05 Memory
45 pages
CH10 - Memory Hierarchy
No ratings yet
CH10 - Memory Hierarchy
106 pages
Chapter 4 - Computer Memory System
No ratings yet
Chapter 4 - Computer Memory System
53 pages
Memory
No ratings yet
Memory
20 pages
Introduction To Computer Part 1
No ratings yet
Introduction To Computer Part 1
56 pages
L06 Memory
No ratings yet
L06 Memory
37 pages
Computer Memory System
No ratings yet
Computer Memory System
22 pages
Unit 5 COA
No ratings yet
Unit 5 COA
34 pages
Memory System
No ratings yet
Memory System
70 pages
Lecture 5
No ratings yet
Lecture 5
53 pages
Week 12 - Lecture 12 - Memory
No ratings yet
Week 12 - Lecture 12 - Memory
27 pages
Rslogix 5000 Training Seminar: Programming in Ladder Logic With Rockwell'S Rs 5000
No ratings yet
Rslogix 5000 Training Seminar: Programming in Ladder Logic With Rockwell'S Rs 5000
10 pages
Memory: Computer Architecture and Assembly Language
No ratings yet
Memory: Computer Architecture and Assembly Language
15 pages
Memory Design
No ratings yet
Memory Design
36 pages
Coa 4
No ratings yet
Coa 4
34 pages
COA MODULE - Memory Organization
No ratings yet
COA MODULE - Memory Organization
43 pages
SWR302
No ratings yet
SWR302
287 pages
Chapter 7
No ratings yet
Chapter 7
43 pages
Mc9211unit 5 PDF
No ratings yet
Mc9211unit 5 PDF
89 pages
CS6461 - Computer Architecture Fall 2016 Morris Lancaster - Memory Systems
No ratings yet
CS6461 - Computer Architecture Fall 2016 Morris Lancaster - Memory Systems
66 pages
Selection Tool PDF
No ratings yet
Selection Tool PDF
63 pages
Chapter 3 P1
No ratings yet
Chapter 3 P1
57 pages
Lecture 10: Memory System - Memory Technology: CSE 564 Computer Architecture Summer 2017
No ratings yet
Lecture 10: Memory System - Memory Technology: CSE 564 Computer Architecture Summer 2017
44 pages
110029
No ratings yet
110029
30 pages
Onur 740 Fall11 Lecture25 Mainmemory
No ratings yet
Onur 740 Fall11 Lecture25 Mainmemory
50 pages
COA MODULE 5 PART 1 Edd
No ratings yet
COA MODULE 5 PART 1 Edd
11 pages
Parallel Processing Chapter - 4: Memory Hierarchies
No ratings yet
Parallel Processing Chapter - 4: Memory Hierarchies
53 pages
Cache Memory: 13 March 2013
No ratings yet
Cache Memory: 13 March 2013
80 pages
Computer Organization and Architecture Chapter 7 Large and Fast Exploiting
No ratings yet
Computer Organization and Architecture Chapter 7 Large and Fast Exploiting
32 pages
Memory Organization
No ratings yet
Memory Organization
24 pages
Guide Wire Documentation
No ratings yet
Guide Wire Documentation
47 pages
The Memory System: Deepak John, Department Computer Applications, SJCET-Pala
No ratings yet
The Memory System: Deepak John, Department Computer Applications, SJCET-Pala
63 pages
Lecture Two
No ratings yet
Lecture Two
35 pages
COA Chapter 4
No ratings yet
COA Chapter 4
11 pages
Computer Science I Essentials
From Everand
Computer Science I Essentials
Randall Raus
5/5 (7)
Chapter - 3 Memory Systems
No ratings yet
Chapter - 3 Memory Systems
78 pages
CS 152 Computer Architecture and Engineering Lecture 6 - Memory
No ratings yet
CS 152 Computer Architecture and Engineering Lecture 6 - Memory
29 pages
Memory Subsystem: Dr. Gayathri Sivakumar Assistant Professor (SG-I) School of Electronics VIT, Chennai
No ratings yet
Memory Subsystem: Dr. Gayathri Sivakumar Assistant Professor (SG-I) School of Electronics VIT, Chennai
16 pages
OpenBSD Mastery: Filesystems: IT Mastery, #19
From Everand
OpenBSD Mastery: Filesystems: IT Mastery, #19
Michael W. Lucas
No ratings yet
Associative Memory
No ratings yet
Associative Memory
31 pages
Unit 3 OF ESD
No ratings yet
Unit 3 OF ESD
22 pages
FreeBSD Mastery: Storage Essentials: IT Mastery, #4
From Everand
FreeBSD Mastery: Storage Essentials: IT Mastery, #4
Michael W. Lucas
No ratings yet
Memory Hierarchy: - Memory Flavors - Principle of Locality - Program Traces - Memory Hierarchies - Associativity
No ratings yet
Memory Hierarchy: - Memory Flavors - Principle of Locality - Program Traces - Memory Hierarchies - Associativity
26 pages
CSE243: Introduction To Computer Architecture and Hardware/Software Interface
No ratings yet
CSE243: Introduction To Computer Architecture and Hardware/Software Interface
25 pages
List of All DOS Command
No ratings yet
List of All DOS Command
9 pages
Cache Memory: Computer Organization and Architecture Characteristics of Memory Systems
No ratings yet
Cache Memory: Computer Organization and Architecture Characteristics of Memory Systems
16 pages
Lecture 3 Gis
No ratings yet
Lecture 3 Gis
20 pages
Chapter 4 Csa Summary
No ratings yet
Chapter 4 Csa Summary
8 pages
TM PTE 01 - Software Update
No ratings yet
TM PTE 01 - Software Update
18 pages
Lesson 4 - Looping Structure
No ratings yet
Lesson 4 - Looping Structure
28 pages
String in C++
No ratings yet
String in C++
18 pages
Dispute Management
No ratings yet
Dispute Management
3 pages
Analysis and Design of Enterprise Resource Planning (Erp) System For Small and Medium Enterprises (Smes) in The Sales Business Function Area
No ratings yet
Analysis and Design of Enterprise Resource Planning (Erp) System For Small and Medium Enterprises (Smes) in The Sales Business Function Area
7 pages
CS 465 Module Five Full Stack Guide
No ratings yet
CS 465 Module Five Full Stack Guide
18 pages
Breakout Board User Manual PDF
No ratings yet
Breakout Board User Manual PDF
15 pages
Introduction To Interactive Content
No ratings yet
Introduction To Interactive Content
8 pages
PGW Post
No ratings yet
PGW Post
9 pages
MCA Rtu Syllabuss
No ratings yet
MCA Rtu Syllabuss
6 pages
VIVIA
No ratings yet
VIVIA
30 pages
Difference Between SAP Memory and ABAP Memory: Answers 1
No ratings yet
Difference Between SAP Memory and ABAP Memory: Answers 1
2 pages
Nightingale 06
No ratings yet
Nightingale 06
14 pages
Ip Address: 100.100.1.100 Subnet Mask:255.255.255.0 Default Gateway: 100.100.1.1
No ratings yet
Ip Address: 100.100.1.100 Subnet Mask:255.255.255.0 Default Gateway: 100.100.1.1
1 page
Lora Based Smart Irrigation System1
No ratings yet
Lora Based Smart Irrigation System1
2 pages
Porting J2ME Apps To Nokia X Using J2ME Android Bridge
No ratings yet
Porting J2ME Apps To Nokia X Using J2ME Android Bridge
11 pages
CS Simulation Manual
No ratings yet
CS Simulation Manual
11 pages
Assignment MITO 8107 05.10.20
No ratings yet
Assignment MITO 8107 05.10.20
9 pages
Network Design and Workstation Specifications
No ratings yet
Network Design and Workstation Specifications
3 pages
Digital Forensic: Project Phase 2
No ratings yet
Digital Forensic: Project Phase 2
7 pages