0% found this document useful (0 votes)

18 views5 pages

Exploring Cache Coherency Design

Uploaded by

corganhuang

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

18 views5 pages

Exploring Cache Coherency Design

Uploaded by

corganhuang

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

International Journal of Engineering Research & Technology (IJERT)

ISSN: 2278-0181
Vol. 4 Issue 08, August-2015

Exploring Cache Coherency Design for Chip

Multiprocessor using Multi2Sim

Vinh Ngo Quang and Hao Do Trang Hoang and Thanh Vu D.

IC Design Research and Education Center, University of Technology,
VNUHCM VNUHCM
Ho Chi Minh city, Vietnamese Ho Chi Minh city, Vietnamese

Abstract—Memory hierarchy design plays an important role the address range of the memory banks and we show that, by
in improving the performance of chip multiprocessor (CMP). simulation, the performance could improve up to 13,5% in
The reason is that the performance of a CMP is strongly comparison with a baseline model that arranges the memory
affected by the latency of fetching data from the memory address linearly. The experiment is carried out with
system. Several organizations of the memory hierarchy have
Multi2Sim [5], an open source simulator for heterogeneous
been explored to optimize this latency. In the memory hierarchy,
data traversing is based on a cache coherence protocol which is multiprocessor design. Splash-2 benchmark [17] is used as
the skeleton of the CMP's memory system. In this paper, we the workload in the experiment. In the experiment result, we
concentrate on exploring MOESI, a well-defined and popular focus on analyzing the cache miss latency because this
cache coherence protocol in CMP. Our experiment is based on parameter not only can determine the efficiency of the
Splash-2 benchmark which is widely used in every publication MOESI but also strongly affects the CMP performance. Our
regarding CMP design. The experiment results show that by contribution in this work is (1) the statistical result of the
rearrange the address range of the memory banks, L2 hit ratio CMP performance using MOESI protocol, (2) demonstration
could be improved up to 13,5 %. of the performance improvement by rearranging the address
Keywords—Chip Multiprocessor; Memory Hierarchy;
range of the memory banks.
Coherence Protocol; MOESI; Memory Bank The paper is structured as follows. Section II surveys some
of the latest work on CMP’s cache organizations and
coherence protocols. Section III presents the experiment
I. INTRODUCTION
method. Section IV gives and explains the result. And section
Nowadays, chip multiprocessor (CMP) is the main trend in V finalizes the paper.
designing the CPU for high performance devices. This
originates from the fact that the single core chip reaches the II. RELATED WORK
limitation of execution speed because of the heat and power Several works have been carried out to improve the CMP’s
dissipation issues. Moreover, modern technologies support performance by optimizing the on chip memory hierarchy.
millions of transistors to be integrated in one chip which There are different aspects to look at in the memory hierarchy
eases such as: the shared or private last level cache model, the
the design of multicore on chip in terms of area. In fact, cache coherence protocol, on-chip interconnection and so
several CMPS have been commercialized in the market [1], forth. While L1 cache is always private to the processor core,
[2], [3], [4]. the L2 cache can be designed to be private or shared. Many
In CMP chip, memory hierarchy design is a concern that research papers exploit the possibilities in designing the L2
takes a lot of effort of researchers. The memory organization, cache [7]. Shared last level cache has an advantage in
but not the CPU core, is the bottleneck in CMP design. Most comparison with private cache is that it dynamically allocates
of the memory systems have multiple levels of cache the overall cache space for all cores on chip. Thus, the last
hierarchy. For instance, Intel’s commercial Ivy Bridge has 3 level cache space is better utilized and its miss ratio is
levels of cache. These cache levels and the main memory therefore reduced. Shared last level cache (LLC), can be
need a coherence protocol in order to keep the memory physically centralized or distributed with respect to the
consistent. For example, a L2 cache must ensure data processor cores. In the first designs of CMP, researchers
consistency among L1 caches. Moreover, a good coherence proposed the shared LLC organization to have uniform cache
protocol also helps the CMP to improve the performance in access time (UCA). Even though the UCA is simple for
terms of memory access latency. To the best of our designing, it was soon replaced by non-uniform cache access
knowledge, MOESI [6] is the most widely used cache (NUCA) techniques. With NUCA, nearer cache banks will
coherence protocol in CMP. In this paper, we first evaluate provide lower access latencies than further banks with respect
the CMP performance by using MOESI protocol. Then, by to the requesting core. NUCA was first proposed in [9]. The
observing that the instruction and data cache have different complexity of the interconnection network is the price to pay
ways of access model, we concentrate on optimizing the last as using NUCA. The latency for a bank access depends on
level cache memory to adapt the difference and get better its size and the network hops distance between the requesting
performance in terms of L2 hit ratio. Our idea is to interleave core and the bank. Kim et al. [9] investigated a model with a

IJERTV4IS080547 www.ijert.org 775

(This work is licensed under a Creative Commons Attribution 4.0 International License.)
International Journal of Engineering Research & Technology (IJERT)
ISSN: 2278-0181
Vol. 4 Issue 08, August-2015

single core with a large L2 cache divided into multiple banks. evel cache/memory. Besides, when a block in state E is
They argued that a highly-banked cache structure with evicted, it doesn’t need to write back the data because the
distributed cache controller is desirable in reducing the cache data is clean. Based on research papers, it seems that MOESI
access latency. Two main techniques for NUCA were is better in terms of performance comparing with MESI,
proposed in their paper are static UCA (S-NUCA) and MOSI. Anyway, there is no evidence that prove it.
dynamic UCA (D-NUCA). In S-NUCA, data is statically Depends on the interconnection, there are two major cache
mapped into banks with the least significant bits determining coherence protocols: bus-based and directory-based.
the bank. S-NUCA is proven to have more advantages than Directory-based is desirable for big number of cores on chip
UCA in [9] because of two reasons. Firstly, the banks have using scalable interconnection such as mesh. The directory-
non-uniform access times and thus accesses to the nearer based protocol breaks the broadcast coherence message as in
bank to the requesting core incur lower latency. Secondly, bus-based into point-to-point messages that only involve
different banks can be accessed appropriate nodes in the interconnection network. In more
simultaneously which helps to reduce the contention. The D- detail, the sharing information is logically centralized into a
NUCA further improves the performance of S-NUCA by directory. The directory is usually co-located with the data
dynamically mapping data into different LLC banks. In other block in the memory and each of its entry corresponds to one
words, frequently accessed data are placed in closer banks memory address. The entry keeps essential information to
while less used data are cached in farther banks. The D- track the memory block’s current sharers and their read/write
NUCA leads to the data management policies issues. The key privileges. The directory-based mechanism dramatically
issues in data management for D-NUCA are (1) how the data reduces the coherence traffic in comparison with bus-based.
are mapped to banks and which banks a specific data can be Moreover, it allows coherence messages to traverse over the
reside, (2) how to search a cache line as quick as possible in a dedicated and fast channels on the interconnection network
large cache with multiple banks and (3) how and when to rather than a single shared bus as in the bus-based
migrate the data between banks of the cache. D-NUCA is mechanism. In this paper, we choose the MOESI and
widely exploited in several researches [10], [11], [12]. To the directory-based coherence protocol for the sake of the
best of our knowledge, there is no research paper that refers performance and the scalability [8].
the way in which the entire memory address range is divided The simulator used in this research is Multi2Sim. It models
into different memory banks. In this paper we show that the an event-driven memory hierarchy which uses MOESI as the
L2 hit ratio can be improved up to 13,5% by interleaving the coherence protocol between caches from different processor
memory address range between different memory banks. To cores. It also supports multi-level cache organization as well
ensure a consistent view of memory between all processor as directories for caches and main memory. The simulator is
cores, a cache coherence protocol needs to be implemented in written purely in C language which is more simple to read
CMP. Two main parts of a coherence mechanism are: (1) a and modify the code in comparison with GEM5 [16]. Other
storage that holds the data sharing information and (2) a set well known simulators for this research are CACTI [14] and
of protocols to keep the consistency of the data using the Simics [15]. But CACTI solely builds the cache model which
information in (1). One essential information of the data costs users more effort to run the simulator with benchmarks
sharing is their status. The status of the cached copies of any while Simics is mainly for commercial usage.
data block is usually kept by attaching the state to each cache
data block. The minimum states that a coherence protocol III. METHODOLOGY
must have are: (1) the invalid (I) state which indicates that the We use Multi2sim to run and simulate the operation of CMP,
cache block is not holding the valid data; (2) the shared (S) focusing on memory hierarchy. Multi2sim is a simulation
means that the cache block is shared by one or more other framework written in C for heterogeneous computing. It
processor caches in the system, it also means that this block provides an easy way to design, configure, launch the CMP
can only be read from (not written to) and it is holding the for research purpose on heterogeneous system. This
same data value with the memory; (3) the modified (M) to framework allows creating the CMP by using INI files. We
signify that the block is uniquely held. If a cache block is in can intercede to the main memory, cache, internal network
M state, it must be written back to the memory before being from these files. In the first experiment, we use INI files such
evicted. This simple coherence protocol is therefore named as as memory configuration and network configuration to
three-state MSI protocol. generate a CMP given in Fig. 1.
More sophisticated protocols employed more cache block
states to reduce the coherence traffic and the latency of
fetching a data block. Some popular protocols are MESI,
MOSI, MOESI [13]. MOESI is considered to be the most
complex protocol which encompasses all the possible states
commonly used in other protocols. The O is added to
describe a dirty and shared block. This state helps to reduce
the coherence traffic because a block in M state doesn’t need
to write back the data when it receives a read request and just
changes to O state instead. On the other hand, state E is
implemented to signify a clean and exclusive block. A block
Fig. 1. The CMP architecture used for experiment.
can change to M state without the need of notifying the lower

IJERTV4IS080547 www.ijert.org 776

(This work is licensed under a Creative Commons Attribution 4.0 International License.)
International Journal of Engineering Research & Technology (IJERT)
ISSN: 2278-0181
Vol. 4 Issue 08, August-2015

The CMP composes 4 processor cores, 4 L1 cache results because L1 cache contains the most likely to used
modules, 2 L2 cache modules and a main memory module. data and instructions. L2 hit ratio is less than L1 but also
Each L2 cache is shared for 2 L1 cache modules, but is acceptable because L2 is one level lower and processor core
private from the main memory viewpoint. We also create 4 can proceed with other processing tasks that do not need to
threads for our four-core CMP. The detail configuration of wait for the data from L2. Besides, in this configuration, L2
this system is shown in TABLE I. effective size is limited by its private characteristic. With this
result, we can agree that Multi2sim simulator and the MOESI
protocol work correctly. More importantly, we divide the
main memory into 4 equal banks. Each bank preserves a
continuously range of memory address. This method of
separating bank is considered as the baseline for the next
experiment which shows the improvement of L2 hit ratio by
interleaving the bank’s address range.
In the second experiment, the main memory in this model
is separated into 4 banks by interleaving the memory address
range. There are 32 bits to index memory bytes, but we do
not use all of them. When using interleaving to divide the
main memory, a memory bank is a set of equal smaller range,
if the interleaving ranges are too small, it’s not efficient for
memory access pattern because of its locality of reference.
Thus, we choose the smallest range is 1 KB, this means that
we have 22 bits to index a range. On the other hand, we have
4 memory banks, and also, we need to employ a pair of bits
to locate the memory bank which contains the block. In
TABLE I. Configuration of CMP in detail theory, we can use any pair for this task, but we use 2
consecutive bits within the 22 index bits for our experiment.
We use a ring network to connect all modules of main This approach, which is called interleaved address memory
memory. This helps L2 private caches can access directly to bank, helps us to spread the data and instructions into every
every memory banks in the ring. The MOESI protocol is used memory bank. Fig. 3 illustrates the cases of dividing memory
in this CMP design to keep data consistent between L1 address into banks. There are 21 pairs of consecutive bits in
caches of processor cores. 22 index bits. So that we can divide the main memory by 21
The experiment is run with Splash-2 benchmark. This difference ways based on these pairs of bits. Our purpose,
benchmark contains 11 applications that solve 11 computing finally, is identifying which pair is the best choice, and how
problems such as: N-Body, Cholesky factorization, FFT, etc. much it improves the hit ratio in L2. Firstly, We run the
These are the most classical problems in parallel computing model in Fig. 1 for 21 cases using a benchmark from Splash-
theory. As we know, the main application field of CMP is 2 and get the L2 hit ratio of each case. We randomly choose
solving the complexity problem by parallel computing. That the Sparse Cholesky Factorization problem. After getting the
is the reason we choose this benchmark to evaluate MOESI result, we determine the best pair for interleaving. Next, we
protocol in CMP performance. We run 11 applications and run all remaining benchmarks in Splash-2 with that pair and
record the L1 and L2 hit ratio. The result is reported in Fig. 2. compare the result
As we can see, the average hit ratio of L1 is very high, over
98% for L1-Data and 99% for L1-Instruction. When it comes
to L2, this number is lower, about 77%. These are relevant

Fig. 2. The average of hit ratio in L1 and L2 cache.

Fig. 3. Memory partitions using difference pair of bits

IJERTV4IS080547 www.ijert.org 777

(This work is licensed under a Creative Commons Attribution 4.0 International License.)
International Journal of Engineering Research & Technology (IJERT)
ISSN: 2278-0181
Vol. 4 Issue 08, August-2015

Fig. 4. The hit ratios in L2 based on using difference pairs of bits to partition
main memory when launching Cholesky factorization

Fig. 5. The hit ratios in L2 when using interleaving 256 KB with base
with the based line. After that, we compute the improvement line in comparison
when using interleaved banks versus linear range banks
model. This design is an inclusive cache design so that the In Fig 5, the above line present the hit ratio when using
directories in 4 main memory banks limit the number of interleave 256 KB, the other is base line. The average
cache blocks in all cache modules level 1 and level 2. If the improvement is 7% and up to 13.5% in the best case. So, we
directories are full, cache blocks in the directories are evicted can make the conclusion for these experiments that using
and therefore the corresponding blocks in L1 and L2 caches interleaving method is the good choice.
are also evicted. Thus, we predict that dividing the main
memory banks into 4 linear address range banks should give V. CONCLUSION
the lower performance than that when interleaving the The paper explores the MOESI protocol used in CMP. Based
address range into banks. on the different characteristic of instruction and data cache
IV. EXPERIMENT RESULTS block, we reorganize the memory bank by interleaving the
address range between the banks. And the experiment result
Fig. 4 shows 2 important information after running Cholesky shows that by interleaving 256 KB between 4 memory banks,
factorization. We see that the hit ratio interestingly reaches the L2 hit ration of the CMP improves up to 13,5% compared
the highest value when we use the ninth pair of bits for with the baseline model in which the memory address is
partitioning. Besides, when the interleaving size is too large, divided into 4 continuous address ranges. This result should
hit ratio is a constant. be applicable for other different workloads. Depends on the
First, we present why the hit ratio in L2 is a constant when size of the workload, the interleaving coefficient might
the size of interleave is large. Because the size of the change
application is limited, when the pair of bits is high, or the accordingly to achieve the best L2 hit ratio.
range is large enough, the application code fits within one
bank. Actually, the hit ratio is not a constant due to some ACKNOWLEDGMENT
objective reason such as hardware, other programs ..., but it This research was funded by Vietnam National University -
fluctuates around a number with an amplitude. This Hochiminh city under grant number C2015-40-01.
amplitude
is very small, so we can not recognize its appearance. REFERENCES
Secondly, the hit ratio reaches the maximum value when [1] Chen, Thomas, et al., “Cell broadband engine architecture and
we use the ninth pair of bits corresponding with the range of its first implementation - a performance view,” IBM Journal of
each interleaved bank is 256 KB. The reason for this it that
Research and Development 51.5 (2007): 559-572.
L2 is a unified cache for both instruction and data. Besides,
[2] George, Varghese, T. Piazza, and H. Jiang., “Technology
CMP is running 4 threads in parallel. By interleaving the
Insight: Intel Next Generation Microarchitecture Codename Ivy
memory address range, the instruction and data of
Bridge,” (2011).
applications are distributed equally into 4 directories of main
memory banks. This mechanism of memory banking should [3] Conway, Pat, et al., “Cache hierarchy and memory subsystem
help the directories efficiently store the cache block with of the AMD Opteron processor,” IEEE micro 30.2 (2010): 16-
regarding its limited capacity. Moreover, when a cache block 29.
in the directory is evicted, it is high potential that the block [4] P. Kongetira, K. Aingaran, and K. Olukotun, “Niagara: A 32-
will not be accessed again in upper level caches. way multi- threaded spark processor,” IEEE Micro, vol. 25, pp.
21-29, March 2005.
[5] Ubal, Rafael, et al. ”Multi2Sim: a simulation framework for
CPU-GPU computing.” Proceedings of the 21st international
conference on Parallel architectures and compilation
techniques. ACM, 2012.

IJERTV4IS080547 www.ijert.org 778

(This work is licensed under a Creative Commons Attribution 4.0 International License.)
International Journal of Engineering Research & Technology (IJERT)
ISSN: 2278-0181
Vol. 4 Issue 08, August-2015

[6] Milo M. K. Martin. Token Coherence, Ph.D. Dissertation. Dec.

2003.
[7] Balasubramonian, Rajeev, Norman P. Jouppi, and Naveen
Muralimanohar. “Multi-core cache hierarchies,” Synthesis
Lectures on Computer Architecture 6.3 (2011): 1-153.
[8] Martin, Milo MK, Mark D. Hill, and Daniel J. Sorin. “Why on-
chip cache coherence is here to stay.” Communications of the
ACM 55.7 (2012): 78-89.
[9] Kim, Changkyu, Doug Burger, and Stephen W. Keckler,“An
adaptive, non-uniform cache structure for wire-delay
dominated on-chip caches,”Acm Sigplan Notices. Vol. 37. No.
10. ACM, 2002.
[10] Chang, Jichuan, and Gurindar S. Sohi, “Cooperative caching
for chip multiprocessors”, Vol. 34. No. 2. IEEE Computer
Society, 2006.
[11] Zhang, Michael, and Krste Asanovic, “Victim replication:
Maximizingcapacity while hiding wire delay in tiled chip
multiprocessors,” ACMSIGARCH Computer Architecture
News. Vol. 33. No. 2. IEEE Computer Society, 2005.
[12] Beckmann, Bradford M., and David A. Wood, “Managing wire
delay inlarge chip-multiprocessor caches,” Microarchitecture,
2004. MICRO-37 2004. 37th International Symposium on.
IEEE, 2004.
[13] Sorin, Daniel J., Mark D. Hill, and David A. Wood, “A primer
on memory consistency and cache coherence,” Synthesis
Lectures on Computer Architecture 6.3 (2011): 1-212.
[14] Wilton, Steven JE, and Norman P. Jouppi, “CACTI: An
enhanced cache access and cycle time model,” Solid-State
Circuits, IEEE Journal of 31.5 (1996): 677-688.
[15] Magnusson, Peter S., et al., “Simics: A full system simulation
platform”,Computer 35.2 (2002): 50-58.
[16] Binkert, Nathan, et al., “The gem5 simulator,” ACM
SIGARCH Computer Architecture News 39.2 (2011): 1-7.
[17] S. C. Woo, M. Ohara, E. Torrie, J. P. Singh, and A. Gupta,
“The SPLASH-2 Programs: Characterization and
Methodological Considerations,” in Proceedings of the 22nd
International Symposium on Computer Architecture, pages 24-
36, June 1995.

IJERTV4IS080547 www.ijert.org 779

(This work is licensed under a Creative Commons Attribution 4.0 International License.)

Unit III Multiprocessor Issues
No ratings yet
Unit III Multiprocessor Issues
42 pages
2021-MESI Protocol For Multicore Processors Based On FPGA
No ratings yet
2021-MESI Protocol For Multicore Processors Based On FPGA
10 pages
FPGA Implementation of Snoopy Bus Based Cache Coherence Protocols For Dual Processor System
No ratings yet
FPGA Implementation of Snoopy Bus Based Cache Coherence Protocols For Dual Processor System
11 pages
IJARCCE-46 Cachemesiwithverilog
No ratings yet
IJARCCE-46 Cachemesiwithverilog
5 pages
Memory Latency
No ratings yet
Memory Latency
7 pages
Optimized Caching Techniques: Application for Scalable Distributed Architectures
From Everand
Optimized Caching Techniques: Application for Scalable Distributed Architectures
Peter Jones
No ratings yet
Assignment (G)
No ratings yet
Assignment (G)
5 pages
Assignment 2
No ratings yet
Assignment 2
15 pages
Hierarchical Cache / Bus Architecture For Shared Memory Multiprocessors
No ratings yet
Hierarchical Cache / Bus Architecture For Shared Memory Multiprocessors
9 pages
ACA - Memory
No ratings yet
ACA - Memory
26 pages
Survey On Shared Cache Management in Chip Multiprocessors: 1 Problem
No ratings yet
Survey On Shared Cache Management in Chip Multiprocessors: 1 Problem
3 pages
OpenCL Programming and Architecture: Definitive Reference for Developers and Engineers
From Everand
OpenCL Programming and Architecture: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
AdhamsLaw MemHierachy
No ratings yet
AdhamsLaw MemHierachy
8 pages
A Survey On Computer System Memory Management
No ratings yet
A Survey On Computer System Memory Management
7 pages
CMP Cache Architectures - A Survey
No ratings yet
CMP Cache Architectures - A Survey
9 pages
An Efficient Multi-Level Cache System For Geometrically Interconnected Many-Core Chip Multiprocessor
No ratings yet
An Efficient Multi-Level Cache System For Geometrically Interconnected Many-Core Chip Multiprocessor
10 pages
Lec 6 SharedArch PDF
No ratings yet
Lec 6 SharedArch PDF
33 pages
Ilovepdf Merged
No ratings yet
Ilovepdf Merged
86 pages
A Survey of Cache Coherence Mechanisms in Shared M
No ratings yet
A Survey of Cache Coherence Mechanisms in Shared M
27 pages
Mastering the Craft of C Programming: Unraveling the Secrets of Expert-Level Programming
From Everand
Mastering the Craft of C Programming: Unraveling the Secrets of Expert-Level Programming
Steve Jones
No ratings yet
Lect10 SMPCC
No ratings yet
Lect10 SMPCC
27 pages
Electrical Engineering and Computer Science Department
No ratings yet
Electrical Engineering and Computer Science Department
21 pages
Electrical Engineering and Computer Science Department: Chip Multiprocessor Cooperative Cache Compression and Migration
No ratings yet
Electrical Engineering and Computer Science Department: Chip Multiprocessor Cooperative Cache Compression and Migration
23 pages
Design and Implementation of A Cache Hierarchy-Aware Task Scheduling For Parallel Loops On Multicore Architectures
No ratings yet
Design and Implementation of A Cache Hierarchy-Aware Task Scheduling For Parallel Loops On Multicore Architectures
13 pages
Mastering C: Advanced Techniques and Tricks
From Everand
Mastering C: Advanced Techniques and Tricks
Ted Norice
No ratings yet
Memory HIerarchy
No ratings yet
Memory HIerarchy
53 pages
Fundamentals of Modern Computer Architecture: From Logic Gates to Parallel Processing
From Everand
Fundamentals of Modern Computer Architecture: From Logic Gates to Parallel Processing
Sam Steed
No ratings yet
Memory Models
No ratings yet
Memory Models
18 pages
Phase-Priority Based Directory Coherence For Multicore Processor
No ratings yet
Phase-Priority Based Directory Coherence For Multicore Processor
15 pages
Study Guide 2
No ratings yet
Study Guide 2
4 pages
ECE657
No ratings yet
ECE657
15 pages
03 Memory
No ratings yet
03 Memory
48 pages
Distributed Storage Networks: Architecture, Protocols and Management
From Everand
Distributed Storage Networks: Architecture, Protocols and Management
Thomas C. Jepsen
No ratings yet
ACA Unit 2
No ratings yet
ACA Unit 2
45 pages
Longhorn for Kubernetes Storage Architecture: The Complete Guide for Developers and Engineers
From Everand
Longhorn for Kubernetes Storage Architecture: The Complete Guide for Developers and Engineers
William Smith
No ratings yet
LPIC-3 Exam 306-300 Mastery: 500 Practice Questions on High Availability & Storage Clusters
From Everand
LPIC-3 Exam 306-300 Mastery: 500 Practice Questions on High Availability & Storage Clusters
Steve Brown
No ratings yet
Parallel Programming with MPI: Definitive Reference for Developers and Engineers
From Everand
Parallel Programming with MPI: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Kernel Concepts and Architecture: Definitive Reference for Developers and Engineers
From Everand
Kernel Concepts and Architecture: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
10 Cacheperf
No ratings yet
10 Cacheperf
24 pages
Krishna M. Kavi The University of Alabama in Huntsville: Cache Memories
No ratings yet
Krishna M. Kavi The University of Alabama in Huntsville: Cache Memories
5 pages
CHAPTER 2 Memory Hierarchy Design & APPENDIX B. Review of Memory Heriarchy
No ratings yet
CHAPTER 2 Memory Hierarchy Design & APPENDIX B. Review of Memory Heriarchy
73 pages
MTP 01 Final J.raghunat b15216
No ratings yet
MTP 01 Final J.raghunat b15216
10 pages
Chip Multicore Processors - Tutorial 6: Task 6.1: Cache Misses
No ratings yet
Chip Multicore Processors - Tutorial 6: Task 6.1: Cache Misses
1 page
NVMe Architecture and Protocols: Definitive Reference for Developers and Engineers
From Everand
NVMe Architecture and Protocols: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
"Cache Memory" in (Microprocessor and Assembly Language) : Lecture-20
No ratings yet
"Cache Memory" in (Microprocessor and Assembly Language) : Lecture-20
19 pages
Cache Coherency
No ratings yet
Cache Coherency
19 pages
Pattern Based Cache Coherency Architectu
No ratings yet
Pattern Based Cache Coherency Architectu
13 pages
Mastering C: Advanced Techniques and Best Practices
From Everand
Mastering C: Advanced Techniques and Best Practices
Adam Jones
No ratings yet
Pipelining For Multi-Core Architectures
No ratings yet
Pipelining For Multi-Core Architectures
31 pages
Reconfigurable Cache Architecture: Major Technical Project On
No ratings yet
Reconfigurable Cache Architecture: Major Technical Project On
9 pages
18bce2429 Da 2 Cao
No ratings yet
18bce2429 Da 2 Cao
13 pages
Neha Csa Seminar
No ratings yet
Neha Csa Seminar
12 pages
Performance Analysis On Multicore Processors
No ratings yet
Performance Analysis On Multicore Processors
9 pages
Memory Hierarchy: Memory Hierarchy Design in A Computer System Mainly
No ratings yet
Memory Hierarchy: Memory Hierarchy Design in A Computer System Mainly
16 pages
An Energy-Efficient Cache Architecture For Chip-Multiprocessors Based On Non-Uniformity Accesses
No ratings yet
An Energy-Efficient Cache Architecture For Chip-Multiprocessors Based On Non-Uniformity Accesses
4 pages
Study Guide Designing Cisco Data Centre Infrastructure (300-610) Exam
From Everand
Study Guide Designing Cisco Data Centre Infrastructure (300-610) Exam
Anand Vemula
No ratings yet
The Cache Memory Book (Jim Handy) (Z-Library)
No ratings yet
The Cache Memory Book (Jim Handy) (Z-Library)
331 pages
MC&CC
No ratings yet
MC&CC
21 pages
Cortex for Scalable Multi-Tenant Metrics: The Complete Guide for Developers and Engineers
From Everand
Cortex for Scalable Multi-Tenant Metrics: The Complete Guide for Developers and Engineers
William Smith
No ratings yet
Computer Organization & Architecture
No ratings yet
Computer Organization & Architecture
17 pages
A Deadlock-Free and Fault-Tolerant Routing Algorithm For 2.5D Chiplet Networks
No ratings yet
A Deadlock-Free and Fault-Tolerant Routing Algorithm For 2.5D Chiplet Networks
7 pages
Android Geek Night 3.0: Per Nymann Jørgensen
No ratings yet
Android Geek Night 3.0: Per Nymann Jørgensen
44 pages
Calculating Memory System Power For DDR3r PDF
No ratings yet
Calculating Memory System Power For DDR3r PDF
24 pages
Vsi Ocb (Bvci)
No ratings yet
Vsi Ocb (Bvci)
132 pages
Whymb 2010 06 07c
No ratings yet
Whymb 2010 06 07c
28 pages
Exploring Parallel Computing Architectures A Literature Review
No ratings yet
Exploring Parallel Computing Architectures A Literature Review
9 pages
OCR A Level Computer Science: 1.2 Types of Processor
No ratings yet
OCR A Level Computer Science: 1.2 Types of Processor
8 pages
VSP Midrange Architecture and Concepts Guide
No ratings yet
VSP Midrange Architecture and Concepts Guide
60 pages
Module - 05 CC (Bcs601) Search Creators
100% (2)
Module - 05 CC (Bcs601) Search Creators
35 pages
Guia Instalación eNSP
No ratings yet
Guia Instalación eNSP
12 pages
HD 308 S2
No ratings yet
HD 308 S2
5 pages
About The Presentations
No ratings yet
About The Presentations
42 pages
Processor I7 (Seminar Report)
64% (14)
Processor I7 (Seminar Report)
14 pages
Intel Xeon 6 Product Brief
No ratings yet
Intel Xeon 6 Product Brief
6 pages
IBM Power Systems Performance Capabilities Reference
No ratings yet
IBM Power Systems Performance Capabilities Reference
46 pages
Dell Emc Poweredge 15g Portfolio Brochure
No ratings yet
Dell Emc Poweredge 15g Portfolio Brochure
9 pages
Unit-1 Part-1
No ratings yet
Unit-1 Part-1
14 pages
ConvertXtoDVD Version 5 Complete Guide
100% (1)
ConvertXtoDVD Version 5 Complete Guide
51 pages
0521197163
100% (1)
0521197163
388 pages
CS6801 MCAP-Lesson Plan - Regulation-2013
No ratings yet
CS6801 MCAP-Lesson Plan - Regulation-2013
5 pages
Modern Trends in Computer Architecture
No ratings yet
Modern Trends in Computer Architecture
4 pages
Sprada 8
No ratings yet
Sprada 8
10 pages
Optical Interconnects For Future Data Center Networks
100% (3)
Optical Interconnects For Future Data Center Networks
179 pages
POWER9 Family Level 1 Quiz
No ratings yet
POWER9 Family Level 1 Quiz
8 pages
Introduction To Parallel Processing and Distributed Systems
No ratings yet
Introduction To Parallel Processing and Distributed Systems
15 pages
Central Applications Office (CAO) Paper 2
No ratings yet
Central Applications Office (CAO) Paper 2
3 pages
PDF TNPM Installguide
No ratings yet
PDF TNPM Installguide
268 pages
Flow-3D: Hydraulics Applications Using
No ratings yet
Flow-3D: Hydraulics Applications Using
74 pages
Belarc Advisor - Computer Profile - Nila - Laptop
No ratings yet
Belarc Advisor - Computer Profile - Nila - Laptop
5 pages
Tutorial
No ratings yet
Tutorial
9 pages
OS - Unit 1
No ratings yet
OS - Unit 1
67 pages
CS111 Introduction To Computation: Navpreet Singh
No ratings yet
CS111 Introduction To Computation: Navpreet Singh
46 pages
Introduction To CSS (Unit 2 Lesson 2)
No ratings yet
Introduction To CSS (Unit 2 Lesson 2)
92 pages
Alteryx End User License Agreement
No ratings yet
Alteryx End User License Agreement
6 pages

Exploring Cache Coherency Design

Uploaded by

Exploring Cache Coherency Design

Uploaded by

International Journal of Engineering Research & Technology (IJERT)

Exploring Cache Coherency Design for Chip

Vinh Ngo Quang and Hao Do Trang Hoang and Thanh Vu D.

IJERTV4IS080547 www.ijert.org 775

IJERTV4IS080547 www.ijert.org 776

Fig. 2. The average of hit ratio in L1 and L2 cache.

Fig. 3. Memory partitions using difference pair of bits

IJERTV4IS080547 www.ijert.org 777

IJERTV4IS080547 www.ijert.org 778

[6] Milo M. K. Martin. Token Coherence, Ph.D. Dissertation. Dec.

IJERTV4IS080547 www.ijert.org 779

You might also like