0% found this document useful (0 votes)

58 views4 pages

Performance Analysis of 3D Stacked Memory Architectures in High Performance Computing

Uploaded by

jackkacy1010

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

58 views4 pages

Performance Analysis of 3D Stacked Memory Architectures in High Performance Computing

Uploaded by

jackkacy1010

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 4

2024 4th International Conference on Advance Computing and Innovative Technologies in Engineering (ICACITE)

Performance Analysis of 3D Stacked Memory

Architectures in High Performance Computing

Venkat Tulasi Krishna Gannavaram Arun Kumar Gajula
School of Electrical, Computer and Energy Engineering Department of Electronics and Communication Engineering
Arizona State University Kakatiya Institute of Technology and Science
Tempe, USA Warangal, India
[email protected] [email protected]

Abstract: The increasing speed of the CPU is greater interconnect density. In a 3D stacked structure, DRAM
than that of the RAM. The 'Memory Wall' issue, which memory is stacked on top of the logic layer, bonded with
occurs when the processor runs out of data, could occur. vertical interconnects.
The I/O infrastructure is strained as the number of cores
in CMPs rises because more data is needed from the 3D stacking enables an increase in bandwidth by using
memory subsystem. Memory bandwidth becomes a these dense interconnects instead of traditional I/O pins,
bottleneck in terms of performance. By layering allows for mixing dissimilar process technologies such as
memories on top of logic, three-dimensional integrated highspeed CMOS with high-density DRAM leading to an
circuits are proposed as a solution to this problem. The increase in on-chip memory capacity, as well as reductions in
widely-used compute-in-memory benchmark framework power requirements by reducing the number of external I/O
'NeuroSim' is utilized by us to investigate the benefits of drivers and interconnects [1].
a three-dimensional design for a Neural Network
application. Some issues with building in the third dimension, is the
integration of different process technologies (and materials)
Keywords: 3D Stacked Memory, Compute in together can be challenging. Further, reliability issues can be
Memory, Monolithic 3D Integration, Neural Networks induced due to stress from higher tiered layers in the stack.
I. INTRODUCTION Additionally, power dissipation has major influence as the
high-power compute layers can create thermal hotspots in the
Most computer chores are done by the processor (CPU) stacked memory modules, especially those near the centre
getting data and instructions from main memory, which is which are far away from the heat sink.
usually an external memory that uses DRAMs, and then II. BACKGROUND STUDY
running them in order. Processors are becoming more and
more efficient at a rate of about 60% per year thanks to new G. H. Loh [2] examines the integration of 3D DRAM in
methods and technologies being used. On the other hand, multicore computers, with the control and peripheral access
DRAM technology and access times have been getting better circuitry being located on a distinct CMOS technology layer
at a rate of less than 10% per year. When more data needs to specifically designated for this purpose. The bit cells are
be retrieved from memory, the I/O system has to work harder fabricated using vertically stacked NMOS technology, with
to make sure there is enough memory speed. This type of 2D Through-Silicon Vias employed for interconnecting the
integrated circuit has the main memory and processor on two layers. This design implements a distributed architecture for
different chips. Because the bus link between chips is so long, DRAM ranks, utilising multiple layers instead of a single
there are a lot of capacitive loads. This makes the process of layer. This results in a 32% reduction in memory access time
moving data from main memory to the CPU very slow and for a DRAM with five layers. The authors propose the
uses a lot of power. Because logic and memory don't work utilisation of a vector bloom filter to enhance the
together as well as they should, microprocessor makers have functionality of the L2 miss handling architecture (MHA) in
had to come up with complicated, energy-hungry order to optimise the additional capacity provided by the 3D-
architectures that allow for out-of-order and speculative stacked memory system. The test findings indicate that the
execution. To hide the delay in main memory, computers have proposed method of memory organisation is 1.75 times more
also been made with bigger and bigger cache hierarchies. efficient than alternative 3D DRAM concepts when
People often call these kinds of problems "Memory Wall performing memory-intensive jobs on a quad-core CPU.
problems." Applications that use a lot of data, like machine
learning, neural network calculations, and real-time data D. H. Woo et al. [3] suggest a 3D-stacked memory
analytics, are directly affected by the memory wall limit. architecture with a vertical L2 network using a large array of
high- density TSVs to improve memory bandwidth even
One way to deal with this problem is to bring the main further. Their results show at least a 1.27x speedup over
memory closer to the processor die thus reducing access traditional 3D stacked DRAM architectures.
latencies, by using 2.5D or 3D packaging. In a 2.5D structure,
the processor and memory are placed side-by-side on a D. Lee et al. [4] explore another approach to better utilize
silicon interposer to achieve extremely high die-to-die the total potential bandwidth increase offered by TSVs for
3D stacked DRAMs. Their proposed architecture delivers an

979-8-3503-6016-5/24/$31.00 ©2024 IEEE 1634

Authorized licensed use limited to: National Yang Ming Chiao Tung University. Downloaded on November 22,2024 at 06:14:04 UTC from IEEE Xplore. Restrictions apply.
2024 4th International Conference on Advance Computing and Innovative Technologies in Engineering (ICACITE)

increase in the internal DRAM bandwidth by accessing
multiple DRAM layers simultaneously, thus making much We employ NeuroSim [8], a commonly utilised
greater use of the bandwidth that the TSVs offer. benchmark platform for 3D integrated CIM accelerators
specifically developed for DNN inference. NeuroSim is
capable of facilitating both monolithic and heterogeneous 3D
integration.

3.1 NeuroSim Framework

The desired DNN model is trained externally through a
PyTorch wrapper, and the final trained weights get mapped
to hardware synapses within NeuroSim. The tool uses a DNN
network topology defined by the user, and then using hard-
ware inferences, decides the floorplan of the CIM accelerator.
This enables instruction-accurate evaluation on both
Fig. 1. NeuroSim Chip Architecture (Peng et al. IEDM accuracy and hardware performance of inference. The high-
2019) level chip floorplan is shown in figure 1.

In [5], the authors abandon the idea of die-to-die stacking The synaptic arrays contain one-transistor-one-resistor
which uses TSVs as vertical interconnects, instead opting for (1T1R) based resistive random-access-memory (RRAM)
a monolithic based 3D integration approach, where multiple cells. Multiple such cells act together to mimic synapses in
tiers of devices are fabricated sequentially over each other. hardware. Additional peripheral circuits like multiplexers,
This technology uses monolithic inter-tier vias (MIVs) for analog- to-digital converters, shift-adders, etc. are present
vertical connections, which are over three orders of within the array block. The framework assumes that the on-
magnitude smaller than TSVs, allowing for fine-grained chip memory is sufficient to store all weight data, while input
vertical integration, and reporting both performance and data lies off- chip.
thermal advantages over their TSV counterparts.
The synaptic array size is decided by user input. Tile and
M. M. Sabry Aly et al. [6] introduce a new architecture PE size are iteratively optimized by NeuroSim to get the
incorporating monolithic 3D integration with new logic de- highest possible memory utilization. Multiple tiles can map
vices (such as carbon nanotube field-effect) as well as the use to one layer, but multiple layers do not map to a single tile.
of high-density Nonvolatile memory (ReRAM and Interconnects between modules use a H-tree based wiring
STTRAM) improving the energy-delay product for common structure, also visible in the figure.
workloads by almost three orders of magnitude over
conventional systems. Despite the potential benefits offered Monolithic integration was chosen as the 3D integration
by 3D stacking, there are substantial worries regarding its method for this study. In monolithic 3D, multiple tiers are
thermal effects. High power dissipation from the compute fabricated over each other sequentially instead of die-
layer can create thermal hot spots in the stacked DRAM stacking, like that in heterogeneous 3D. It utilizes finer
modules, especially those near the centre that are far away grained back end of the line (BEOL) monolithic inter-tier vias
from the heat sink. This can lead to higher peak temperature (MIVs) for inter-tier communication, thus resulting in higher
than that in 2D chips. Higher temperatures lead to an issue memory bandwidth and lower access times when compared
with performance, leakage power, and reliability of the to the through silicon vias (TSV) used in heterogeneous 3D.
circuit. The research presented in [7] reports that thermal
constraints for 3D stacked memory do place a lower limit on NeuroSim implements a two-tiered chip structure for
the operating frequency in 3D ICs, while still giving large monolithic 3D, as shown in figure 2, where the bottom tier
performance benefits over traditional 2D designs. consists of logic elements (ADC and Accumulation Circuits),
and the top tier is strictly dedicated for RRAM memory and
III. PROPOSED WORK its peripheries. This design keeps area consuming logic layers
to be on a separate tier, allowing the use of advanced tech
We take a look at Compute-in-Memory (CIM) nodes (7 nm) for logic, and older tech nodes (22 nm) for the
architectures, which is a popular use case of 3D integrated memory tier.
chips. CIM attempts to overcome the memory wall bottleneck
by performing operations on data within the memory, where
possible. The application we have selected is Deep Neural
Networks (DNN), which is a class of machine learning
algorithms that employs multiple convolutional layers to
perform inferences. This structure leads to large memory
bandwidth requirements, and allows us to get a good
comparison of the benefits of 3D integration. For the scope
of this project, we will not delve deep into how CIM or DNNs
work.

1635
Authorized licensed use limited to: National Yang Ming Chiao Tung University. Downloaded on November 22,2024 at 06:14:04 UTC from IEEE Xplore. Restrictions apply.
2024 4th International Conference on Advance Computing and Innovative Technologies in Engineering (ICACITE)

The following parameters can be changed at run-time to
tune the hardware implementation for an RRAM based
monolithic 3D CIM accelerator.

● Network: A layer by layer network structure of the desired

DNN, including the sizes of the IFM and weight matrices.
This directly influences the floorplan in NeuroSim [10].

● Sub Array Size: Decides the size of each RRAM sub array
module in the chip architecture. Influences the PE and Tile
sizes [11].

● Speedup Degree: Allows NeuroSim to more aggressively

perform weight duplication on the given network structure
[12].

● Mapping Style: Choice between conventional and novel

Fig. 2. Monolithic 3D Floorplan (Peng et al. IEDM 2020) weight mapping.

3.2 Optimizations by NeuroSim Some additional hardware parameters include selection of

tech nodes for the logic and memory blocks, RRAM crossbar
To further improve memory utilization and the processing sizing, buffer and ADC types, etc.
speed of the whole network as much as possible, weight
duplication is introduced for each layer. Layer structures IV. RESULTS
(such as input feature size, channel depth and kernel size)
vary significantly within DNNs. Hence, NeuroSim iteratively The NeuroSim inference implementation results for a
decides the PE and tile sizes, and possibilities of weight VGG- 8 model trained on CIFAR-10 dataset are given in table
duplication among PEs. For example, if the weight-matrix of I. It is evident from the results that 3D integration
a layer is smaller than the tile size, it’s possible to duplicate improves throughput and access energies for CIM systems,
the weight-matrix, fetching in multiple neural activation given the same sub array architecture. Lower synaptic array
vectors in parallel, thus speeding up the computation of this sizes improve memory utilization as this leads to fewer arrays
layer. Further, the slower shallow layers of the DNN can be with empty memory cells, but the added need to communicate
sped up by using this parallelism of weight duplication, so between blocks as well as an increase in the peripheral
that the deeper layers do not have to wait idly as long for the circuitry reduces performance and increases power [13].
input feature maps (IFMs) to arrive [9].
Similarly, larger array sizes lead to an increase in
performance as more computes can take place within the
array, reducing the amount of inter module communications
[14]. However, more arrays would have empty cells now due
to the larger size, leading to low memory utilization. For this
DNN structure, a sub array size of 128x128 proves to be a
Table 1. Inference implementation results in NeuroSim for balanced design option.
VGG-8 model trained on cifar-10 dataset.
Further, we also take a look at the comparison of different
The novel weight mapping mechanism utilised by sub array sizes layer by layer for the DNN in NeuroSim,
NeuroSim is detailed in [10]. The weights allocated to shown in table II. The layer in question is layer 7 of the VGG-
different sub-matrices at different spatial locations of each 8 network, and is a fully connected layer of size 1024. All
kernel are determined by the processing element (PE) size three implementations have a 100% memory utilization.
that is selected. The input data that requires assignment to
distinct spatial positions within each kernel will also be
conveyed to the corresponding submatrix.

One may designate a collection of subarrays that include

accumulation modules and input and output containers as a
single processing element (PE). By enabling the recycling of
input data across these processing elements (PEs), this Table 2. NeuroSim latency and energy parameters for layer
mapping reduces the need for inter-PE communication and, 7 of VGG-8 network
consequently, computational latency when compared to the
conventional weight mapping method. The 64x64 implementation is the worst performing for
this layer. This is possibly due to the energy leakage and
3.3 NeuroSim Parameters access time latencies caused by the additional peripherals.
The 256x256 floorplan has the lowest read energy. This could

1636
Authorized licensed use limited to: National Yang Ming Chiao Tung University. Downloaded on November 22,2024 at 06:14:04 UTC from IEEE Xplore. Restrictions apply.
2024 4th International Conference on Advance Computing and Innovative Technologies in Engineering (ICACITE)

be due to the reduction in total number of tiles, leading to a [10] X. Peng, R. Liu, S. Yu,” Optimizing weight mapping and data flow for
convolutional neural networks on RRAM based processing-in-memory
reduced number of total buffers required for each module, architecture”, IEEE International Symposium on Circuits and Systems
and thus data needs to be read by fewer total buffers. The (ISCAS), 2019.
leakage power also reduces, as there are fewer peripherals’ [11] J. Sun, P. Houshmand and M. Verhelst, "Analog or Digital In-Memory
circuits for the entire layer. However, this has the largest read Computing? Benchmarking Through Quantitative Modeling," 2023
latency. Finally, the 128x128 design has the lowest read IEEE/ACM International Conference on Computer Aided Design
(ICCAD), San Francisco, CA, USA, 2023, pp. 1-9, doi:
latency, proving to be the prefect size for this layer [15]. The 10.1109/ICCAD57390.2023.10323763.
remaining parameters are also balanced in comparison to the [12] J. Rhe, K. E. Jeon, J. C. Lee, S. Jeong and J. H. Ko, "Kernel Shape
other implementations. Other layers also seem to follow this Control for Row-Efficient Convolution on Processing-In-Memory
trend, if we do not include weight duplication. Enabling Arrays," 2023 IEEE/ACM International Conference on Computer
Aided Design (ICCAD), San Francisco, CA, USA, 2023, pp. 1-9, doi:
speedup for layers results in different amounts of weight 10.1109/ICCAD57390.2023.10323749.
duplication per layer depending on the sub array sizes, and [13] Y. Halawani, H. Tesfai, B. Mohammad and H. Saleh, "FORSA:
not just the speed-up degree. This can lead to varying trends Exploiting Filter Ordering to Reduce Switching Activity for Low
across sub array sizes, due to different degrees of latency and Power CNNs," 2023 IEEE 66th International Midwest Symposium on
utilization benefits from parallelism. Circuits and Systems (MWSCAS), Tempe, AZ, USA, 2023, pp. 561-
565, doi: 10.1109/MWSCAS57524.2023.10406115.
[14] L. Han, P. Huang, Z. Zhou, Y. Chen, X. Liu and J. Kang, "A
V. CONCLUSION Convolution Neural Network Accelerator Design with Weight
Mapping and Pipeline Optimization," 2023 60th ACM/IEEE Design
A monolithic 3D based CIM accelerator for DNNs was Automation Conference (DAC), San Francisco, CA, USA, 2023, pp. 1-
successfully simulated for various hardware parameters in 6, doi: 10.1109/DAC56929.2023.10247977.
NeuroSim. The improvements of a 3D based chip over [15] O. Krestinskaya, L. Zhang and K. N. Salama, "Towards Efficient In-
Memory Computing Hardware for Quantized Neural Networks: State-
traditional 2D architectures for data hungry applications were of-the-Art, Open Challenges and Perspectives," in IEEE Transactions
tested [16]. The importance of proper hardware planning is on Nanotechnology, vol. 22, pp. 377-386, 2023, doi:
also apparent from the results. 10.1109/TNANO.2023.3293026.
[16] J. Song, X. Tang, X. Qiao, Y. Wang, R. Wang and R. Huang, "A 28
nm 16 Kb Bit-Scalable Charge-Domain Transpose 6T SRAM In-
REFERENCES Memory Computing Macro," in IEEE Transactions on Circuits and
Systems I: Regular Papers, vol. 70, no. 5, pp. 1835-1845, May 2023,
[1] G. H. Loh and Y. Xie, ”3D Stacked Microprocessor: Are We There doi: 10.1109/TCSI.2023.3244338.
Yet?” in IEEE Micro, vol. 30, no. 3, pp. 60-64, May-June 2010, doi:
10.1109/MM.2010.45
[2] G. H. Loh, ”3D-Stacked Memory Architectures for Multi-core
Processors,” 2008 International Symposium on Computer
Architecture, Beijing, China, 2008, pp. 453-464, doi:
10.1109/ISCA.2008.15.
[3] D. H. Woo, N. H. Seong, D. L. Lewis and H. -H. S. Lee,” An
optimized 3D-stacked memory architecture by exploiting excessive,
high- density TSV bandwidth,” HPCA - 16 2010 The Sixteenth
International Symposium on High-Performance Computer
Architecture, Bangalore, India, 2010, pp. 1-12, doi:
10.1109/HPCA.2010.5416628.
[4] Donghyuk Lee, Saugata Ghose, Gennady Pekhimenko, Samira Khan,
and Onur Mutlu. 2016. Simultaneous Multi-Layer Access: Improving
3D- Stacked Memory Bandwidth at Low Cost. ACM Trans. Archit.
Code Optim. 12, 4, Article 63 (January 2016), 29 pages, doi:
1145/2832911
[5] Aqeeb Iqbal Arka, Biresh Kumar Joardar, Ryan Gary Kim, Dae Hyun
Kim, Janardhan Rao Doppa, and Partha Pratim Pande. 2021. HeM3D:
Heterogeneous Manycore Architecture Based on Monolithic 3D
Vertical Integration. ACM Trans. Des. Autom. Electron. Syst. 26, 2,
Article 16 (March 2021), 21 pages. https://doi.org/10.1145/3424239
[6] M. M. Sabry Aly et al.,” The N3XT Approach to Energy-Efficient
Abundant-Data Computing,” in Proceedings of the IEEE, vol. 107, no.
1, pp. 19-48, Jan. 2019, doi: 10.1109/JPROC.2018.2882603.
[7] G. Loi, B. Agrawal, N. Srivastava, S. Lin, T. Sherwood, K. Banerjee,
A thermally aware performance analysis of vertically integrated (3-D)
processor-memory hierarchy, in: Proceedings of the 43rd annual
Design Automation Conference, 2006, pp. 991–996.
[8] X. Peng, S. Huang, Y. Luo, X. Sun and S. Yu,” DNN+NeuroSim: An
End- to-End Benchmarking Framework for Compute-in-Memory
Accelerators with Versatile Device Technologies,” 2019 IEEE
International Electron Devices Meeting (IEDM), San Francisco, CA,
USA, 2019, pp. 32.5.1- 32.5.4, doi:
10.1109/IEDM19573.2019.8993491.
[9] X. Peng, W. Chakraborty, A. Kaul, W. Shim, M. S Bakir, S. Datta
and S. Yu,” Benchmarking Monolithic 3D Integration for Compute-in-
Memory Accelerators: Overcoming ADC Bottlenecks and Maintaining
Scalability to 7nm or Beyond”, IEEE International Electron Devices
Meeting (IEDM), 2020.

1637
Authorized licensed use limited to: National Yang Ming Chiao Tung University. Downloaded on November 22,2024 at 06:14:04 UTC from IEEE Xplore. Restrictions apply.

Assignment 2
No ratings yet
Assignment 2
15 pages
Die-Stacking Architecture
No ratings yet
Die-Stacking Architecture
129 pages
A 3D Stackable 1T1C DRAM Architecture Process Integration and Circuit Simulation
No ratings yet
A 3D Stackable 1T1C DRAM Architecture Process Integration and Circuit Simulation
4 pages
DRAM-101-1 An
No ratings yet
DRAM-101-1 An
14 pages
Electronics 12 01077 v2
No ratings yet
Electronics 12 01077 v2
19 pages
Seminar Report
50% (4)
Seminar Report
30 pages
2009 Hpca
No ratings yet
2009 Hpca
11 pages
Highly Stackable 3D Capacitor-Less DRAM For A High-Performance Hybrid Memory
No ratings yet
Highly Stackable 3D Capacitor-Less DRAM For A High-Performance Hybrid Memory
4 pages
Lecture 8
No ratings yet
Lecture 8
33 pages
Architectural and Integration Options For 3D NAND Flash Memories
No ratings yet
Architectural and Integration Options For 3D NAND Flash Memories
19 pages
DRAM3 DDa T09
No ratings yet
DRAM3 DDa T09
12 pages
2012 - ECE - Kannan - Propagation Delay Analysis in 3D Stacked Memory Using Novel MOS Depletion Layer Modeling Approach For Through Silicon Via
No ratings yet
2012 - ECE - Kannan - Propagation Delay Analysis in 3D Stacked Memory Using Novel MOS Depletion Layer Modeling Approach For Through Silicon Via
8 pages
Conference Proceeding
No ratings yet
Conference Proceeding
6 pages
SRAM Main
No ratings yet
SRAM Main
7 pages
Textual Learning Material - Module 3
No ratings yet
Textual Learning Material - Module 3
36 pages
Circuit and Microarchitecture Evaluation of 3D Stacking Magnetic RAM (MRAM) As A Universal Memory Replacement
No ratings yet
Circuit and Microarchitecture Evaluation of 3D Stacking Magnetic RAM (MRAM) As A Universal Memory Replacement
6 pages
Chapter 2 Neede For Guide Line Help From Smiw
No ratings yet
Chapter 2 Neede For Guide Line Help From Smiw
7 pages
Computer Architecture 1st Semester Spring Session Unit 3
No ratings yet
Computer Architecture 1st Semester Spring Session Unit 3
33 pages
9 PDF
No ratings yet
9 PDF
10 pages
CICC 2007 Oh
No ratings yet
CICC 2007 Oh
5 pages
Chapter 4 - Memory Part 3
No ratings yet
Chapter 4 - Memory Part 3
18 pages
Assignment (G)
No ratings yet
Assignment (G)
5 pages
Dense Footprint Cache: Capacity-Efficient Die-Stacked DRAM Last Level Cache
No ratings yet
Dense Footprint Cache: Capacity-Efficient Die-Stacked DRAM Last Level Cache
13 pages
Study Guide 2
No ratings yet
Study Guide 2
4 pages
An Energy-Efficient Cache Architecture For Chip-Multiprocessors Based On Non-Uniformity Accesses
No ratings yet
An Energy-Efficient Cache Architecture For Chip-Multiprocessors Based On Non-Uniformity Accesses
4 pages
DMR3D: Dynamic Memory Relocation in 3D Multicore Systems: Dean Michael Ancajas Koushik Chakraborty Sanghamitra Roy
No ratings yet
DMR3D: Dynamic Memory Relocation in 3D Multicore Systems: Dean Michael Ancajas Koushik Chakraborty Sanghamitra Roy
9 pages
Die Stacking Architecture
No ratings yet
Die Stacking Architecture
128 pages
Organization of 3d-Stacked Memory
No ratings yet
Organization of 3d-Stacked Memory
11 pages
Electrical Engineering and Computer Science Department
No ratings yet
Electrical Engineering and Computer Science Department
21 pages
A Case For Intelligent RAM: IRAM: 1. Introduction and Why There Is A Problem
No ratings yet
A Case For Intelligent RAM: IRAM: 1. Introduction and Why There Is A Problem
23 pages
An Architectural Approach
No ratings yet
An Architectural Approach
14 pages
ACA Unit 2
No ratings yet
ACA Unit 2
45 pages
Unit II
No ratings yet
Unit II
9 pages
Taouil
No ratings yet
Taouil
12 pages
CS 152 Computer Architecture and Engineering Lecture 6 - Memory
No ratings yet
CS 152 Computer Architecture and Engineering Lecture 6 - Memory
29 pages
Memory Latency
No ratings yet
Memory Latency
7 pages
Intelligent Technologies for Research and Engineering
From Everand
Intelligent Technologies for Research and Engineering
S. Kannadhasan
No ratings yet
hw4 Sol PDF
100% (2)
hw4 Sol PDF
23 pages
Chapter 6: Memory: - CPU Accesses Memory at Least Once Per Fetch-Execute Cycle: - Memory Is Organized Into A Hierarchy
No ratings yet
Chapter 6: Memory: - CPU Accesses Memory at Least Once Per Fetch-Execute Cycle: - Memory Is Organized Into A Hierarchy
25 pages
On-Chip MRAM As A High-Bandwidth, Low-Latency Replacement For DRAM Physical Memories
No ratings yet
On-Chip MRAM As A High-Bandwidth, Low-Latency Replacement For DRAM Physical Memories
11 pages
3D
No ratings yet
3D
9 pages
3D Packaged Memor Documents
No ratings yet
3D Packaged Memor Documents
5 pages
Advanced Vlsi
No ratings yet
Advanced Vlsi
11 pages
CHAPTER 2 Memory Hierarchy Design & APPENDIX B. Review of Memory Heriarchy
No ratings yet
CHAPTER 2 Memory Hierarchy Design & APPENDIX B. Review of Memory Heriarchy
73 pages
UNIT-IV Memory and I/O
No ratings yet
UNIT-IV Memory and I/O
36 pages
Chapter 6
No ratings yet
Chapter 6
37 pages
Hyper-Threading Technology: Processor Microarchitecture
No ratings yet
Hyper-Threading Technology: Processor Microarchitecture
18 pages
Three DIMENSIONAL-CHIP
No ratings yet
Three DIMENSIONAL-CHIP
6 pages
1.1 Processor Micro Architecture
No ratings yet
1.1 Processor Micro Architecture
21 pages
3D IC Architecture For High Density Memories
No ratings yet
3D IC Architecture For High Density Memories
6 pages
Increasing Factors Which Improves The Performance of Computer in Future
No ratings yet
Increasing Factors Which Improves The Performance of Computer in Future
7 pages
Phoenix Black-Microwave Muffle Furnace
No ratings yet
Phoenix Black-Microwave Muffle Furnace
12 pages
Overview of 3D Architecture Design Opportunities and Techniques
No ratings yet
Overview of 3D Architecture Design Opportunities and Techniques
6 pages
An Overview of 3D Integrated Circuits: Vachan Kumar Azad Naeemi
No ratings yet
An Overview of 3D Integrated Circuits: Vachan Kumar Azad Naeemi
3 pages
Fundamentals of Modern Computer Architecture: From Logic Gates to Parallel Processing
From Everand
Fundamentals of Modern Computer Architecture: From Logic Gates to Parallel Processing
Sam Steed
No ratings yet
Design and Analysis of Algorithms: 1, #1
From Everand
Design and Analysis of Algorithms: 1, #1
S. R. Jena
No ratings yet
Lexicon of Computer Science Terminology: Lexicon of Tech and Business, #16
From Everand
Lexicon of Computer Science Terminology: Lexicon of Tech and Business, #16
Mustafa Al-Dori
4/5 (1)
Powertronic Installation Manual-Kawasaki Er-6N (2012-2018)
No ratings yet
Powertronic Installation Manual-Kawasaki Er-6N (2012-2018)
33 pages
Specification For Fireproofingof Structural Steel and Equipment
No ratings yet
Specification For Fireproofingof Structural Steel and Equipment
11 pages
1-Introduction To Algorithms and C Programming
No ratings yet
1-Introduction To Algorithms and C Programming
50 pages
BMW 5 Series BSI BRI Sheet
No ratings yet
BMW 5 Series BSI BRI Sheet
1 page
Introduction to Quantum Computing & Machine Learning Technologies: 1, #1
From Everand
Introduction to Quantum Computing & Machine Learning Technologies: 1, #1
M. Sreedevi
No ratings yet
SAP Afaria System Requirements
No ratings yet
SAP Afaria System Requirements
38 pages
Bf2 Flanger Eng
No ratings yet
Bf2 Flanger Eng
5 pages
OPTALIGNsmart guideNV
No ratings yet
OPTALIGNsmart guideNV
2 pages
Machine Learning Mastery for Engineers
From Everand
Machine Learning Mastery for Engineers
Abdellatif Sadeq
No ratings yet
Airport Fueling - Manual
No ratings yet
Airport Fueling - Manual
47 pages
Customer Management Compact Handbook
No ratings yet
Customer Management Compact Handbook
10 pages
Penjelasan Listing Program
No ratings yet
Penjelasan Listing Program
63 pages
Objective:: Write An Experiment On Zener Diode Clipper
No ratings yet
Objective:: Write An Experiment On Zener Diode Clipper
13 pages
Exploring Bentley STAAD.Pro CONNECT Edition, 3rd Edition
From Everand
Exploring Bentley STAAD.Pro CONNECT Edition, 3rd Edition
Prof. Sham Tickoo
5/5 (3)
Whitepaper EngineeringDesignSimulationShapeOptimization OnshapeSimScaleESTECO
No ratings yet
Whitepaper EngineeringDesignSimulationShapeOptimization OnshapeSimScaleESTECO
17 pages
Cisco Intersight Infrastructure Service Data Sheet
No ratings yet
Cisco Intersight Infrastructure Service Data Sheet
15 pages
JS7 ClassNotes
No ratings yet
JS7 ClassNotes
5 pages
Z0200461 Busi55215 - 2023
No ratings yet
Z0200461 Busi55215 - 2023
9 pages
Partlist N4004A-1
No ratings yet
Partlist N4004A-1
2 pages
Project Documentation: File: Examen - Project Date: 16/06/2021 Profile: Codesys V3.5 Sp17
No ratings yet
Project Documentation: File: Examen - Project Date: 16/06/2021 Profile: Codesys V3.5 Sp17
9 pages
Tom's Introduction To The MBT Binaural Beats and How Best To Use Them
No ratings yet
Tom's Introduction To The MBT Binaural Beats and How Best To Use Them
3 pages
Programming Unit Vocabulary 1
No ratings yet
Programming Unit Vocabulary 1
4 pages
Bipolar Soft Neutrosophic Topological Region
No ratings yet
Bipolar Soft Neutrosophic Topological Region
5 pages
SCBM-910400#SCBM-910400 1
No ratings yet
SCBM-910400#SCBM-910400 1
2 pages
Pathfinder Solution Overview
No ratings yet
Pathfinder Solution Overview
2 pages
Mickael Musindo
No ratings yet
Mickael Musindo
2 pages
Aimcat 1803 Exp Review
No ratings yet
Aimcat 1803 Exp Review
2 pages
Geu Admit Card Back
No ratings yet
Geu Admit Card Back
1 page
15kw - SN College - SLD
No ratings yet
15kw - SN College - SLD
1 page
Timber Stacker One Page 7
No ratings yet
Timber Stacker One Page 7
1 page
SECURITY MEASURES IN Monuments
No ratings yet
SECURITY MEASURES IN Monuments
1 page
Mesh Generation: Advances and Applications in Computer Vision Mesh Generation
From Everand
Mesh Generation: Advances and Applications in Computer Vision Mesh Generation
Fouad Sabry
No ratings yet
Computational Geometry: Exploring Geometric Insights for Computer Vision
From Everand
Computational Geometry: Exploring Geometric Insights for Computer Vision
Fouad Sabry
No ratings yet

Performance Analysis of 3D Stacked Memory Architectures in High Performance Computing

Uploaded by

Performance Analysis of 3D Stacked Memory Architectures in High Performance Computing

Uploaded by

2024 4th International Conference on Advance Computing and Innovative Technologies in Engineering (ICACITE)

Performance Analysis of 3D Stacked Memory

Architectures in High Performance Computing

979-8-3503-6016-5/24/$31.00 ©2024 IEEE 1634

3.1 NeuroSim Framework

● Network: A layer by layer network structure of the desired

● Speedup Degree: Allows NeuroSim to more aggressively

● Mapping Style: Choice between conventional and novel

3.2 Optimizations by NeuroSim Some additional hardware parameters include selection of

One may designate a collection of subarrays that include

You might also like