0% found this document useful (0 votes)

106 views36 pages

Exploring Better Speculation and Data Locality in Sparse Matrix Vector Multiplication On Intel Xeon

This is the oral presentation of the ICCD 2020 accepted research paper entitled Exploring Better Speculation and Data Locality in Sparse Matrix Vector Multiplication on Intel Xeon.

Uploaded by

夏天

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

106 views36 pages

Exploring Better Speculation and Data Locality in Sparse Matrix Vector Multiplication On Intel Xeon

This is the oral presentation of the ICCD 2020 accepted research paper entitled Exploring Better Speculation and Data Locality in Sparse Matrix Vector Multiplication on Intel Xeon.

Uploaded by

夏天

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 36

Exploring Better Speculation and Data Locality in

Sparse Matrix-Vector Multiplication on Intel Xeon

Haoran Zhao, Tian Xia, Chenyang Li,

Wenzhe Zhao, Nanning Zheng and Pengju Ren

Xi’an Jiaotong University

ICCD’2020
Outline

• SpMV on Intel XEON

• Bottleneck Analysis
• Optimize Speculation
• Optimize Memory Access
• Optimization Scheme
• Evaluations
Outline

• SpMV on Intel XEON

• Bottleneck Analysis
• Optimize Speculation
• Optimize Memory Access
• Optimization Scheme
• Evaluations
SpMV on Intel XEON

(Iterations)
Sparse Matrix-Vector Multiplication

Vector
• (Very) Sparse Matrix
• Dense Vector
x

a b
For i in Range(Iter): c d e f
A g y
…
h i j
yi = Axi
k l
…
Sparse Matrix
End
SpMV on Intel XEON

(Iterations)
Sparse Matrix-Vector Multiplication

Vector
• (Very) Sparse Matrix
• Dense Vector
• (Normally) Many Iterations with
Same Matrix and Multiple Vectors
a b
For i in Range(Iter): c d e f
A g
…
h i j
yi = Axi
k l
…
Sparse Matrix
End
SpMV on Intel XEON
CPU
Graph-based Matrix (SuiteSparse Matrix Collection)
• Intel Xeon Gold 6146 (SkyLake)
• CPU Core: 12
• Frequency: 3.2-4.2 GHz
• L1 Cache: 32KB I/D web-google dblp-2010
• L2 Cache: 1MB Private
• L3 Cache: 24.75 MB Shared

Tested Function
• Intel® Math Kernel Library
• Version: l_mkl_2019.5.281
• Routine: mkl_sparse_d_mv
Linux_call_graph web-Stanford
Profiling
• Intel® VTune Profiler 2020
SpMV on Intel XEON

Some Take-Aways:
Retiring Front-End Bad Speculation • Poor efficiency (34% valid execution)
• Two Major bottlenecks:
➢ Bad Speculation
➢ Memory Access
• Observation Applicable on Most Graph-
based Sparse Matrices

Back-End.Core Back-End.Memory
Outline

• SpMV on Intel XEON

• Bottleneck Analysis
• Optimize Speculation
• Optimize Memory Access
• Optimization Scheme
• Evaluations
Bottleneck Analysis
• CSR Format
0 1 2 3 4 Column Index
0 a b 1 3 0 1 2 4 1 0 3 4 1 2
1 c d e f
2 g a b c d e f g h i j k l
3 h i j Non-zero value
4 k l Row Pointer 0 2 6 7 10 12

Vector (CSR) Sparse Matrix Vector Irregular computation:

R1 a b • Different loops among rows
a b means random branch conditions
R2 c d e f
c d e f • Different vector positions among
R3 g rows means no cache-line reuse.
g
h i j R4 h i j
k l R5 k l
Sparse Matrix
Bottleneck Analysis
• Speculation
Speculatively Execution

Branch ... • Modern Superscalar CPU relies on Branch Prediction

... to increase pipeline efficiency
...

Fetch, decode, Reorder Buffer Retire

rename, dispatch Issue Execution (ROB)

... ...
• Wrong Prediction causes serious penalty:
... ...
... ➢ Invalidate speculatively-executed instructions
... ...
➢ Recover CPU state to branch point
... ...

Detect Wrong Prediction

Bottleneck Analysis
• Speculation

PHT • Branch predictor learns from local branch history and

History Branch
global branch history
T NT NT NT NT
T-T Weak Strong
Strong Weak
Not Not
T - NT Taken
T
Taken
T Taken T Taken • SpMV loop is impossible to predict, because it has
NT - T no pattern by its nature.

NT - NT
• Shorter rows tend to have more frequent wrong
Train predictions
T T NT X
R1
NT (Predict)
T T X
100% Mispredict
R2
Bottleneck Analysis
• Speculation

PHT • Branch predictor learns from local branch history and

History Branch
T NT NT NT NT
global branch history
T-T Weak Strong
Strong Weak
Not Not
Taken Taken
T - NT T T Taken T Taken • SpMV loop is impossible to predict, because it has
NT - T no pattern by its nature.

NT - NT
• Shorter rows tend to have more frequent wrong
Train predictions
T T NT X
R1 • Longer rows tend to have less frequent wrong
T (Predict) predictions
T T
R2 20% Mispredict
• Speculation Penalty is influenced by NNZ’s density
of each row.
Bottleneck Analysis
• Memory Access

• web-google has less density but poorer locality

• Memory locality is influenced by NNZ’s distribution,

web-google dblp-2010 not by NNZ’s density.

• In graph-based matrix, locality can be measured

using Diagonal Band, i.e. NNZ proportion in this
band.

• NNZ Density = 6.1 x 10-6 • NNZ Density = 1.5 x 10-5

• Poor Locality • Good Locality
• NNZ scatter all over • NNZ gather in diagonal
Bottleneck Analysis
• Sparse vs. Dense Original
Matrix
• In scale-free matrix, most rows are sparse
• Divide matrix into sparse & dense matrices
• Separately analyze bottlenecks Split Rows by NNZ

Dense Sparse
Threshold NNZ=20
Sub-Matrix Sub-Matrix

Dense
MKL SpMV MKL SpMV
Routine Routine
Sparse

Bottleneck Bottleneck
Analysis Analysis
Bottleneck Analysis
• Sparse vs. Dense
Retiring
Sparse Sparse Front-End
Runtime Row
Bad Speculation

Back-End.Memory

Back-End.Core
Sparse
NNZ

Sparse Sub-matrix Dense Sub-matrix

• Sparse part is bounded • Dense part is bounded

by memory access and • Sparse part take up by memory access
speculation most of the total runtime
Bottleneck Analysis
• SpMV Penalty Model

More NNZ Less NNZ

Poorly Banded Poorly Banded
Optimize Speculation & Locality

High
Optimize Locality

Memory Penalty
Low

More NNZ Less NNZ

Highly Banded Low High Highly Banded
Speculation Penalty
It’s already perfect! Optimize Speculation
Outline

• SpMV on Intel XEON

• Bottleneck Analysis
• Optimize Speculation
• Optimize Memory Access
• Optimization Scheme
• Evaluations
Optimize Speculation
X X
R0 R2
Density-based Optimization (DBO):
R1 R5
• Group together rows with same length
R2 R0 • Adjacent rows have similar loop pattern
R3 R4
Row Reorder Results:
R4 R7
✓ Better speculation
R5 R9 ⅹ Break original matrix structure and Y sequence
(Come back to this later)
R6 R3

R7 R8

R8 R1
R9 R6

63% Mis-Prediction 26% Mis-Prediction

Outline

• SpMV on Intel XEON

• Bottleneck Analysis
• Optimize Speculation
• Optimize Memory Access
• Optimization Scheme
• Evaluations
Optimize Memory Access

1. Downscale Scale=4 Bitmap-based Optimization (BBO):

• Segment each row and downscale to bitmaps
• Put rows of the same bitmap into buckets
1 1 0 1
• Order buckets with grey coding

2. Buckets
Tag: 0000 Tag: 0001 Tag: 0011 Tag: 1101
Generation
… …

3. Buckets 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1
Ordering 0 0 0 0 1 1 1 1 1 1 1 1 0 0 0 0
0 0 1 1 1 1 0 0 0 0 1 1 1 1 0 0
0 1 1 0 0 1 1 0 0 1 1 0 0 1 1 0
Optimize Memory Access

1. Downscale Bitmap-based Optimization (BBO):

(Threshold)
• Segment each row and downscale to bitmaps
• Put rows of the same bitmap into buckets
1 0 1 0
Down-scale (Threshold = 2) • Order buckets with grey coding
• For dense rows, use threshold to select only more
2. Buckets dense sections
Tag: 0000 Tag: 0001 Tag: 0011 Tag: 1010
Generation
… …
Results:
✓ Better memory locality on poorly-banded matrix
ⅹ No effect on highly-banded matrix
3. Buckets 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1
Ordering 0 0 0 0 1 1 1 1 1 1 1 1 0 0 0 0
0 0 1 1 1 1 0 0 0 0 1 1 1 1 0 0
0 1 1 0 0 1 1 0 0 1 1 0 0 1 1 0
Optimize Memory Access
DBO + BBO:
density group: • DBO may degrade memory locality as it breaks original structure
K-1 Gray Code • If poorly banded, combine DBO & BBO to alleviate degradation
• Use BBO inside each DBO group
• Use reverse gray ordering among groups

density group: Inverse Results:

K Gray Code
✓ Better speculation
✓ Better memory locality on poorly-banded matrix

density group: Gray Code

K+1
Outline

• SpMV on Intel XEON

• Bottleneck Analysis
• Optimize Speculation
• Optimize Memory Access
• Optimization Scheme
• Evaluations
Optimization Scheme
Original
Matrix

Split Rows by NNZ

High
More NNZ Less NNZ
Poorly Banded Poorly Banded
Sparse Dense

Memory Penalty
Optimize Locality Optimize Speculation & Locality
Sub-Matrix Sub-Matrix
BBO BBO + DBO
Highly Highly
Banded ? Banded ? More NNZ Less NNZ
Yes No Yes No Highly Banded Highly Banded

DBO DBO+BBO BBO It’s already perfect! Optimize Speculation

/ DBO

Low
Optimized Optimized
Sparse Sub-matrix Dense Sub-matrix Low Speculation Penalty High

Optimized Matrix
(CSR)
Outline

• SpMV on Intel XEON

• Bottleneck Analysis
• Optimize Speculation
• Optimize Memory Access
• Optimization Scheme
• Evaluations
Evaluations
• Sparse vs. Dense
Evaluations
Full Matrix Performance
CPU Benchmark
• Intel Xeon Gold 6146 (SkyLake) • SuiteSparse Matrix Collection
• CPU Core: 12 • 86 Graph-based Matices
• Frequency: 3.2-4.2 GHz • Scale range 10K – 2 Millions
• L1 Cache: 32KB I/D • NNZ range 40K – 70 Millions
• L2 Cache: 1MB Private Thread
• L3 Cache: 24.75 MB Shared • 1 Thread
Function
• 8 Thread
• Intel® Math Kernel Library Tested Methods
• Version: l_mkl_2019.5.281 • Ours
• Routine: mkl_sparse_d_mv (Baseline) • MKL Opt
• Optimize: mkl_sparse_optimize • Ours + MKL Opt
Evaluations
1 Thread:
• Ours: 1.7x
• MKL Opt: 1.1x
• Ours + MKL Opt: 2.5x

8 Thread:
• Ours: 1.8x
• MKL Opt: 1.2x
• Ours + MKL Opt: 3.6x

Conclusion:
• Ours outperform MKL Opt with great
margins.
• Our method helps MKL Opt achieve
higher vectorization rate and much
higher efficiency.
Evaluations
Pre-processing Cost

• Ours: 4.2x
• MKL Opt: 26.5x
• Ours + MKL Opt: 31.9x

Conclusion:
• Extreme low cost compared with MKL Opt or other approaches
• Can be negligible considering numerous iterations.
Q&A

Xi’an Jiaotong University

ICCD’2020
SpMV on Intel XEON

Retiring Front-End Bad Speculation

p2p-Gnutella soc-sign-Slashdot sx-askubuntu

email-EuAll Stanford web-Stanford

Back-End.Core Back-End.Memory
Linux_call_graph dblp-2010 web-google
SpMV on Intel XEON
• Intel® VTune Profiler 2020
Optimization Scheme
Original
Matrix

Split Rows by NNZ

Sparse Dense
Sub-Matrix Sub-Matrix

Highly Highly
Banded ? Banded ?
Yes No Yes No
DBO DBO+BBO BBO

Optimized Optimized
Sparse Sub-matrix Dense Sub-matrix

Optimized Matrix
(CSR)
Optimization Scheme

Yes
Optimization Scheme
Original
Matrix

Split Rows by NNZ

Sparse Dense
Sub-Matrix Sub-Matrix

Highly Highly
Banded ? Banded ?
No Yes No
DBO DBO+BBO BBO

Optimized Optimized
Sparse Sub-matrix Dense Sub-matrix

Optimized Matrix
(CSR)
Optimization Scheme

2212 07490
No ratings yet
2212 07490
41 pages
Slides
No ratings yet
Slides
46 pages
A Systematic Literature Survey of Sparse Matrix-Vector Multiplication
No ratings yet
A Systematic Literature Survey of Sparse Matrix-Vector Multiplication
34 pages
SpV8: Pursuing Optimal Vectorization and Regular Computation Pattern in SPMV
No ratings yet
SpV8: Pursuing Optimal Vectorization and Regular Computation Pattern in SPMV
21 pages
Chen 2022 FG SPMSP V
No ratings yet
Chen 2022 FG SPMSP V
29 pages
PMP in Practice Book Sec-301-599
No ratings yet
PMP in Practice Book Sec-301-599
299 pages
Ece408 Lecture19 Sparse Matrix VK SP23
No ratings yet
Ece408 Lecture19 Sparse Matrix VK SP23
28 pages
Sparse Bayesian Learning - Analysis and Applications
No ratings yet
Sparse Bayesian Learning - Analysis and Applications
57 pages
Sparsematrics
No ratings yet
Sparsematrics
28 pages
Merge-Based Parallel Sparse Matrix-Vector Multiplication SC2016
No ratings yet
Merge-Based Parallel Sparse Matrix-Vector Multiplication SC2016
12 pages
Sparse Matrix-Matrix Multiplication On Multilevel Memory Architectures: Algorithms and Experiments
No ratings yet
Sparse Matrix-Matrix Multiplication On Multilevel Memory Architectures: Algorithms and Experiments
24 pages
DWDM
No ratings yet
DWDM
20 pages
2018 - Optimizing Sparse Matrix-Vector Multiplications On Armv8-Based Many-Core Architecture
No ratings yet
2018 - Optimizing Sparse Matrix-Vector Multiplications On Armv8-Based Many-Core Architecture
14 pages
SMASH: Co-Designing Software Compression and Hardware-Accelerated Indexing For Efficient Sparse Matrix Operations
No ratings yet
SMASH: Co-Designing Software Compression and Hardware-Accelerated Indexing For Efficient Sparse Matrix Operations
15 pages
Yang 2018 Europa R
No ratings yet
Yang 2018 Europa R
16 pages
SVM Model
No ratings yet
SVM Model
7 pages
Systolic Sparse Matrix Vector Multiply in The Age of TPUs and Accelerators
No ratings yet
Systolic Sparse Matrix Vector Multiply in The Age of TPUs and Accelerators
10 pages
ASpT PPoPP19
No ratings yet
ASpT PPoPP19
15 pages
Support Vector Machine For Classification
No ratings yet
Support Vector Machine For Classification
38 pages
Skill Module
No ratings yet
Skill Module
15 pages
PKDD 23
No ratings yet
PKDD 23
18 pages
Neural 4
No ratings yet
Neural 4
5 pages
A Map Reduce Based Support Vector Machine For Big Data Classification
No ratings yet
A Map Reduce Based Support Vector Machine For Big Data Classification
22 pages
Group 5 Report
No ratings yet
Group 5 Report
21 pages
VO MCA S4 Data Mining Unit 6
No ratings yet
VO MCA S4 Data Mining Unit 6
21 pages
CSR5: An Efficient Storage Format For Cross-Platform Sparse Matrix-Vector Multiplication
No ratings yet
CSR5: An Efficient Storage Format For Cross-Platform Sparse Matrix-Vector Multiplication
12 pages
Support Vector Machine (SVM) : Basic Terminologies
100% (1)
Support Vector Machine (SVM) : Basic Terminologies
2 pages
Support Vector Machine
No ratings yet
Support Vector Machine
14 pages
SVM Lab.7
No ratings yet
SVM Lab.7
4 pages
DMML Unit4 - SVM
No ratings yet
DMML Unit4 - SVM
50 pages
Unit5 ML
No ratings yet
Unit5 ML
12 pages
Merge-Based Sparse Matrix-Vector Multiplication (SPMV) Using The CSR Storage Format
No ratings yet
Merge-Based Sparse Matrix-Vector Multiplication (SPMV) Using The CSR Storage Format
2 pages
AP For NLP-LO2
No ratings yet
AP For NLP-LO2
38 pages
Support Vactor Machine Final
No ratings yet
Support Vactor Machine Final
11 pages
Chapter 07
No ratings yet
Chapter 07
18 pages
SVM Fully Translated Fixed
No ratings yet
SVM Fully Translated Fixed
5 pages
Big
No ratings yet
Big
347 pages
Ml-Ii Unit-1
No ratings yet
Ml-Ii Unit-1
4 pages
Support Vector Machine (SVM) Algorithm
No ratings yet
Support Vector Machine (SVM) Algorithm
8 pages
SVM Presentation Updated
No ratings yet
SVM Presentation Updated
9 pages
SVM Manual
No ratings yet
SVM Manual
7 pages
Support Vector Machine - Theory
No ratings yet
Support Vector Machine - Theory
8 pages
ML Algorithms
No ratings yet
ML Algorithms
4 pages
Prediction On Iris
No ratings yet
Prediction On Iris
14 pages
Machine Learning Algorithms
No ratings yet
Machine Learning Algorithms
13 pages
Course Title: Fundamentals of Machine Learning Course Code: Group Assignment On
No ratings yet
Course Title: Fundamentals of Machine Learning Course Code: Group Assignment On
9 pages
Intel MKL Sparse Blas Overview
No ratings yet
Intel MKL Sparse Blas Overview
14 pages
Performance Evaluation of Multithreaded Sparse Matrix-Vector Multiplication Using OpenMP
No ratings yet
Performance Evaluation of Multithreaded Sparse Matrix-Vector Multiplication Using OpenMP
7 pages
SVM Everything
No ratings yet
SVM Everything
5 pages
Nitin Liladhar Rane Saurabh P. Choudhary Jayesh Rane: Abstract
No ratings yet
Nitin Liladhar Rane Saurabh P. Choudhary Jayesh Rane: Abstract
22 pages
Support Vector Machine (SVM) Terminology Hyperplane WX + B 0 Support Vectors Margin Kernel Hard Margin Soft Margin
No ratings yet
Support Vector Machine (SVM) Terminology Hyperplane WX + B 0 Support Vectors Margin Kernel Hard Margin Soft Margin
6 pages
Machine Learning Note 3
No ratings yet
Machine Learning Note 3
2 pages
Cyber Security and Ethical Hacking
No ratings yet
Cyber Security and Ethical Hacking
18 pages
Support Vecor Machine
No ratings yet
Support Vecor Machine
4 pages
Libsvm For MATLAB
No ratings yet
Libsvm For MATLAB
8 pages
SVM
No ratings yet
SVM
12 pages
Support Vector Machine (SVM)
No ratings yet
Support Vector Machine (SVM)
4 pages
Machine Learning Quick Start Guide
No ratings yet
Machine Learning Quick Start Guide
1 page
Neurocomputing: Yukun Bao, Zhongyi Hu, Tao Xiong
No ratings yet
Neurocomputing: Yukun Bao, Zhongyi Hu, Tao Xiong
9 pages
Computer Science Project Work, Grade 11 - HSEB NOTES
75% (137)
Computer Science Project Work, Grade 11 - HSEB NOTES
45 pages
Tutorial On Support Vector Machine (SVM) : Abstract
No ratings yet
Tutorial On Support Vector Machine (SVM) : Abstract
13 pages
How To Read Math
No ratings yet
How To Read Math
47 pages
360 Deep Dive PMGM SAP SF
No ratings yet
360 Deep Dive PMGM SAP SF
42 pages
DC C20
100% (1)
DC C20
2 pages
GL Planning - Budgeting - SAP Blogs
No ratings yet
GL Planning - Budgeting - SAP Blogs
21 pages
OFPL User Manual Airlines Version 2
No ratings yet
OFPL User Manual Airlines Version 2
20 pages
ccs352 Multimedia and Animation
100% (2)
ccs352 Multimedia and Animation
4 pages
Access Practical Poly
No ratings yet
Access Practical Poly
2 pages
OBR Troubleshooting Guide
No ratings yet
OBR Troubleshooting Guide
153 pages
Ostrich 1 Operation Moates Support
No ratings yet
Ostrich 1 Operation Moates Support
5 pages
C Coding Examples 1
100% (1)
C Coding Examples 1
11 pages
Palo Alto Networks PCCSA
No ratings yet
Palo Alto Networks PCCSA
13 pages
Remote Control Software Manual: Digital Multimeters DMK-DMG Series
No ratings yet
Remote Control Software Manual: Digital Multimeters DMK-DMG Series
39 pages
Py Serial
No ratings yet
Py Serial
63 pages
Tests Timetable Sem I 2024-2025
No ratings yet
Tests Timetable Sem I 2024-2025
14 pages
Game Info
No ratings yet
Game Info
5 pages
STAR-CCM Analysis Intel E5 2680 V2
No ratings yet
STAR-CCM Analysis Intel E5 2680 V2
24 pages
Fairwinds Whitepaper Kubernetes Good Bad Misconfigured
No ratings yet
Fairwinds Whitepaper Kubernetes Good Bad Misconfigured
9 pages
NDG Linux Essentials - Module 1 - Introduction To Linux PDF
No ratings yet
NDG Linux Essentials - Module 1 - Introduction To Linux PDF
4 pages
Computer Programming 2 Final Project
No ratings yet
Computer Programming 2 Final Project
17 pages
CCNA Voice Prep: Cisco IP Phone Boot Process and Registration
No ratings yet
CCNA Voice Prep: Cisco IP Phone Boot Process and Registration
11 pages
Resume 4D
No ratings yet
Resume 4D
1 page
RSSI Fingerprinting Techniques For Indoor Localization Datasets
No ratings yet
RSSI Fingerprinting Techniques For Indoor Localization Datasets
13 pages
Owl - Component - MD at Master Odoo - Owl
No ratings yet
Owl - Component - MD at Master Odoo - Owl
17 pages
Business Proposal: This Proposal Is Prepared For: This Proposal Is Prepared by
No ratings yet
Business Proposal: This Proposal Is Prepared For: This Proposal Is Prepared by
6 pages
The Tech Interview Playbook: From DSA to System Design
From Everand
The Tech Interview Playbook: From DSA to System Design
Chinmoy Mukherjee
No ratings yet
Build Switch and Logic Gates Using Transistors on the Breadboard
From Everand
Build Switch and Logic Gates Using Transistors on the Breadboard
GURUPRASAD N H
No ratings yet
Scanline Rendering: Exploring Visual Realism Through Scanline Rendering Techniques
From Everand
Scanline Rendering: Exploring Visual Realism Through Scanline Rendering Techniques
Fouad Sabry
No ratings yet
Hidden Line Removal: Unveiling the Invisible: Secrets of Computer Vision
From Everand
Hidden Line Removal: Unveiling the Invisible: Secrets of Computer Vision
Fouad Sabry
No ratings yet

Exploring Better Speculation and Data Locality in Sparse Matrix Vector Multiplication On Intel Xeon

Uploaded by

Exploring Better Speculation and Data Locality in Sparse Matrix Vector Multiplication On Intel Xeon

Uploaded by

Exploring Better Speculation and Data Locality in

Sparse Matrix-Vector Multiplication on Intel Xeon

Haoran Zhao, Tian Xia, Chenyang Li,

Xi’an Jiaotong University

• SpMV on Intel XEON

• SpMV on Intel XEON

• SpMV on Intel XEON

Vector (CSR) Sparse Matrix Vector Irregular computation:

Branch ... • Modern Superscalar CPU relies on Branch Prediction

Fetch, decode, Reorder Buffer Retire

Detect Wrong Prediction

PHT • Branch predictor learns from local branch history and

PHT • Branch predictor learns from local branch history and

• web-google has less density but poorer locality

• Memory locality is influenced by NNZ’s distribution,

• In graph-based matrix, locality can be measured

• NNZ Density = 6.1 x 10-6 • NNZ Density = 1.5 x 10-5

Sparse Sub-matrix Dense Sub-matrix

• Sparse part is bounded • Dense part is bounded

More NNZ Less NNZ

More NNZ Less NNZ

• SpMV on Intel XEON

63% Mis-Prediction 26% Mis-Prediction

• SpMV on Intel XEON

1. Downscale Scale=4 Bitmap-based Optimization (BBO):

1. Downscale Bitmap-based Optimization (BBO):

density group: Inverse Results:

density group: Gray Code

• SpMV on Intel XEON

Split Rows by NNZ

DBO DBO+BBO BBO It’s already perfect! Optimize Speculation

• SpMV on Intel XEON

Xi’an Jiaotong University

Retiring Front-End Bad Speculation

p2p-Gnutella soc-sign-Slashdot sx-askubuntu

email-EuAll Stanford web-Stanford

Split Rows by NNZ

Split Rows by NNZ

You might also like