DeMM: A Decoupled Matrix Multiplication Engine Supporting Relaxed Structured Sparsity

Peltekis, Christodoulos; Titopoulos, Vasileios; Nicopoulos, Chrysostomos; Dimitrakopoulos, Giorgos

doi:10.1109/LCA.2024.3355178

Computer Science > Hardware Architecture

arXiv:2401.08179 (cs)

[Submitted on 16 Jan 2024]

Title:DeMM: A Decoupled Matrix Multiplication Engine Supporting Relaxed Structured Sparsity

Authors:Christodoulos Peltekis, Vasileios Titopoulos, Chrysostomos Nicopoulos, Giorgos Dimitrakopoulos

View PDF

Abstract:Deep Learning (DL) has achieved unprecedented success in various application domains. Meanwhile, model pruning has emerged as a viable solution to reduce the footprint of DL models in mobile applications, without compromising their accuracy. To enable the matrix engines built for dense DL models to also handle their pruned counterparts, pruned DL models follow a fine-grained structured sparsity pattern of 1:4, or 2:4, whereby in each group of four contiguous values, at least one, or two, respectively, must be non-zero. Structured sparsity has recently also moved to coarser (relaxed) cases of N:128, or N:256, for small values of N, targeting a wider range of sparsity (10%-90%) for the DL models. In this work, we design an accelerator that operates, by construction, on wide blocks with relaxed structured sparsity. In contrast to the conventional systolic array archetype, the new engine decouples the memory part of the systolic array from the multiply-add units. The memory block comprises 1 write and N read ports, with the number of read ports being equal to the number of non-zero elements per row. The multiply-add units connect directly to each read port and complete the multiplication in a row-wise product-first order. More importantly, simple reconfiguration facilitates more dense patterns. The experimental evaluation demonstrates substantial latency improvements over current state-of-the-art systolic array engines built for fine-grained and relaxed structured sparsity.

Comments:	Accepted on the IEEE Computer Architecture Letters
Subjects:	Hardware Architecture (cs.AR)
Cite as:	arXiv:2401.08179 [cs.AR]
	(or arXiv:2401.08179v1 [cs.AR] for this version)
	https://doi.org/10.48550/arXiv.2401.08179
Related DOI:	https://doi.org/10.1109/LCA.2024.3355178

Submission history

From: Christodoulos Peltekis [view email]
[v1] Tue, 16 Jan 2024 07:51:15 UTC (1,179 KB)

Computer Science > Hardware Architecture

Title:DeMM: A Decoupled Matrix Multiplication Engine Supporting Relaxed Structured Sparsity

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Hardware Architecture

Title:DeMM: A Decoupled Matrix Multiplication Engine Supporting Relaxed Structured Sparsity

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators