IPSJ Transactions on System and LSI Design Methodology
Online ISSN : 1882-6687
ISSN-L : 1882-6687
Parallelism-flexible Convolution Core for Sparse Convolutional Neural Networks on FPGA
Salita SombatsiriSeiya ShibataYuki KobayashiHiroaki InoueTakashi TakenakaTakeo HosomiJaehoon YuYoshinori Takeuchi
Author information
JOURNAL FREE ACCESS

2019 Volume 12 Pages 22-37

Details
Abstract

The performance of recent CNN accelerators falls behind their peak performance because they fail to maximize parallel computation in every convolutional layer from the parallelism that varies throughout the CNN. Furthermore, the exploitation of multiple parallelisms may reduce calculation-skip ability. This paper proposes a convolution core for sparse CNN that leverages multiple types of parallelism and weight sparsity efficiently to achieve high performance. It alternates dataflow and scheduling of parallel computation according to the available parallelism of each convolutional layer by exploiting both intra- and inter-output parallelism to maximize multiplier utilization. In addition, it eliminates redundant multiply-accumulate (MACC) operations due to weight sparsity. The proposed convolution core enables both abilities with ease of dataflow control by using a parallelism controller for scheduling parallel MACCs on the processing elements (PEs) and a weight broadcaster for broadcasting non-zero weights to the PEs according to the scheduling. The proposed convolution core was evaluated on 13 convolutional layers in a sparse VGG-16 benchmark. It outperforms the baseline architecture for dense CNN that exploits intra-output parallelism by 4x speedup. It achieves 3x effective GMACS over prior arts of CNN accelerator in total performance.

Content from these authors
© 2019 by the Information Processing Society of Japan
Previous article Next article
feedback
Top