default search action
30th PACT 2021: Atlanta, GA, USA
- Jaejin Lee, Albert Cohen:
30th International Conference on Parallel Architectures and Compilation Techniques, PACT 2021, Atlanta, GA, USA, September 26-29, 2021. IEEE 2021, ISBN 978-1-6654-4278-7 - Minxuan Zhou, Guoyang Chen, Mohsen Imani, Saransh Gupta, Weifeng Zhang, Tajana Rosing:
PIM-DL: Boosting DNN Inference on Digital Processing In-Memory Architectures via Data Layout Optimizations. 1 - Phitchaya Mangpo Phothilimthana, Amit Sabne, Nikhil Sarda, Karthik Srinivasa Murthy, Yanqi Zhou, Christof Angermueller, Mike Burrows, Sudip Roy, Ketan Mandke, Rezsa Farahani, Yu Emma Wang, Berkin Ilbeyi, Blake A. Hechtman, Bjarke Roune, Shen Wang, Yuanzhong Xu, Samuel J. Kaufman:
A Flexible Approach to Autotuning Multi-Pass Machine Learning Compilers. 1-16 - Alexander Brauckmann, Andrés Goens, Jerónimo Castrillón:
PolyGym: Polyhedral Optimizations as an Environment for Reinforcement Learning. 17-29 - Geonhwa Jeong, Gokcen Kestor, Prasanth Chatarasi, Angshuman Parashar, Po-An Tsai, Sivasankaran Rajamanickam, Roberto Gioiosa, Tushar Krishna:
Union: A Unified HW-SW Co-Design Ecosystem in MLIR for Evaluating Tensor Operations on Spatial Accelerators. 30-44 - William S. Moses, Lorenzo Chelini, Ruizhe Zhao, Oleksandr Zinenko:
Polygeist: Raising C to Polyhedral MLIR. 45-59 - Bruce Collie, Michael F. P. O'Boyle:
Program Lifting using Gray-Box Behavior. 60-74 - Joonsung Kim, Suyeon Hur, Eunbok Lee, Seungho Lee, Jangwoo Kim:
NLP-Fast: A Fast, Scalable, and Flexible System to Accelerate Large-Scale Heterogeneous NLP Models. 75-89 - Myeonggyun Han, Woongki Baek:
HERTI: A Reinforcement Learning-Augmented System for Efficient Real-Time Inference on Heterogeneous Embedded Systems. 90-102 - Naveen Vedula, Reza Hojabr, Ahmad Khonsari, Arrvindh Shriraman:
X-Layer: Building Composable Pipelined Dataflows for Low-Rank Convolutions. 103-115 - Daehyeon Baek, Soojin Hwang, Taekyung Heo, Daehoon Kim, Jaehyuk Huh:
InnerSP: A Memory Efficient Sparse Matrix Multiplication Accelerator with Locality-Aware Inner Product Processing. 116-128 - Maximilian Lam, Zachary Yedidia, Colby R. Banbury, Vijay Janapa Reddi:
Precision Batching: Bitserial Decomposition for Efficient Neural Network Inference on GPUs. 129-141 - Wanling Gao, Fei Tang, Jianfeng Zhan, Xu Wen, Lei Wang, Zheng Cao, Chuanxin Lan, Chunjie Luo, Xiaoli Liu, Zihan Jiang:
AIBench Scenario: Scenario-Distilling AI Benchmarking. 142-158 - Amirali Boroumand, Saugata Ghose, Berkin Akin, Ravi Narayanaswami, Geraldo F. Oliveira, Xiaoyu Ma, Eric Shiu, Onur Mutlu:
Google Neural Network Models for Edge Devices: Analyzing and Mitigating Machine Learning Inference Bottlenecks. 159-172 - Guodong Liu, Sa Wang, Yungang Bao:
SEER: A Time Prediction Model for CNNs from GPU Kernel's View. 173-185 - Minxuan Zhou, Lingxi Wu, Muzhou Li, Niema Moshiri, Kevin Skadron, Tajana Rosing:
Ultra Efficient Acceleration for De Novo Genome Assembly via Near-Memory Computing. 199-212 - Nadja Ramhöj Holtryd, Madhavan Manivannan, Per Stenström, Miquel Pericàs:
CBP: Coordinated management of cache partitioning, bandwidth partitioning and prefetch throttling. 213-225 - Mingcan Zhu, Amna Shahab, Antonios Katsarakis, Boris Grot:
Invalidate or Update? Revisiting Coherence for Tomorrow's Cache Hierarchies. 226-241 - Suyash Mahar, Sihang Liu, Korakit Seemakhupt, Vinson Young, Samira Manabi Khan:
Write Prediction for Persistent Memory Systems. 242-257 - Akash Panda, Ashish Panwar, Arkaprava Basu:
nuKSM: NUMA-aware Memory De-duplication on Multi-socket Servers. 258-273 - Xiaowei Shang, Weiwei Jia, Jianchen Shan, Xiaoning Ding:
CoPlace: Effectively Mitigating Cache Conflicts in Modern Clouds. 274-288 - Daniel Mawhirter, Sam Reinehr, Wei Han, Noah Fields, Miles Claver, Connor Holmes, Jedidiah McClurg, Tongping Liu, Bo Wu:
Dryadic: Flexible and Fast Graph Pattern Matching at Scale. 289-303 - Pengyu Wang, Chao Li, Jing Wang, Taolei Wang, Lu Zhang, Jingwen Leng, Quan Chen, Minyi Guo:
Skywalker: Efficient Alias-Method-Based Graph Sampling and Random Walk on GPUs. 304-317 - Chuangyi Gui, Xiaofei Liao, Long Zheng, Pengcheng Yao, Qinggang Wang, Hai Jin:
SumPA: Efficient Pattern-Centric Graph Mining with Pattern Abstraction. 318-330 - Octavi Obiols-Sales, Abhinav Vishnu, Nicholas Malaya, Aparna Chandramowlishwaran:
SURFNet: Super-Resolution of Turbulent Flows with Transfer Learning using Small Datasets. 331-344 - Sultan Durrani, Muhammad Saad Chughtai, Mert Hidayetoglu, Rashid Tahir, Abdul Dakkak, Lawrence Rauchwerger, Fareed Zaffar, Wen-Mei W. Hwu:
Accelerating Fourier and Number Theoretic Transforms using Tensor Cores and Warp Shuffles. 345-355
manage site settings
To protect your privacy, all features that rely on external API calls from your browser are turned off by default. You need to opt-in for them to become active. All settings here will be stored as cookies with your web browser. For more information see our F.A.Q.