Skip to main content

Showing 1–2 of 2 results for author: Lumetta, S

Searching in archive cs. Search in all archives.
.
  1. arXiv:2408.10013  [pdf, other

    cs.DC cs.NE

    TBA: Faster Large Language Model Training Using SSD-Based Activation Offloading

    Authors: Kun Wu, Jeongmin Brian Park, Xiaofan Zhang, Mert Hidayetoğlu, Vikram Sharma Mailthody, Sitao Huang, Steven Sam Lumetta, Wen-mei Hwu

    Abstract: The growth rate of the GPU memory capacity has not been able to keep up with that of the size of large language models (LLMs), hindering the model training process. In particular, activations -- the intermediate tensors produced during forward propagation and reused in backward propagation -- dominate the GPU memory use. To address this challenge, we propose TBA to efficiently offload activations… ▽ More

    Submitted 19 August, 2024; originally announced August 2024.

  2. ASAP: Accelerated Short-Read Alignment on Programmable Hardware

    Authors: Subho S. Banerjee, Mohamed El-Hadedy, Jong Bin Lim, Zbigniew T. Kalbarczyk, Deming Chen, Steve Lumetta, Ravishankar K. Iyer

    Abstract: The proliferation of high-throughput sequencing machines ensures rapid generation of up to billions of short nucleotide fragments in a short period of time. This massive amount of sequence data can quickly overwhelm today's storage and compute infrastructure. This paper explores the use of hardware acceleration to significantly improve the runtime of short-read alignment, a crucial step in preproc… ▽ More

    Submitted 23 May, 2018; v1 submitted 6 March, 2018; originally announced March 2018.