Arxiv - 20210823 - Deepak Narayanan - Efficient Large-Scale Language Model Training On GPU Clusters Using Megatron-LM
Arxiv - 20210823 - Deepak Narayanan - Efficient Large-Scale Language Model Training On GPU Clusters Using Megatron-LM
Arxiv - 20210823 - Deepak Narayanan - Efficient Large-Scale Language Model Training On GPU Clusters Using Megatron-LM
Using Megatron-LM
Deepak Narayanan‡★, Mohammad Shoeybi† , Jared Casper† , Patrick LeGresley† ,
Mostofa Patwary† , Vijay Korthikanti† , Dmitri Vainbrand† , Prethvi Kashinkunti† ,
Julie Bernauer† , Bryan Catanzaro† , Amar Phanishayee∗ , Matei Zaharia‡
† NVIDIA ‡ Stanford University ∗ Microsoft Research
ABSTRACT
2 Y Q F I V S J T E V E Q I X I V W
arXiv:2104.04473v5 [cs.CL] 23 Aug 2021
Large language models have led to state-of-the-art accuracies across + 4 8 &