Optimizing Multi-GPU Parallelization Strategies for Deep Learning Training

Pal, Saptadeep; Ebrahimi, Eiman; Zulfiqar, Arslan; Fu, Yaosheng; Zhang, Victor; Migacz, Szymon; Nellans, David; Gupta, Puneet

doi:10.1109/MM.2019.2935967

Computer Science > Machine Learning

arXiv:1907.13257 (cs)

[Submitted on 30 Jul 2019]

Title:Optimizing Multi-GPU Parallelization Strategies for Deep Learning Training

Authors:Saptadeep Pal, Eiman Ebrahimi, Arslan Zulfiqar, Yaosheng Fu, Victor Zhang, Szymon Migacz, David Nellans, Puneet Gupta

View PDF

Abstract:Deploying deep learning (DL) models across multiple compute devices to train large and complex models continues to grow in importance because of the demand for faster and more frequent training. Data parallelism (DP) is the most widely used parallelization strategy, but as the number of devices in data parallel training grows, so does the communication overhead between devices. Additionally, a larger aggregate batch size per step leads to statistical efficiency loss, i.e., a larger number of epochs are required to converge to a desired accuracy. These factors affect overall training time and beyond a certain number of devices, the speedup from leveraging DP begins to scale poorly. In addition to DP, each training step can be accelerated by exploiting model parallelism (MP). This work explores hybrid parallelization, where each data parallel worker is comprised of more than one device, across which the model dataflow graph (DFG) is split using MP. We show that at scale, hybrid training will be more effective at minimizing end-to-end training time than exploiting DP alone. We project that for Inception-V3, GNMT, and BigLSTM, the hybrid strategy provides an end-to-end training speedup of at least 26.5%, 8%, and 22% respectively compared to what DP alone can achieve at scale.

Subjects:	Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Distributed, Parallel, and Cluster Computing (cs.DC); Machine Learning (stat.ML)
Cite as:	arXiv:1907.13257 [cs.LG]
	(or arXiv:1907.13257v1 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.1907.13257
Related DOI:	https://doi.org/10.1109/MM.2019.2935967

Submission history

From: Saptadeep Pal [view email]
[v1] Tue, 30 Jul 2019 23:20:50 UTC (1,393 KB)

Computer Science > Machine Learning

Title:Optimizing Multi-GPU Parallelization Strategies for Deep Learning Training

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Optimizing Multi-GPU Parallelization Strategies for Deep Learning Training

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators