Course4 Efficiency
Course4 Efficiency
1
The cost of pre-training LMs
Compute ( )
For a given compute level , there exist an optimal and
Training a bigger model on less data => worse
Training a smaller model on more data => worse
Memory-efficient implementation