Is Local SGD Better than Minibatch SGD?

Blake Woodworth, Kumar Kshitij Patel, Sebastian Stich, Zhen Dai, Brian Bullins, Brendan Mcmahan, Ohad Shamir, Nathan Srebro
Proceedings of the 37th International Conference on Machine Learning, PMLR 119:10334-10343, 2020.

Abstract

We study local SGD (also known as parallel SGD and federated SGD), a natural and frequently used distributed optimization method. Its theoretical foundations are currently lacking and we highlight how all existing error guarantees in the convex setting are dominated by a simple baseline, minibatch SGD. (1) For quadratic objectives we prove that local SGD strictly dominates minibatch SGD and that accelerated local SGD is minmax optimal for quadratics; (2) For general convex objectives we provide the first guarantee that at least \emph{sometimes} improves over minibatch SGD, but our guarantee does not always improve over, nor even match, minibatch SGD; (3) We show that indeed local SGD does \emph{not} dominate minibatch SGD by presenting a lower bound on the performance of local SGD that is worse than the minibatch SGD guarantee.

Cite this Paper


BibTeX
@InProceedings{pmlr-v119-woodworth20a, title = {Is Local {SGD} Better than Minibatch {SGD}?}, author = {Woodworth, Blake and Patel, Kumar Kshitij and Stich, Sebastian and Dai, Zhen and Bullins, Brian and Mcmahan, Brendan and Shamir, Ohad and Srebro, Nathan}, booktitle = {Proceedings of the 37th International Conference on Machine Learning}, pages = {10334--10343}, year = {2020}, editor = {III, Hal Daumé and Singh, Aarti}, volume = {119}, series = {Proceedings of Machine Learning Research}, month = {13--18 Jul}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v119/woodworth20a/woodworth20a.pdf}, url = {https://proceedings.mlr.press/v119/woodworth20a.html}, abstract = {We study local SGD (also known as parallel SGD and federated SGD), a natural and frequently used distributed optimization method. Its theoretical foundations are currently lacking and we highlight how all existing error guarantees in the convex setting are dominated by a simple baseline, minibatch SGD. (1) For quadratic objectives we prove that local SGD strictly dominates minibatch SGD and that accelerated local SGD is minmax optimal for quadratics; (2) For general convex objectives we provide the first guarantee that at least \emph{sometimes} improves over minibatch SGD, but our guarantee does not always improve over, nor even match, minibatch SGD; (3) We show that indeed local SGD does \emph{not} dominate minibatch SGD by presenting a lower bound on the performance of local SGD that is worse than the minibatch SGD guarantee.} }
Endnote
%0 Conference Paper %T Is Local SGD Better than Minibatch SGD? %A Blake Woodworth %A Kumar Kshitij Patel %A Sebastian Stich %A Zhen Dai %A Brian Bullins %A Brendan Mcmahan %A Ohad Shamir %A Nathan Srebro %B Proceedings of the 37th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2020 %E Hal Daumé III %E Aarti Singh %F pmlr-v119-woodworth20a %I PMLR %P 10334--10343 %U https://proceedings.mlr.press/v119/woodworth20a.html %V 119 %X We study local SGD (also known as parallel SGD and federated SGD), a natural and frequently used distributed optimization method. Its theoretical foundations are currently lacking and we highlight how all existing error guarantees in the convex setting are dominated by a simple baseline, minibatch SGD. (1) For quadratic objectives we prove that local SGD strictly dominates minibatch SGD and that accelerated local SGD is minmax optimal for quadratics; (2) For general convex objectives we provide the first guarantee that at least \emph{sometimes} improves over minibatch SGD, but our guarantee does not always improve over, nor even match, minibatch SGD; (3) We show that indeed local SGD does \emph{not} dominate minibatch SGD by presenting a lower bound on the performance of local SGD that is worse than the minibatch SGD guarantee.
APA
Woodworth, B., Patel, K.K., Stich, S., Dai, Z., Bullins, B., Mcmahan, B., Shamir, O. & Srebro, N.. (2020). Is Local SGD Better than Minibatch SGD?. Proceedings of the 37th International Conference on Machine Learning, in Proceedings of Machine Learning Research 119:10334-10343 Available from https://proceedings.mlr.press/v119/woodworth20a.html.

Related Material