Sparsified SGD with Memory

Stich, Sebastian U.; Cordonnier, Jean-Baptiste; Jaggi, Martin

Computer Science > Machine Learning

arXiv:1809.07599 (cs)

[Submitted on 20 Sep 2018 (v1), last revised 28 Nov 2018 (this version, v2)]

Title:Sparsified SGD with Memory

Authors:Sebastian U. Stich, Jean-Baptiste Cordonnier, Martin Jaggi

View PDF

Abstract:Huge scale machine learning problems are nowadays tackled by distributed optimization algorithms, i.e. algorithms that leverage the compute power of many devices for training. The communication overhead is a key bottleneck that hinders perfect scalability. Various recent works proposed to use quantization or sparsification techniques to reduce the amount of data that needs to be communicated, for instance by only sending the most significant entries of the stochastic gradient (top-k sparsification). Whilst such schemes showed very promising performance in practice, they have eluded theoretical analysis so far.
In this work we analyze Stochastic Gradient Descent (SGD) with k-sparsification or compression (for instance top-k or random-k) and show that this scheme converges at the same rate as vanilla SGD when equipped with error compensation (keeping track of accumulated errors in memory). That is, communication can be reduced by a factor of the dimension of the problem (sometimes even more) whilst still converging at the same rate. We present numerical experiments to illustrate the theoretical findings and the better scalability for distributed applications.

Comments:	to appear at NIPS 2018
Subjects:	Machine Learning (cs.LG); Distributed, Parallel, and Cluster Computing (cs.DC); Data Structures and Algorithms (cs.DS); Machine Learning (stat.ML)
MSC classes:	68W40, 68W15, 90C25, 90C06
ACM classes:	G.1.6; F.2.1; E.4
Cite as:	arXiv:1809.07599 [cs.LG]
	(or arXiv:1809.07599v2 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.1809.07599

Submission history

From: Jean-Baptiste Cordonnier [view email]
[v1] Thu, 20 Sep 2018 13:02:14 UTC (732 KB)
[v2] Wed, 28 Nov 2018 21:13:10 UTC (748 KB)

Computer Science > Machine Learning

Title:Sparsified SGD with Memory

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Sparsified SGD with Memory

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators