Asynchronous Mini-Batch Gradient Descent with Variance Reduction for Non-Convex Optimization

Authors

  • Zhouyuan Huo University of Texas at Arlington
  • Heng Huang University of Texas at Arlington

DOI:

https://doi.org/10.1609/aaai.v31i1.10940

Keywords:

Non-Convex Optimization, Asynchronous Mini-batch Gradient Descent, Variance Reduction

Abstract

We provide the first theoretical analysis on the convergence rate of asynchronous mini-batch gradient descent with variance reduction (AsySVRG) for non-convex optimization. Asynchronous stochastic gradient descent (AsySGD) has been broadly used for deep learning optimization, and it is proved to converge with rate of O(1/\sqrt{T}) for non-convex optimization. Recently, variance reduction technique is proposed and it is proved to be able to accelerate the convergence of SGD greatly. It is shown that asynchronous SGD method with variance reduction technique has linear convergence rate when problem is strongly convex. However, there is still no work to analyze the convergence rate of this method for non-convex problem. In this paper, we consider two asynchronous parallel implementations of mini-batch gradient descent method with variance reduction: one is on distributed-memory architecture and the other is on shared-memory architecture. We prove that both methods can converge with a rate of O(1/T) for non-convex optimization, and linear speedup is accessible when we increase the number of workers. We evaluate our methods by optimizing multi-layer neural networks on two real datasets (MNIST and CIFAR-10), and experimental results demonstrate our theoretical analysis.

Downloads

Published

2017-02-13

How to Cite

Huo, Z., & Huang, H. (2017). Asynchronous Mini-Batch Gradient Descent with Variance Reduction for Non-Convex Optimization. Proceedings of the AAAI Conference on Artificial Intelligence, 31(1). https://doi.org/10.1609/aaai.v31i1.10940