Parallel train algorithms for deep neural networks (DNNs) are needed to train substantial data. Deep learning has been rapidly growing since 2006 after the introduction of deep belief nets, which DNNs are initialized by the restricted Boltzmann machin...
Parallel train algorithms for deep neural networks (DNNs) are needed to train substantial data. Deep learning has been rapidly growing since 2006 after the introduction of deep belief nets, which DNNs are initialized by the restricted Boltzmann machine. Deep learning has performed well in a variety of classification problems. Many deep learning applications typically perform better with more data. It takes a lot of time for DNN to train a large data set. As data become large, faster train method is needed.
Many parallel learning algorithms are introducing various approximation to speed up. Stochastic gradient descent (SGD) is the most widely used method for training DNNs. Since SGD is an inherently sequential process, the parallelization of SGD is difficult. Delayed gradient problems occur while sequential processes are parallelized.
To avoid the problem of gradient mismatch due to delayed gradients, we improve Pipelined SGD by storing model parameters of each module regarding to the time delay.
The proposed method showed the speedup of x2.25 using 4-GPU without significant performance degradation for Cifar10 dataset.