Hybrid Distributed Stochastic Gradient Descent for Federated Learning

Thumbnail Image
Lin, Xiaofeng
Federated Learning
With the advancement of information technology in the past decades, the world embraces the era of 'Big Data', in which large volumes of data are being produced in high velocity, while there is an increasing demand in processing these data. Such environment sets up a perfect playground for deep learning, which is able to utilize the large volumes of data to achieve various tasks. However, as both the volumes of data and the complexity of neural network architecture rises, it becomes increasingly expensive to train the model on a single machine. Federated learning becomes a hot research topic in recent years, which decentralizes the conventional deep learning architecture by distributing both data storage and/or computation operations to multiple machines, while it requires no exchange of information about the local training data so that the data privacy is preserved. In the literature, two different transmission approaches for federated learning, analog-based transmissions and digital-based transmissions, were studied and it was shown that the analog-based approach considerably outperforms the digital-based approach by utilizing the waveform superposition principle of the wireless access medium. In this thesis, we propose the Hybrid Distributed Stochastic Gradient Descent (Hybrid DSGD), a training scheme for federated learning which utilizes the advantages of both digital and analog transmissions to reduce communication overhead and latency. We demonstrate why the conventional analog-based transmission schemes perform poorly when the number of workers participating the training and/or the power available for each worker are restricted. We then explain how our scheme addresses such issue. We will show through experiments that the hybrid DSGD is able to outperform the conventional analog-based transmission scheme under such circumstance.
External DOI