- Advanced Neural Network Applications
- Stochastic Gradient Optimization Techniques
- Ferroelectric and Negative Capacitance Devices
- Advanced Memory and Neural Computing
- IoT and Edge/Fog Computing
- Privacy-Preserving Technologies in Data
- Wireless Body Area Networks
- Cloud Computing and Resource Management
- Advanced Optical Network Technologies
- Sparse and Compressive Sensing Techniques
- Context-Aware Activity Recognition Systems
- Software-Defined Networks and 5G
- Advanced Data Compression Techniques
- Machine Learning and ELM
National University of Defense Technology
2021-2023
Deep learning's widespread adoption in various fields has made distributed training across multiple computing nodes essential. However, frequent communication between can significantly slow down speed, creating a bottleneck training. To address this issue, researchers are focusing on optimization algorithms for deep learning systems. In paper, we propose standard that systematically classifies all based mathematical modeling, which is not achieved by existing surveys the field. We categorize...
Communication overhead is the key challenge for distributed training. Gradient compression a widely used approach to reduce communication traffic. When combining with parallel mechanism method like pipeline, gradient technique can greatly alleviate impact of overhead. However, there exist two problems be solved. Firstly, brings in extra computation cost, which will delay next training iteration. Secondly, usually leads decrease convergence accuracy. In this paper, we combine quantization and...
Communication overhead is the key challenge for distributed training. Gradient compression a widely used approach to reduce communication traffic. When combining with parallel mechanism method like pipeline, gradient technique can greatly alleviate impact of overhead. However, there exists two problems be solved. Firstly, brings in extra computation cost, which will delay next training iteration. Secondly, usually leads decrease convergence accuracy.
Intensive communication and synchronization cost for gradients parameters is the well-known bottleneck of distributed deep learning training. Based on observations that Synchronous SGD (SSGD) obtains good convergence accuracy while asynchronous (ASGD) delivers a faster raw training speed, we propose Several Steps Delay (SSD-SGD) to combine their merits, aiming at tackling via sparsification. SSD-SGD explores both global synchronous updates in parameter servers local workers each periodic...