Enda Yu

ORCID: 0000-0003-2661-0889
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Advanced Neural Network Applications
  • Stochastic Gradient Optimization Techniques
  • Ferroelectric and Negative Capacitance Devices
  • Advanced Memory and Neural Computing
  • IoT and Edge/Fog Computing
  • Privacy-Preserving Technologies in Data
  • Wireless Body Area Networks
  • Cloud Computing and Resource Management
  • Advanced Optical Network Technologies
  • Sparse and Compressive Sensing Techniques
  • Context-Aware Activity Recognition Systems
  • Software-Defined Networks and 5G
  • Advanced Data Compression Techniques
  • Machine Learning and ELM

National University of Defense Technology
2021-2023

Deep learning's widespread adoption in various fields has made distributed training across multiple computing nodes essential. However, frequent communication between can significantly slow down speed, creating a bottleneck training. To address this issue, researchers are focusing on optimization algorithms for deep learning systems. In paper, we propose standard that systematically classifies all based mathematical modeling, which is not achieved by existing surveys the field. We categorize...

10.1109/tpds.2023.3323282 article EN IEEE Transactions on Parallel and Distributed Systems 2023-10-10

Communication overhead is the key challenge for distributed training. Gradient compression a widely used approach to reduce communication traffic. When combining with parallel mechanism method like pipeline, gradient technique can greatly alleviate impact of overhead. However, there exist two problems be solved. Firstly, brings in extra computation cost, which will delay next training iteration. Secondly, usually leads decrease convergence accuracy. In this paper, we combine quantization and...

10.1145/3472456.3472508 article EN 2021-08-09

Communication overhead is the key challenge for distributed training. Gradient compression a widely used approach to reduce communication traffic. When combining with parallel mechanism method like pipeline, gradient technique can greatly alleviate impact of overhead. However, there exists two problems be solved. Firstly, brings in extra computation cost, which will delay next training iteration. Secondly, usually leads decrease convergence accuracy.

10.48550/arxiv.2106.10796 preprint EN cc-by arXiv (Cornell University) 2021-01-01

Intensive communication and synchronization cost for gradients parameters is the well-known bottleneck of distributed deep learning training. Based on observations that Synchronous SGD (SSGD) obtains good convergence accuracy while asynchronous (ASGD) delivers a faster raw training speed, we propose Several Steps Delay (SSD-SGD) to combine their merits, aiming at tackling via sparsification. SSD-SGD explores both global synchronous updates in parameter servers local workers each periodic...

10.1145/3563038 article EN ACM Transactions on Architecture and Code Optimization 2022-09-14
Coming Soon ...