- Cloud Computing and Resource Management
- Advanced Neural Network Applications
- Advanced Data Storage Technologies
- IoT and Edge/Fog Computing
- Parallel Computing and Optimization Techniques
- Graph Theory and Algorithms
Alibaba Group (China)
2021-2023
Nowdays, it is prevalent to train deep learning (DL) models in cloud-native platforms that actively leverage containerization and orchestration technologies for high elasticity, low flexible operation cost, many other benefits. However, also faces new challenges our work focusing on those related I/O throughput training, including complex data access with complicated performance tuning, lack of cache capacity specialized hardware match its dynamic requirement, inefficient resource sharing...
Deep learning (DL) is becoming increasingly popular in many domains, including computer vision, speech recognition, self-driving automobiles, etc. GPU can train DL models efficiently but expensive, which motivates users to share resource reduce money costs practice. To ensure efficient sharing among multiple users, it necessary develop management and scheduling solutions. However, existing ones have several shortcomings. First, they require the specify job requirement usually quite...
Nowdays, it is prevalent to train deep learning models in cloud-native platforms that actively leverage containerization and orchestration technologies for high elasticity, low flexible operation cost, many other benefits. However, also faces new challenges our work focusing on those related I/O throughput training, including complex data access, lack of matching dynamic requirement, inefficient resource scheduling across different jobs. We propose <italic...