- Parallel Computing and Optimization Techniques
- Distributed and Parallel Computing Systems
- Advanced Data Storage Technologies
- Distributed systems and fault tolerance
- Software Testing and Debugging Techniques
- Interconnection Networks and Systems
- Cloud Computing and Resource Management
- Radiation Effects in Electronics
- Software Engineering Research
- Embedded Systems Design Techniques
- Security and Verification in Computing
- Advanced Malware Detection Techniques
- Animal Vocal Communication and Behavior
- Image Enhancement Techniques
- Graph Theory and Algorithms
- Advanced Image Fusion Techniques
- Video Surveillance and Tracking Methods
- Blind Source Separation Techniques
- Advanced Data Compression Techniques
- Radiation Therapy and Dosimetry
- Hand Gesture Recognition Systems
- Wildlife-Road Interactions and Conservation
- Underwater Vehicles and Communication Systems
- Software System Performance and Reliability
- Spam and Phishing Detection
National University of Defense Technology
2015-2024
South China University of Technology
2023-2024
University of Electronic Science and Technology of China
2022
China University of Mining and Technology
2019
University of Maryland, College Park
2018
In this paper, we describe our experiment developing an implementation of the Linpack benchmark for TianHe-1, a petascale CPU/GPU supercomputer system, largest GPU-accelerated system ever attempted before. An adaptive optimization framework is presented to balance workload distribution across GPUs and CPUs with negligible runtime overhead, resulting in better performance than static or training partitioning methods. The CPU-GPU communication overhead effectively hidden by software pipelining...
Multithreaded programs execute nondeterministically on conventional architectures and operating systems. This complicates many tasks, including debugging testing. Deterministic multithreading (DMT) makes the output of a multithreaded program depend its inputs only, which can totally solve above problem. However, current DMT implementations suffer from common inefficiency: they use frequent global barriers to enforce deterministic ordering memory accesses. In this paper, we eliminate that...
Bird species detection is critical for applications such as the analysis of bird population dynamics and diversity. However, this task remains challenging due to local structural similarities class imbalances among species. Currently, most deep learning algorithms focus on designing feature extraction modules while ignoring importance global information. information essential accurate detection. To address limitation, we propose BSD-Net, a network. BSD-Net efficiently learns in pixels...
Facing the challenges of next generation exascale computing, National University Defense Technology has developed a prototype system to explore opportunities, solutions, and limits toward Tianhe system. This paper briefly introduces system, which is deployed at Supercomputer Center in Tianjin theoretical peak performance 3.15 Pflops. A total 512 compute nodes are found where each node three proprietary CPUs called Matrix-2000+. The memory 98.3 TB, storage 1.4 PB total.
Responses of auditory cortical neurons encode sound features incoming acoustic stimuli and also are shaped by stimulus context history. Previous studies mammalian cortex have reported a variable time course for such contextual effects ranging from milliseconds to minutes. However, in secondary forebrain areas songbirds, long-term stimulus-specific neuronal habituation can persist much longer periods time, hours days. Such the songbird is form memory that requires gene expression. Although...
Recent research has sought to improve fuzzing performance via parallel computing. However, researchers focus on improving efficiency while ignoring the increasing cost of testing resources. Parallel in distributed environment amplifies resource-wasting problem caused by random nature fuzzing. In mode, owing lack an appropriate task dispatching scheme and timely status synchronization among different instances, conflicts workload imbalance occur, making severe. this paper, we design...
Data races hidden in concurrent programs have caused severe failures. To improve the reliability, many race detectors are proposed. However, most of reported not harmful, which consumes manual effort to identify harmful races. This paper proposes RaceChecker that can detect potential and effectively efficiently. Unlike previous detectors, combines happens-before relation ad-hoc synchronization prune infeasible so fewer required be verified. Before verification, groups remaining races,...
Natural sounds such as vocalizations often have covarying acoustic attributes, resulting in redundancy neural coding. The efficient coding hypothesis proposes that sensory systems are able to detect covariation and adapt reduce redundancy, leading more Recent psychoacoustic studies shown the auditory system can rapidly efficiently encode two dimensions a single dimension, following passive exposure which temporal spectral attributes covaried correlated fashion. However, these observed cost...
Multithreaded programs execute nondeterministically on conventional architectures and operating systems. This complicates many tasks, including debugging testing. Deterministic multithreading (DMT) makes the output of a multithreaded program depend its inputs only, which can totally solve above problem. However, current DMT implementations suffer from common inefficiency: they use frequent global barriers to enforce deterministic ordering memory accesses. In this paper, we eliminate that...
Summary Concurrency bugs, such as atomicity‐violation are difficult to detect due the uncertainty of thread‐scheduling. It is particularly conduct a thorough bug fix when an can be triggered by different buggy interleavings. This paper proposes prediction‐based approach comprehensively bugs. A incomplete developer cannot have all Based on candidate interleavings, this predict unmanifested bugs from non‐buggy execution and display interleavings for same assist fix. We use monitored record...
Current deterministic systems generally incur large overhead due to the difficulty of detecting and eliminating data races. This paper presents RaceFree, a novel multi-threading runtime that adopts relaxed model provide data-race-free environment for parallel programs. cuts off unnecessary shared-memory communication by isolating threads in separated memories, which eliminates direct Meanwhile, we leverage happen-before relation defined applications themselves as one-way pipes perform...