Kai Lu

ORCID: 0000-0003-2284-7897
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Parallel Computing and Optimization Techniques
  • Distributed and Parallel Computing Systems
  • Advanced Data Storage Technologies
  • Distributed systems and fault tolerance
  • Software Testing and Debugging Techniques
  • Interconnection Networks and Systems
  • Cloud Computing and Resource Management
  • Radiation Effects in Electronics
  • Software Engineering Research
  • Embedded Systems Design Techniques
  • Security and Verification in Computing
  • Advanced Malware Detection Techniques
  • Animal Vocal Communication and Behavior
  • Image Enhancement Techniques
  • Graph Theory and Algorithms
  • Advanced Image Fusion Techniques
  • Video Surveillance and Tracking Methods
  • Blind Source Separation Techniques
  • Advanced Data Compression Techniques
  • Radiation Therapy and Dosimetry
  • Hand Gesture Recognition Systems
  • Wildlife-Road Interactions and Conservation
  • Underwater Vehicles and Communication Systems
  • Software System Performance and Reliability
  • Spam and Phishing Detection

National University of Defense Technology
2015-2024

South China University of Technology
2023-2024

University of Electronic Science and Technology of China
2022

China University of Mining and Technology
2019

University of Maryland, College Park
2018

In this paper, we describe our experiment developing an implementation of the Linpack benchmark for TianHe-1, a petascale CPU/GPU supercomputer system, largest GPU-accelerated system ever attempted before. An adaptive optimization framework is presented to balance workload distribution across GPUs and CPUs with negligible runtime overhead, resulting in better performance than static or training partitioning methods. The CPU-GPU communication overhead effectively hidden by software pipelining...

10.1109/cluster.2010.12 article EN 2010-09-01

Multithreaded programs execute nondeterministically on conventional architectures and operating systems. This complicates many tasks, including debugging testing. Deterministic multithreading (DMT) makes the output of a multithreaded program depend its inputs only, which can totally solve above problem. However, current DMT implementations suffer from common inefficiency: they use frequent global barriers to enforce deterministic ordering memory accesses. In this paper, we eliminate that...

10.1145/2555243.2555252 article EN 2014-02-06

Bird species detection is critical for applications such as the analysis of bird population dynamics and diversity. However, this task remains challenging due to local structural similarities class imbalances among species. Currently, most deep learning algorithms focus on designing feature extraction modules while ignoring importance global information. information essential accurate detection. To address limitation, we propose BSD-Net, a network. BSD-Net efficiently learns in pixels...

10.3390/s25010291 article EN cc-by Sensors 2025-01-06

Facing the challenges of next generation exascale computing, National University Defense Technology has developed a prototype system to explore opportunities, solutions, and limits toward Tianhe system. This paper briefly introduces system, which is deployed at Supercomputer Center in Tianjin theoretical peak performance 3.15 Pflops. A total 512 compute nodes are found where each node three proprietary CPUs called Matrix-2000+. The memory 98.3 TB, storage 1.4 PB total.

10.26599/tst.2020.9010009 article EN Tsinghua Science & Technology 2020-10-12

10.1016/j.jpdc.2012.02.008 article EN Journal of Parallel and Distributed Computing 2012-02-18

Responses of auditory cortical neurons encode sound features incoming acoustic stimuli and also are shaped by stimulus context history. Previous studies mammalian cortex have reported a variable time course for such contextual effects ranging from milliseconds to minutes. However, in secondary forebrain areas songbirds, long-term stimulus-specific neuronal habituation can persist much longer periods time, hours days. Such the songbird is form memory that requires gene expression. Although...

10.1523/jneurosci.2118-18.2018 article EN cc-by-nc-sa Journal of Neuroscience 2018-09-28

10.1007/s10766-014-0304-y article EN International Journal of Parallel Programming 2014-02-21

Recent research has sought to improve fuzzing performance via parallel computing. However, researchers focus on improving efficiency while ignoring the increasing cost of testing resources. Parallel in distributed environment amplifies resource-wasting problem caused by random nature fuzzing. In mode, owing lack an appropriate task dispatching scheme and timely status synchronization among different instances, conflicts workload imbalance occur, making severe. this paper, we design...

10.1109/tse.2022.3219520 article EN cc-by IEEE Transactions on Software Engineering 2022-11-04

Data races hidden in concurrent programs have caused severe failures. To improve the reliability, many race detectors are proposed. However, most of reported not harmful, which consumes manual effort to identify harmful races. This paper proposes RaceChecker that can detect potential and effectively efficiently. Unlike previous detectors, combines happens-before relation ad-hoc synchronization prune infeasible so fewer required be verified. Before verification, groups remaining races,...

10.1109/pdp.2015.19 article EN 2015-03-01

Natural sounds such as vocalizations often have covarying acoustic attributes, resulting in redundancy neural coding. The efficient coding hypothesis proposes that sensory systems are able to detect covariation and adapt reduce redundancy, leading more Recent psychoacoustic studies shown the auditory system can rapidly efficiently encode two dimensions a single dimension, following passive exposure which temporal spectral attributes covaried correlated fashion. However, these observed cost...

10.1523/jneurosci.0141-19.2019 article EN cc-by-nc-sa Journal of Neuroscience 2019-09-13

Multithreaded programs execute nondeterministically on conventional architectures and operating systems. This complicates many tasks, including debugging testing. Deterministic multithreading (DMT) makes the output of a multithreaded program depend its inputs only, which can totally solve above problem. However, current DMT implementations suffer from common inefficiency: they use frequent global barriers to enforce deterministic ordering memory accesses. In this paper, we eliminate that...

10.1145/2692916.2555252 article EN ACM SIGPLAN Notices 2014-02-06

Summary Concurrency bugs, such as atomicity‐violation are difficult to detect due the uncertainty of thread‐scheduling. It is particularly conduct a thorough bug fix when an can be triggered by different buggy interleavings. This paper proposes prediction‐based approach comprehensively bugs. A incomplete developer cannot have all Based on candidate interleavings, this predict unmanifested bugs from non‐buggy execution and display interleavings for same assist fix. We use monitored record...

10.1002/cpe.5160 article EN Concurrency and Computation Practice and Experience 2019-02-20

Current deterministic systems generally incur large overhead due to the difficulty of detecting and eliminating data races. This paper presents RaceFree, a novel multi-threading runtime that adopts relaxed model provide data-race-free environment for parallel programs. cuts off unnecessary shared-memory communication by isolating threads in separated memories, which eliminates direct Meanwhile, we leverage happen-before relation defined applications themselves as one-way pipes perform...

10.1145/2442516.2442553 article EN 2013-02-23

10.1007/s11432-015-0203-2 article EN Science China Information Sciences 2016-10-13
Coming Soon ...