Heming Cui

ORCID: 0000-0001-7746-440X
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Distributed systems and fault tolerance
  • Software Testing and Debugging Techniques
  • Parallel Computing and Optimization Techniques
  • Cloud Computing and Resource Management
  • Adversarial Robustness in Machine Learning
  • Advanced Data Storage Technologies
  • Security and Verification in Computing
  • Advanced Malware Detection Techniques
  • Mobile Ad Hoc Networks
  • Blockchain Technology Applications and Security
  • Advanced Neural Network Applications
  • Robotic Path Planning Algorithms
  • Software Engineering Research
  • Software System Performance and Reliability
  • Robotics and Sensor-Based Localization
  • Distributed and Parallel Computing Systems
  • Cloud Data Security Solutions
  • Wireless Networks and Protocols
  • Software Reliability and Analysis Research
  • Energy Efficient Wireless Sensor Networks
  • Robotics and Automated Systems
  • Caching and Content Delivery
  • Embedded Systems Design Techniques
  • Advanced Memory and Neural Computing
  • Domain Adaptation and Few-Shot Learning

University of Hong Kong
2016-2025

Shanghai Artificial Intelligence Laboratory
2023-2024

Chinese University of Hong Kong
2018-2023

Beijing Academy of Artificial Intelligence
2023

Shanghai Zhangjiang Laboratory
2022

Columbia University
2010-2014

Tsinghua University
2008

Multithreaded programs are hard to get right. A key reason is that the contract between developers and runtimes grants exponentially many schedules runtimes. We present Parrot, a simple, practical runtime with new developers. By default, it orders thread synchronizations in well-defined round-robin order, vastly reducing provide determinism (more precisely, deterministic synchronizations) stability (i.e., robustness against input or code perturbations, more useful property than determinism)....

10.1145/2517349.2522735 article EN 2013-10-08

Coverage-guided fuzzing has become mainstream in to automatically expose program vulnerabilities. Recently, a group of fuzzers are proposed adopt random search mechanism namely Havoc, explicitly or implicitly, augment their edge exploration. However, they only tend the default setup Havoc as an implementation option while none them attempts explore its power under diverse setups inspect rationale for potential improvement. In this paper, address such issues, we conduct first empirical study...

10.1145/3510003.3510174 article EN Proceedings of the 44th International Conference on Software Engineering 2022-05-21

Deterministic multithreading (DMT) eliminates many pernicious software problems caused by nondeterminism. It works constraining a program to repeat the same thread interleavings, or schedules, when given input. Despite much recent research, it remains an open challenge build both deterministic and efficient DMT systems for general programs on commodity hardware. To deterministically resolve data race, system must enforce schedule of shared memory accesses, mem-schedule, which can incur...

10.1145/2043556.2043588 article EN 2011-10-23

A deterministic multithreading (DMT) system eliminates nondeterminism in thread scheduling, simplifying the development of multithreaded programs. However, existing DMT systems are unstable; they may force a program to (ad)venture into vastly different schedules even for slightly inputs or execution environments, defeating many benefits determinism. Moreover, few work with server programs whose arrive continuously and nondeterministically.TERN is stable system. The key novelty TERN idea...

10.5555/1924943.1924958 article EN Operating Systems Design and Implementation 2010-10-04

State machine replication (SMR) uses Paxos to enforce the same inputs for a program (e.g., Redis) replicated on number of hosts, tolerating various types failures. Unfortunately, traditional protocols incur prohibitive performance overhead server programs due their high consensus latency TCP/IP. Worse, extant increases drastically when more concurrent client connections or hosts are added. This paper presents APUS, first RDMA-based protocol that aims be fast and scalable hosts. APUS...

10.1145/3127479.3128609 article EN 2017-09-24

Systems code must obey many rules, such as "opened files be closed." One approach to verifying rules is static analysis, but this technique cannot infer precise runtime effects of code, often emitting false positives. An alternative symbolic execution, a that verifies program paths over all inputs up bounded size. However, when applied verify existing execution systems blindly explore redundant while missing relevant ones may contain bugs.

10.1145/2451116.2451152 article EN 2013-03-16

Deep Reinforcement Learning (DRL) suffers from uncertainties and inaccuracies in the observation signal realworld applications. Adversarial attack is an effective method for evaluating robustness of DRL agents. However, existing methods targeting individual sampled actions have limited impacts on overall policy distribution, particularly continuous action spaces. To address these limitations, we propose Distribution-Aware Projected Gradient Descent (DAPGD). DAPGD uses distribution similarity...

10.48550/arxiv.2501.03562 preprint EN arXiv (Cornell University) 2025-01-07

Mixture-of-Experts (MoE) has emerged as a promising sparse paradigm for scaling up pre-trained models (PTMs) with remarkable cost-effectiveness. However, the dynamic nature of MoE leads to rapid fluctuations and imbalances in expert loads during training, resulting significant straggler effects that hinder training performance when using parallelism (EP). Existing systems attempt mitigate these through rearrangement strategies, but they face challenges terms memory efficiency timeliness...

10.48550/arxiv.2502.02581 preprint EN arXiv (Cornell University) 2025-02-04

10.1109/icassp49660.2025.10890540 article EN ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2025-03-12

As the adoption of LLMs becomes more widespread in software coding ecosystems, a pressing issue has emerged: does generated code contain social bias and unfairness, such as those related to age, gender, race? This concerns integrity, fairness, ethical foundation applications that depend on by these models but are underexplored literature. paper presents novel testing framework is specifically designed for generation tasks. Based this framework, we conduct an extensive empirical study biases...

10.1145/3724117 article EN ACM Transactions on Software Engineering and Methodology 2025-03-18

Recently, deep reinforcement learning (DRL) has emerged as a promising approach for robotic control. However, the deployment of DRL in real-world robots is hindered by its sensitivity to environmental perturbations. While existing whitebox adversarial attacks rely on local gradient information and apply uniform perturbations across all states evaluate robustness, they fail account temporal dynamics state-specific vulnerabilities. To combat above challenge, we first conduct theoretical...

10.48550/arxiv.2503.20613 preprint EN arXiv (Cornell University) 2025-03-26

Deep reinforcement learning (DRL) has emerged as a promising approach for robotic control, but its realworld deployment remains challenging due to vulnerability environmental perturbations. Existing white-box adversarial attack methods, adapted from supervised learning, fail effectively target DRL agents they overlook temporal dynamics and indiscriminately perturb all state dimensions, limiting their impact on long-term rewards. To address these challenges, we propose the Adaptive...

10.48550/arxiv.2503.20844 preprint EN arXiv (Cornell University) 2025-03-26

State machine replication (SMR) leverages distributed consensus protocols such as Paxos to keep multiple replicas of a program consistent in face replica failures or network partitions. This fault tolerance is enticing on implementing principled SMR system that replicates general programs, especially server programs demand high availability. Unfortunately, assumes deterministic execution, but most are multithreaded and thus nondeterministic. Moreover, existing systems provide narrow state...

10.1145/2815400.2815427 article EN 2015-10-01

The increasing computational complexity of DNNs achieved unprecedented successes in various areas such as machine vision and natural language processing (NLP), e.g., the recent advanced Transformer has billions parameters. However, large-scale significantly exceed GPU's physical memory limit, they cannot be trained by conventional methods data parallelism. Pipeline parallelism that partitions a large DNN into small subnets trains them on different GPUs is plausible solution. Unfortunately,...

10.1109/tpds.2021.3094364 article EN cc-by-nc-nd IEEE Transactions on Parallel and Distributed Systems 2021-07-02

As a widely-used platform to support various Java-bytecode-based applications, Java Virtual Machine (JVM) incurs severe performance loss caused by its real-time program interpretation mechanism. To tackle this issue, the Just-in- Time compiler (JIT) has been widely adopted strengthen efficacy of JVM. Therefore, how effectively and efficiently detect JIT bugs becomes critical ensure correctness In paper, we propose coverage-guided fuzzing framework, namely JITfuzz, automatically bugs....

10.1109/icse48619.2023.00017 article EN 2023-05-01

Deployed multithreaded applications contain many races because these are difficult to write, test, and debug. Worse, the number of in deployed may drastically increase due rise multicore hardware immaturity current race detectors.LOOM is a live-workaround system designed quickly safely bypass application at runtime. LOOM provides flexible safe language for developers write execution filters that explicitly synchronize code. It then uses an evacuation algorithm install live avoid races....

10.5555/1924943.1924953 article EN Operating Systems Design and Implementation 2010-10-04

A permissioned blockchain framework typically runs an efficient Byzantine consensus protocol and is attractive to deploy fast trading applications among a large number of mutually untrusted participants (e.g., companies). Unfortunately, all existing frameworks adopt sequential workflows for invoking the executing applications' transactions, making performance these much lower than deploying them in traditional systems in-datacenter stock exchange).

10.1145/3477132.3483574 article EN 2021-10-19

Fuzzing nowadays has been commonly modeled as an optimization problem, e.g., maximizing code coverage under a given time budget via typical search-based solutions such evolutionary algorithms. However, are widely argued to cause inefficient computing resource usage, i.e., mutations. To address this issue, two neural program-smoothing-based fuzzers, Neuzz and MTFuzz, have recently proposed approximate program branching behaviors network models, which input byte sequences of seed output...

10.1145/3510003.3510089 article EN Proceedings of the 44th International Conference on Software Engineering 2022-05-21

Parallel programs are known to be difficult analyze. A key reason is that they typically have an enormous number of execution interleavings, or schedules. Static analysis over all schedules requires over-approximations, resulting in poor precision; dynamic rarely covers more than a tiny fraction We propose approach called schedule specialization analyze parallel program only small set for precision, and then enforce these at runtime soundness the static results. build framework C/C++...

10.1145/2254064.2254090 article EN 2012-06-11

Stable multithreading dramatically simplifies the interleaving behaviors of parallel programs, offering new hope for making programming easier.

10.1145/2500875 article EN Communications of the ACM 2014-02-26

Code generation models have increasingly become integral to aiding software development, offering assistance in tasks such as code completion, debugging, and translation. Although current research has thoroughly examined the correctness of produced by models, a vital aspect, i.e., efficiency generated code, often been neglected. This paper presents EffiBench, benchmark with 1,000 efficiency-critical coding problems for assessing models. EffiBench contains diverse set LeetCode problems. Each...

10.48550/arxiv.2402.02037 preprint EN arXiv (Cornell University) 2024-02-03

A distributed database utilizing the wide-spread edge computing servers to provide low-latency data access with serializability guarantee is highly desirable for emerging applications. In an database, nodes are divided into regions, and a transaction can be categorized as intra-region (IRT) or cross-region (CRT) based on whether it accesses in different regions. addition serializability, we insist that practical should low tail latency both IRTs CRTs, such must scalable large number of...

10.1145/3447786.3456238 article EN 2021-04-21

Just like bugs in single-threaded programs can lead to vulnerabilities, multithreaded also concurrency attacks. We studied 31 real-world attacks, including privilege escalations, hijacking code executions, and bypassing security checks. found that compared bugs' traditional consequences (e.g., program crashes), attacks' are often implicit, extremely hard be observed diagnosed by developers. Moreover, addition bug-inducing inputs, extra subtle inputs needed trigger the These features make...

10.1109/dsn.2018.00033 article EN 2018-06-01
Coming Soon ...