NFDI4DS | UHH-SEMS - Publication Details

Xusheng Chen

ORCID: 0000-0002-2807-9780

Publications

Citations

Views

---

Saved

---

About

Contact & Profiles

A5101527007

Research Areas

Distributed systems and fault tolerance
Cloud Computing and Resource Management
Blockchain Technology Applications and Security
Advanced Neural Network Applications
Advanced Data Storage Technologies
Topic Modeling
Renal cell carcinoma treatment
Caching and Content Delivery
Security and Verification in Computing
Domain Adaptation and Few-Shot Learning
Cryptography and Data Security
Advanced Computational Techniques and Applications
Power Systems and Technologies
Railway Systems and Energy Efficiency
Electric and Hybrid Vehicle Technologies
Age of Information Optimization
Energy Efficient Wireless Sensor Networks
Image Retrieval and Classification Techniques
IoT and Edge/Fog Computing
Parallel Computing and Optimization Techniques
Power Systems and Renewable Energy
Ferroelectric and Negative Capacitance Devices
Smart Grid and Power Systems
Machine Fault Diagnosis Techniques
Ferroptosis and cancer prognosis

Beijing Hua Xin Hospital
2024

Tianjin Medical University Cancer Institute and Hospital
2010-2023

Xiangtan University
2021-2023

University of Hong Kong
2017-2023

Chongqing University
2023

Chinese University of Hong Kong
2018-2020

Beijing Union University
2009-2013

University of Electronic Science and Technology of China
2013

Tianjin People's Hospital
2011

Xinyang Normal University
2007-2009

APUS

OPENALEX - Publications

Cheng Wang Jianyu Jiang Xusheng Chen Ning Yi Heming Cui

State machine replication (SMR) uses Paxos to enforce the same inputs for a program (e.g., Redis) replicated on number of hosts, tolerating various types failures. Unfortunately, traditional protocols incur prohibitive performance overhead server programs due their high consensus latency TCP/IP. Worse, extant increases drastically when more concurrent client connections or hosts are added. This paper presents APUS, first RDMA-based protocol that aims be fast and scalable hosts. APUS...

10.1145/3127479.3128609 article EN 2017-09-24

DeepFlow: Serverless Large Language Model Serving at Scale

OPENALEX - Publications

Junhao Hu Jiang Xu Yulong He Y. Chen Gengyuan Dan and 16 more

This paper introduces DeepFlow, a scalable and serverless AI platform designed to efficiently serve large language models (LLMs) at scale in cloud environments. DeepFlow addresses key challenges such as resource allocation, serving efficiency, cold start latencies through four main design components. First, it uses simple abstraction called the request-job-task model, which helps manage workloads across post-training model tasks. Second, builds an in-house engine FlowServe using...

10.48550/arxiv.2501.14417 preprint EN arXiv (Cornell University) 2025-01-24

Hecate: Unlocking Efficient Sparse Model Training via Fully Sharded Sparse Data Parallelism

OPENALEX - Publications

Yuhao Qing Guannan Zhu Fanxin Li Lei Liang Z. T. Sun and 6 more

Mixture-of-Experts (MoE) has emerged as a promising sparse paradigm for scaling up pre-trained models (PTMs) with remarkable cost-effectiveness. However, the dynamic nature of MoE leads to rapid fluctuations and imbalances in expert loads during training, resulting significant straggler effects that hinder training performance when using parallelism (EP). Existing systems attempt mitigate these through rearrangement strategies, but they face challenges terms memory efficiency timeliness...

10.48550/arxiv.2502.02581 preprint EN arXiv (Cornell University) 2025-02-04

Efficient Long-Decoding Inference with Reasoning-Aware Attention Sparsity

OPENALEX - Publications

Junhao Hu W. Huang Weidong Wang Zhenwen Li Tiancheng Hu and 4 more

Large Language Models (LLMs) have demonstrated strong capabilities across various domains, with recent advancements in challenging reasoning tasks such as mathematics and programming. However, solving often requires long decoding chains (of thoughts), which incur $O(N)$ time memory consumption, where $N$ is the chain length. To mitigate existing sparsity-based algorithms propose retaining only most critical token's intermediate data (i.e., key-value cache) discarding rest. these struggle...

10.48550/arxiv.2502.11147 preprint EN arXiv (Cornell University) 2025-02-16

vPipe: A Virtualized Acceleration System for Achieving Efficient and Scalable Pipeline Parallel DNN Training

OPENALEX - Publications

Shixiong Zhao Fanxin Li Xusheng Chen Xiuxian Guan Jianyu Jiang and 8 more

The increasing computational complexity of DNNs achieved unprecedented successes in various areas such as machine vision and natural language processing (NLP), e.g., the recent advanced Transformer has billions parameters. However, large-scale significantly exceed GPU's physical memory limit, they cannot be trained by conventional methods data parallelism. Pipeline parallelism that partitions a large DNN into small subnets trains them on different GPUs is plausible solution. Unfortunately,...

10.1109/tpds.2021.3094364 article EN cc-by-nc-nd IEEE Transactions on Parallel and Distributed Systems 2021-07-02

Bidl

OPENALEX - Publications

Ji Qi Xusheng Chen Yunpeng Jiang Jianyu Jiang Tianxiang Shen and 6 more

A permissioned blockchain framework typically runs an efficient Byzantine consensus protocol and is attractive to deploy fast trading applications among a large number of mutually untrusted participants (e.g., companies). Unfortunately, all existing frameworks adopt sequential workflows for invoking the executing applications' transactions, making performance these much lower than deploying them in traditional systems in-datacenter stock exchange).

10.1145/3477132.3483574 article EN 2021-10-19

Inference without Interference: Disaggregate LLM Inference for Mixed Downstream Workloads

OPENALEX - Publications

Cunchen Hu Heyang Huang Liangliang Xu Xusheng Chen Jiang Xu and 7 more

Transformer-based large language model (LLM) inference serving is now the backbone of many cloud services. LLM consists a prefill phase and decode phase. However, existing deployment practices often overlook distinct characteristics these phases, leading to significant interference. To mitigate interference, our insight carefully schedule group requests based on their characteristics. We realize this idea in TetriInfer through three pillars. First, it partitions prompts into fixed-size...

10.48550/arxiv.2401.11181 preprint EN cc-by arXiv (Cornell University) 2024-01-01

Achieving low tail-latency and high scalability for serializable transactions in edge computing

OPENALEX - Publications

Xusheng Chen Haoze Song Jianyu Jiang Chaoyi Ruan Cheng Li and 4 more

A distributed database utilizing the wide-spread edge computing servers to provide low-latency data access with serializability guarantee is highly desirable for emerging applications. In an database, nodes are divided into regions, and a transaction can be categorized as intra-region (IRT) or cross-region (CRT) based on whether it accesses in different regions. addition serializability, we insist that practical should low tail latency both IRTs CRTs, such must scalable large number of...

10.1145/3447786.3456238 article EN 2021-04-21

A Geography-Based P2P Overlay Network for Fast and Robust Blockchain Systems

OPENALEX - Publications

Haoran Qiu Tao Ji Shixiong Zhao Xusheng Chen Ji Qi and 2 more

Numerous blockchain systems with various consensus protocols have emerged to achieve high transaction rates (2<inline-formula><tex-math notation="LaTeX">$\sim$</tex-math></inline-formula>10K tps). However, their underlying P2P network primitives constrain further improvements due two problems (i) message redundancy and (ii) long broadcast convergence time. The first problem is caused by the excessive robustness of dominant approach Gossip. All state-of-the-art only tolerate 20-50%...

10.1109/tsc.2022.3189667 article EN IEEE Transactions on Services Computing 2022-01-01

CRONUS: Fault-isolated, Secure and High-performance Heterogeneous Computing for Trusted Execution Environment

OPENALEX - Publications

Jianyu Jiang Ji Qi Tianxiang Shen Xusheng Chen Shixiong Zhao and 5 more

With the trend of processing a large volume sensitive data on PaaS services (e.g., DNN training), TEE architecture that supports general heterogeneous accelerators, enables spatial sharing one accelerator, and enforces strong isolation across accelerators is highly desirable. However, none existing solutions meet all three requirements. In this paper, we propose CRONUS, first achieves crucial The key idea CRONUS to partition computation into isolated enclaves, where each enclave encapsulates...

10.1109/micro56248.2022.00019 article EN 2022-10-01

CaraServe: CPU-Assisted and Rank-Aware LoRA Serving for Generative LLM Inference

OPENALEX - Publications

Suyi Li Hanfeng Lu Tianyuan Wu Minchen Yu Qizhen Weng and 4 more

Pre-trained large language models (LLMs) often need specialization for domain-specific tasks. Low-Rank Adaptation (LoRA) is a popular approach that adapts base model to multiple tasks by adding lightweight trainable adapters. In this paper, we present CaraServe, system efficiently serves many LoRA adapters derived from common model. CaraServe maintains the on GPUs and dynamically loads activated main memory. As GPU loading results in cold-start substantially delays token generation, employs...

10.48550/arxiv.2401.11240 preprint EN cc-by arXiv (Cornell University) 2024-01-01

Improvements in manufacturability, bonding strength, and curing efficiency of a silicone adhesive

OPENALEX - Publications

Ye Yang Jinfeng Xu Xusheng Chen Yang Wang Lina Si and 4 more

10.1016/j.jajp.2024.100243 article EN cc-by-nc-nd Journal of Advanced Joining Processes 2024-08-23

Robot robust object recognition based on fast SURF feature matching

OPENALEX - Publications

Mingfang Du Junzheng Wang Jing Li Haiqing Cao Guangtao Cui and 3 more

The local invariant features SURF (Speeded Up Robust Features) is introduced into the robot visual recognition field to solve scale changes, rotation, perspective changes in illumination and other problems. A Speeded up (SSURF) algorithm proposed meet needs of identification. In SSURF algorithms, main direction determination step modified which make search scope becomes {-α, +α} (0 ≤ α 30°) from original 360 According compressed sensing ideas interest points distribution histogram, space...

10.1109/cac.2013.6775802 article EN Chinese Automation Congress 2013-11-01

Uranus

OPENALEX - Publications

Jianyu Jiang Xusheng Chen TszOn Li Cheng Wang Tianxiang Shen and 4 more

Applications written in Java have strengths to tackle diverse threats public clouds, but these applications are still prone privileged attacks when processing plaintext data. Intel SGX is powerful attacks, and traditional systems rewrite a application's sensitive functions, which process data, using C/C++ API. Although this code-rewrite approach achieves good efficiency small TCB, it requires expert knowledge can be tedious error-prone. To the limitations of rewriting C/C++, recent propose...

10.1145/3320269.3384763 article EN 2020-10-05

Fold3D: Rethinking and Parallelizing Computational and Communicational Tasks in the Training of Large DNN Models

OPENALEX - Publications

Fanxin Li Shixiong Zhao Yuhao Qing Xusheng Chen Xiuxian Guan and 3 more

Training a large DNN (e.g., GPT3) efficiently on commodity clouds is challenging even with the latest 3D parallel training systems Megatron v3.0). In particular, along pipeline parallelism dimension, computational tasks that produce whole DNN's gradients multiple input batches should be concurrently activated; data set of heavy-weight communications (for aggregating accumulated outputs tasks) <italic xmlns:mml="http://www.w3.org/1998/Math/MathML"...

10.1109/tpds.2023.3247883 article EN cc-by IEEE Transactions on Parallel and Distributed Systems 2023-03-20

NASPipe: high performance and reproducible pipeline parallel supernet training via causal synchronous parallelism

OPENALEX - Publications

Shixiong Zhao Fanxin Li Xusheng Chen Tianxiang Shen Li Chen and 4 more

Supernet training, a prevalent and important paradigm in Neural Architecture Search, embeds the whole DNN architecture search space into one monolithic supernet, iteratively activates subset of supernet (i.e., subnet) for fitting each batch data, searches high-quality subnet which meets specific requirements. Although training subnets parallel on multiple GPUs is desirable acceleration, there inherently exists race hazard that concurrent may access same layers. Existing systems support...

10.1145/3503222.3507735 article EN 2022-02-22

Comparison of tyrosine kinase inhibitors in the treatment of metastatic renal cell carcinoma with rhabdoid and sarcomatoid differentiations

OPENALEX - Publications

Kun Wang Pengqiang Duan Xusheng Chen Qing Yang Guowei Feng and 3 more

Abstract Objective To investigate the efficacy of tyrosine kinase inhibitors (TKIs) in treatment metastatic renal cell carcinoma (mRCC) with rhabdoid (mRCC‐R) and sarcomatoid (mRCC‐S) differentiations. Materials Methods In this single‐institutional cohort study, we included patients RCC (RCC‐R) (RCC‐S) differentiation, who were treated TKIs after metastasis at our institute from 2013 to 2021. Patient characteristics, treatments, clinical outcomes recorded analyzed. Results We identified 111...

10.1002/cam4.6081 article EN cc-by Cancer Medicine 2023-06-16

DAENet: Making Strong Anonymity Scale in a Fully Decentralized Network

OPENALEX - Publications

Tianxiang Shen Jianyu Jiang Yunpeng Jiang Xusheng Chen Ji Qi and 4 more

Traditional anonymous networks (e.g., Tor) are vulnerable to traffic analysis attacks that monitor the whole network determine which users communicating. To preserve user anonymity against attacks, emerging mix mess up order of packets through a set centralized and explicit shuffling nodes. However, this design is insecure targeted DoS can completely block these In article, we present <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">DAENet</i>...

10.1109/tdsc.2021.3052831 article EN cc-by IEEE Transactions on Dependable and Secure Computing 2021-02-05

Low differentiated microvascular density and low expression of platelet‐derived growth factor‐BB (PDGF‐BB) predict distant metastasis and poor prognosis in clear cell renal cell carcinoma

OPENALEX - Publications

Lifeng Qi Jun Du Zhenting Zhang Lei Diao Xusheng Chen and 1 more

Objective To examine the prognostic significance of expression platelet‐derived growth factor‐ BB ( PDGF‐BB ) and differentiated microvascular density MVD in patients with clear cell renal carcinoma ccRCC ). Patients Methods We used vascular marker cluster differentiation 34 CD34 to identify tumour blood vessels. The CD was detected by immunohistochemistry IHC tissue microarrays TMAs from 100 ccRCCs . Prognostic effects individual parameters were calculated using C ox regression models...

10.1111/bju.12191 article EN BJU International 2013-07-23

A Fast, General Storage Replication Protocol for Active-Active Virtual Machine Fault Tolerance

OPENALEX - Publications

Cheng Wang Xusheng Chen Zixu Wang Youwei Zhu Heming Cui

Cloud computing enables more and online services deployed in virtual machines (VMs), making fast VM fault tolerance particularly crucial. Unfortunately, despite much effort, achieving remains an open problem. A traditional way to provide is the active-passive approach, which frequently transfers tremendous updated states, including memory storage, of a primary suspended secondary VM. The other emerging namely active-active runs concurrently with primary. Compared active-passive, faster...

10.1109/icpads.2017.00031 article EN 2017-12-01

The Design of Group Decision Support System for Emergency Management

OPENALEX - Publications

Deng Jingyi Xusheng Chen

China has entered a period in which emergencies are of high frequency occurrence. Lack professional knowledge is one the main causes emergency response failure. Therefore, decision support system imperative for reduction disaster losses and efficiency improvement resources allocation. Based on analysis decision-making process, group (E-GDSS) framework designed functions defined include case querying system, assessment so on. A prototype developed context public health emergencies. Practical...

10.1109/isecs.2010.38 article EN 2010-07-01

Efficient and DoS-resistant Consensus for Permissioned Blockchains

OPENALEX - Publications

Xusheng Chen Shixiong Zhao Ji Qi Jianyu Jiang Haoze Song and 8 more

10.1016/j.peva.2021.102244 article EN Performance Evaluation 2021-10-19

Efficient and DoS-resistant Consensus for Permissioned Blockchains

OPENALEX - Publications

Xusheng Chen Shixiong Zhao Ji Qi Jianyu Jiang Haoze Song and 8 more

Existing permissioned blockchain systems designate a fixed and explicit group of committee nodes to run consensus protocol that confirms the same sequence blocks among all nodes. Unfortunately, when such system runs on large scale Internet, these can be easily turned down by denialof- service (DoS) or network partition attacks. Although recent studies proposed scalable BFT protocols larger number nodes, protocols' efficiency drops dramatically only small are attacked.

10.1145/3529113.3529134 article EN ACM SIGMETRICS Performance Evaluation Review 2022-03-22

Coming Soon ...