- Distributed systems and fault tolerance
- Cloud Computing and Resource Management
- Blockchain Technology Applications and Security
- Advanced Neural Network Applications
- Advanced Data Storage Technologies
- Topic Modeling
- Renal cell carcinoma treatment
- Caching and Content Delivery
- Security and Verification in Computing
- Domain Adaptation and Few-Shot Learning
- Cryptography and Data Security
- Advanced Computational Techniques and Applications
- Power Systems and Technologies
- Railway Systems and Energy Efficiency
- Electric and Hybrid Vehicle Technologies
- Age of Information Optimization
- Energy Efficient Wireless Sensor Networks
- Image Retrieval and Classification Techniques
- IoT and Edge/Fog Computing
- Parallel Computing and Optimization Techniques
- Power Systems and Renewable Energy
- Ferroelectric and Negative Capacitance Devices
- Smart Grid and Power Systems
- Machine Fault Diagnosis Techniques
- Ferroptosis and cancer prognosis
Beijing Hua Xin Hospital
2024
Tianjin Medical University Cancer Institute and Hospital
2010-2023
Xiangtan University
2021-2023
University of Hong Kong
2017-2023
Chongqing University
2023
Chinese University of Hong Kong
2018-2020
Beijing Union University
2009-2013
University of Electronic Science and Technology of China
2013
Tianjin People's Hospital
2011
Xinyang Normal University
2007-2009
State machine replication (SMR) uses Paxos to enforce the same inputs for a program (e.g., Redis) replicated on number of hosts, tolerating various types failures. Unfortunately, traditional protocols incur prohibitive performance overhead server programs due their high consensus latency TCP/IP. Worse, extant increases drastically when more concurrent client connections or hosts are added. This paper presents APUS, first RDMA-based protocol that aims be fast and scalable hosts. APUS...
This paper introduces DeepFlow, a scalable and serverless AI platform designed to efficiently serve large language models (LLMs) at scale in cloud environments. DeepFlow addresses key challenges such as resource allocation, serving efficiency, cold start latencies through four main design components. First, it uses simple abstraction called the request-job-task model, which helps manage workloads across post-training model tasks. Second, builds an in-house engine FlowServe using...
Mixture-of-Experts (MoE) has emerged as a promising sparse paradigm for scaling up pre-trained models (PTMs) with remarkable cost-effectiveness. However, the dynamic nature of MoE leads to rapid fluctuations and imbalances in expert loads during training, resulting significant straggler effects that hinder training performance when using parallelism (EP). Existing systems attempt mitigate these through rearrangement strategies, but they face challenges terms memory efficiency timeliness...
Large Language Models (LLMs) have demonstrated strong capabilities across various domains, with recent advancements in challenging reasoning tasks such as mathematics and programming. However, solving often requires long decoding chains (of thoughts), which incur $O(N)$ time memory consumption, where $N$ is the chain length. To mitigate existing sparsity-based algorithms propose retaining only most critical token's intermediate data (i.e., key-value cache) discarding rest. these struggle...
The increasing computational complexity of DNNs achieved unprecedented successes in various areas such as machine vision and natural language processing (NLP), e.g., the recent advanced Transformer has billions parameters. However, large-scale significantly exceed GPU's physical memory limit, they cannot be trained by conventional methods data parallelism. Pipeline parallelism that partitions a large DNN into small subnets trains them on different GPUs is plausible solution. Unfortunately,...
A permissioned blockchain framework typically runs an efficient Byzantine consensus protocol and is attractive to deploy fast trading applications among a large number of mutually untrusted participants (e.g., companies). Unfortunately, all existing frameworks adopt sequential workflows for invoking the executing applications' transactions, making performance these much lower than deploying them in traditional systems in-datacenter stock exchange).
Transformer-based large language model (LLM) inference serving is now the backbone of many cloud services. LLM consists a prefill phase and decode phase. However, existing deployment practices often overlook distinct characteristics these phases, leading to significant interference. To mitigate interference, our insight carefully schedule group requests based on their characteristics. We realize this idea in TetriInfer through three pillars. First, it partitions prompts into fixed-size...
A distributed database utilizing the wide-spread edge computing servers to provide low-latency data access with serializability guarantee is highly desirable for emerging applications. In an database, nodes are divided into regions, and a transaction can be categorized as intra-region (IRT) or cross-region (CRT) based on whether it accesses in different regions. addition serializability, we insist that practical should low tail latency both IRTs CRTs, such must scalable large number of...
Numerous blockchain systems with various consensus protocols have emerged to achieve high transaction rates (2<inline-formula><tex-math notation="LaTeX">$\sim$</tex-math></inline-formula>10K tps). However, their underlying P2P network primitives constrain further improvements due two problems (i) message redundancy and (ii) long broadcast convergence time. The first problem is caused by the excessive robustness of dominant approach Gossip. All state-of-the-art only tolerate 20-50%...
With the trend of processing a large volume sensitive data on PaaS services (e.g., DNN training), TEE architecture that supports general heterogeneous accelerators, enables spatial sharing one accelerator, and enforces strong isolation across accelerators is highly desirable. However, none existing solutions meet all three requirements. In this paper, we propose CRONUS, first achieves crucial The key idea CRONUS to partition computation into isolated enclaves, where each enclave encapsulates...
Pre-trained large language models (LLMs) often need specialization for domain-specific tasks. Low-Rank Adaptation (LoRA) is a popular approach that adapts base model to multiple tasks by adding lightweight trainable adapters. In this paper, we present CaraServe, system efficiently serves many LoRA adapters derived from common model. CaraServe maintains the on GPUs and dynamically loads activated main memory. As GPU loading results in cold-start substantially delays token generation, employs...
The local invariant features SURF (Speeded Up Robust Features) is introduced into the robot visual recognition field to solve scale changes, rotation, perspective changes in illumination and other problems. A Speeded up (SSURF) algorithm proposed meet needs of identification. In SSURF algorithms, main direction determination step modified which make search scope becomes {-α, +α} (0 ≤ α 30°) from original 360 According compressed sensing ideas interest points distribution histogram, space...
Applications written in Java have strengths to tackle diverse threats public clouds, but these applications are still prone privileged attacks when processing plaintext data. Intel SGX is powerful attacks, and traditional systems rewrite a application's sensitive functions, which process data, using C/C++ API. Although this code-rewrite approach achieves good efficiency small TCB, it requires expert knowledge can be tedious error-prone. To the limitations of rewriting C/C++, recent propose...
Training a large DNN (e.g., GPT3) efficiently on commodity clouds is challenging even with the latest 3D parallel training systems Megatron v3.0). In particular, along pipeline parallelism dimension, computational tasks that produce whole DNN's gradients multiple input batches should be concurrently activated; data set of heavy-weight communications (for aggregating accumulated outputs tasks) <italic xmlns:mml="http://www.w3.org/1998/Math/MathML"...
Supernet training, a prevalent and important paradigm in Neural Architecture Search, embeds the whole DNN architecture search space into one monolithic supernet, iteratively activates subset of supernet (i.e., subnet) for fitting each batch data, searches high-quality subnet which meets specific requirements. Although training subnets parallel on multiple GPUs is desirable acceleration, there inherently exists race hazard that concurrent may access same layers. Existing systems support...
Abstract Objective To investigate the efficacy of tyrosine kinase inhibitors (TKIs) in treatment metastatic renal cell carcinoma (mRCC) with rhabdoid (mRCC‐R) and sarcomatoid (mRCC‐S) differentiations. Materials Methods In this single‐institutional cohort study, we included patients RCC (RCC‐R) (RCC‐S) differentiation, who were treated TKIs after metastasis at our institute from 2013 to 2021. Patient characteristics, treatments, clinical outcomes recorded analyzed. Results We identified 111...
Traditional anonymous networks (e.g., Tor) are vulnerable to traffic analysis attacks that monitor the whole network determine which users communicating. To preserve user anonymity against attacks, emerging mix mess up order of packets through a set centralized and explicit shuffling nodes. However, this design is insecure targeted DoS can completely block these In article, we present <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">DAENet</i>...
Objective To examine the prognostic significance of expression platelet‐derived growth factor‐ BB ( PDGF‐BB ) and differentiated microvascular density MVD in patients with clear cell renal carcinoma ccRCC ). Patients Methods We used vascular marker cluster differentiation 34 CD34 to identify tumour blood vessels. The CD was detected by immunohistochemistry IHC tissue microarrays TMAs from 100 ccRCCs . Prognostic effects individual parameters were calculated using C ox regression models...
Cloud computing enables more and online services deployed in virtual machines (VMs), making fast VM fault tolerance particularly crucial. Unfortunately, despite much effort, achieving remains an open problem. A traditional way to provide is the active-passive approach, which frequently transfers tremendous updated states, including memory storage, of a primary suspended secondary VM. The other emerging namely active-active runs concurrently with primary. Compared active-passive, faster...
China has entered a period in which emergencies are of high frequency occurrence. Lack professional knowledge is one the main causes emergency response failure. Therefore, decision support system imperative for reduction disaster losses and efficiency improvement resources allocation. Based on analysis decision-making process, group (E-GDSS) framework designed functions defined include case querying system, assessment so on. A prototype developed context public health emergencies. Practical...
Existing permissioned blockchain systems designate a fixed and explicit group of committee nodes to run consensus protocol that confirms the same sequence blocks among all nodes. Unfortunately, when such system runs on large scale Internet, these can be easily turned down by denialof- service (DoS) or network partition attacks. Although recent studies proposed scalable BFT protocols larger number nodes, protocols' efficiency drops dramatically only small are attacked.