- Parallel Computing and Optimization Techniques
- Embedded Systems Design Techniques
- Interconnection Networks and Systems
- Advanced Neural Network Applications
- Advanced Memory and Neural Computing
- Ferroelectric and Negative Capacitance Devices
- VLSI and FPGA Design Techniques
- Acute Myeloid Leukemia Research
- Innovative Microfluidic and Catalytic Techniques Innovation
- CCD and CMOS Imaging Sensors
- Green IT and Sustainability
- Advanced Data Storage Technologies
- Protein Degradation and Inhibitors
- Caching and Content Delivery
- Ginseng Biological Effects and Applications
- Low-power high-performance VLSI design
- Traditional Chinese Medicine Analysis
- Algorithms and Data Compression
- IoT and Edge/Fog Computing
- Cloud Computing and Resource Management
- Domain Adaptation and Few-Shot Learning
- 3D Printing in Biomedical Research
- Histone Deacetylase Inhibitors Research
- Video Surveillance and Tracking Methods
- Anomaly Detection Techniques and Applications
First Affiliated Hospital of Zhengzhou University
2021-2025
Brown University
2024-2025
John Brown University
2024-2025
University of Pittsburgh
2021-2024
St. Jude Children's Research Hospital
2020-2024
Swanson Center
2023
Zhengzhou University
2023
University of Maryland, College Park
2023
University of California, Los Angeles
2014-2021
Anqing Normal University
2021
With the recent advancement of multilayer convolutional neural networks (CNN), deep learning has achieved amazing success in many areas, especially visual content understanding and classification. To improve performance energy-efficiency computation-demanding CNN, FPGA-based acceleration emerges as one most attractive alternatives. In this paper we design implement Caffeine, a hardware/software co-designed library to efficiently accelerate entire CNN on FPGAs. First, propose uniformed...
With the recent advancement of multilayer convolutional neural networks (CNNs) and fully connected (FCNs), deep learning has achieved amazing success in many areas, especially visual content understanding classification. To improve performance energy efficiency computation-demanding CNN, FPGA-based acceleration emerges as one most attractive alternatives. In this paper, we design implement Caffeine, a hardware/software co-designed library to efficiently accelerate entire CNN FCN on FPGAs....
With the recent advancement of multilayer convolutional neural networks (CNN), deep learning has achieved amazing success in many areas, especially visual content understanding and classification. To improve performance energy-efficiency computation-demanding CNN, FPGA-based acceleration emerges as one most attractive alternatives. In this paper we design implement Caffeine, a hardware software co-designed library to efficiently accelerate entire CNN on FPGAs. Based portable high-level...
CD8+ cytotoxic T cells (CTLs) orchestrate antitumour immunity and exhibit inherent heterogeneity1,2, with precursor exhausted (Tpex) but not terminally (Tex) capable of responding to existing immunotherapies3-7. The gene regulatory network that underlies CTL differentiation whether Tex cell responses can be functionally reinvigorated are incompletely understood. Here we systematically mapped causal networks using single-cell CRISPR screens in vivo discovered checkpoints for differentiation....
Stencil computation is one of the most important kernels in many application domains such as image processing, solving partial differential equations, and cellular automata. Many stencil are complex, usually consist multiple stages or iterations, often computation-bounded. Such offloaded to FPGAs take advantages efficiency dedicated hardware. However, implementing complex efficiently not trivial, due complicated data dependencies, difficulties programming with RTL, well large design space....
Future processor chips will not be limited by the transistor resources, but mainly constrained energy efficiency. Reconfigurable fabrics bring higher efficiency than CPUs via customized hardware that adapts to user applications. Among different reconfigurable fabrics, coarse-grained arrays (CGRAs) can even more efficient fine-grained FPGAs when bit-level customization is necessary in target CGRAs were originally developed era resources critical Previous work shares among operations modulo...
Oncolyic virotherapy is one of the modern experimental techniques to treat human cancers. Here we studied antitumor activity wild-type Newcastle disease virus (NDV) isolates from Russian migratory birds. We showed that NDV could selectively kill malignant cells without affecting healthy cells. evaluated oncolytic effect 44 in 4 histogenetically different cell lines (HCT116, HeLa, A549, MCF7). The safety was also tested normal peripheral blood mononuclear (PBMC) viability tumor after...
High-level synthesis (HLS) is getting increasing attention from both academia and industry for high-quality high-productivity designs. However, when inferring primitive-type arrays in HLS designs into on-chip memory buffers, commercial tools fail to effectively organize FPGAs' BRAM building blocks realize high-bandwidth data communication; this often leads sub-optimal quality of results. This paper addresses issue via automated buffer restructuring. Specifically, we present three...
Dense matrix multiply (MM) serves as one of the most heavily used kernels in deep learning applications. To cope with high computation demands these applications, heterogeneous architectures featuring both FPGA and dedicated ASIC accelerators have emerged promising platforms. For example, AMD/Xilinx Versal ACAP architecture combines general-purpose CPU cores programmable logic (PL) AI Engine processors (AIE) optimized for AI/ML. An array 400 executing at 1 GHz can theoretically provide up to...
Background In patients with hepatitis B virus (HBV)-related hepatocellular carcinoma (HCC), virus-specific cytotoxic T lymphocytes (CTLs) fail to eliminate HCC cells expressing HBV antigens. As the expression of viral antigen in HBV-associated may decrease allow tumor escape immune attacks, we hypothesized that an surface (HBsAg)-specific affinity-improved-T-cell receptor (TCR) will enable target more effectively than corresponding wild-type-TCR. We also postulated TCR promiscuity can be...
Multi-head self-attention (attention mechanism) has been employed in a variety of fields such as machine translation, language modeling, and image processing due to its superiority feature extraction sequential data analysis. This is benefited from large number parameters sophisticated model architecture behind the attention mechanism. To efficiently deploy mechanism on resource-constrained devices, existing works propose reduce size by building customized smaller or compressing big standard...
With the increase in computation intensity of chip, mismatch between layer shapes and available resource significantly limits utilization chip. Driven by this observation, prior works discuss spatial accelerators or dataflow architecture to maximize throughput. However, using could potentially execution latency. In work, we first systematically investigate two models: (1) sequentially (temporally) launch one monolithic accelerator, (2) spatially multiple accelerators. From observations, find...
As the increasing complexity of Neural Network(NN) models leads to high demands for computation, AMD introduces a heterogeneous programmable system-on-chip (SoC), i.e., Versal ACAP architectures featured with logic(PL), CPUs, and dedicated AI engines (AIE) ASICs which has theoretical throughput up 6.4 TFLOPs FP32, 25.6 TOPs INT16 102.4 INT8. However, higher level makes it non-trivial achieve performance even well-studied applications like matrix-matrix multiply. In this paper, we provide...
While vision transformers (ViTs) have shown consistent progress in computer vision, deploying them for real-time decision-making scenarios (<1 ms) is challenging. Current computing platforms like CPUs, GPUs, or FPGA-based solutions struggle to meet this deterministic low-latency requirement, even with quantized ViT models. Some approaches use pruning sparsity reduce the model size and latency, but often results accuracy loss. To address aforementioned constraints, work, we propose EQ-ViT, an...
Adenosine triphosphate (ATP)-dependent chromatin remodeling SWI/SNF-like BAF and PBAF complexes have been implicated in the regulation of stem cell function cancers. Several subunits or PBAF, including BRG1, BAF53a, BAF45a, BAF180, BAF250a, are known to be involved hematopoiesis. Baf200, a subunit complex, plays pivotal role heart morphogenesis coronary artery angiogenesis. However, little is on importance Baf200 normal malignant Utilizing Tie2-Cre-, Vav-iCre-, Mx1-Cre-mediated gene deletion...
Crystallization of active pharmaceutical ingredients (APIs) is a crucial process in the industry due to its great impact drug efficacy. However, conventional approaches for screening optimal crystallization conditions APIs are usually time-consuming, labor-intensive and expensive. Recently, droplet microfluidic technology has offered an alternative strategy high-throughput conditions. Despite many advantages such as low sample consumption, reduced operation time, increased throughput, etc.,...
Conventionally, DNN models are trained once in the cloud and deployed edge devices such as cars, robots, or unmanned aerial vehicles (UAVs) for real-time inference. However, there many cases that require to adapt new environments, domains, users. In order realize domain adaption personalization, on need be continuously device. this work, we design EF-Train, an efficient training accelerator with a unified channel-level parallelism-based convolution kernel can achieve end-to-end...