NFDI4DS | UHH-SEMS - Publication Details

Peipei Zhou

ORCID: 0000-0002-0493-1844

Publications

Citations

Views

---

Saved

---

About

Contact & Profiles

A5063866156

Research Areas

Parallel Computing and Optimization Techniques
Embedded Systems Design Techniques
Interconnection Networks and Systems
Advanced Neural Network Applications
Advanced Memory and Neural Computing
Ferroelectric and Negative Capacitance Devices
VLSI and FPGA Design Techniques
Acute Myeloid Leukemia Research
Innovative Microfluidic and Catalytic Techniques Innovation
CCD and CMOS Imaging Sensors
Green IT and Sustainability
Advanced Data Storage Technologies
Protein Degradation and Inhibitors
Caching and Content Delivery
Ginseng Biological Effects and Applications
Low-power high-performance VLSI design
Traditional Chinese Medicine Analysis
Algorithms and Data Compression
IoT and Edge/Fog Computing
Cloud Computing and Resource Management
Domain Adaptation and Few-Shot Learning
3D Printing in Biomedical Research
Histone Deacetylase Inhibitors Research
Video Surveillance and Tracking Methods
Anomaly Detection Techniques and Applications

First Affiliated Hospital of Zhengzhou University
2021-2025

Brown University
2024-2025

John Brown University
2024-2025

University of Pittsburgh
2021-2024

St. Jude Children's Research Hospital
2020-2024

Swanson Center
2023

Zhengzhou University
2023

University of Maryland, College Park
2023

University of California, Los Angeles
2014-2021

Anqing Normal University
2021

Caffeine

OPENALEX - Publications

Chen Zhang Zhenman Fang Peipei Zhou Peichen Pan Jason Cong

With the recent advancement of multilayer convolutional neural networks (CNN), deep learning has achieved amazing success in many areas, especially visual content understanding and classification. To improve performance energy-efficiency computation-demanding CNN, FPGA-based acceleration emerges as one most attractive alternatives. In this paper we design implement Caffeine, a hardware/software co-designed library to efficiently accelerate entire CNN on FPGAs. First, propose uniformed...

10.1145/2966986.2967011 article EN 2016-10-18

Caffeine: Toward Uniformed Representation and Acceleration for Deep Convolutional Neural Networks

OPENALEX - Publications

Chen Zhang Guangyu Sun Zhenman Fang Peipei Zhou Peichen Pan and 1 more

With the recent advancement of multilayer convolutional neural networks (CNNs) and fully connected (FCNs), deep learning has achieved amazing success in many areas, especially visual content understanding classification. To improve performance energy efficiency computation-demanding CNN, FPGA-based acceleration emerges as one most attractive alternatives. In this paper, we design implement Caffeine, a hardware/software co-designed library to efficiently accelerate entire CNN FCN on FPGAs....

10.1109/tcad.2017.2785257 article EN publisher-specific-oa IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 2018-10-18

Caffeine: Towards Uniformed Representation and Acceleration for Deep Convolutional Neural Networks

OPENALEX - Publications

Chen Zhang Guangyu Sun Zhenman Fang Peipei Zhou Jason Cong

With the recent advancement of multilayer convolutional neural networks (CNN), deep learning has achieved amazing success in many areas, especially visual content understanding and classification. To improve performance energy-efficiency computation-demanding CNN, FPGA-based acceleration emerges as one most attractive alternatives. In this paper we design implement Caffeine, a hardware software co-designed library to efficiently accelerate entire CNN on FPGAs. Based portable high-level...

10.1145/3603165.3607390 article EN 2023-07-28

Single-cell CRISPR screens in vivo map T cell fate regulomes in cancer

OPENALEX - Publications

Peipei Zhou Hao Shi Hongling Huang Xiang Sun Sujing Yuan and 7 more

CD8+ cytotoxic T cells (CTLs) orchestrate antitumour immunity and exhibit inherent heterogeneity1,2, with precursor exhausted (Tpex) but not terminally (Tex) capable of responding to existing immunotherapies3-7. The gene regulatory network that underlies CTL differentiation whether Tex cell responses can be functionally reinvigorated are incompletely understood. Here we systematically mapped causal networks using single-cell CRISPR screens in vivo discovered checkpoints for differentiation....

10.1038/s41586-023-06733-x article EN cc-by Nature 2023-11-15

SODA

OPENALEX - Publications

Yuze Chi Jason Cong Peng Wei Peipei Zhou

Stencil computation is one of the most important kernels in many application domains such as image processing, solving partial differential equations, and cellular automata. Many stencil are complex, usually consist multiple stages or iterations, often computation-bounded. Such offloaded to FPGAs take advantages efficiency dedicated hardware. However, implementing complex efficiently not trivial, due complicated data dependencies, difficulties programming with RTL, well large design space....

10.1145/3240765.3240850 article EN 2018-11-05

A Fully Pipelined and Dynamically Composable Architecture of CGRA

OPENALEX - Publications

Jason Cong Hui Huang Chiyuan Ma Bingjun Xiao Peipei Zhou

Future processor chips will not be limited by the transistor resources, but mainly constrained energy efficiency. Reconfigurable fabrics bring higher efficiency than CPUs via customized hardware that adapts to user applications. Among different reconfigurable fabrics, coarse-grained arrays (CGRAs) can even more efficient fine-grained FPGAs when bit-level customization is necessary in target CGRAs were originally developed era resources critical Previous work shares among operations modulo...

10.1109/fccm.2014.12 article EN 2014-05-01

Oncolytic effect of wild-type Newcastle disease virus isolates in cancer cell lines in vitro and in vivo on xenograft model

OPENALEX - Publications

Kseniya S. Yurchenko Peipei Zhou А. В. Ковнер E. L. Zavjalov Л. В. Шестопалова and 1 more

Oncolyic virotherapy is one of the modern experimental techniques to treat human cancers. Here we studied antitumor activity wild-type Newcastle disease virus (NDV) isolates from Russian migratory birds. We showed that NDV could selectively kill malignant cells without affecting healthy cells. evaluated oncolytic effect 44 in 4 histogenetically different cell lines (HCT116, HeLa, A549, MCF7). The safety was also tested normal peripheral blood mononuclear (PBMC) viability tumor after...

10.1371/journal.pone.0195425 article EN cc-by PLoS ONE 2018-04-05

Bandwidth Optimization Through On-Chip Memory Restructuring for HLS

OPENALEX - Publications

Jason Cong Peng Wei Cody Hao Yu Peipei Zhou

High-level synthesis (HLS) is getting increasing attention from both academia and industry for high-quality high-productivity designs. However, when inferring primitive-type arrays in HLS designs into on-chip memory buffers, commercial tools fail to effectively organize FPGAs' BRAM building blocks realize high-bandwidth data communication; this often leads sub-optimal quality of results. This paper addresses issue via automated buffer restructuring. Specifically, we present three...

10.1145/3061639.3062208 article EN 2017-06-13

CHARM: C omposing H eterogeneous A ccele R ators for M atrix Multiply on Versal ACAP Architecture

OPENALEX - Publications

Jinming Zhuang Jason Lau Hanchen Ye Zhuoping Yang Yubo Du and 8 more

Dense matrix multiply (MM) serves as one of the most heavily used kernels in deep learning applications. To cope with high computation demands these applications, heterogeneous architectures featuring both FPGA and dedicated ASIC accelerators have emerged promising platforms. For example, AMD/Xilinx Versal ACAP architecture combines general-purpose CPU cores programmable logic (PL) AI Engine processors (AIE) optimized for AI/ML. An array 400 executing at 1 GHz can theoretically provide up to...

10.1145/3543622.3573210 article EN cc-by-nc-sa 2023-02-10

In vivo therapeutic effects of affinity-improved-TCR engineered T-cells on HBV-related hepatocellular carcinoma

OPENALEX - Publications

Qi Liu Ye Tian Yanyan Li Wei Zhang Wenxuan Cai and 7 more

Background In patients with hepatitis B virus (HBV)-related hepatocellular carcinoma (HCC), virus-specific cytotoxic T lymphocytes (CTLs) fail to eliminate HCC cells expressing HBV antigens. As the expression of viral antigen in HBV-associated may decrease allow tumor escape immune attacks, we hypothesized that an surface (HBsAg)-specific affinity-improved-T-cell receptor (TCR) will enable target more effectively than corresponding wild-type-TCR. We also postulated TCR promiscuity can be...

10.1136/jitc-2020-001748 article EN cc-by-nc Journal for ImmunoTherapy of Cancer 2020-12-01

Algorithm-hardware Co-design of Attention Mechanism on FPGA Devices

OPENALEX - Publications

Xinyi Zhang Yawen Wu Peipei Zhou Xulong Tang Jingtong Hu

Multi-head self-attention (attention mechanism) has been employed in a variety of fields such as machine translation, language modeling, and image processing due to its superiority feature extraction sequential data analysis. This is benefited from large number parameters sophisticated model architecture behind the attention mechanism. To efficiently deploy mechanism on resource-constrained devices, existing works propose reduce size by building customized smaller or compressing big standard...

10.1145/3477002 article EN ACM Transactions on Embedded Computing Systems 2021-09-17

Development and comparison of three 89Zr-labeled anti-CLDN18.2 antibodies to noninvasively evaluate CLDN18.2 expression in gastric cancer: a preclinical study

OPENALEX - Publications

Guilan Hu Wenjia Zhu Yu Liu Yuan Wang Zheng Zhang and 6 more

10.1007/s00259-022-05739-3 article EN European Journal of Nuclear Medicine and Molecular Imaging 2022-03-26

Integrated pharmacokinetics and pharmacometabolomics to reveal the synergistic mechanism of a multicomponent Chinese patent medicine, Mailuo Shutong pills against thromboangiitis obliterans

OPENALEX - Publications

Xiao-bao Wang Mengli Wang Yaojuan Chu Peipei Zhou Xiangyu Zhang and 9 more

10.1016/j.phymed.2023.154709 article EN Phytomedicine 2023-02-08

SSR: Spatial Sequential Hybrid Architecture for Latency Throughput Tradeoff in Transformer Acceleration

OPENALEX - Publications

Jinming Zhuang Zhuoping Yang Shixin Ji Heng Huang Alex K. Jones and 3 more

With the increase in computation intensity of chip, mismatch between layer shapes and available resource significantly limits utilization chip. Driven by this observation, prior works discuss spatial accelerators or dataflow architecture to maximize throughput. However, using could potentially execution latency. In work, we first systematically investigate two models: (1) sequentially (temporally) launch one monolithic accelerator, (2) spatially multiple accelerators. From observations, find...

10.1145/3626202.3637569 preprint EN cc-by-nc-sa 2024-04-01

High Performance, Low Power Matrix Multiply Design on ACAP: from Architecture, Design Challenges and DSE Perspectives

OPENALEX - Publications

Jinming Zhuang Zhuoping Yang Peipei Zhou

As the increasing complexity of Neural Network(NN) models leads to high demands for computation, AMD introduces a heterogeneous programmable system-on-chip (SoC), i.e., Versal ACAP architectures featured with logic(PL), CPUs, and dedicated AI engines (AIE) ASICs which has theoretical throughput up 6.4 TFLOPs FP32, 25.6 TOPs INT16 102.4 INT8. However, higher level makes it non-trivial achieve performance even well-studied applications like matrix-matrix multiply. In this paper, we provide...

10.1109/dac56929.2023.10247981 article EN 2023-07-09

EQ-ViT: Algorithm-Hardware Co-Design for End-to-End Acceleration of Real-Time Vision Transformer Inference on Versal ACAP Architecture

OPENALEX - Publications

Peiyan Dong Jinming Zhuang Zhuoping Yang Shixin Ji Yanyu Li and 7 more

While vision transformers (ViTs) have shown consistent progress in computer vision, deploying them for real-time decision-making scenarios (<1 ms) is challenging. Current computing platforms like CPUs, GPUs, or FPGA-based solutions struggle to meet this deterministic low-latency requirement, even with quantized ViT models. Some approaches use pruning sparsity reduce the model size and latency, but often results accuracy loss. To address aforementioned constraints, work, we propose EQ-ViT, an...

10.1109/tcad.2024.3443692 article EN IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 2024-11-01

The chromatin remodeling subunit Baf200 promotes normal hematopoiesis and inhibits leukemogenesis

OPENALEX - Publications

Lulu Liu Xiaoling Wan Peipei Zhou Xiaoyuan Zhou Wei Zhang and 12 more

Adenosine triphosphate (ATP)-dependent chromatin remodeling SWI/SNF-like BAF and PBAF complexes have been implicated in the regulation of stem cell function cancers. Several subunits or PBAF, including BRG1, BAF53a, BAF45a, BAF180, BAF250a, are known to be involved hematopoiesis. Baf200, a subunit complex, plays pivotal role heart morphogenesis coronary artery angiogenesis. However, little is on importance Baf200 normal malignant Utilizing Tie2-Cre-, Vav-iCre-, Mx1-Cre-mediated gene deletion...

10.1186/s13045-018-0567-7 article EN cc-by Journal of Hematology & Oncology 2018-02-26

A high-throughput system combining microfluidic hydrogel droplets with deep learning for screening the antisolvent-crystallization conditions of active pharmaceutical ingredients

OPENALEX - Publications

Zhenning Su Jinxu He Peipei Zhou Lü Huang Jianhua Zhou

Crystallization of active pharmaceutical ingredients (APIs) is a crucial process in the industry due to its great impact drug efficacy. However, conventional approaches for screening optimal crystallization conditions APIs are usually time-consuming, labor-intensive and expensive. Recently, droplet microfluidic technology has offered an alternative strategy high-throughput conditions. Despite many advantages such as low sample consumption, reduced operation time, increased throughput, etc.,...

10.1039/d0lc00153h article EN Lab on a Chip 2020-01-01

EF-Train: Enable Efficient On-device CNN Training on FPGA through Data Reshaping for Online Adaptation or Personalization

OPENALEX - Publications

Yue Tang Xinyi Zhang Peipei Zhou Jingtong Hu

Conventionally, DNN models are trained once in the cloud and deployed edge devices such as cars, robots, or unmanned aerial vehicles (UAVs) for real-time inference. However, there many cases that require to adapt new environments, domains, users. In order realize domain adaption personalization, on need be continuously device. this work, we design EF-Train, an efficient training accelerator with a unified channel-level parallelism-based convolution kernel can achieve end-to-end...

10.1145/3505633 article EN ACM Transactions on Design Automation of Electronic Systems 2022-02-24

SCARIF: Towards Carbon Modeling of Cloud Servers with Accelerators

OPENALEX - Publications

Shixin Ji Zhuoping Yang X. Chen Stephen Cahoon Jingtong Hu and 3 more

10.1109/isvlsi61997.2024.00095 article EN 2024-07-01

MTrain: Enable Efficient CNN Training on Heterogeneous FPGA-Based Edge Servers

OPENALEX - Publications

Yue Tang Alex K. Jones Jinjun Xiong Peipei Zhou Jingtong Hu

10.1109/tcad.2025.3541486 article EN IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 2025-01-01

ARIES: An Agile MLIR-Based Compilation Flow for Reconfigurable Devices with AI Engines

OPENALEX - Publications

Jinming Zhuang Shaojie Xiang Hongzheng Chen Niansong Zhang Zhuoping Yang and 3 more

10.1145/3706628.3708870 article EN cc-by-nc-sa 2025-02-26

Towards Accelerator Customization in Real-time Safety-critical Systems

OPENALEX - Publications

Shixin Ji X. Chen Wei Zhang Zhuoping Yang Jinming Zhuang and 6 more

10.1145/3706628.3708841 article EN 2025-02-26

Coming Soon ...