Peipei Zhou

ORCID: 0000-0002-0493-1844
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Parallel Computing and Optimization Techniques
  • Embedded Systems Design Techniques
  • Interconnection Networks and Systems
  • Advanced Neural Network Applications
  • Advanced Memory and Neural Computing
  • Ferroelectric and Negative Capacitance Devices
  • VLSI and FPGA Design Techniques
  • Acute Myeloid Leukemia Research
  • Innovative Microfluidic and Catalytic Techniques Innovation
  • CCD and CMOS Imaging Sensors
  • Green IT and Sustainability
  • Advanced Data Storage Technologies
  • Protein Degradation and Inhibitors
  • Caching and Content Delivery
  • Ginseng Biological Effects and Applications
  • Low-power high-performance VLSI design
  • Traditional Chinese Medicine Analysis
  • Algorithms and Data Compression
  • IoT and Edge/Fog Computing
  • Cloud Computing and Resource Management
  • Domain Adaptation and Few-Shot Learning
  • 3D Printing in Biomedical Research
  • Histone Deacetylase Inhibitors Research
  • Video Surveillance and Tracking Methods
  • Anomaly Detection Techniques and Applications

First Affiliated Hospital of Zhengzhou University
2021-2025

Brown University
2024-2025

John Brown University
2024-2025

University of Pittsburgh
2021-2024

St. Jude Children's Research Hospital
2020-2024

Swanson Center
2023

Zhengzhou University
2023

University of Maryland, College Park
2023

University of California, Los Angeles
2014-2021

Anqing Normal University
2021

With the recent advancement of multilayer convolutional neural networks (CNN), deep learning has achieved amazing success in many areas, especially visual content understanding and classification. To improve performance energy-efficiency computation-demanding CNN, FPGA-based acceleration emerges as one most attractive alternatives. In this paper we design implement Caffeine, a hardware/software co-designed library to efficiently accelerate entire CNN on FPGAs. First, propose uniformed...

10.1145/2966986.2967011 article EN 2016-10-18

With the recent advancement of multilayer convolutional neural networks (CNNs) and fully connected (FCNs), deep learning has achieved amazing success in many areas, especially visual content understanding classification. To improve performance energy efficiency computation-demanding CNN, FPGA-based acceleration emerges as one most attractive alternatives. In this paper, we design implement Caffeine, a hardware/software co-designed library to efficiently accelerate entire CNN FCN on FPGAs....

10.1109/tcad.2017.2785257 article EN publisher-specific-oa IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 2018-10-18

With the recent advancement of multilayer convolutional neural networks (CNN), deep learning has achieved amazing success in many areas, especially visual content understanding and classification. To improve performance energy-efficiency computation-demanding CNN, FPGA-based acceleration emerges as one most attractive alternatives. In this paper we design implement Caffeine, a hardware software co-designed library to efficiently accelerate entire CNN on FPGAs. Based portable high-level...

10.1145/3603165.3607390 article EN 2023-07-28

CD8+ cytotoxic T cells (CTLs) orchestrate antitumour immunity and exhibit inherent heterogeneity1,2, with precursor exhausted (Tpex) but not terminally (Tex) capable of responding to existing immunotherapies3-7. The gene regulatory network that underlies CTL differentiation whether Tex cell responses can be functionally reinvigorated are incompletely understood. Here we systematically mapped causal networks using single-cell CRISPR screens in vivo discovered checkpoints for differentiation....

10.1038/s41586-023-06733-x article EN cc-by Nature 2023-11-15

Stencil computation is one of the most important kernels in many application domains such as image processing, solving partial differential equations, and cellular automata. Many stencil are complex, usually consist multiple stages or iterations, often computation-bounded. Such offloaded to FPGAs take advantages efficiency dedicated hardware. However, implementing complex efficiently not trivial, due complicated data dependencies, difficulties programming with RTL, well large design space....

10.1145/3240765.3240850 article EN 2018-11-05

Future processor chips will not be limited by the transistor resources, but mainly constrained energy efficiency. Reconfigurable fabrics bring higher efficiency than CPUs via customized hardware that adapts to user applications. Among different reconfigurable fabrics, coarse-grained arrays (CGRAs) can even more efficient fine-grained FPGAs when bit-level customization is necessary in target CGRAs were originally developed era resources critical Previous work shares among operations modulo...

10.1109/fccm.2014.12 article EN 2014-05-01

Oncolyic virotherapy is one of the modern experimental techniques to treat human cancers. Here we studied antitumor activity wild-type Newcastle disease virus (NDV) isolates from Russian migratory birds. We showed that NDV could selectively kill malignant cells without affecting healthy cells. evaluated oncolytic effect 44 in 4 histogenetically different cell lines (HCT116, HeLa, A549, MCF7). The safety was also tested normal peripheral blood mononuclear (PBMC) viability tumor after...

10.1371/journal.pone.0195425 article EN cc-by PLoS ONE 2018-04-05

High-level synthesis (HLS) is getting increasing attention from both academia and industry for high-quality high-productivity designs. However, when inferring primitive-type arrays in HLS designs into on-chip memory buffers, commercial tools fail to effectively organize FPGAs' BRAM building blocks realize high-bandwidth data communication; this often leads sub-optimal quality of results. This paper addresses issue via automated buffer restructuring. Specifically, we present three...

10.1145/3061639.3062208 article EN 2017-06-13

Dense matrix multiply (MM) serves as one of the most heavily used kernels in deep learning applications. To cope with high computation demands these applications, heterogeneous architectures featuring both FPGA and dedicated ASIC accelerators have emerged promising platforms. For example, AMD/Xilinx Versal ACAP architecture combines general-purpose CPU cores programmable logic (PL) AI Engine processors (AIE) optimized for AI/ML. An array 400 executing at 1 GHz can theoretically provide up to...

10.1145/3543622.3573210 article EN cc-by-nc-sa 2023-02-10

Background In patients with hepatitis B virus (HBV)-related hepatocellular carcinoma (HCC), virus-specific cytotoxic T lymphocytes (CTLs) fail to eliminate HCC cells expressing HBV antigens. As the expression of viral antigen in HBV-associated may decrease allow tumor escape immune attacks, we hypothesized that an surface (HBsAg)-specific affinity-improved-T-cell receptor (TCR) will enable target more effectively than corresponding wild-type-TCR. We also postulated TCR promiscuity can be...

10.1136/jitc-2020-001748 article EN cc-by-nc Journal for ImmunoTherapy of Cancer 2020-12-01

Multi-head self-attention (attention mechanism) has been employed in a variety of fields such as machine translation, language modeling, and image processing due to its superiority feature extraction sequential data analysis. This is benefited from large number parameters sophisticated model architecture behind the attention mechanism. To efficiently deploy mechanism on resource-constrained devices, existing works propose reduce size by building customized smaller or compressing big standard...

10.1145/3477002 article EN ACM Transactions on Embedded Computing Systems 2021-09-17

With the increase in computation intensity of chip, mismatch between layer shapes and available resource significantly limits utilization chip. Driven by this observation, prior works discuss spatial accelerators or dataflow architecture to maximize throughput. However, using could potentially execution latency. In work, we first systematically investigate two models: (1) sequentially (temporally) launch one monolithic accelerator, (2) spatially multiple accelerators. From observations, find...

10.1145/3626202.3637569 preprint EN cc-by-nc-sa 2024-04-01

As the increasing complexity of Neural Network(NN) models leads to high demands for computation, AMD introduces a heterogeneous programmable system-on-chip (SoC), i.e., Versal ACAP architectures featured with logic(PL), CPUs, and dedicated AI engines (AIE) ASICs which has theoretical throughput up 6.4 TFLOPs FP32, 25.6 TOPs INT16 102.4 INT8. However, higher level makes it non-trivial achieve performance even well-studied applications like matrix-matrix multiply. In this paper, we provide...

10.1109/dac56929.2023.10247981 article EN 2023-07-09

While vision transformers (ViTs) have shown consistent progress in computer vision, deploying them for real-time decision-making scenarios (<1 ms) is challenging. Current computing platforms like CPUs, GPUs, or FPGA-based solutions struggle to meet this deterministic low-latency requirement, even with quantized ViT models. Some approaches use pruning sparsity reduce the model size and latency, but often results accuracy loss. To address aforementioned constraints, work, we propose EQ-ViT, an...

10.1109/tcad.2024.3443692 article EN IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 2024-11-01

Adenosine triphosphate (ATP)-dependent chromatin remodeling SWI/SNF-like BAF and PBAF complexes have been implicated in the regulation of stem cell function cancers. Several subunits or PBAF, including BRG1, BAF53a, BAF45a, BAF180, BAF250a, are known to be involved hematopoiesis. Baf200, a subunit complex, plays pivotal role heart morphogenesis coronary artery angiogenesis. However, little is on importance Baf200 normal malignant Utilizing Tie2-Cre-, Vav-iCre-, Mx1-Cre-mediated gene deletion...

10.1186/s13045-018-0567-7 article EN cc-by Journal of Hematology & Oncology 2018-02-26

Crystallization of active pharmaceutical ingredients (APIs) is a crucial process in the industry due to its great impact drug efficacy. However, conventional approaches for screening optimal crystallization conditions APIs are usually time-consuming, labor-intensive and expensive. Recently, droplet microfluidic technology has offered an alternative strategy high-throughput conditions. Despite many advantages such as low sample consumption, reduced operation time, increased throughput, etc.,...

10.1039/d0lc00153h article EN Lab on a Chip 2020-01-01

Conventionally, DNN models are trained once in the cloud and deployed edge devices such as cars, robots, or unmanned aerial vehicles (UAVs) for real-time inference. However, there many cases that require to adapt new environments, domains, users. In order realize domain adaption personalization, on need be continuously device. this work, we design EF-Train, an efficient training accelerator with a unified channel-level parallelism-based convolution kernel can achieve end-to-end...

10.1145/3505633 article EN ACM Transactions on Design Automation of Electronic Systems 2022-02-24

10.1109/tcad.2025.3541486 article EN IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 2025-01-01
Coming Soon ...