NFDI4DS | UHH-SEMS - Publication Details

Qijing Huang

ORCID: 0000-0001-9084-8520

Publications

Citations

Views

---

Saved

---

About

Contact & Profiles

A5014354015

Research Areas

Advanced Neural Network Applications
Parallel Computing and Optimization Techniques
CCD and CMOS Imaging Sensors
Embedded Systems Design Techniques
Advanced Memory and Neural Computing
Advanced Image and Video Retrieval Techniques
VLSI and FPGA Design Techniques
Interconnection Networks and Systems
Machine Learning in Materials Science
Real-Time Systems Scheduling
Machine Learning and Algorithms
Ferroelectric and Negative Capacitance Devices
Domain Adaptation and Few-Shot Learning
Software Engineering Research
VLSI and Analog Circuit Testing
Industrial Vision Systems and Defect Detection
Low-power high-performance VLSI design
Machine Learning and Data Classification
Multimodal Machine Learning Applications
Membrane Separation Technologies
Software-Defined Networks and 5G
Evolutionary Algorithms and Applications
Radiation Effects in Electronics
Neural Networks and Applications
Brain Tumor Detection and Classification

Nvidia (United States)
2023-2025

South China University of Technology
2024

Shanghai Jiao Tong University
2023-2024

First Affiliated Hospital of Guangzhou University of Chinese Medicine
2022-2023

Guangzhou University of Chinese Medicine
2022

University of California, Berkeley
2017-2021

University of Hong Kong
2021

Berkeley College
2021

East China University of Science and Technology
2020

Universidad Técnica de Ambato
2019

FireSim: FPGA-Accelerated Cycle-Exact Scale-Out System Simulation in the Public Cloud

OPENALEX - Publications

Sagar Karandikar Howard Mao Donggyu Kim David Biancolin Alon Amid and 11 more

We present FireSim, an open-source simulation platform that enables cycle-exact microarchitectural of large scale-out clusters by combining FPGA-accelerated silicon-proven RTL designs with a scalable, distributed network simulation. Unlike prior tools, FireSim runs on Amazon EC2 F1, public cloud FPGA platform, which greatly improves usability, provides elasticity, and lowers the cost large-scale FPGA-based experiments. describe design implementation show how it can provide sufficient...

10.1109/isca.2018.00014 article EN 2018-06-01

Dendritic-cell-targeting virus-like particles as potent mRNA vaccine carriers

OPENALEX - Publications

Di Yin Yiye Zhong Sikai Ling Sicong Lu Xiaoyuan Wang and 26 more

10.1038/s41551-024-01208-4 article EN Nature Biomedical Engineering 2024-05-07

AutoCkt: Deep Reinforcement Learning of Analog Circuit Designs

OPENALEX - Publications

Keertana Settaluri Ameer Haj-Ali Qijing Huang Kourosh Hakhamaneshi Borivoje Nikolić

Domain specialization under energy constraints in deeply-scaled CMOS has been driving the need for agile development of Systems on a Chip (SoCs). While digital subsystems have design flows that are conducive to rapid iterations from specification layout, analog and mixed-signal modules face challenge long human-in-the-middle iteration loop requires expert intuition verify post-layout circuit parameters meet original specification. Existing automated solutions optimize given target...

10.23919/date48585.2020.9116200 article EN Design, Automation & Test in Europe Conference & Exhibition (DATE), 2015 2020-03-01

Synetgy

OPENALEX - Publications

Yifan Yang Qijing Huang Bichen Wu Tianjun Zhang Liang Ma and 6 more

Using FPGAs to accelerate ConvNets has attracted significant attention in recent years. However, FPGA accelerator design not leveraged the latest progress of ConvNets. As a result, key application characteristics such as frames-per-second (FPS) are ignored favor simply counting GOPs, and results on accuracy, which is critical success, often even reported. In this work, we adopt an algorithm-hardware co-design approach develop ConvNet called Synetgy novel model DiracDeltaNet. Both tailored...

10.1145/3289602.3293902 preprint EN 2019-02-20

CoSA: Scheduling by Constrained Optimization for Spatial Accelerators

OPENALEX - Publications

Qijing Huang Minwoo Kang Grace Dinh Thomas Norell Aravind Kalaiah and 3 more

Recent advances in Deep Neural Networks (DNNs) have led to active development of specialized DNN accelerators, many which feature a large number processing elements laid out spatially, together with multi-level memory hierarchy and flexible interconnect. While accelerators can take advantage data reuse achieve high peak throughput, they also expose runtime parameters the programmers who need explicitly manage how computation is scheduled both spatially temporally. In fact, different...

10.1109/isca52012.2021.00050 article EN 2021-06-01

Full Stack Optimization of Transformer Inference: a Survey

OPENALEX - Publications

Sehoon Kim Coleman Hooper Thanakul Wattanawong Minwoo Kang Ruohan Yan and 7 more

Recent advances in state-of-the-art DNN architecture design have been moving toward Transformer models. These models achieve superior accuracy across a wide range of applications. This trend has consistent over the past several years since were originally introduced. However, amount compute and bandwidth required for inference recent is growing at significant rate, this made their deployment latency-sensitive applications challenging. As such, there an increased focus on making more...

10.48550/arxiv.2302.14017 preprint EN other-oa arXiv (Cornell University) 2023-01-01

Antifouling Asymmetric Block Copolymer Nanofilms via Freestanding Interfacial Polymerization for Efficient and Sustainable Water Purification

OPENALEX - Publications

Yu Chen Kaiyuan Song Ziying Li Yue Su Li Yu and 8 more

Membrane materials that resist nonspecific or specific adsorption are urgently required in widespread practical applications, such as water purification, food processing, and life sciences. In inevitable membrane fouling not only limits separation performance, leading to a decline both permeance selectivity, but also remarkably increases operation requirements, augments extra maintenance costs higher energy consumption. this work, we report freestanding interfacial polymerization (IP)...

10.1002/anie.202408345 article EN Angewandte Chemie International Edition 2024-06-18

Integrating NVIDIA Deep Learning Accelerator (NVDLA) with RISC-V SoC on FireSim

OPENALEX - Publications

Farzad Farshchi Qijing Huang Heechul Yun

NVDLA is an open-source deep neural network (DNN) accelerator which has received a lot of attention by the community since its introduction Nvidia. It full-featured hardware IP and can serve as good reference for conducting research development SoCs with integrated accelerators. However, expensive FPGA board required to do experiments this in real SoC. Moreover, clocked at lower frequency on FPGA, it would be hard accurate performance analysis such setup. To overcome these limitations, we...

10.1109/emc249363.2019.00012 article EN 2019-02-01

CoDeNet: Efficient Deployment of Input-Adaptive Object Detection on Embedded FPGAs

OPENALEX - Publications

Qijing Huang Dequan Wang Zhen Dong Yizhao Gao Yaohui Cai and 4 more

Deploying deep learning models on embedded systems for computer vision tasks has been challenging due to limited compute resources and strict energy budgets. The majority of existing work focuses accelerating image classification, while other fundamental problems, such as object detection, have not adequately addressed. Compared with detection problems are more sensitive the spatial variance objects, therefore, require specialized convolutions aggregate information. To address this need,...

10.1145/3431920.3439295 preprint EN 2021-02-17

The Effect of Compiler Optimizations on High-Level Synthesis for FPGAs

OPENALEX - Publications

Qijing Huang Ruolong Lian Andrew Canis Jongsok Choi Ryan Xi and 2 more

We consider the impact of compiler optimizations on quality high-level synthesis (HLS)-generated FPGA hardware. Using a HLS tool implemented within state-of-the-art LLVM [1] compiler, we study effect hardware metrics circuit area, execution cycles, Fmax, and wall-clock time. evaluate 56 different show that some significantly affect quality. Moreover, is also affected by order in which are applied. then present new HLS-directed approach to optimizations, wherein execute partial profiling at...

10.1109/fccm.2013.50 article EN 2013-04-01

FPGA Accelerated INDEL Realignment in the Cloud

OPENALEX - Publications

Lisa Wu David Bruns-Smith Frank Austin Nothaft Qijing Huang Sagar Karandikar and 7 more

The amount of data being generated in genomics is predicted to be between 2 and 40 exabytes per year for the next decade, making genomic analysis new frontier challenge precision medicine. This paper explores targeted deployment hardware accelerators cloud improve runtime throughput immensescale analyses. In particular, INDEL (INsertion/DELetion) realignment a critical operation that enables diagnostic testings cancer through error correction prior variant calling. It slowest part somatic...

10.1109/hpca.2019.00044 article EN 2019-02-01

HAO: Hardware-aware Neural Architecture Optimization for Efficient Inference

OPENALEX - Publications

Zhen Dong Yizhao Gao Qijing Huang John Wawrzynek Hayden Kwok‐Hay So and 1 more

Automatic algorithm-hardware co-design for DNN has shown great success in improving the performance of DNNs on FPGAs. However, this process remains challenging due to intractable search space neural network architectures and hardware accelerator implementation. Differing from existing hardware-aware architecture (NAS) algorithms that rely solely expensive learning-based approaches, our work incorporates integer programming into algorithm prune design space. Given a set resource constraints,...

10.1109/fccm51124.2021.00014 article EN 2021-05-01

From software to accelerators with LegUp high-level synthesis

OPENALEX - Publications

Andrew Canis Jongsok Choi Blair Fort Ruolong Lian Qijing Huang and 7 more

Embedded system designers can achieve energy and performance benefits by using dedicated hardware accelerators. However, implementing custom accelerators for an application be difficult time intensive. LegUp is open-source high-level synthesis framework that simplifies the accelerator design process [8]. With LegUp, a designer start from embedded running on processor incrementally migrate portions of program to implemented FPGA. The final then executes automatically-generated...

10.1109/cases.2013.6662524 article EN 2013-09-01

The Effect of Compiler Optimizations on High-Level Synthesis-Generated Hardware

OPENALEX - Publications

Qijing Huang Ruolong Lian Andrew Canis Jongsok Choi Ryan Xi and 3 more

We consider the impact of compiler optimizations on quality high-level synthesis (HLS)-generated field-programmable gate array (FPGA) hardware. Using an HLS tool implemented within state-of-the-art LLVM compiler, we study effect hardware metrics circuit area, execution cycles, FMax , and wall-clock time. evaluate 56 different show that some significantly affect quality. Moreover, is also affected by optimization parameter values, as well order in which are applied. then present a new...

10.1145/2629547 article EN ACM Transactions on Reconfigurable Technology and Systems 2015-05-11

HA-DOPE-Modified Honokiol-Loaded Liposomes Targeted Therapy for Osteosarcoma

OPENALEX - Publications

Xiangxiang Zhang Huaen Chen Yang Zhang Qijing Huang Jianjia Feng and 6 more

Osteosarcoma (OS) is the most common bone cancer with a high risk of metastasis, growth rate, and poor prognosis. Honokiol (HNK) general ingredient traditional Chinese medicine, potential anti-tumor effect. However, HNK insoluble in water lacks drug targeting, which limits its clinical application. To improve OS therapeutic effect HNK, we used HNK-loaded liposomes modified hyaluronic acid-phospholipid conjugates (HA-DOPE) to treat based on HA interaction CD44.The were prepared via thin-film...

10.2147/ijn.s371934 article EN cc-by-nc International Journal of Nanomedicine 2022-11-01

ChipVQA: Benchmarking Visual Language Models for Chip Design

OPENALEX - Publications

Haoyu Yang Qijing Huang Nathaniel Pinckney Walker J. Turner Wenfei Zhou and 4 more

10.23919/date64628.2025.10992791 article EN 2025-03-31

High-Throughput SAT Sampling

OPENALEX - Publications

Arash Ardakani Minwoo Kang Kevin He Qijing Huang John Wawrzynek

10.23919/date64628.2025.10993248 article EN 2025-03-31

From software to accelerators with LegUp high-level synthesis

OPENALEX - Publications

Andrew Canis Jongsok Choi Blair Fort Ruolong Lian Qijing Huang and 7 more

10.5555/2555729.2555747 article EN Compilers, Architecture, and Synthesis for Embedded Systems 2013-09-29

AutoPhase: Compiler Phase-Ordering for HLS with Deep Reinforcement Learning

OPENALEX - Publications

Qijing Huang Ameer Haj-Ali William S. Moses John Xiang Ion Stoica and 2 more

The performance of the code generated by a compiler depends on order in which optimization passes are applied. In high-level synthesis, quality circuit relates directly to front-end compiler. Choosing good order-often referred as phase-ordering problem-is an NP-hard problem. this paper, we evaluate new technique address problem: deep reinforcement learning. We implement framework context LLVM optimize ordering for HLS programs and compare learning state-of-the-art algorithms that Overall,...

10.1109/fccm.2019.00049 article EN 2019-04-01

BRU: Bandwidth Regulation Unit for Real-Time Multicore Processors

OPENALEX - Publications

Farzad Farshchi Qijing Huang Heechul Yun

Poor time-predictability of the multicore processors is a well-known issue that hinders their adoption in real-time systems due to contention shared memory resources. In this paper, we present Bandwidth Regulation Unit (BRU), drop-in hardware module enables per-core bandwidth regulation at fine-grained time intervals. Additionally, BRU has ability regulate access multiple cores collectively improve utilization. Besides eliminating overhead software methods, our evaluation results using...

10.1109/rtas48715.2020.00011 article EN 2020-04-01

DEMOTIC: A Differentiable Sampler for Multi-Level Digital Circuits

OPENALEX - Publications

Arash Ardakani Minwoo Kang Kevin He Qijing Huang Vighnesh Iyer and 2 more

Efficient sampling of satisfying formulas for circuit satisfiability (CircuitSAT), a well-known NP-complete problem, is essential in modern front-end applications thorough testing and verification digital circuits. Generating such samples hard computational problem due to the inherent complexity circuits, size search space, resource constraints involved process. Addressing these challenges has prompted development specialized algorithms that heavily rely on heuristics. However,...

10.1145/3658617.3697760 article EN Proceedings of the 28th Asia and South Pacific Design Automation Conference 2025-01-20

High-Throughput SAT Sampling

OPENALEX - Publications

Arash Ardakani Minwoo Kang Kevin He Qijing Huang John Wawrzynek

In this work, we present a novel technique for GPU-accelerated Boolean satisfiability (SAT) sampling. Unlike conventional sampling algorithms that directly operate on conjunctive normal form (CNF), our method transforms the logical constraints of SAT problems by factoring their CNF representations into simplified multi-level, multi-output functions. It then leverages gradient-based optimization to guide search diverse set valid solutions. Our operates circuit structure refactored instances,...

10.48550/arxiv.2502.08673 preprint EN arXiv (Cornell University) 2025-02-11

Coming Soon ...