Qijing Huang

ORCID: 0000-0001-9084-8520
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Advanced Neural Network Applications
  • Parallel Computing and Optimization Techniques
  • CCD and CMOS Imaging Sensors
  • Embedded Systems Design Techniques
  • Advanced Memory and Neural Computing
  • Advanced Image and Video Retrieval Techniques
  • VLSI and FPGA Design Techniques
  • Interconnection Networks and Systems
  • Machine Learning in Materials Science
  • Real-Time Systems Scheduling
  • Machine Learning and Algorithms
  • Ferroelectric and Negative Capacitance Devices
  • Domain Adaptation and Few-Shot Learning
  • Software Engineering Research
  • VLSI and Analog Circuit Testing
  • Industrial Vision Systems and Defect Detection
  • Low-power high-performance VLSI design
  • Machine Learning and Data Classification
  • Multimodal Machine Learning Applications
  • Membrane Separation Technologies
  • Software-Defined Networks and 5G
  • Evolutionary Algorithms and Applications
  • Radiation Effects in Electronics
  • Neural Networks and Applications
  • Brain Tumor Detection and Classification

Nvidia (United States)
2023-2025

South China University of Technology
2024

Shanghai Jiao Tong University
2023-2024

First Affiliated Hospital of Guangzhou University of Chinese Medicine
2022-2023

Guangzhou University of Chinese Medicine
2022

University of California, Berkeley
2017-2021

University of Hong Kong
2021

Berkeley College
2021

East China University of Science and Technology
2020

Universidad Técnica de Ambato
2019

We present FireSim, an open-source simulation platform that enables cycle-exact microarchitectural of large scale-out clusters by combining FPGA-accelerated silicon-proven RTL designs with a scalable, distributed network simulation. Unlike prior tools, FireSim runs on Amazon EC2 F1, public cloud FPGA platform, which greatly improves usability, provides elasticity, and lowers the cost large-scale FPGA-based experiments. describe design implementation show how it can provide sufficient...

10.1109/isca.2018.00014 article EN 2018-06-01

Domain specialization under energy constraints in deeply-scaled CMOS has been driving the need for agile development of Systems on a Chip (SoCs). While digital subsystems have design flows that are conducive to rapid iterations from specification layout, analog and mixed-signal modules face challenge long human-in-the-middle iteration loop requires expert intuition verify post-layout circuit parameters meet original specification. Existing automated solutions optimize given target...

10.23919/date48585.2020.9116200 article EN Design, Automation & Test in Europe Conference & Exhibition (DATE), 2015 2020-03-01

Using FPGAs to accelerate ConvNets has attracted significant attention in recent years. However, FPGA accelerator design not leveraged the latest progress of ConvNets. As a result, key application characteristics such as frames-per-second (FPS) are ignored favor simply counting GOPs, and results on accuracy, which is critical success, often even reported. In this work, we adopt an algorithm-hardware co-design approach develop ConvNet called Synetgy novel model DiracDeltaNet. Both tailored...

10.1145/3289602.3293902 preprint EN 2019-02-20

Recent advances in Deep Neural Networks (DNNs) have led to active development of specialized DNN accelerators, many which feature a large number processing elements laid out spatially, together with multi-level memory hierarchy and flexible interconnect. While accelerators can take advantage data reuse achieve high peak throughput, they also expose runtime parameters the programmers who need explicitly manage how computation is scheduled both spatially temporally. In fact, different...

10.1109/isca52012.2021.00050 article EN 2021-06-01

Recent advances in state-of-the-art DNN architecture design have been moving toward Transformer models. These models achieve superior accuracy across a wide range of applications. This trend has consistent over the past several years since were originally introduced. However, amount compute and bandwidth required for inference recent is growing at significant rate, this made their deployment latency-sensitive applications challenging. As such, there an increased focus on making more...

10.48550/arxiv.2302.14017 preprint EN other-oa arXiv (Cornell University) 2023-01-01

Membrane materials that resist nonspecific or specific adsorption are urgently required in widespread practical applications, such as water purification, food processing, and life sciences. In inevitable membrane fouling not only limits separation performance, leading to a decline both permeance selectivity, but also remarkably increases operation requirements, augments extra maintenance costs higher energy consumption. this work, we report freestanding interfacial polymerization (IP)...

10.1002/anie.202408345 article EN Angewandte Chemie International Edition 2024-06-18

NVDLA is an open-source deep neural network (DNN) accelerator which has received a lot of attention by the community since its introduction Nvidia. It full-featured hardware IP and can serve as good reference for conducting research development SoCs with integrated accelerators. However, expensive FPGA board required to do experiments this in real SoC. Moreover, clocked at lower frequency on FPGA, it would be hard accurate performance analysis such setup. To overcome these limitations, we...

10.1109/emc249363.2019.00012 article EN 2019-02-01

Deploying deep learning models on embedded systems for computer vision tasks has been challenging due to limited compute resources and strict energy budgets. The majority of existing work focuses accelerating image classification, while other fundamental problems, such as object detection, have not adequately addressed. Compared with detection problems are more sensitive the spatial variance objects, therefore, require specialized convolutions aggregate information. To address this need,...

10.1145/3431920.3439295 preprint EN 2021-02-17

We consider the impact of compiler optimizations on quality high-level synthesis (HLS)-generated FPGA hardware. Using a HLS tool implemented within state-of-the-art LLVM [1] compiler, we study effect hardware metrics circuit area, execution cycles, Fmax, and wall-clock time. evaluate 56 different show that some significantly affect quality. Moreover, is also affected by order in which are applied. then present new HLS-directed approach to optimizations, wherein execute partial profiling at...

10.1109/fccm.2013.50 article EN 2013-04-01

The amount of data being generated in genomics is predicted to be between 2 and 40 exabytes per year for the next decade, making genomic analysis new frontier challenge precision medicine. This paper explores targeted deployment hardware accelerators cloud improve runtime throughput immensescale analyses. In particular, INDEL (INsertion/DELetion) realignment a critical operation that enables diagnostic testings cancer through error correction prior variant calling. It slowest part somatic...

10.1109/hpca.2019.00044 article EN 2019-02-01

Automatic algorithm-hardware co-design for DNN has shown great success in improving the performance of DNNs on FPGAs. However, this process remains challenging due to intractable search space neural network architectures and hardware accelerator implementation. Differing from existing hardware-aware architecture (NAS) algorithms that rely solely expensive learning-based approaches, our work incorporates integer programming into algorithm prune design space. Given a set resource constraints,...

10.1109/fccm51124.2021.00014 article EN 2021-05-01

Embedded system designers can achieve energy and performance benefits by using dedicated hardware accelerators. However, implementing custom accelerators for an application be difficult time intensive. LegUp is open-source high-level synthesis framework that simplifies the accelerator design process [8]. With LegUp, a designer start from embedded running on processor incrementally migrate portions of program to implemented FPGA. The final then executes automatically-generated...

10.1109/cases.2013.6662524 article EN 2013-09-01

We consider the impact of compiler optimizations on quality high-level synthesis (HLS)-generated field-programmable gate array (FPGA) hardware. Using an HLS tool implemented within state-of-the-art LLVM compiler, we study effect hardware metrics circuit area, execution cycles, FMax , and wall-clock time. evaluate 56 different show that some significantly affect quality. Moreover, is also affected by optimization parameter values, as well order in which are applied. then present a new...

10.1145/2629547 article EN ACM Transactions on Reconfigurable Technology and Systems 2015-05-11

Osteosarcoma (OS) is the most common bone cancer with a high risk of metastasis, growth rate, and poor prognosis. Honokiol (HNK) general ingredient traditional Chinese medicine, potential anti-tumor effect. However, HNK insoluble in water lacks drug targeting, which limits its clinical application. To improve OS therapeutic effect HNK, we used HNK-loaded liposomes modified hyaluronic acid-phospholipid conjugates (HA-DOPE) to treat based on HA interaction CD44.The were prepared via thin-film...

10.2147/ijn.s371934 article EN cc-by-nc International Journal of Nanomedicine 2022-11-01

Embedded system designers can achieve energy and performance benefits by using dedicated hardware accelerators. However, implementing custom accelerators for an application be difficult time intensive. LegUp is open-source high-level synthesis framework that simplifies the accelerator design process [8]. With LegUp, a designer start from embedded running on processor incrementally migrate portions of program to implemented FPGA. The final then executes automatically-generated...

10.5555/2555729.2555747 article EN Compilers, Architecture, and Synthesis for Embedded Systems 2013-09-29

The performance of the code generated by a compiler depends on order in which optimization passes are applied. In high-level synthesis, quality circuit relates directly to front-end compiler. Choosing good order-often referred as phase-ordering problem-is an NP-hard problem. this paper, we evaluate new technique address problem: deep reinforcement learning. We implement framework context LLVM optimize ordering for HLS programs and compare learning state-of-the-art algorithms that Overall,...

10.1109/fccm.2019.00049 article EN 2019-04-01

Poor time-predictability of the multicore processors is a well-known issue that hinders their adoption in real-time systems due to contention shared memory resources. In this paper, we present Bandwidth Regulation Unit (BRU), drop-in hardware module enables per-core bandwidth regulation at fine-grained time intervals. Additionally, BRU has ability regulate access multiple cores collectively improve utilization. Besides eliminating overhead software methods, our evaluation results using...

10.1109/rtas48715.2020.00011 article EN 2020-04-01

Efficient sampling of satisfying formulas for circuit satisfiability (CircuitSAT), a well-known NP-complete problem, is essential in modern front-end applications thorough testing and verification digital circuits. Generating such samples hard computational problem due to the inherent complexity circuits, size search space, resource constraints involved process. Addressing these challenges has prompted development specialized algorithms that heavily rely on heuristics. However,...

10.1145/3658617.3697760 article EN Proceedings of the 28th Asia and South Pacific Design Automation Conference 2025-01-20

In this work, we present a novel technique for GPU-accelerated Boolean satisfiability (SAT) sampling. Unlike conventional sampling algorithms that directly operate on conjunctive normal form (CNF), our method transforms the logical constraints of SAT problems by factoring their CNF representations into simplified multi-level, multi-output functions. It then leverages gradient-based optimization to guide search diverse set valid solutions. Our operates circuit structure refactored instances,...

10.48550/arxiv.2502.08673 preprint EN arXiv (Cornell University) 2025-02-11
Coming Soon ...