NFDI4DS | UHH-SEMS - Publication Details

Cheng Tan

ORCID: 0000-0003-3727-2889

Publications

Citations

Views

---

Saved

---

About

Contact & Profiles

A5059973546

Research Areas

Embedded Systems Design Techniques
Parallel Computing and Optimization Techniques
Interconnection Networks and Systems
Radio Frequency Integrated Circuit Design
Microwave Engineering and Waveguides
Advanced Neural Network Applications
Advanced Memory and Neural Computing
Advanced Power Amplifier Design
Millimeter-Wave Propagation and Modeling
GaN-based semiconductor devices and materials
Modular Robots and Swarm Intelligence
Fault Detection and Control Systems
Ferroelectric and Negative Capacitance Devices
Adversarial Robustness in Machine Learning
Distributed and Parallel Computing Systems
VLSI and Analog Circuit Testing
Photonic and Optical Devices
Radiation Effects in Electronics
Antenna Design and Optimization
Software Testing and Debugging Techniques
Machine Learning and Data Classification
Pregnancy and preeclampsia studies
Advanced Graph Neural Networks
Membrane Separation Technologies
Software System Performance and Reliability

Google (United States)
2023-2025

National University of Singapore
2008-2025

Peking University
2025

Arizona State University
2025

Aerospace Information Research Institute
2020-2024

Chinese Academy of Sciences
2018-2024

Microsoft (United States)
2022-2024

Bellevue Hospital Center
2023

University of Chinese Academy of Sciences
2020-2023

Microsoft Research (United Kingdom)
2022-2023

A novel hybrid forward osmosis - nanofiltration (FO-NF) process for seawater desalination: Draw solution selection and system configuration

OPENALEX - Publications

Cheng Tan How Yong Ng

Abstract A hybrid forward osmosis-nanofiltration (FO-NF) process for seawater desalination is proposed in this study. Seven potential draw solutions the FO-NF were investigated using laboratory-scale osmosis (FO) and nanofiltration (NF) test cells. Results from both FO NF tests suggested that a feasible desalination. Water fluxes of about 10 L/m2 h, processes could be achieved. Solute rejection membrane was maintained at over 99.4% all seven solutes tested. four selected achieve maximum...

10.5004/dwt.2010.1733 article EN cc-by-nc-nd Desalination and Water Treatment 2010-01-01

I-GCN: A Graph Convolutional Network Accelerator with Runtime Locality Enhancement through Islandization

OPENALEX - Publications

Tong Geng Chunshu Wu Yongan Zhang Cheng Tan Chenhao Xie and 4 more

Graph Convolutional Networks (GCNs) have drawn tremendous attention in the past three years. Compared with other deep learning modalities, high-performance hardware acceleration of GCNs is as critical but even more challenging. The hurdles arise from poor data locality and redundant computation due to large size, high sparsity, irregular non-zero distribution real-world graphs.

10.1145/3466752.3480113 article EN 2021-10-17

OpenCGRA: An Open-Source Unified Framework for Modeling, Testing, and Evaluating CGRAs

OPENALEX - Publications

Cheng Tan Chenhao Xie Ang Li Kevin Barker Antonino Tumeo

Coarse-grained reconfigurable arrays (CGRAs), loosely defined as of functional units (e.g., adder, subtractor, multiplier, divider, or larger multi-operation units, but smaller than a general-purpose core) interconnected through Network-on-Chip, provide higher flexibility domain-specific ASIC accelerators while offering increased hardware efficiency with respect to fine-grained devices, such Field Programmable Gate Arrays (FPGAs). The fast evolving fields machine learning and edge computing,...

10.1109/iccd50377.2020.00070 article EN 2022 IEEE 40th International Conference on Computer Design (ICCD) 2020-10-01

Ultra-Elastic CGRAs for Irregular Loop Specialization

OPENALEX - Publications

Christopher Torng Peitian Pan Yanghui Ou Cheng Tan Christopher Batten

Reconfigurable accelerator fabrics, including coarse-grain reconfigurable arrays (CGRAs), have experienced a resurgence in interest because they allow fast-paced software algorithm development to continue evolving post-fabrication. CGRAs traditionally target regular workloads with data-level parallelism (e.g., neural networks, image processing), but once integrated into an SoC remain idle and unused for irregular workloads. An emerging trend towards repurposing these resources raises...

10.1109/hpca51647.2021.00042 article EN 2021-02-01

An MLIR-based Compiler Flow for System-Level Design and Hardware Acceleration

OPENALEX - Publications

Nicolas Bohm Agostini Serena Curzel Vinay Amatya Cheng Tan Marco Minutoli and 4 more

The generation of custom hardware accelerators for applications implemented within high-level productive programming frameworks requires considerable manual effort. To automate this process, we introduce SODA-OPT, a compiler tool that extends the MLIR infrastructure. SODA-OPT automatically searches, outlines, tiles, and pre-optimizes relevant code regions to generate high-quality through synthesis. can support any framework domain-specific language interface with By leveraging MLIR, solves...

10.1145/3508352.3549424 article EN Proceedings of the 41st IEEE/ACM International Conference on Computer-Aided Design 2022-10-30

An X/Ku Dual-Band Switch-Free Reconfigurable GaAs LNA MMIC Based on Coupled Line

OPENALEX - Publications

Chunshuang Xie Zhongjun Yu Cheng Tan

This article presents an X/Ku dual-band switch-free reconfigurable GaAs low-noise amplifier (LNA) realized by inter-stage and output-stage coupled lines. is the first LNA design in lines structure. After amplified broadband drive stage, input signal divided into two parallel single-band stages (consists of a high-band stage low-band stage) proposed line. Two split-band signals are combined line output port after stages. The also included matching networks. Dual-band operation achieved...

10.1109/access.2020.3020396 article EN cc-by IEEE Access 2020-01-01

AURORA: Automated Refinement of Coarse-Grained Reconfigurable Accelerators

OPENALEX - Publications

Cheng Tan Chenhao Xie Ang Li Kevin Barker Antonino Tumeo

Coarse-grained reconfigurable arrays (CGRAs), loosely defined as of functional units interconnected through a network-on-chip (NoC), provide higher flexibility than domain-specific ASIC accelerators while offering increased hardware efficiency with respect to fine-grained devices, such Field Programmable Gate Arrays (FPGAs). Unfortunately, designing CGRA for specific application domain involves enormous softwarelhardware engineering effort (e.g., the CGRA, map operations onto etc) and...

10.23919/date51398.2021.9473955 article EN Design, Automation & Test in Europe Conference & Exhibition (DATE), 2015 2021-02-01

Bridging Python to Silicon: The SODA Toolchain

OPENALEX - Publications

Nicolas Bohm Agostini Serena Curzel Jeff Zhang Ankur Limaye Cheng Tan and 7 more

Systems performing scientific computing, data analysis, and machine learning tasks have a growing demand for application-specific accelerators that can provide high computational performance while meeting strict size power requirements. However, the algorithms applications need to be accelerated are evolving at rate is incompatible with manual design processes based on hardware description languages. Agile tools compiler techniques help by quickly producing an integrated circuit (ASIC)...

10.1109/mm.2022.3178580 article EN IEEE Micro 2022-06-01

Analysis and Design of a 2-40.5 GHz Low Noise Amplifier With Multiple Bandwidth Expansion Techniques

OPENALEX - Publications

Jiaxuan Li Jialong Zeng Yang Yuan Ding He Jingxin Fan and 2 more

This paper analyzes the main factors limiting bandwidth expansion of low-noise amplifiers (LNA) and designs a broadband LNA with 2-40.5 GHz. The is designed using multiple methods, including cascode, resistance feedback, cascode Darlington amplifier. amplitude-frequency characteristics principle three structures are studied theoretically based on small-signal equivalent circuit model. Thanks to these techniques, three-stage in 0.15-μm GaAs pseudomorphic high-electron-mobility (pHEMT)...

10.1109/access.2023.3243090 article EN cc-by IEEE Access 2023-01-01

ML-CGRA: An Integrated Compilation Framework to Enable Efficient Machine Learning Acceleration on CGRAs

OPENALEX - Publications

Yixuan Luo Cheng Tan Nicolas Bohm Agostini Ang Li Antonino Tumeo and 2 more

Coarse-Grained Reconfigurable Arrays (CGRAs) can achieve higher energy-efficiency than general-purpose processors and accelerators or fine-grained reconfigurable devices, while maintaining adaptability to different computational patterns. CGRAs have shown some success as a platform accelerate machine learning (ML) thanks their flexibility, which allows them support new models not considered by fixed accelerators. However, current solutions for employ low level instruction-based compiler...

10.1109/dac56929.2023.10247873 article EN 2023-07-09

Approximation-aware scheduling on heterogeneous multi-core architectures

OPENALEX - Publications

Cheng Tan Thannirmalai Somu Muthukaruppan Tulika Mitra Lei Ju

The high performance demand of embedded systems along with restrictive thermal design power (TDP) constraint have lead to the emergence heterogenous multi-core architectures, where cores same instruction-set architecture but different power-performance characteristics provide new opportunities for energy-efficient computing. Heterogeneity introduces challenges in scheduling tasks appropriate and selecting frequency assignment each core. In this paper, we introduce an approximation-aware...

10.1109/aspdac.2015.7059077 article EN 2015-01-01

Enhancing CGRA Efficiency Through Aligned Compute and Communication Provisioning

OPENALEX - Publications

Zhaoying Li Pranav Dangi Chenyang Yin Thilini Kaushalya Bandara Rohan Juneja and 3 more

Coarse-grained Reconfigurable Arrays (CGRAs) are domain-agnostic accelerators that enhance the energy efficiency of resource-constrained edge devices. The CGRA landscape is diverse, exhibiting trade-offs between performance, efficiency, and architectural specialization. However, CGRAs often overprovision communication resources relative to their modest computing capabilities. This occurs because theoretically provisioned programmability for proves superfluous in practical implementations. In...

10.1145/3669940.3707230 preprint EN 2025-02-03

PICACHU: Plug-In CGRA Handling Upcoming Nonlinear Operations in LLMs

OPENALEX - Publications

Jiaxiang Qin Tianhua Xia Cheng Tan Jeff Zhang Sai Qian Zhang

10.1145/3676641.3716013 article EN 2025-03-27

Click-Through Rate Prediction with Multi-Behavior Sequences and Shared Interest Learning

OPENALEX - Publications

Biao Jin Cheng Tan Yunjie Xu Wenqiang Jin

10.1016/j.im.2025.104177 article EN Information & Management 2025-05-01

Stitch: Fusible Heterogeneous Accelerators Enmeshed with Many-Core Architecture for Wearables

OPENALEX - Publications

Cheng Tan Manupa Karunaratne Tulika Mitra Li-Shiuan Peh

Wearable devices are now leveraging multi-core processors to cater the increasing computational demands of applications via multi-threading. However, power, performance constraints many wearable can only be satisfied when thread-level parallelism is coupled with hardware acceleration common kernels. The ASIC accelerators high performance/watt suffer from non-recurring engineering costs. Configurable that reused across present a promising alternative. Autonomous configurable loosely-coupled...

10.1109/isca.2018.00054 article EN 2018-06-01

OpenCGRA: Democratizing Coarse-Grained Reconfigurable Arrays

OPENALEX - Publications

Cheng Tan Nicolas Bohm Agostini Jeff Zhang Marco Minutoli Vito Giovanni Castellana and 5 more

Reconfigurable architectures are today experiencing a renewed interest for their ability to provide specialization without sacrificing the capability adapt disparate workloads. Coarse-grained reconfigurable arrays (CGRAs) higher flexibility than application-specific integrated circuits (ASICs) while offering increased hardware efficiency with respect field-programmable gate (FPGAs). This makes CGRAs promising alternative enable power-/area-efficient acceleration across different application...

10.1109/asap52443.2021.00029 article EN 2021-07-01

Synergy

OPENALEX - Publications

Guanwen Zhong Akshat Dubey Cheng Tan Tulika Mitra

Convolutional Neural Networks (CNN) have been widely deployed in diverse application domains. There has significant progress accelerating both their training and inference using high-performance GPUs, FPGAs, custom ASICs for datacenter-scale environments. The recent proliferation of mobile Internet Things (IoT) devices necessitated real-time, energy-efficient deep neural network on embedded-class, resource-constrained platforms. In this context, we present Synergy , an automated,...

10.1145/3301278 article EN ACM Transactions on Embedded Computing Systems 2019-03-18

ARENA: Asynchronous Reconfigurable Accelerator Ring to Enable Data-Centric Parallel Computing

OPENALEX - Publications

Cheng Tan Chenhao Xie Tong Geng Andrés Márquez Antonino Tumeo and 2 more

The next generation HPC and data centers are likely to be reconfigurable data-centric due the trend of hardware specialization emergence data-driven applications. In this article, we propose ARENA - an asynchronous accelerator ring architecture as a potential scenario on how future will like. Despite using coarse-grained arrays (CGRAs) substrate platform, our key contribution is not only CGRA-cluster design itself, but also ensemble new programming model that enables tasking across cluster...

10.1109/tpds.2021.3081074 article EN publisher-specific-oa IEEE Transactions on Parallel and Distributed Systems 2021-05-19

LOCUS

OPENALEX - Publications

Cheng Tan Aditi Kulkarni Vanchinathan Venkataramani Manupa Karunaratne Tulika Mitra and 1 more

Application requirements, such as real-time response, are pushing wearable devices to leverage more powerful processors inside the SoC (system on chip). However, existing not well suited for challenging applications due poor performance, and conventional many-core architectures appropriate either stringent power budget in this domain. We propose LOCUS—a low-power, customizable, processor next-generation devices. LOCUS combines customizable cores with a network message-passing architecture...

10.1145/3122786 article EN ACM Transactions on Embedded Computing Systems 2017-11-14

An X/Ku Dual-Band Switchless Frequency Reconfigurable GaAs Power Amplifier

OPENALEX - Publications

Chunshuang Xie Peng Wu Cheng Tan Yang Yuan Jialong Zeng and 1 more

This letter presents an <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$X$ </tex-math></inline-formula> / <italic xmlns:xlink="http://www.w3.org/1999/xlink">Ku</i> dual-band switchless power amplifier (PA) with frequency reconfigurable operation in a 0.25- notation="LaTeX">$\mu \text{m}$ GaAs pHEMT process. The proposed PA consists of one drive and two single-band amplifiers parallel. first stage works...

10.1109/lmwc.2021.3139665 article EN IEEE Microwave and Wireless Components Letters 2022-01-17

Dnestmap

OPENALEX - Publications

Manupa Karunaratne Cheng Tan Aditi Kulkarni Tulika Mitra Li-Shiuan Peh

Coarse-Grained Reconfigurable Arrays (CGRAs) provide high performance, energy-efficient execution of the innermost loops an application. Most real-world applications, however, comprise deeply-nested with complex and often irregular control flow structures that cannot be mapped to CGRAs by existing compilers. This leads excessive data transfer costs as continuously alternates between outer loop-nests on host processor loop CGRA accelerator. Moreover, ultra-low power can only include limited...

10.1145/3195970.3196027 article EN 2018-06-19

PyOCN: A Unified Framework for Modeling, Testing, and Evaluating On-Chip Networks

OPENALEX - Publications

Cheng Tan Yanghui Ou Shunning Jiang Peitian Pan Christopher Torng and 2 more

There is a growing interest in the open-source hardware movement to amortize non-recurring engineering costs by using plug-and-play system-on-chip (SoC) designs, where communication among different components provided an on-chip interconnection network. Unfortunately, building network (OCN) that suitable for specific SoC design requires exploration of large number options and involves diverse research methodologies evaluate performance, area, energy, timing. In this paper, we propose PyOCN,...

10.1109/iccd46524.2019.00068 article EN 2022 IEEE 40th International Conference on Computer Design (ICCD) 2019-11-01

LOCUS

OPENALEX - Publications

Cheng Tan Aditi Kulkarni Vanchinathan Venkataramani Manupa Karunaratne Tulika Mitra and 1 more

The requirements' demands of applications, such as real-time response, are pushing the wearable devices to leverage more power-efficient processors inside SoC (System-on-chip). However, existing not well suited for challenging applications due poor performance, while conventional powerful many-core architectures appropriate either stringent power budget in this domain. We propose LOCUS - a low-power, customizable, processor next-generation devices. combines customizable cores with network on...

10.1145/2968455.2968506 article EN 2016-10-01

Automated Generation of Integrated Digital and Spiking Neuromorphic Machine Learning Accelerators

OPENALEX - Publications

Serena Curzel Nicolas Bohm Agostini Shihao Song Ismet Dagli Ankur Limaye and 8 more

The growing numbers of application areas for artificial intelligence (AI) methods have led to an explosion in availability domain-specific accelerators, which struggle support every new machine learning (ML) algorithm advancement, clearly highlighting the need a tool quickly and automatically transition from definition hardware implementation explore design space along variety SWaP (size, weight Power) metrics. software defined architectures (SODA) synthesizer implements modular...

10.1109/iccad51958.2021.9643474 article EN 2015 IEEE/ACM International Conference on Computer-Aided Design (ICCAD) 2021-11-01

A Survey: Handling Irregularities in Neural Network Acceleration with FPGAs

OPENALEX - Publications

Tong Geng Chunshu Wu Cheng Tan Chenhao Xie Anqi Guo and 5 more

In the last decade, Artificial Intelligence (AI) through Deep Neural Networks (DNNs) has penetrated virtually every aspect of science, technology, and business. Many types DNNs have been continue to be developed, including Convolutional (CNNs), Recurrent (RNNs), Graph (GNNs). The overall problem for all these (NNs) is that their target applications generally pose stringent constraints on latency throughput, while also having strict accuracy requirements. There many previous efforts in...

10.1109/hpec49654.2021.9622877 article EN 2021-09-20

Coming Soon ...