- Embedded Systems Design Techniques
- Parallel Computing and Optimization Techniques
- Interconnection Networks and Systems
- Radio Frequency Integrated Circuit Design
- Microwave Engineering and Waveguides
- Advanced Neural Network Applications
- Advanced Memory and Neural Computing
- Advanced Power Amplifier Design
- Millimeter-Wave Propagation and Modeling
- GaN-based semiconductor devices and materials
- Modular Robots and Swarm Intelligence
- Fault Detection and Control Systems
- Ferroelectric and Negative Capacitance Devices
- Adversarial Robustness in Machine Learning
- Distributed and Parallel Computing Systems
- VLSI and Analog Circuit Testing
- Photonic and Optical Devices
- Radiation Effects in Electronics
- Antenna Design and Optimization
- Software Testing and Debugging Techniques
- Machine Learning and Data Classification
- Pregnancy and preeclampsia studies
- Advanced Graph Neural Networks
- Membrane Separation Technologies
- Software System Performance and Reliability
Google (United States)
2023-2025
National University of Singapore
2008-2025
Peking University
2025
Arizona State University
2025
Aerospace Information Research Institute
2020-2024
Chinese Academy of Sciences
2018-2024
Microsoft (United States)
2022-2024
Bellevue Hospital Center
2023
University of Chinese Academy of Sciences
2020-2023
Microsoft Research (United Kingdom)
2022-2023
Abstract A hybrid forward osmosis-nanofiltration (FO-NF) process for seawater desalination is proposed in this study. Seven potential draw solutions the FO-NF were investigated using laboratory-scale osmosis (FO) and nanofiltration (NF) test cells. Results from both FO NF tests suggested that a feasible desalination. Water fluxes of about 10 L/m2 h, processes could be achieved. Solute rejection membrane was maintained at over 99.4% all seven solutes tested. four selected achieve maximum...
Graph Convolutional Networks (GCNs) have drawn tremendous attention in the past three years. Compared with other deep learning modalities, high-performance hardware acceleration of GCNs is as critical but even more challenging. The hurdles arise from poor data locality and redundant computation due to large size, high sparsity, irregular non-zero distribution real-world graphs.
Coarse-grained reconfigurable arrays (CGRAs), loosely defined as of functional units (e.g., adder, subtractor, multiplier, divider, or larger multi-operation units, but smaller than a general-purpose core) interconnected through Network-on-Chip, provide higher flexibility domain-specific ASIC accelerators while offering increased hardware efficiency with respect to fine-grained devices, such Field Programmable Gate Arrays (FPGAs). The fast evolving fields machine learning and edge computing,...
Reconfigurable accelerator fabrics, including coarse-grain reconfigurable arrays (CGRAs), have experienced a resurgence in interest because they allow fast-paced software algorithm development to continue evolving post-fabrication. CGRAs traditionally target regular workloads with data-level parallelism (e.g., neural networks, image processing), but once integrated into an SoC remain idle and unused for irregular workloads. An emerging trend towards repurposing these resources raises...
The generation of custom hardware accelerators for applications implemented within high-level productive programming frameworks requires considerable manual effort. To automate this process, we introduce SODA-OPT, a compiler tool that extends the MLIR infrastructure. SODA-OPT automatically searches, outlines, tiles, and pre-optimizes relevant code regions to generate high-quality through synthesis. can support any framework domain-specific language interface with By leveraging MLIR, solves...
This article presents an X/Ku dual-band switch-free reconfigurable GaAs low-noise amplifier (LNA) realized by inter-stage and output-stage coupled lines. is the first LNA design in lines structure. After amplified broadband drive stage, input signal divided into two parallel single-band stages (consists of a high-band stage low-band stage) proposed line. Two split-band signals are combined line output port after stages. The also included matching networks. Dual-band operation achieved...
Coarse-grained reconfigurable arrays (CGRAs), loosely defined as of functional units interconnected through a network-on-chip (NoC), provide higher flexibility than domain-specific ASIC accelerators while offering increased hardware efficiency with respect to fine-grained devices, such Field Programmable Gate Arrays (FPGAs). Unfortunately, designing CGRA for specific application domain involves enormous softwarelhardware engineering effort (e.g., the CGRA, map operations onto etc) and...
Systems performing scientific computing, data analysis, and machine learning tasks have a growing demand for application-specific accelerators that can provide high computational performance while meeting strict size power requirements. However, the algorithms applications need to be accelerated are evolving at rate is incompatible with manual design processes based on hardware description languages. Agile tools compiler techniques help by quickly producing an integrated circuit (ASIC)...
This paper analyzes the main factors limiting bandwidth expansion of low-noise amplifiers (LNA) and designs a broadband LNA with 2-40.5 GHz. The is designed using multiple methods, including cascode, resistance feedback, cascode Darlington amplifier. amplitude-frequency characteristics principle three structures are studied theoretically based on small-signal equivalent circuit model. Thanks to these techniques, three-stage in 0.15-μm GaAs pseudomorphic high-electron-mobility (pHEMT)...
Coarse-Grained Reconfigurable Arrays (CGRAs) can achieve higher energy-efficiency than general-purpose processors and accelerators or fine-grained reconfigurable devices, while maintaining adaptability to different computational patterns. CGRAs have shown some success as a platform accelerate machine learning (ML) thanks their flexibility, which allows them support new models not considered by fixed accelerators. However, current solutions for employ low level instruction-based compiler...
The high performance demand of embedded systems along with restrictive thermal design power (TDP) constraint have lead to the emergence heterogenous multi-core architectures, where cores same instruction-set architecture but different power-performance characteristics provide new opportunities for energy-efficient computing. Heterogeneity introduces challenges in scheduling tasks appropriate and selecting frequency assignment each core. In this paper, we introduce an approximation-aware...
Coarse-grained Reconfigurable Arrays (CGRAs) are domain-agnostic accelerators that enhance the energy efficiency of resource-constrained edge devices. The CGRA landscape is diverse, exhibiting trade-offs between performance, efficiency, and architectural specialization. However, CGRAs often overprovision communication resources relative to their modest computing capabilities. This occurs because theoretically provisioned programmability for proves superfluous in practical implementations. In...
Wearable devices are now leveraging multi-core processors to cater the increasing computational demands of applications via multi-threading. However, power, performance constraints many wearable can only be satisfied when thread-level parallelism is coupled with hardware acceleration common kernels. The ASIC accelerators high performance/watt suffer from non-recurring engineering costs. Configurable that reused across present a promising alternative. Autonomous configurable loosely-coupled...
Reconfigurable architectures are today experiencing a renewed interest for their ability to provide specialization without sacrificing the capability adapt disparate workloads. Coarse-grained reconfigurable arrays (CGRAs) higher flexibility than application-specific integrated circuits (ASICs) while offering increased hardware efficiency with respect field-programmable gate (FPGAs). This makes CGRAs promising alternative enable power-/area-efficient acceleration across different application...
Convolutional Neural Networks (CNN) have been widely deployed in diverse application domains. There has significant progress accelerating both their training and inference using high-performance GPUs, FPGAs, custom ASICs for datacenter-scale environments. The recent proliferation of mobile Internet Things (IoT) devices necessitated real-time, energy-efficient deep neural network on embedded-class, resource-constrained platforms. In this context, we present Synergy , an automated,...
The next generation HPC and data centers are likely to be reconfigurable data-centric due the trend of hardware specialization emergence data-driven applications. In this article, we propose ARENA - an asynchronous accelerator ring architecture as a potential scenario on how future will like. Despite using coarse-grained arrays (CGRAs) substrate platform, our key contribution is not only CGRA-cluster design itself, but also ensemble new programming model that enables tasking across cluster...
Application requirements, such as real-time response, are pushing wearable devices to leverage more powerful processors inside the SoC (system on chip). However, existing not well suited for challenging applications due poor performance, and conventional many-core architectures appropriate either stringent power budget in this domain. We propose LOCUS—a low-power, customizable, processor next-generation devices. LOCUS combines customizable cores with a network message-passing architecture...
This letter presents an <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$X$ </tex-math></inline-formula> / <italic xmlns:xlink="http://www.w3.org/1999/xlink">Ku</i> dual-band switchless power amplifier (PA) with frequency reconfigurable operation in a 0.25- notation="LaTeX">$\mu \text{m}$ GaAs pHEMT process. The proposed PA consists of one drive and two single-band amplifiers parallel. first stage works...
Coarse-Grained Reconfigurable Arrays (CGRAs) provide high performance, energy-efficient execution of the innermost loops an application. Most real-world applications, however, comprise deeply-nested with complex and often irregular control flow structures that cannot be mapped to CGRAs by existing compilers. This leads excessive data transfer costs as continuously alternates between outer loop-nests on host processor loop CGRA accelerator. Moreover, ultra-low power can only include limited...
There is a growing interest in the open-source hardware movement to amortize non-recurring engineering costs by using plug-and-play system-on-chip (SoC) designs, where communication among different components provided an on-chip interconnection network. Unfortunately, building network (OCN) that suitable for specific SoC design requires exploration of large number options and involves diverse research methodologies evaluate performance, area, energy, timing. In this paper, we propose PyOCN,...
The requirements' demands of applications, such as real-time response, are pushing the wearable devices to leverage more power-efficient processors inside SoC (System-on-chip). However, existing not well suited for challenging applications due poor performance, while conventional powerful many-core architectures appropriate either stringent power budget in this domain. We propose LOCUS - a low-power, customizable, processor next-generation devices. combines customizable cores with network on...
The growing numbers of application areas for artificial intelligence (AI) methods have led to an explosion in availability domain-specific accelerators, which struggle support every new machine learning (ML) algorithm advancement, clearly highlighting the need a tool quickly and automatically transition from definition hardware implementation explore design space along variety SWaP (size, weight Power) metrics. software defined architectures (SODA) synthesizer implements modular...
In the last decade, Artificial Intelligence (AI) through Deep Neural Networks (DNNs) has penetrated virtually every aspect of science, technology, and business. Many types DNNs have been continue to be developed, including Convolutional (CNNs), Recurrent (RNNs), Graph (GNNs). The overall problem for all these (NNs) is that their target applications generally pose stringent constraints on latency throughput, while also having strict accuracy requirements. There many previous efforts in...