NFDI4DS | UHH-SEMS - Publication Details

Hayden Kwok‐Hay So

ORCID: 0000-0002-6514-0237

Publications

Citations

Views

---

Saved

---

About

Contact & Profiles

A5020581824

Research Areas

Parallel Computing and Optimization Techniques
Embedded Systems Design Techniques
Interconnection Networks and Systems
Numerical Methods and Algorithms
Advanced Neural Network Applications
Low-power high-performance VLSI design
Digital Holography and Microscopy
Cell Image Analysis Techniques
Digital Filter Design and Implementation
Image Processing Techniques and Applications
VLSI and FPGA Design Techniques
Advanced Memory and Neural Computing
Advanced Fluorescence Microscopy Techniques
CCD and CMOS Imaging Sensors
Cryptography and Residue Arithmetic
Neural Networks and Applications
Optical measurement and interference techniques
Analog and Mixed-Signal Circuit Design
Advanced Vision and Imaging
Ultrasound Imaging and Elastography
Adversarial Robustness in Machine Learning
Graph Theory and Algorithms
Distributed and Parallel Computing Systems
Machine Learning and ELM
Advancements in PLL and VCO Technologies

University of Hong Kong
2016-2025

University of California, Berkeley
1979-2024

Hong Kong Baptist University
2024

Bridge University
2024

Hong Kong Science and Technology Parks Corporation
2022

City University of Hong Kong
2017-2020

Chinese University of Hong Kong
2012-2020

Princeton University
2017

Stanford University
2017

Xidian University
2017

High-Dimensional Dense Residual Convolutional Neural Network for Light Field Reconstruction

OPENALEX - Publications

Nan Meng Hayden Kwok‐Hay So Xing Sun Edmund Y. Lam

We consider the problem of high-dimensional light field reconstruction and develop a learning-based framework for spatial angular super-resolution. Many current approaches either require disparity clues or restore details separately. Such methods have difficulties with non-Lambertian surfaces occlusions. In contrast, we formulate super-resolution (LFSR) as tensor restoration learning based on two-stage 4-dimensional (4D) convolution. This allows our model to learn features capturing geometry...

10.1109/tpami.2019.2945027 article EN IEEE Transactions on Pattern Analysis and Machine Intelligence 2019-10-01

PARC: ultrafast and accurate clustering of phenotypic data of millions of single cells

OPENALEX - Publications

Shobana V. Stassen Dickson M. D. Siu Kelvin C. M. Lee Joshua W. K. Ho Hayden Kwok‐Hay So and 1 more

New single-cell technologies continue to fuel the explosive growth in scale of heterogeneous data. However, existing computational methods are inadequately scalable large datasets and therefore cannot uncover complex cellular heterogeneity.

10.1093/bioinformatics/btaa042 article EN cc-by-nc Bioinformatics 2020-01-16

Large-Scale Multi-Class Image-Based Cell Classification With Deep Learning

OPENALEX - Publications

Nan Meng Edmund Y. Lam Kevin K. Tsia Hayden Kwok‐Hay So

Recent advances in ultra-high-throughput microscopy have enabled a new generation of cell classification methodologies using image-based phenotypes alone. In contrast to current single-cell analysis techniques that rely solely on slow and costly genetic/epigenetic analysis, these analyses allow morphological profiling screening thousands or even millions single cells at fraction the cost, been proven demonstrate statistical significance required for understanding role heterogeneity diverse...

10.1109/jbhi.2018.2878878 article EN IEEE Journal of Biomedical and Health Informatics 2018-10-31

Mix and Match: A Novel FPGA-Centric Deep Neural Network Quantization Framework

OPENALEX - Publications

Sung-En Chang Yanyu Li Mengshu Sun Runbin Shi Hayden Kwok‐Hay So and 3 more

Deep Neural Networks (DNNs) have achieved extraordinary performance in various application domains. To support diverse DNN models, efficient implementations of inference on edge-computing platforms, e.g., ASICs, FPGAs, and embedded systems, are extensively investigated. Due to the huge model size computation amount, compression is a critical step deploy models edge devices. This paper focuses weight quantization, hardware-friendly approach that complementary pruning.Unlike existing methods...

10.1109/hpca51647.2021.00027 article EN 2021-02-01

PACoGen: A Hardware Posit Arithmetic Core Generator

OPENALEX - Publications

Manish Kumar Jaiswal Hayden Kwok‐Hay So

This paper proposes open-source hardware Posit Arithmetic Core Generator (PACoGen) for the recently developed universal number posit system, along with a set of pipelined architectures. The system composed run-time varying exponent component, which is defined by composition length "regime-bit" and "exponent-bit" (with maximum size ES bits, size). in effect also makes fraction part to vary at position. These variations inherit an interesting design challenge arithmetic being infant stage its...

10.1109/access.2019.2920936 article EN cc-by-nc-nd IEEE Access 2019-01-01

Quantitative Phase Imaging Flow Cytometry for Ultra‐Large‐Scale Single‐Cell Biophysical Phenotyping

OPENALEX - Publications

Kelvin C. M. Lee Maolin Wang Kathryn S.E. Cheah Gcf Chan Hayden Kwok‐Hay So and 2 more

ABSTRACT Cellular biophysical properties are the effective label‐free phenotypes indicative of differences in cell types, states, and functions. However, current phenotyping methods largely lack throughput specificity required majority cell‐based assays that involve large‐scale single‐cell characterization for inquiring inherently complex heterogeneity many biological systems. Further confounded by reported robust reproducibility quality control, widespread adoption mainstream cytometry...

10.1002/cyto.a.23765 article EN Cytometry Part A 2019-04-22

A unified hardware/software runtime environment for FPGA-based reconfigurable computers using BORPH

OPENALEX - Publications

Hayden Kwok‐Hay So R.W. Brodersen

This paper explores the design and implementation of BORPH, an operating system designed for FPGA-based reconfigurable computers. Hardware designs execute as normal UNIX processes under having access to standard OS services, such file support. software components user may, therefore, run communicating within BORPH's runtime environment. The familiar language independent kernel interface facilitates easy reuse rapid application development. To develop hardware designs, a Simulink-based flow...

10.1145/1331331.1331338 article EN ACM Transactions on Embedded Computing Systems 2008-01-29

Medical Ultrasound Imaging: To GPU or Not to GPU?

OPENALEX - Publications

Hayden Kwok‐Hay So Junying Chen Billy Y. S. Yiu Alfred C. H. Yu

Medical ultrasound imaging stands out from other modalities in providing real-time diagnostic capability at an affordable price while being physically portable. This article explores the suitability of using GPUs as primary signal and image processors for future medical systems. A case study on synthetic aperture (SA) illustrates promise high-performance such

10.1109/mm.2011.65 article EN IEEE Micro 2011-07-26

Fringe Pattern Improvement and Super-Resolution Using Deep Learning in Digital Holography

OPENALEX - Publications

Zhenbo Ren Hayden Kwok‐Hay So Edmund Y. Lam

Digital holographic imaging is a powerful technique that can provide wavefront information of three-dimensional object for biological and industrial applications. However, due to the constraint cost sensors, acquired digital hologram limited in terms pixel count, thus affecting resolution reconstruction. To overcome this constraint, paper we propose deep learning-based method super-resolve holograms improve quality low-resolution by training convolutional neural network with large-scale data...

10.1109/tii.2019.2913853 article EN IEEE Transactions on Industrial Informatics 2019-04-29

High-throughput time-stretch imaging flow cytometry for multi-class classification of phytoplankton

OPENALEX - Publications

Queenie T. K. Lai Kelvin C. M. Lee Anson H. L. Tang Kenneth K. Y. Wong Hayden Kwok‐Hay So and 1 more

Time-stretch imaging has been regarded as an attractive technique for high-throughput flow cytometry primarily owing to its real-time, continuous ultrafast operation. Nevertheless, two key challenges remain: (1) sufficiently high time-stretch image resolution and contrast is needed visualizing sub-cellular complexity of single cells, (2) the ability unravel heterogeneity highly diverse population cells - a central problem single-cell analysis in life sciences required. We here demonstrate...

10.1364/oe.24.028170 article EN cc-by Optics Express 2016-11-28

Universal number posit arithmetic generator on FPGA

OPENALEX - Publications

Manish Kumar Jaiswal Hayden Kwok‐Hay So

Posit number system format includes a run-time varying exponent component, defined by combination of regime-bit (with length) and exponent-bit size up to ES bits, the size). This also leads variation in its mantissa field position. posit poses hardware design challenge. Being recent development, lacks for adequate arithmetic architectures. Thus, this paper is aimed towards algorithmic development their generic generator. It focused on basic (floating-point conversion, floating point...

10.23919/date.2018.8342187 article EN Design, Automation & Test in Europe Conference & Exhibition (DATE), 2015 2018-03-01

Deep-learning-assisted biophysical imaging cytometry at massive throughput delineates cell population heterogeneity

OPENALEX - Publications

Dickson M. D. Siu Kelvin C. M. Lee Michelle C. K. Lo Shobana V. Stassen Maolin Wang and 8 more

The association of the intrinsic optical and biophysical properties cells to homeostasis pathogenesis has long been acknowledged. Defining these label-free cellular features obviates need for costly time-consuming labelling protocols that perturb living cells. However, wide-ranging applicability such cell-based assays requires sufficient throughput, statistical power sensitivity are unattainable with current technologies. To close this gap, we present a large-scale, integrative imaging flow...

10.1039/d0lc00542h article EN Lab on a Chip 2020-01-01

RedCap: residual encoder-decoder capsule network for holographic image reconstruction

OPENALEX - Publications

Tianjiao Zeng Hayden Kwok‐Hay So Edmund Y. Lam

A capsule network, as an advanced technique in deep learning, is designed to overcome information loss the pooling operation and internal data representation of a convolutional neural network (CNN). It has shown promising results several applications, such digit recognition image segmentation. In this work, we investigate for first time use digital holographic reconstruction. The proposed residual encoder-decoder which call RedCap, uses novel windowed spatial dynamic routing algorithm block,...

10.1364/oe.383350 article EN cc-by Optics Express 2020-01-27

NITI: Training Integer Neural Networks Using Integer-Only Arithmetic

OPENALEX - Publications

Maolin Wang S A Rasoulinezhad Philip H. W. Leong Hayden Kwok‐Hay So

Low bitwidth integer arithmetic has been widely adopted in hardware implementations of deep neural network inference applications. However, despite the promised energy-efficiency improvements demanding edge applications, use low for training remains limited. Unlike inference, demands high dynamic range and numerical accuracy quality results, making low-bitwidth particularly challenging. To address this challenge, we present a novel framework called NITI that exclusively utilizes arithmetic....

10.1109/tpds.2022.3149787 article EN IEEE Transactions on Parallel and Distributed Systems 2022-02-09

Lens-free motion analysis via neuromorphic laser speckle imaging

OPENALEX - Publications

Ge Zhou Pei Zhang Yizhao Gao Hayden Kwok‐Hay So Edmund Y. Lam

Laser speckle imaging (LSI) is a powerful tool for motion analysis owing to the high sensitivity of laser speckles. Traditional LSI techniques rely on identifying changes from sequential intensity patterns, where each pixel performs synchronous measurements. However, lot redundant data static speckles without information in scene will also be recorded, resulting considerable resources consumption processing and storage. Moreover, cues are inevitably lost during "blind" time interval between...

10.1364/oe.444948 article EN cc-by Optics Express 2022-01-03

Reduced-Dimension MUSIC based Target Localization with Uniform Circular Array by using Signals of Opportunity from LEO Satellites

OPENALEX - Publications

Meng Sun Dongqi Shi Jingjing Pan Xiaofei Zhang Yide Wang and 1 more

10.36227/techrxiv.173750254.47079022/v1 preprint EN cc-by 2025-01-21

Random resistive memory-based deep extreme point learning machine for unified visual processing

OPENALEX - Publications

Shaocong Wang Yizhao Gao Yi Li Woyu Zhang Yifei Yu and 19 more

Visual sensors, including 3D light detection and ranging, neuromorphic dynamic vision sensor, conventional frame cameras, are increasingly integrated into edge-side intelligent machines. However, their data heterogeneous, causing complexity in system development. Moreover, digital hardware is constrained by von Neumann bottleneck the physical limit of transistor scaling. The computational demands training ever-growing models further exacerbate these challenges. We propose a hardware-software...

10.1038/s41467-025-56079-3 article EN cc-by-nc-nd Nature Communications 2025-01-23

TATAA: Programmable Mixed-Precision Transformer Acceleration with a Transformable Arithmetic Architecture

OPENALEX - Publications

Jiajun Wu Mo Song Jingmin Zhao Yizhao Gao Jia Li and 1 more

Modern transformer-based deep neural networks present unique technical challenges for effective acceleration in real-world applications. Apart from the vast amount of linear operations needed due to their sizes, modern transformer models are increasingly reliance on precise non-linear computations that make traditional low-bitwidth quantization methods and fixed-dataflow matrix accelerators ineffective end-to-end acceleration. To address this need accelerate both a unified programmable...

10.1145/3714416 article EN ACM Transactions on Reconfigurable Technology and Systems 2025-01-24

A Hardware-Software Design Framework for SpMV Acceleration with Flexible Access Pattern Portfolio

OPENALEX - Publications

Zhenyu Wu Maolin Wang Hayden Kwok‐Hay So

10.1109/hpca61900.2025.00068 article EN 2025-03-01

A unified hardware/software runtime environment for FPGA-based reconfigurable computers using BORPH

OPENALEX - Publications

Hayden Kwok‐Hay So Artem Tkachenko R.W. Brodersen

This paper presents a hw/sw codesign methodology based on BORPH, an operating system designed for FPGA-based reconfigurable computers (RC's). By providing native kernel support FPGA hardware, BORPH offers homogeneous UNIX interface both software and hardware processes. Hardware processes inherit the same level of service from kernel, such as file support, typical components design therefore run within BORPH's run-time environment. The familiar language independent facilitates easy reuse...

10.1145/1176254.1176316 article EN 2006-10-22

Zero-Configuration Identity-Based Signcryption Scheme for Smart Grid

OPENALEX - Publications

Hayden Kwok‐Hay So Sammy H. M. Kwok Edmund Y. Lam King‐Shan Lui

The success of future intelligent power deliver and transmission systems across the globe relies critically on availability a fast, calable, most importantly secure communication infrastructure between energy producers consumers. One major obstacle to ensure among various parties in smart grid network hinges technical implementation difficulties associated with key distribution such large-scale often-time disinterested This paper proposes use an identity-based signcryption (IBS) system...

10.1109/smartgrid.2010.5622061 article EN 2010-10-01

UE-TCAM: An ultra efficient SRAM-based TCAM

OPENALEX - Publications

Zahid Ullah Manish Kumar Jaiswal Ray C. C. Cheung Hayden Kwok‐Hay So

Ternary content-addressable memories (TCAMs) are high speed memories; however, compared to static random-access (SRAMs), TCAMs suffer from low storage density, relatively slow access time, poor scalability, complexity in circuitry, and higher cost. To the benefits of SRAM, several SRAM-based TCAMs, specifically on field-programmable gate array (FPGA) platforms, were proposed. further improve performance this paper presents UE-TCAM, which reduces memory requirement, latency, power...

10.1109/tencon.2015.7372837 article EN 2015-11-01

Map-reduce processing of k-means algorithm with FPGA-accelerated computer cluster

OPENALEX - Publications

Yuk-Ming Choi Hayden Kwok‐Hay So

The design and implementation of the k-means clustering algorithm on an FPGA-accelerated computer cluster is presented. followed Map-Reduce programming model, with both map reduce functions executing autonomously to CPU multiple FPGAs. A hardware/software framework was developed manage gateware execution FPGAs across cluster. Using this as example, system-level tradeoff study between computation I/O performance in target multi-FPGA environment performed. When compared a similar software over...

10.1109/asap.2014.6868624 article EN 2014-06-01

Architecture Generator for Type-3 Unum Posit Adder/Subtractor

OPENALEX - Publications

Manish Kumar Jaiswal Hayden Kwok‐Hay So

This paper is aimed towards the hardware architecture aspect of a recently proposed posit number system under type-3 unum (universal system). Here, an algorithmic flow for addition/subtraction arithmetic developed and its designed. Compare to floating point, provides better dynamic range accuracy over same word size, along with more accurate exact support. Posit format includes run-time varying exponent component, provided by combination regime-bits (of length) exponent-bits size up ES...

10.1109/iscas.2018.8351142 article EN 2022 IEEE International Symposium on Circuits and Systems (ISCAS) 2018-01-01

QuickDough: A rapid FPGA loop accelerator design framework using soft CGRA overlay

OPENALEX - Publications

Cheng Liu Ho-Cheung Ng Hayden Kwok‐Hay So

The use of FPGAs as compute accelerators has been demonstrated by numerous researchers an effective solution to meet the performance requirement across many application domains. However, design productivity developing FPGA remains much lower compared a typical software development flow. Although high-level tools may partly alleviate this shortcoming, lengthy low-level implementation process including synthesis, placing and routing still dramatically limits number compile-debug-edit cycles...

10.1109/fpt.2015.7393130 article EN 2015-12-01

Coming Soon ...