Kin Wai Lau

ORCID: 0000-0001-5364-5070
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Music and Audio Processing
  • Advanced Neural Network Applications
  • Speech and Audio Processing
  • Speech Recognition and Synthesis
  • Image and Video Quality Assessment
  • Advanced Image Processing Techniques
  • Domain Adaptation and Few-Shot Learning
  • Advanced Image and Video Retrieval Techniques
  • Image Retrieval and Classification Techniques
  • Advanced Image Fusion Techniques
  • Machine Learning and Data Classification
  • Music Technology and Sound Studies
  • Privacy-Preserving Technologies in Data
  • Machine Learning and ELM
  • Visual Attention and Saliency Detection
  • Stochastic Gradient Optimization Techniques
  • Image and Signal Denoising Methods
  • Advanced Vision and Imaging
  • Video Surveillance and Tracking Methods
  • Human Pose and Action Recognition
  • Generative Adversarial Networks and Image Synthesis
  • Robotics and Sensor-Based Localization
  • Multimodal Machine Learning Applications
  • Brain Tumor Detection and Classification
  • Image Enhancement Techniques

City University of Hong Kong
2019-2024

TCL (China)
2021-2024

Trinity College London
2022

TFI Digital Media Limited (China)
2018-2019

Deep convolutional neural networks (CNNs) have been successfully applied on no-reference image quality assessment (NR-IQA) with respect to human perception. Most of these methods deal small patches and use the average score test for predicting whole quality. We discovered that from homogenous regions are unreliable both network training final estimation. In addition, complex structures much higher chances achieving better prediction. Based findings, we enhanced conventional CNN-based NR-IQA...

10.1109/tcsvt.2019.2891159 article EN IEEE Transactions on Circuits and Systems for Video Technology 2019-01-09

Visual Attention Networks (VAN) with Large Kernel (LKA) modules have been shown to provide remarkable performance, that surpasses Vision Transformers (ViTs), on a range of vision-based tasks. However, the depth-wise convolutional layer in these LKA incurs quadratic increase computational and memory footprints increasing kernel size. To mitigate problems enable use extremely large kernels attention VAN, we propose family Separable modules, termed LSKA. LSKA decomposes 2D into cascaded...

10.2139/ssrn.4463661 preprint EN 2023-01-01

Deep learning based image hashing methods learn hash codes by using powerful feature extractors and nonlinear transformations to achieve highly efficient retrieval. For most end-to-end deep methods, the supervised process relies on pair-wise or triplet-wise information provide an internal relationship of similarity data. However, use triplet loss function is limited not only expensive training costs but also quantization errors. In this paper, we propose a novel semantic method for retrieval...

10.1109/access.2019.2939650 article EN cc-by IEEE Access 2019-01-01

Self-supervised learning (SSL) aims to learn feature representation without human-annotated data. Existing methods approach this goal by encouraging the representations be invariant under a set of task-irrelevant transformations and distortions defined priori. However, multiple studies have shown that such an assumption often limits expressive power model would perform poorly when downstream tasks violate assumption. For example, being rotations prevent features from retaining enough...

10.1109/cvprw56347.2022.00456 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) 2022-06-01

Deep Learning based image quality assessment (IQA) has been shown to greatly improve the score prediction accuracy of images with single distortion. However, because these models lack generalizability and multidistortion-based data is relatively low, designing reliable IQA systems still an open issue. In this paper, we propose introduce long-range dependencies between local artifacts high-order spatial pooling into a convolutional neural network (CNN) model performance full-reference...

10.1109/access.2020.2984886 article EN cc-by IEEE Access 2020-01-01

Due to unreliable geometric matching and content misalignment, most conventional pose transfer algorithms fail generate fine-trained person images. In this paper, we propose a novel framework – Spatial Content Alignment GAN (SCA-GAN) which aims enhance the consistency of garment textures details human characteristics. We first alleviate spatial misalignment by transferring edge target in advance. Secondly, introduce new Content-Style DeBlk can progressively synthesize photo-realistic images...

10.1109/icme51207.2021.9428146 article EN 2022 IEEE International Conference on Multimedia and Expo (ICME) 2021-06-09

This report presents the technical details of our submission to 2023 Epic-Kitchen EPIC-SOUNDS Audio-Based Interaction Recognition Challenge. The task is learn mapping from audio samples their corresponding action labels. To achieve this goal, we propose a simple yet effective single-stream CNN-based architecture called AudioInceptionNeXt that operates on time-frequency log-mel-spectrogram samples. Motivated by design InceptionNeXt, parallel multi-scale depthwise separable convolutional...

10.48550/arxiv.2307.07265 preprint EN other-oa arXiv (Cornell University) 2023-01-01

The integration of Federated Learning (FL) and Self-supervised (SSL) offers a unique synergetic combination to exploit the audio data for general-purpose understanding, without compromising user privacy. However, rare efforts have been made investigate SSL models in FL regime especially when training is generated by large-scale heterogeneous sources. In this paper, we evaluate performance feature-matching predictive audio-SSL techniques integrated into settings simulated with...

10.48550/arxiv.2402.02889 preprint EN arXiv (Cornell University) 2024-02-05

Recent research has successfully adapted vision-based convolutional neural network (CNN) architectures for audio recognition tasks using Mel-Spectrograms. However, these CNNs have high computational costs and memory requirements, limiting their deployment on low-end edge devices. Motivated by the success of efficient vision models like InceptionNeXt ConvNeXt, we propose AudioRepInceptionNeXt, a single-stream architecture. Its basic building block breaks down parallel multi-branch depth-wise...

10.48550/arxiv.2404.13551 preprint EN arXiv (Cornell University) 2024-04-21

Federated Learning (FL) has emerged as a privacy-preserving method for training machine learning models in distributed manner on edge devices. However, on-device face inherent computational power and memory limitations, potentially resulting constrained gradient updates. As the model's size increases, frequency of updates devices decreases, ultimately leading to suboptimal outcomes during any particular FL round. This limits feasibility deploying advanced large-scale devices, hindering...

10.48550/arxiv.2409.15898 preprint EN arXiv (Cornell University) 2024-09-24

This paper is the report of first Under-Display Camera (UDC) image restoration challenge in conjunction with RLQ workshop at ECCV 2020. The based on a newly-collected database Camera. tracks correspond to two types display: 4k Transparent OLED (T-OLED) and phone Pentile (P-OLED). Along about 150 teams registered challenge, eight nine submitted results during testing phase for each track. are state-of-the-art performance Restoration. Datasets available https://yzhouas.github.io/projects/UDC/udc.html.

10.48550/arxiv.2008.07742 preprint EN other-oa arXiv (Cornell University) 2020-01-01

Visual Attention Networks (VAN) with Large Kernel (LKA) modules have been shown to provide remarkable performance, that surpasses Vision Transformers (ViTs), on a range of vision-based tasks. However, the depth-wise convolutional layer in these LKA incurs quadratic increase computational and memory footprints increasing kernel size. To mitigate problems enable use extremely large kernels attention VAN, we propose family Separable modules, termed LSKA. LSKA decomposes 2D into cascaded...

10.48550/arxiv.2309.01439 preprint EN other-oa arXiv (Cornell University) 2023-01-01

Recent research has successfully adapted vision-based convolutional neural network (CNN) architectures for audio recognition tasks using Mel-Spectrograms. However, these CNNs have high computational costs and memory requirements, limiting their deployment on low-end edge devices. Motivated by the success of efficient vision models like InceptionNeXt ConvNeXt, we propose AudioRepInceptionNeXt, a single-stream architecture. Its basic building block breaks down parallel multi-branch depth-wise...

10.2139/ssrn.4588783 preprint EN 2023-01-01

Uncertainty estimation aims to evaluate the confidence of a trained deep neural network. However, existing uncertainty approaches rely on low-dimensional distributional assumptions and thus suffer from high dimensionality latent features. Existing tend focus discrete classification probabilities, which leads poor generalizability for other tasks. Moreover, most literature requires seeing out-of-distribution (OOD) data in training better uncertainty, limits performance practice because OOD...

10.48550/arxiv.2310.16587 preprint EN cc-by arXiv (Cornell University) 2023-01-01
Coming Soon ...