NFDI4DS | UHH-SEMS - Publication Details

Large Separable Kernel Attention: Rethinking the Large Kernel Attention design in CNN

OPENALEX - Publications

Kin Wai Lau Lai-Man Po Yasar Abbas Ur Rehman

10.1016/j.eswa.2023.121352 article EN Expert Systems with Applications 2023-09-01

A Novel Patch Variance Biased Convolutional Neural Network for No-Reference Image Quality Assessment

OPENALEX - Publications

Lai-Man Po Mengyang Liu Wilson Y. F. Yuen Yuming Li Xuyuan Xu and 4 more

Deep convolutional neural networks (CNNs) have been successfully applied on no-reference image quality assessment (NR-IQA) with respect to human perception. Most of these methods deal small patches and use the average score test for predicting whole quality. We discovered that from homogenous regions are unreliable both network training final estimation. In addition, complex structures much higher chances achieving better prediction. Based findings, we enhanced conventional CNN-based NR-IQA...

10.1109/tcsvt.2019.2891159 article EN IEEE Transactions on Circuits and Systems for Video Technology 2019-01-09

Large Separable Kernel Attention: Rethinking the Large Kernel Attention Design in Cnn

OPENALEX - Publications

Kin Wai Lau Lai-Man Po Yasar Abbas Ur Rehman

Visual Attention Networks (VAN) with Large Kernel (LKA) modules have been shown to provide remarkable performance, that surpasses Vision Transformers (ViTs), on a range of vision-based tasks. However, the depth-wise convolutional layer in these LKA incurs quadratic increase computational and memory footprints increasing kernel size. To mitigate problems enable use extremely large kernels attention VAN, we propose family Separable modules, termed LSKA. LSKA decomposes 2D into cascaded...

10.2139/ssrn.4463661 preprint EN 2023-01-01

Angular Deep Supervised Hashing for Image Retrieval

OPENALEX - Publications

Chang Zhou Lai-Man Po Wilson Y. F. Yuen Kwok-Wai Cheung Xuyuan Xu and 4 more

Deep learning based image hashing methods learn hash codes by using powerful feature extractors and nonlinear transformations to achieve highly efficient retrieval. For most end-to-end deep methods, the supervised process relies on pair-wise or triplet-wise information provide an internal relationship of similarity data. However, use triplet loss function is limited not only expensive training costs but also quantization errors. In this paper, we propose a novel semantic method for retrieval...

10.1109/access.2019.2939650 article EN cc-by IEEE Access 2019-01-01

What Should Be Equivariant In Self-Supervised Learning

OPENALEX - Publications

Yuyang Xie Jianhong Wen Kin Wai Lau Yasar Abbas Ur Rehman Jiajun Shen

Self-supervised learning (SSL) aims to learn feature representation without human-annotated data. Existing methods approach this goal by encouraging the representations be invariant under a set of task-irrelevant transformations and distortions defined priori. However, multiple studies have shown that such an assumption often limits expressive power model would perform poorly when downstream tasks violate assumption. For example, being rotations prevent features from retaining enough...

10.1109/cvprw56347.2022.00456 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) 2022-06-01

AudioRepInceptionNeXt: A lightweight single-stream architecture for efficient audio recognition

OPENALEX - Publications

Kin Wai Lau Yasar Abbas Ur Rehman Lai-Man Po

10.1016/j.neucom.2024.127432 article EN Neurocomputing 2024-02-20

Long-Range Dependencies and High-Order Spatial Pooling for Deep Model-Based Full-Reference Image Quality Assessment

OPENALEX - Publications

Mengyang Liu Lai-Man Po Xuyuan Xu Kwok-Wai Cheung Yuzhi Zhao and 2 more

Deep Learning based image quality assessment (IQA) has been shown to greatly improve the score prediction accuracy of images with single distortion. However, because these models lack generalizability and multidistortion-based data is relatively low, designing reliable IQA systems still an open issue. In this paper, we propose introduce long-range dependencies between local artifacts high-order spatial pooling into a convolutional neural network (CNN) model performance full-reference...

10.1109/access.2020.2984886 article EN cc-by IEEE Access 2020-01-01

Spatial Content Alignment for Pose Transfer

OPENALEX - Publications

Wing-Yin Yu Lai-Man Po Yuzhi Zhao Jingjing Xiong Kin Wai Lau

Due to unreliable geometric matching and content misalignment, most conventional pose transfer algorithms fail generate fine-trained person images. In this paper, we propose a novel framework – Spatial Content Alignment GAN (SCA-GAN) which aims enhance the consistency of garment textures details human characteristics. We first alleviate spatial misalignment by transferring edge target in advance. Secondly, introduce new Content-Style DeBlk can progressively synthesize photo-realistic images...

10.1109/icme51207.2021.9428146 article EN 2022 IEEE International Conference on Multimedia and Expo (ICME) 2021-06-09

AudioInceptionNeXt: TCL AI LAB Submission to EPIC-SOUND Audio-Based-Interaction-Recognition Challenge 2023

OPENALEX - Publications

Kin Wai Lau Yasar Abbas Ur Rehman Yuyang Xie Lan Ma

This report presents the technical details of our submission to 2023 Epic-Kitchen EPIC-SOUNDS Audio-Based Interaction Recognition Challenge. The task is learn mapping from audio samples their corresponding action labels. To achieve this goal, we propose a simple yet effective single-stream CNN-based architecture called AudioInceptionNeXt that operates on time-frequency log-mel-spectrogram samples. Motivated by design InceptionNeXt, parallel multi-scale depthwise separable convolutional...

10.48550/arxiv.2307.07265 preprint EN other-oa arXiv (Cornell University) 2023-01-01

FEANet: Foreground-edge-aware network with DenseASPOC for human parsing

OPENALEX - Publications

Wing-Yin Yu Lai-Man Po Yuzhi Zhao Yujia Zhang Kin Wai Lau

10.1016/j.imavis.2021.104145 article EN Image and Vision Computing 2021-03-01

Exploring Federated Self-Supervised Learning for General Purpose Audio Understanding

OPENALEX - Publications

Yasar Abbas Ur Rehman Kin Wai Lau Yuyang Xie Lan Ma Jiajun Shen

The integration of Federated Learning (FL) and Self-supervised (SSL) offers a unique synergetic combination to exploit the audio data for general-purpose understanding, without compromising user privacy. However, rare efforts have been made investigate SSL models in FL regime especially when training is generated by large-scale heterogeneous sources. In this paper, we evaluate performance feature-matching predictive audio-SSL techniques integrated into settings simulated with...

10.48550/arxiv.2402.02889 preprint EN arXiv (Cornell University) 2024-02-05

AudioRepInceptionNeXt: A lightweight single-stream architecture for efficient audio recognition

OPENALEX - Publications

Kin Wai Lau Yasar Abbas Ur Rehman Lai-Man Po

Recent research has successfully adapted vision-based convolutional neural network (CNN) architectures for audio recognition tasks using Mel-Spectrograms. However, these CNNs have high computational costs and memory requirements, limiting their deployment on low-end edge devices. Motivated by the success of efficient vision models like InceptionNeXt ConvNeXt, we propose AudioRepInceptionNeXt, a single-stream architecture. Its basic building block breaks down parallel multi-branch depth-wise...

10.48550/arxiv.2404.13551 preprint EN arXiv (Cornell University) 2024-04-21

Exploring Federated Self-Supervised Learning for General Purpose Audio Understanding

OPENALEX - Publications

Yasar Abbas Ur Rehman Kin Wai Lau Yuyang Xie Lan Ma Jiajun Shen

10.1109/icasspw62465.2024.10627121 article EN 2024-04-14

FedRepOpt: Gradient Re-parametrized Optimizers in Federated Learning

OPENALEX - Publications

Kin Wai Lau Yasar Abbas Ur Rehman Pedro Porto Buarque de Gusmão Lai-Man Po Lan Ma and 1 more

Federated Learning (FL) has emerged as a privacy-preserving method for training machine learning models in distributed manner on edge devices. However, on-device face inherent computational power and memory limitations, potentially resulting constrained gradient updates. As the model's size increases, frequency of updates devices decreases, ultimately leading to suboptimal outcomes during any particular FL round. This limits feasibility deploying advanced large-scale devices, hindering...

10.48550/arxiv.2409.15898 preprint EN arXiv (Cornell University) 2024-09-24

UDC 2020 Challenge on Image Restoration of Under-Display Camera: Methods and Results

OPENALEX - Publications

Yuqian Zhou Michael Kwan Kyle Tolentino Neil Emerton Sehoon Lim and 40 more

This paper is the report of first Under-Display Camera (UDC) image restoration challenge in conjunction with RLQ workshop at ECCV 2020. The based on a newly-collected database Camera. tracks correspond to two types display: 4k Transparent OLED (T-OLED) and phone Pentile (P-OLED). Along about 150 teams registered challenge, eight nine submitted results during testing phase for each track. are state-of-the-art performance Restoration. Datasets available https://yzhouas.github.io/projects/UDC/udc.html.

10.48550/arxiv.2008.07742 preprint EN other-oa arXiv (Cornell University) 2020-01-01

Large Separable Kernel Attention: Rethinking the Large Kernel Attention Design in CNN

OPENALEX - Publications

Kin Wai Lau Lai-Man Po Yasar Abbas Ur Rehman

Visual Attention Networks (VAN) with Large Kernel (LKA) modules have been shown to provide remarkable performance, that surpasses Vision Transformers (ViTs), on a range of vision-based tasks. However, the depth-wise convolutional layer in these LKA incurs quadratic increase computational and memory footprints increasing kernel size. To mitigate problems enable use extremely large kernels attention VAN, we propose family Separable modules, termed LSKA. LSKA decomposes 2D into cascaded...

10.48550/arxiv.2309.01439 preprint EN other-oa arXiv (Cornell University) 2023-01-01

Audiorepinceptionnext: A Lightweight Single-Stream Architecture for Efficient Audio Recognition

OPENALEX - Publications

Kin Wai Lau Yasar Abbas Ur Rehman Lai-Man Po

Recent research has successfully adapted vision-based convolutional neural network (CNN) architectures for audio recognition tasks using Mel-Spectrograms. However, these CNNs have high computational costs and memory requirements, limiting their deployment on low-end edge devices. Motivated by the success of efficient vision models like InceptionNeXt ConvNeXt, we propose AudioRepInceptionNeXt, a single-stream architecture. Its basic building block breaks down parallel multi-branch depth-wise...

10.2139/ssrn.4588783 preprint EN 2023-01-01

Adaptive Uncertainty Estimation via High-Dimensional Testing on Latent Representations

OPENALEX - Publications

Tsai Hor Chan Kin Wai Lau Jiajun Shen Guosheng Yin Lequan Yu

Uncertainty estimation aims to evaluate the confidence of a trained deep neural network. However, existing uncertainty approaches rely on low-dimensional distributional assumptions and thus suffer from high dimensionality latent features. Existing tend focus discrete classification probabilities, which leads poor generalizability for other tasks. Moreover, most literature requires seeing out-of-distribution (OOD) data in training better uncertainty, limits performance practice because OOD...

10.48550/arxiv.2310.16587 preprint EN cc-by arXiv (Cornell University) 2023-01-01