Kim–Hui Yap

ORCID: 0000-0003-1933-4986
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Advanced Image and Video Retrieval Techniques
  • Image Retrieval and Classification Techniques
  • Image and Signal Denoising Methods
  • Advanced Image Processing Techniques
  • Advanced Vision and Imaging
  • Image Processing Techniques and Applications
  • Multimodal Machine Learning Applications
  • Video Surveillance and Tracking Methods
  • Human Pose and Action Recognition
  • Remote-Sensing Image Classification
  • Advanced Neural Network Applications
  • Domain Adaptation and Few-Shot Learning
  • Video Coding and Compression Technologies
  • Robotics and Sensor-Based Localization
  • Face recognition and analysis
  • Generative Adversarial Networks and Image Synthesis
  • Advanced Data Compression Techniques
  • Video Analysis and Summarization
  • 3D Shape Modeling and Analysis
  • Autonomous Vehicle Technology and Safety
  • Neural Networks and Applications
  • Digital Media Forensic Detection
  • Sparse and Compressive Sensing Techniques
  • Digital and Cyber Forensics
  • Gait Recognition and Analysis

Nanyang Technological University
2014-2024

Advanced Digital Sciences Center
2018

Toronto Metropolitan University
2010

The University of Sydney
2000-2003

This paper proposes Attribute Attention Network (AANet), a new architecture that integrates person attributes and attribute attention maps into classification framework to solve the re-identification (re-ID) problem. Many re-ID models typically employ semantic cues such as body parts or human pose improve performance. information, however, is often not utilized. The proposed AANet leverages on baseline model uses key information in an unified learning framework. consists of global ID task,...

10.1109/cvpr.2019.00730 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2019-06-01

Point clouds provide intrinsic geometric information and surface context for scene understanding. Existing methods point cloud segmentation require a large amount of fully labeled data. Using advanced depth sensors, collection scale 3D dataset is no longer cumbersome process. However, manually producing point-level label on the time labor-intensive. In this paper, we propose weakly supervised approach to predict results using weak labels clouds. We introduce our multi-path region mining...

10.1109/cvpr42600.2020.00444 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2020-06-01

This paper proposes a new algorithm to integrate image registration into super-resolution (SR). Image SR is process reconstruct high-resolution (HR) by fusing multiple low-resolution (LR) images. A critical step in accurate of the LR images or, other words, effective estimation motion parameters. Conventional algorithms assume either estimated parameters existing methods be error-free or are known priori. assumption, however, impractical many applications, as most still experience various...

10.1109/tip.2007.908074 article EN IEEE Transactions on Image Processing 2007-10-15

We examine two different techniques for parameter averaging in GAN training. Moving Average (MA) computes the time-average of parameters, whereas Exponential (EMA) an exponentially discounted sum. Whilst MA is known to lead convergence bilinear settings, we provide -- our knowledge first theoretical arguments support EMA. show that EMA converges limit cycles around equilibrium with vanishing amplitude as discount approaches one simple games and also enhances stability general establish...

10.48550/arxiv.1806.04498 preprint EN cc-by arXiv (Cornell University) 2018-01-01

It is difficult to construct a data collection including all possible combinations of human actions and interacting objects due the combinatorial nature human-object interactions (HOI). In this work, we aim develop transferable HOI detector for unseen interactions. Existing detectors often treat as discrete labels learn classifier according predetermined category space. This inherently inapt detecting which are out predefined categories. Conversely, independent natural language supervision...

10.1109/cvpr52688.2022.00101 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2022-06-01

This paper proposes a new bag-of-visual phrase (BoP) approach for mobile landmark recognition based on discriminative learning of category-dependent visual phrases. Many previous works adopt bag-of-words (BoW) method which ignores the co-occurrence relationship between neighboring words in an image. Although some that focus have appeared, they mainly construct generalized dictionary from all categories recognition, lacks descriptive capability specific category. Another shortcoming these is...

10.1109/tmm.2014.2301978 article EN IEEE Transactions on Multimedia 2014-01-31

Conventional relevance feedback in content-based image retrieval (CBIR) systems uses only the labeled images for learning. Image labeling, however, is a time-consuming task and users are often unwilling to label too many during process. This gives rise small sample problem where learning from number of training samples restricts performance. To address this problem, we propose technique based on concept pseudo-labeling order enlarge data set. As name implies, pseudo-labeled an not explicitly...

10.1109/mci.2006.1626490 article EN IEEE Computational Intelligence Magazine 2006-05-01

We aim to detect human interactions with novel objects through zero-shot learning. Different from previous works, we allow unseen object categories by using its semantic word embedding. To do so, design a human-object region proposal network specifically for the interaction detection task. The core idea is leverage visual clues localize which are interacting humans. show that our proposed model can outperform existing methods on detecting objects, and generalize well objects. recognize...

10.1109/cvpr42600.2020.01167 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2020-06-01

In this paper, we investigate an open research task of generating controllable 3D textured shapes from the given textual descriptions. Previous works either require ground truth caption labeling or extensive optimization time. To resolve these issues, present a novel framework, TAPS3D, to train text-guided shape generator with pseudo captions. Specifically, based on rendered 2D images, retrieve relevant words CLIP vocabulary and construct captions using templates. Our constructed provide...

10.1109/cvpr52729.2023.01612 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2023-06-01

Mobile-based landmark recognition is becoming increasingly appealing due to the proliferation of mobile devices coupled with improving processing techniques, imaging capability, and networking infrastructure. This article provides a general overview existing mobile-based nonmobile-based systems their differences. We discuss content context analysis compare classification methods. also present experimental results our own evaluations based on analysis, integrated content-context analysis.

10.1109/mis.2010.12 article EN IEEE Intelligent Systems 2010-01-01

This paper proposes a new soft bag-of-words (BoW) method for mobile landmark recognition based on discriminative learning of image patches. Conventional BoW methods often consider the patches/regions in images as equally important learning. Amongst few existing works that information patches, they mainly focus selecting representative patches training, and discard others. binary hard selection approach results underutilization available, some discarded may still contain useful information....

10.1109/tcyb.2013.2267015 article EN IEEE Transactions on Cybernetics 2013-07-03

Multifocal multiview (MFMV) is an emerging high-dimensional optical data that allows to record richer scene information but yields huge volumes of data. To unveil its imaging mechanism, we present angular–focal–spatial representation model, which decomposes MFMV into angular, spatial, and focal dimensions. construct a comprehensive dataset, leverage representative prototypes, including digital camera imaging, plenoptic refocusing, synthesized Blender 3D creation. It believed be the...

10.1364/ol.505496 article EN Optics Letters 2024-01-02

Digital watermarking has become an important technique for copyright protection, and various schemes have been proposed. Singular Value Decomposition (SVD) used as a valuable transform robust digital due to some superior characteristics not obtained by DCT, DFT or DWT. In this paper, we present new hybrid image scheme based on SVD DCT. After applying the cover blocks, perform DCT macro block comprised of first singular values (SVs) each block. We also developed method embed watermark in...

10.1109/icip.2011.6116241 article EN 2011-09-01

Recently, much research efforts have been dedicated to the development of computer-vision-based driver fatigue detection systems. Most them utilize RGB data, and focus on status during day. However, drivers are more likely be tired drowsy night time. In this paper, we present a system based CNN using depth video sequences, which helps provide alerts properly Specifically, two-stream architecture incorporates spatial information current frame temporal neighboring frames is represented by...

10.1109/icot.2017.8336111 article EN 2017-12-01

Ethnicity information is an integral part of human identity, and a useful identifier for various applications ranging from video surveillance, targeted advertisement to social media profiling. In recent years, Convolutional Neural Networks (CNNs) have shown state-of-the-art performance in many visual recognition problems. Currently, there are few CNN-based approaches on ethnicity classification [1], [2]. However, the suffer following limitations: (i) most face datasets do not include...

10.1109/iscas.2018.8351370 article EN 2022 IEEE International Symposium on Circuits and Systems (ISCAS) 2018-05-01

Visual food recognition on mobile devices has attracted increasing attention in recent years due to its roles individual diet monitoring and social health management analysis. Existing visual approaches usually use large server-based networks achieve high accuracy. However, these are not compact enough be deployed devices. Even though some architectures have been proposed, most of them unable obtain the performance full-size networks. In view this, this paper proposes a Joint-learning...

10.1109/jstsp.2020.2969328 article EN IEEE Journal of Selected Topics in Signal Processing 2020-01-24

Food-related applications and services are essential for the health well-being of people. With rapid development social networks mobile devices, food images captured by people can offer rich knowledge about also necessary dietary assistance that require special care. Known recognition frameworks approaches in computer vision have heavy reliance on many-shot training a deep network existing large-scale datasets. However, it is common many categories difficult to collect enough training....

10.1109/wacv48630.2021.00175 article EN 2021-01-01

This paper proposes a blind image deconvolution scheme based on soft integration of parametric blur structures. Conventional methods encounter difficult dilemma either imposing stringent and inflexible preconditions the problem formulation or experiencing poor restoration results due to lack information. attempts address this issue by assessing relevance information, incorporating knowledge into double regularization (PDR) scheme. The PDR method assumes that actual satisfies up certain...

10.1109/tip.2005.846024 article EN IEEE Transactions on Image Processing 2005-04-19

This paper presents a novel framework called fuzzy relevance feedback in interactive content-based image retrieval systems. Conventional binary labeling requires crisp decisions to be made on the of retrieved images. is restrictive as user interpretation similarity imprecise and nonstationary nature may vary with respect different information needs perceptual subjectivity. It is, therefore, inadequate model perception logic. In view this, we propose soft notion integrate users' visual...

10.1109/tcsvt.2005.856912 article EN IEEE Transactions on Circuits and Systems for Video Technology 2005-12-01

The growing usage of mobile devices has led to proliferation many applications. A trend in applications is centered on landmark recognition. It a new application that recognizes captured using the device and retrieves related information. This paper will present survey recognition for information retrieval. general overview existing systems be summarized. techniques algorithms used literatures, including content analysis landmarks classification methods recognition, described.

10.1109/mdm.2009.107 article EN 2009-01-01

10.1007/s11042-011-0821-2 article EN Multimedia Tools and Applications 2011-05-23
Coming Soon ...