- Advanced Image and Video Retrieval Techniques
- Image Retrieval and Classification Techniques
- Image and Signal Denoising Methods
- Advanced Image Processing Techniques
- Advanced Vision and Imaging
- Image Processing Techniques and Applications
- Multimodal Machine Learning Applications
- Video Surveillance and Tracking Methods
- Human Pose and Action Recognition
- Remote-Sensing Image Classification
- Advanced Neural Network Applications
- Domain Adaptation and Few-Shot Learning
- Video Coding and Compression Technologies
- Robotics and Sensor-Based Localization
- Face recognition and analysis
- Generative Adversarial Networks and Image Synthesis
- Advanced Data Compression Techniques
- Video Analysis and Summarization
- 3D Shape Modeling and Analysis
- Autonomous Vehicle Technology and Safety
- Neural Networks and Applications
- Digital Media Forensic Detection
- Sparse and Compressive Sensing Techniques
- Digital and Cyber Forensics
- Gait Recognition and Analysis
Nanyang Technological University
2014-2024
Advanced Digital Sciences Center
2018
Toronto Metropolitan University
2010
The University of Sydney
2000-2003
This paper proposes Attribute Attention Network (AANet), a new architecture that integrates person attributes and attribute attention maps into classification framework to solve the re-identification (re-ID) problem. Many re-ID models typically employ semantic cues such as body parts or human pose improve performance. information, however, is often not utilized. The proposed AANet leverages on baseline model uses key information in an unified learning framework. consists of global ID task,...
Point clouds provide intrinsic geometric information and surface context for scene understanding. Existing methods point cloud segmentation require a large amount of fully labeled data. Using advanced depth sensors, collection scale 3D dataset is no longer cumbersome process. However, manually producing point-level label on the time labor-intensive. In this paper, we propose weakly supervised approach to predict results using weak labels clouds. We introduce our multi-path region mining...
This paper proposes a new algorithm to integrate image registration into super-resolution (SR). Image SR is process reconstruct high-resolution (HR) by fusing multiple low-resolution (LR) images. A critical step in accurate of the LR images or, other words, effective estimation motion parameters. Conventional algorithms assume either estimated parameters existing methods be error-free or are known priori. assumption, however, impractical many applications, as most still experience various...
We examine two different techniques for parameter averaging in GAN training. Moving Average (MA) computes the time-average of parameters, whereas Exponential (EMA) an exponentially discounted sum. Whilst MA is known to lead convergence bilinear settings, we provide -- our knowledge first theoretical arguments support EMA. show that EMA converges limit cycles around equilibrium with vanishing amplitude as discount approaches one simple games and also enhances stability general establish...
It is difficult to construct a data collection including all possible combinations of human actions and interacting objects due the combinatorial nature human-object interactions (HOI). In this work, we aim develop transferable HOI detector for unseen interactions. Existing detectors often treat as discrete labels learn classifier according predetermined category space. This inherently inapt detecting which are out predefined categories. Conversely, independent natural language supervision...
This paper proposes a new bag-of-visual phrase (BoP) approach for mobile landmark recognition based on discriminative learning of category-dependent visual phrases. Many previous works adopt bag-of-words (BoW) method which ignores the co-occurrence relationship between neighboring words in an image. Although some that focus have appeared, they mainly construct generalized dictionary from all categories recognition, lacks descriptive capability specific category. Another shortcoming these is...
Conventional relevance feedback in content-based image retrieval (CBIR) systems uses only the labeled images for learning. Image labeling, however, is a time-consuming task and users are often unwilling to label too many during process. This gives rise small sample problem where learning from number of training samples restricts performance. To address this problem, we propose technique based on concept pseudo-labeling order enlarge data set. As name implies, pseudo-labeled an not explicitly...
We aim to detect human interactions with novel objects through zero-shot learning. Different from previous works, we allow unseen object categories by using its semantic word embedding. To do so, design a human-object region proposal network specifically for the interaction detection task. The core idea is leverage visual clues localize which are interacting humans. show that our proposed model can outperform existing methods on detecting objects, and generalize well objects. recognize...
In this paper, we investigate an open research task of generating controllable 3D textured shapes from the given textual descriptions. Previous works either require ground truth caption labeling or extensive optimization time. To resolve these issues, present a novel framework, TAPS3D, to train text-guided shape generator with pseudo captions. Specifically, based on rendered 2D images, retrieve relevant words CLIP vocabulary and construct captions using templates. Our constructed provide...
Mobile-based landmark recognition is becoming increasingly appealing due to the proliferation of mobile devices coupled with improving processing techniques, imaging capability, and networking infrastructure. This article provides a general overview existing mobile-based nonmobile-based systems their differences. We discuss content context analysis compare classification methods. also present experimental results our own evaluations based on analysis, integrated content-context analysis.
This paper proposes a new soft bag-of-words (BoW) method for mobile landmark recognition based on discriminative learning of image patches. Conventional BoW methods often consider the patches/regions in images as equally important learning. Amongst few existing works that information patches, they mainly focus selecting representative patches training, and discard others. binary hard selection approach results underutilization available, some discarded may still contain useful information....
Multifocal multiview (MFMV) is an emerging high-dimensional optical data that allows to record richer scene information but yields huge volumes of data. To unveil its imaging mechanism, we present angular–focal–spatial representation model, which decomposes MFMV into angular, spatial, and focal dimensions. construct a comprehensive dataset, leverage representative prototypes, including digital camera imaging, plenoptic refocusing, synthesized Blender 3D creation. It believed be the...
Digital watermarking has become an important technique for copyright protection, and various schemes have been proposed. Singular Value Decomposition (SVD) used as a valuable transform robust digital due to some superior characteristics not obtained by DCT, DFT or DWT. In this paper, we present new hybrid image scheme based on SVD DCT. After applying the cover blocks, perform DCT macro block comprised of first singular values (SVs) each block. We also developed method embed watermark in...
Recently, much research efforts have been dedicated to the development of computer-vision-based driver fatigue detection systems. Most them utilize RGB data, and focus on status during day. However, drivers are more likely be tired drowsy night time. In this paper, we present a system based CNN using depth video sequences, which helps provide alerts properly Specifically, two-stream architecture incorporates spatial information current frame temporal neighboring frames is represented by...
Ethnicity information is an integral part of human identity, and a useful identifier for various applications ranging from video surveillance, targeted advertisement to social media profiling. In recent years, Convolutional Neural Networks (CNNs) have shown state-of-the-art performance in many visual recognition problems. Currently, there are few CNN-based approaches on ethnicity classification [1], [2]. However, the suffer following limitations: (i) most face datasets do not include...
Visual food recognition on mobile devices has attracted increasing attention in recent years due to its roles individual diet monitoring and social health management analysis. Existing visual approaches usually use large server-based networks achieve high accuracy. However, these are not compact enough be deployed devices. Even though some architectures have been proposed, most of them unable obtain the performance full-size networks. In view this, this paper proposes a Joint-learning...
Food-related applications and services are essential for the health well-being of people. With rapid development social networks mobile devices, food images captured by people can offer rich knowledge about also necessary dietary assistance that require special care. Known recognition frameworks approaches in computer vision have heavy reliance on many-shot training a deep network existing large-scale datasets. However, it is common many categories difficult to collect enough training....
This paper proposes a blind image deconvolution scheme based on soft integration of parametric blur structures. Conventional methods encounter difficult dilemma either imposing stringent and inflexible preconditions the problem formulation or experiencing poor restoration results due to lack information. attempts address this issue by assessing relevance information, incorporating knowledge into double regularization (PDR) scheme. The PDR method assumes that actual satisfies up certain...
This paper presents a novel framework called fuzzy relevance feedback in interactive content-based image retrieval systems. Conventional binary labeling requires crisp decisions to be made on the of retrieved images. is restrictive as user interpretation similarity imprecise and nonstationary nature may vary with respect different information needs perceptual subjectivity. It is, therefore, inadequate model perception logic. In view this, we propose soft notion integrate users' visual...
The growing usage of mobile devices has led to proliferation many applications. A trend in applications is centered on landmark recognition. It a new application that recognizes captured using the device and retrieves related information. This paper will present survey recognition for information retrieval. general overview existing systems be summarized. techniques algorithms used literatures, including content analysis landmarks classification methods recognition, described.