- Gaze Tracking and Assistive Technology
- Visual Attention and Saliency Detection
- Glaucoma and retinal disorders
- Advanced Image and Video Retrieval Techniques
- Advanced Neural Network Applications
- Handwritten Text Recognition Techniques
- E-commerce and Technology Innovations
- Retinal Imaging and Analysis
- Speech Recognition and Synthesis
- Advanced Computing and Algorithms
- COVID-19 diagnosis using AI
- Infrared Target Detection Methodologies
- Higher Education and Teaching Methods
- Spectroscopy Techniques in Biomedical and Chemical Research
- Image and Video Quality Assessment
- Olfactory and Sensory Function Studies
- Advanced Algorithms and Applications
- Image and Object Detection Techniques
- Multimodal Machine Learning Applications
- Face and Expression Recognition
- Sharing Economy and Platforms
- Blasting Impact and Analysis
- Video Surveillance and Tracking Methods
- Advanced Electrical Measurement Techniques
- EEG and Brain-Computer Interfaces
Hong Kong University of Science and Technology
2023
University of Hong Kong
2023
Lanzhou University
2023
Chinese Academy of Sciences
2020-2022
Guangzhou Regenerative Medicine and Health Guangdong Laboratory
2022
Guangzhou Institutes of Biomedicine and Health
2022
Shenzhen Institutes of Advanced Technology
2020-2021
Heihe University
2014-2020
City University of Hong Kong
2016-2019
Tencent (China)
2019
Large-scale vision-language pre-trained (VLP) models are prone to hallucinate non-existent visual objects when generating text based on information. In this paper, we systematically study the object hallucination problem from three aspects. First, examine recent state-of-the-art VLP models, showing that they still frequently and achieving better scores standard metrics (e.g., CIDEr) could be more unfaithful. Second, investigate how different types of image encoding in influence...
This article addresses two key issues in RGB-D salient object detection based on the convolutional neural network (CNN). 1) How to bridge gap between "data-hungry" nature of CNNs and insufficient labeled training data depth modality? 2) take full advantages complementary information among modalities. To solve first problem, we model depth-induced saliency as a CNN-based cross-modal transfer learning problem. Instead directly adopting RGB CNN initialization, additionally train modality...
Convolutional neural networks have achieved wide success in RGB saliency detection. Recently, the advent of RGB-D sensors such as Kinect provide additional geometric cues. However, key challenge for salient object detection that how to fuse and depth information sufficiently is still under-studied. Traditional works mainly follow two-stream architecture combine features/decisions an early or late point. The multi-modal fusion stage performed by directly concatenating features from two...
In the context of wearable gaze tracking techniques, problems two-dimensional (2-D) and three-dimensional (3-D) estimation can be viewed as inferring 2-D epipolar lines 3-D visual axes from eye monitoring cameras. To this end, in article, a simple local polynomial model is proposed to back-project pupil center onto its corresponding axis. Based on approximation, homographylike relation derived manner, via Leave-One-Out cross-validation criterion, training samples at one certain depth...
The gaze estimation in the mobile scenario often suffers from extrapolation and parallax errors. In this paper, we propose a novel calibration framework to achieve precise for head-mounted trackers. Our proposed consists of two steps learn point-to-point point-to-line relations, respectively. aim step I is infer relation between pupil centers spatially constrained points regard. By adopting "CalibMe" data acquisition method, sparse Gaussian Process using pseudo-inputs used capture smooth...
Fusing RGB and depth data is compelling in boosting performance for various robotic computer vision tasks. Typically, the streams of information are merged into a single fusion point an early or late stage to generate combined features decisions. The also means path, which congested inflexible fuse all from different modalities. As result, process brute-force consequently insufficient. To address this problem, we propose multi-scale multi-path multi-modal network (M <sup...
Denoising diffusion probabilistic models (DDPMs) have emerged as competitive generative yet brought challenges to efficient sampling. In this paper, we propose novel bilateral denoising (BDDMs), which take significantly fewer steps generate high-quality samples. From a modeling objective, BDDMs parameterize the forward and reverse processes with score network scheduling network, respectively. We show that new lower bound tighter than standard evidence can be derived surrogate objective for...
Infrared imaging spectrometer (IRIS) often suffers from overlapped bands and random noises, which limit the precision of subsequent processing in robot vision sensing. To address this problem, we propose a novel Gabor transform-based infrared spectrum restoration method by successfully exploring intrinsic structure clean IR degraded one. At first, total variation (TV) regularized coefficients adjustment descriptor is designed incorporated into model. Then, proposed model inferred via an...
This paper proposes a new strategy for moving target detection and localization based on monocular vision. Firstly, to detect with large displacement high speed accurately, two consecutive video images captured by camera are preprocessed using the enhancement denoising methods. Then, optical flow representing motion information is calculated iteratively modified Lucas-Kanade method. Secondly, interest region extraction method developed overcome negative impacts caused noises in background....
To improve frequency estimation accuracy, a algorithm based on cross information fusion was proposed. The suitable for signals of short duration and low signal-to-noise ratio (SNR), which are common in engineering. Firstly, several different signal groups were obtained by grouping multisegment according to the guidelines combination. Secondly, rotation factors complementary each group. Thirdly, average spectrum achieved arithmetic mean value all spectra, calculated factors. Finally,...
Although the mobile head-mounted gaze tracker (HMGT) has gained its great success in human-machine interactions, real implementation of HMGT still poses several significant challenges. The parallax error and tedious calibration procedure, as two these challenges, will be addressed our proposed two-step method. In first step, instead fixating at pre-defined points successively, user is only required to change his or her head pose while gazing one marker with allowance short-period...
Today, there have been many achievements in learning the association between voice and face. However, most previous work models rely on cosine similarity or L2 distance to evaluate likeness of voices faces following contrastive learning, subsequently applied retrieval matching tasks. This method only considers embeddings as high-dimensional vectors, utilizing a minimal scope available information. paper introduces novel framework within an unsupervised setting for voice-face associations. By...
Aiming at the problems of low bandwidth, poor anti-interference ability and low-detection accuracy traditional multi-path coherent vehicle network model, an intelligent acquisition model traffic congestion information in environment based on multi-features is proposed. The clusters uses multi-sensor fusion identification method to mine flow. In networking environment, analysed by theory, cross-fusion, text information, location image, audio, video other information-aware technologies,...
Edge intelligence is the development trend of integration Ubiquitous computing and artificial intelligence, autonomous systems represented by smart cars are playing an increasingly important role in edge architecture design, verification, application services, etc. This article takes accurate indoor mapping intelligent vehicles as research object, systematically designs a boundary point generation scheme that covers exploration, filtering, publishing, other parts. A hybrid algorithm...
In digital fringe projection (DFP) techniques, invalid points such as shadows and background cause ambiguity to the measurement. Manually segmenting object is time-wasting, improper selection of threshold makes errors in this paper, we propose an automatic technique based on both modulation histogram intensity histogram, which can segment from a complex without losing useful information. The feasibility method verified by experiments binary defocusing at different defocus levels.
In this paper, we propose a novel calibration framework for the gaze estimation of mobile tracking systems. our method, user's eye and camera are modeled as central catadioptric camera. Thus epipolar geometry tracker can be described by hybrid two-view geometry. To calibrate model, user is asked to at points distributed in 3-D space but not all located on one plane. light binocular training data, apply 3×6 local hybrid-fundamental matrix register pupil centers with lines scene image. image...
Large-scale vision-language pre-trained (VLP) models are prone to hallucinate non-existent visual objects when generating text based on information. In this paper, we systematically study the object hallucination problem from three aspects. First, examine recent state-of-the-art VLP models, showing that they still frequently, and achieving better scores standard metrics (e.g., CIDEr) could be more unfaithful. Second, investigate how different types of image encoding in influence...
In this work, we try to answer two questions: Can deeply learned features with discriminative power benefit an ASR system's robustness acoustic variability? And how learn them without requiring framewise labelled sequence training data? As existing methods usually require knowing where the labels occur in input sequence, they have so far been limited many real-world learning tasks. We propose a novel method which simultaneously models both and feature within single network architecture, that...