Dan Su

ORCID: 0000-0003-0072-0967
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Gaze Tracking and Assistive Technology
  • Visual Attention and Saliency Detection
  • Glaucoma and retinal disorders
  • Advanced Image and Video Retrieval Techniques
  • Advanced Neural Network Applications
  • Handwritten Text Recognition Techniques
  • E-commerce and Technology Innovations
  • Retinal Imaging and Analysis
  • Speech Recognition and Synthesis
  • Advanced Computing and Algorithms
  • COVID-19 diagnosis using AI
  • Infrared Target Detection Methodologies
  • Higher Education and Teaching Methods
  • Spectroscopy Techniques in Biomedical and Chemical Research
  • Image and Video Quality Assessment
  • Olfactory and Sensory Function Studies
  • Advanced Algorithms and Applications
  • Image and Object Detection Techniques
  • Multimodal Machine Learning Applications
  • Face and Expression Recognition
  • Sharing Economy and Platforms
  • Blasting Impact and Analysis
  • Video Surveillance and Tracking Methods
  • Advanced Electrical Measurement Techniques
  • EEG and Brain-Computer Interfaces

Hong Kong University of Science and Technology
2023

University of Hong Kong
2023

Lanzhou University
2023

Chinese Academy of Sciences
2020-2022

Guangzhou Regenerative Medicine and Health Guangdong Laboratory
2022

Guangzhou Institutes of Biomedicine and Health
2022

Shenzhen Institutes of Advanced Technology
2020-2021

Heihe University
2014-2020

City University of Hong Kong
2016-2019

Tencent (China)
2019

Large-scale vision-language pre-trained (VLP) models are prone to hallucinate non-existent visual objects when generating text based on information. In this paper, we systematically study the object hallucination problem from three aspects. First, examine recent state-of-the-art VLP models, showing that they still frequently and achieving better scores standard metrics (e.g., CIDEr) could be more unfaithful. Second, investigate how different types of image encoding in influence...

10.18653/v1/2023.eacl-main.156 article EN cc-by 2023-01-01

This article addresses two key issues in RGB-D salient object detection based on the convolutional neural network (CNN). 1) How to bridge gap between "data-hungry" nature of CNNs and insufficient labeled training data depth modality? 2) take full advantages complementary information among modalities. To solve first problem, we model depth-induced saliency as a CNN-based cross-modal transfer learning problem. Instead directly adopting RGB CNN initialization, additionally train modality...

10.1109/tcyb.2019.2934986 article EN IEEE Transactions on Cybernetics 2019-08-30

Convolutional neural networks have achieved wide success in RGB saliency detection. Recently, the advent of RGB-D sensors such as Kinect provide additional geometric cues. However, key challenge for salient object detection that how to fuse and depth information sufficiently is still under-studied. Traditional works mainly follow two-stream architecture combine features/decisions an early or late point. The multi-modal fusion stage performed by directly concatenating features from two...

10.1109/iros.2018.8594373 article EN 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) 2018-10-01

In the context of wearable gaze tracking techniques, problems two-dimensional (2-D) and three-dimensional (3-D) estimation can be viewed as inferring 2-D epipolar lines 3-D visual axes from eye monitoring cameras. To this end, in article, a simple local polynomial model is proposed to back-project pupil center onto its corresponding axis. Based on approximation, homographylike relation derived manner, via Leave-One-Out cross-validation criterion, training samples at one certain depth...

10.1109/tii.2019.2933481 article EN IEEE Transactions on Industrial Informatics 2019-08-06

The gaze estimation in the mobile scenario often suffers from extrapolation and parallax errors. In this paper, we propose a novel calibration framework to achieve precise for head-mounted trackers. Our proposed consists of two steps learn point-to-point point-to-line relations, respectively. aim step I is infer relation between pupil centers spatially constrained points regard. By adopting "CalibMe" data acquisition method, sparse Gaussian Process using pseudo-inputs used capture smooth...

10.1109/tii.2018.2867952 article EN IEEE Transactions on Industrial Informatics 2018-08-30

Fusing RGB and depth data is compelling in boosting performance for various robotic computer vision tasks. Typically, the streams of information are merged into a single fusion point an early or late stage to generate combined features decisions. The also means path, which congested inflexible fuse all from different modalities. As result, process brute-force consequently insufficient. To address this problem, we propose multi-scale multi-path multi-modal network (M <sup...

10.1109/iros.2017.8206370 article EN 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) 2017-09-01

Denoising diffusion probabilistic models (DDPMs) have emerged as competitive generative yet brought challenges to efficient sampling. In this paper, we propose novel bilateral denoising (BDDMs), which take significantly fewer steps generate high-quality samples. From a modeling objective, BDDMs parameterize the forward and reverse processes with score network scheduling network, respectively. We show that new lower bound tighter than standard evidence can be derived surrogate objective for...

10.48550/arxiv.2108.11514 preprint EN other-oa arXiv (Cornell University) 2021-01-01

Infrared imaging spectrometer (IRIS) often suffers from overlapped bands and random noises, which limit the precision of subsequent processing in robot vision sensing. To address this problem, we propose a novel Gabor transform-based infrared spectrum restoration method by successfully exploring intrinsic structure clean IR degraded one. At first, total variation (TV) regularized coefficients adjustment descriptor is designed incorporated into model. Then, proposed model inferred via an...

10.1109/iros40897.2019.8967891 article EN 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) 2019-11-01

This paper proposes a new strategy for moving target detection and localization based on monocular vision. Firstly, to detect with large displacement high speed accurately, two consecutive video images captured by camera are preprocessed using the enhancement denoising methods. Then, optical flow representing motion information is calculated iteratively modified Lucas-Kanade method. Secondly, interest region extraction method developed overcome negative impacts caused noises in background....

10.1109/rcar52367.2021.9517462 article EN 2022 IEEE International Conference on Real-time Computing and Robotics (RCAR) 2021-07-15

To improve frequency estimation accuracy, a algorithm based on cross information fusion was proposed. The suitable for signals of short duration and low signal-to-noise ratio (SNR), which are common in engineering. Firstly, several different signal groups were obtained by grouping multisegment according to the guidelines combination. Secondly, rotation factors complementary each group. Thirdly, average spectrum achieved arithmetic mean value all spectra, calculated factors. Finally,...

10.1088/0957-0233/26/1/015004 article EN Measurement Science and Technology 2014-12-01

Although the mobile head-mounted gaze tracker (HMGT) has gained its great success in human-machine interactions, real implementation of HMGT still poses several significant challenges. The parallax error and tedious calibration procedure, as two these challenges, will be addressed our proposed two-step method. In first step, instead fixating at pre-defined points successively, user is only required to change his or her head pose while gazing one marker with allowance short-period...

10.1109/robio.2016.7866370 article EN 2021 IEEE International Conference on Robotics and Biomimetics (ROBIO) 2016-12-01

Today, there have been many achievements in learning the association between voice and face. However, most previous work models rely on cosine similarity or L2 distance to evaluate likeness of voices faces following contrastive learning, subsequently applied retrieval matching tasks. This method only considers embeddings as high-dimensional vectors, utilizing a minimal scope available information. paper introduces novel framework within an unsupervised setting for voice-face associations. By...

10.48550/arxiv.2404.09509 preprint EN arXiv (Cornell University) 2024-04-15

Aiming at the problems of low bandwidth, poor anti-interference ability and low-detection accuracy traditional multi-path coherent vehicle network model, an intelligent acquisition model traffic congestion information in environment based on multi-features is proposed. The clusters uses multi-sensor fusion identification method to mine flow. In networking environment, analysed by theory, cross-fusion, text information, location image, audio, video other information-aware technologies,...

10.1504/ijvics.2019.101512 article EN International Journal of Vehicle Information and Communication Systems 2019-01-01

Edge intelligence is the development trend of integration Ubiquitous computing and artificial intelligence, autonomous systems represented by smart cars are playing an increasingly important role in edge architecture design, verification, application services, etc. This article takes accurate indoor mapping intelligent vehicles as research object, systematically designs a boundary point generation scheme that covers exploration, filtering, publishing, other parts. A hybrid algorithm...

10.1109/icemi59194.2023.10270059 article EN 2023-08-09

In digital fringe projection (DFP) techniques, invalid points such as shadows and background cause ambiguity to the measurement. Manually segmenting object is time-wasting, improper selection of threshold makes errors in this paper, we propose an automatic technique based on both modulation histogram intensity histogram, which can segment from a complex without losing useful information. The feasibility method verified by experiments binary defocusing at different defocus levels.

10.1109/icinfa.2017.8078905 article EN 2017-07-01

In this paper, we propose a novel calibration framework for the gaze estimation of mobile tracking systems. our method, user's eye and camera are modeled as central catadioptric camera. Thus epipolar geometry tracker can be described by hybrid two-view geometry. To calibrate model, user is asked to at points distributed in 3-D space but not all located on one plane. light binocular training data, apply 3×6 local hybrid-fundamental matrix register pupil centers with lines scene image. image...

10.1109/robio.2017.8324434 article EN 2021 IEEE International Conference on Robotics and Biomimetics (ROBIO) 2017-12-01

Large-scale vision-language pre-trained (VLP) models are prone to hallucinate non-existent visual objects when generating text based on information. In this paper, we systematically study the object hallucination problem from three aspects. First, examine recent state-of-the-art VLP models, showing that they still frequently, and achieving better scores standard metrics (e.g., CIDEr) could be more unfaithful. Second, investigate how different types of image encoding in influence...

10.48550/arxiv.2210.07688 preprint EN cc-by arXiv (Cornell University) 2022-01-01

In this work, we try to answer two questions: Can deeply learned features with discriminative power benefit an ASR system's robustness acoustic variability? And how learn them without requiring framewise labelled sequence training data? As existing methods usually require knowing where the labels occur in input sequence, they have so far been limited many real-world learning tasks. We propose a novel method which simultaneously models both and feature within single network architecture, that...

10.1109/icassp.2019.8683088 preprint EN ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2019-04-17
Coming Soon ...