Xiao Li

ORCID: 0000-0003-0680-0220
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Advanced Image and Video Retrieval Techniques
  • Advanced Neural Network Applications
  • Multimodal Machine Learning Applications
  • Advanced Vision and Imaging
  • Visual Attention and Saliency Detection
  • Medical Image Segmentation Techniques
  • Image Retrieval and Classification Techniques
  • Video Analysis and Summarization
  • Video Surveillance and Tracking Methods
  • Face and Expression Recognition
  • Image and Signal Denoising Methods
  • Image Enhancement Techniques
  • Image Processing and 3D Reconstruction
  • Generative Adversarial Networks and Image Synthesis
  • Computer Graphics and Visualization Techniques
  • Face recognition and analysis
  • Domain Adaptation and Few-Shot Learning
  • Biometric Identification and Security
  • Radiomics and Machine Learning in Medical Imaging
  • Advanced Data Processing Techniques
  • Image and Object Detection Techniques
  • Thermography and Photoacoustic Techniques
  • Robotics and Sensor-Based Localization
  • Sustainable Urban and Rural Development
  • Guidance and Control Systems

North University of China
2024

Xidian University
2015-2024

Hohai University
2024

Microsoft Research Asia (China)
2021-2023

Microsoft Research (United Kingdom)
2023

Jiangsu University
2023

Yanshan University
2022

Guilin University of Technology
2022

Xiangtan University
2018

University of Houston
2006-2009

Previous works on video object segmentation (VOS) are trained densely annotated videos. Nevertheless, acquiring annotations in pixel level is expensive and time-consuming. In this work, we demonstrate the feasibility of training a satisfactory VOS model sparsely videos—we merely require two labeled frames per while performance sustained. We term novel paradigm as two-shot segmentation, or for short. The underlying idea to generate pseudo labels unlabeled during optimize combination...

10.1109/cvpr52729.2023.00224 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2023-06-01

Talking head generation is to generate video based on a given source identity and target motion. However, current methods face several challenges that limit the quality controllability of generated videos. First, often has unexpected deformation severe distortions. Second, driving image does not explicitly disentangle movement-relevant information, such as poses expressions, which restricts manipulation different attributes during generation. Third, videos tend have flickering artifacts due...

10.1109/cvpr52729.2023.00543 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2023-06-01

The normal operation of insulator strings affects the safety and stability power system, string flashover is one important faults. In this paper, considering characteristics in complex environments, noise added to collected simulate actual environment, then data are rotationally transformed filtered using an improved non-local mean filtering algorithm. To accurately locate geodesic active contour correction algorithm employed segment image. This developed based on level set model, replaces...

10.1109/access.2024.3424406 article EN cc-by-nc-nd IEEE Access 2024-01-01

The Multiplane Image (MPI), containing a set of fronto-parallel <tex xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">$RGB_{\alpha}$</tex> layers, is an effective and efficient representation for view synthesis from sparse inputs. Yet, its fixed structure limits the performance, especially surfaces imaged at oblique angles. We introduce Structural MPI (S-MPI), where plane approximates 3D scenes concisely. Conveying contexts with geometrically-faithful...

10.1109/cvpr52729.2023.01603 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2023-06-01

Multimodal and multi-domain stylization are two important problems in the field of image style transfer. Currently, there few methods that can perform multimodal simultaneously. In this study, we propose a unified framework for transfer with support both exemplar-based reference randomly sampled guidance. The key component our method is novel distribution alignment module eliminates explicit gaps between various domains reduces risk mode collapse. diversity ensured by either guidance from...

10.1145/3450525 article EN ACM Transactions on Multimedia Computing Communications and Applications 2021-07-22

Abstract We analyze localized textural consistencies in high‐resolution X‐ray (computed tomography) CT scans of coronary arteries to identify the appearance diagnostically relevant changes tissue. For efficient and accurate processing volume data, we use fast wavelet algorithms associated with three‐dimensional isotropic multiresolution wavelets that implement a redundant, frame‐based image encoding without directional preference. Our algorithm identifies by correlating coefficients...

10.1002/cnm.1189 article EN Communications in Numerical Methods in Engineering 2009-01-19

Instance segmentation is a challenging task aiming at classifying and segmenting all object instances of specific classes. While two-stage box-based methods achieve top performances in the image domain, they cannot easily extend their superiority into video domain. This because usually deal with features or images cropped from detected bounding boxes without alignment, failing to capture pixel-level temporal consistency. We embrace observation that bottom-up dealing box-free could offer...

10.1109/tmm.2022.3222643 article EN IEEE Transactions on Multimedia 2022-11-16

Recently, transformer-based image segmentation methods have achieved notable success against previous solutions. While for video domains, how to effectively model temporal context with the attention of object instances across frames remains an open problem. In this paper, we propose online instance framework a novel instance-aware fusion method. We first leverages representation, i.e., latent code in global (instance code) and CNN feature maps represent instance- pixel-level features. Based...

10.48550/arxiv.2112.01695 preprint EN other-oa arXiv (Cornell University) 2021-01-01

Referring Video Object Segmentation (R-VOS) is a challenging task that aims to segment an object in video based on linguistic expression. Most existing R-VOS methods have critical assumption: the referred must appear video. This assumption, which we refer as semantic consensus, often violated real-world scenarios, where expression may be queried against false videos. In this work, highlight need for robust model can handle mismatches. Accordingly, propose extended called Robust R-VOS,...

10.48550/arxiv.2207.01203 preprint EN other-oa arXiv (Cornell University) 2022-01-01

To tackle the challenge of time-varying formation control for underactuated robots under model parameter uncertainties and environmental disturbances, this study proposes an affine approach enhanced by Extended State Observer. Initially, using positioning theory polynomial interpolation, guidelines selecting leader vehicles trajectory planning methods are established, whereby follower is uniquely determined through stress matrix. address cumulative disturbances arising from factors impacting...

10.3390/act13120493 article EN cc-by Actuators 2024-12-02

In order to extract useful information from X-ray fluorescence (XRF) spectra and establish a high-accuracy prediction model of soil heavy metal contents, hybrid combining deep belief network (DBN) with tree-based was proposed. The DBN first introduced into feature extraction XRF spectral data, which can obtain layer features spectra. Owing the strong regression ability model, it offset deficiency in so used for predicting contents based on extracted features. further improve performance...

10.1177/00037028221104823 article EN Applied Spectroscopy 2022-05-18

The current studies on road edge detection are mainly focused algorithms for finding and tracking edges through optical images (Y. Wang et al., 1998) (R. 2002) (B. Ma 1999). In this study, the researchers developed a new road/trail system which is based frequency-modulated continuous-wave (FMCW) radars. This able to provide much more information than do. key features of as follows: 1) FMCW radars, radar technology works effectively during both daytime nighttime, any types terrain, in variety...

10.1109/icnsc.2006.1673255 article EN 2006-08-15

Error propagation is a general but crucial problem in online semi-supervised video object segmentation. We aim to suppress error through correction mechanism with high reliability. The key insight disentangle the from conventional mask process reliable cues. introduce two modulators, and separately perform channel-wise re-calibration on target frame embeddings according local temporal correlations references respectively. Specifically, we assemble modulators cascaded propagation-correction...

10.48550/arxiv.2112.02853 preprint EN public-domain arXiv (Cornell University) 2021-01-01

Here, an efficient framework is developed to address the problem of unconstrained face verification. In particular, unsupervised feature learning method for image representation and a novel similarity metric model are discussed. First, authors propose with sparse auto-encoder (SAE) based on local descriptor (SAELD). A set filter operators learned SAE from patches, descriptors extracted by applying convolve images. This can discriminative issue Then pairwise SAELD projected into weighted...

10.1049/iet-spr.2017.0017 article EN IET Signal Processing 2018-04-19

Remote Sensing Image can be degraded by a variety of causes during acquisition, transmission, compression, storage and reconstruction. Noise is one the most important degradation factors. Quantifying its impact on image may useful for applications such as improving acquisition system thus quality produced images. Objective Quality Measure (IQA) methods classified whether reference image, representing original signal exists. In case remote sensing, ideal un-degraded not available....

10.1117/12.2176894 article EN Proceedings of SPIE, the International Society for Optical Engineering/Proceedings of SPIE 2015-05-21

Previous works on video object segmentation (VOS) are trained densely annotated videos. Nevertheless, acquiring annotations in pixel level is expensive and time-consuming. In this work, we demonstrate the feasibility of training a satisfactory VOS model sparsely videos-we merely require two labeled frames per while performance sustained. We term novel paradigm as two-shot segmentation, or for short. The underlying idea to generate pseudo labels unlabeled during optimize combination...

10.48550/arxiv.2303.12078 preprint EN other-oa arXiv (Cornell University) 2023-01-01
Coming Soon ...