Xiao Tan

ORCID: 0000-0001-9162-8570
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Advanced Neural Network Applications
  • Advanced Image and Video Retrieval Techniques
  • Video Surveillance and Tracking Methods
  • Advanced Vision and Imaging
  • Robotics and Sensor-Based Localization
  • Human Pose and Action Recognition
  • Domain Adaptation and Few-Shot Learning
  • Multimodal Machine Learning Applications
  • Video Analysis and Summarization
  • Advanced Image Processing Techniques
  • Generative Adversarial Networks and Image Synthesis
  • 3D Surveying and Cultural Heritage
  • Image Processing Techniques and Applications
  • Remote Sensing and LiDAR Applications
  • Autonomous Vehicle Technology and Safety
  • Image Retrieval and Classification Techniques
  • 3D Shape Modeling and Analysis
  • Face recognition and analysis
  • Image and Object Detection Techniques
  • Anomaly Detection Techniques and Applications
  • Image Enhancement Techniques
  • Automated Road and Building Extraction
  • Robot Manipulation and Learning
  • Medical Image Segmentation Techniques
  • Industrial Vision Systems and Defect Detection

Baidu (China)
2017-2025

University of Zurich
2025

Hunan Vocational Institute of Technology
2024

Beijing Normal University
2024

Chengdu University of Information Technology
2023

Sichuan University
2023

Xidian University
2023

Chongqing University of Posts and Telecommunications
2020-2021

Southwest Jiaotong University
2020

Vision Technology (United States)
2019

To stimulate progress in automating the reconstruction of neural circuits, we organized first international challenge on 2D segmentation electron microscopic (EM) images brain. Participants submitted boundary maps predicted for a test set images, and were scored based their agreement with consensus human expert annotations. The winning team had no prior experience EM employed convolutional network. This "deep learning" approach has since become accepted as standard images. continued to...

10.3389/fnana.2015.00142 article EN cc-by Frontiers in Neuroanatomy 2015-11-05

In this paper, we propose a novel perspective-guided convolution (PGC) for convolutional neural network (CNN) based crowd counting (i.e. PGCNet), which aims to overcome the dramatic intra-scene scale variations of people due perspective effect. While most state-of-the-arts adopt multi-scale or multi-column architectures address such issue, they generally fail in modeling continuous since only discrete representative scales are considered. PGCNet, on other hand, utilizes information guide...

10.1109/iccv.2019.00104 article EN 2021 IEEE/CVF International Conference on Computer Vision (ICCV) 2019-10-01

Video Recognition has drawn great research interest and progress been made. A suitable frame sampling strategy can improve the accuracy efficiency of recognition. However, mainstream solutions generally adopt hand-crafted strategies for It could degrade performance, especially in untrimmed videos, due to variation frame-level saliency. To this end, we concentrate on improving video classification via developing a learning-based strategy. We intuitively formulate procedure as multiple...

10.1109/iccv.2019.00632 article EN 2021 IEEE/CVF International Conference on Computer Vision (ICCV) 2019-10-01

Object detection from 3D point clouds remains a challenging task, though recent studies pushed the envelope with deep learning techniques. Owing to severe spatial occlusion and inherent variance of density distance sensors, appearance same object varies lot in cloud data. Designing robust feature representation against such changes is hence key issue method. In this paper, we innovatively propose domain adaptation like approach enhance robustness representation. More specifically, bridge gap...

10.1109/cvpr42600.2020.01334 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2020-06-01

3D object detection is an essential task in autonomous driving and robotics. Though great progress has been made, challenges remain estimating pose for distant occluded objects. In this paper, we present a novel framework named ZoomNet stereo imagery-based detection. The pipeline of begins with ordinary 2D model which used to obtain pairs left-right bounding boxes. To further exploit the abundant texture cues rgb images more accurate disparity estimation, introduce conceptually...

10.1609/aaai.v34i07.6945 article EN Proceedings of the AAAI Conference on Artificial Intelligence 2020-04-03

Concurrent perception datasets for autonomous driving are mainly limited to frontal view with sensors mounted on the vehicle. None of them is designed overlooked roadside tasks. On other hand, data captured from cameras have strengths over frontal-view data, which believed facilitate a safer and more intelligent system. To accelerate progress perception, we present first high-diversity challenging Roadside Perception 3D dataset- Rope3D novel view. The dataset consists 50k images 1.5M objects...

10.1109/cvpr52688.2022.02065 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2022-06-01

With basic Semi-Supervised Object Detection (SSOD) techniques, one-stage detectors generally obtain limited promotions compared with two-stage clusters. We experimentally find that the root lies in two kinds of ambiguities: (1) Selection ambiguity selected pseudo labels are less accurate, since classification scores cannot properly represent localization quality. (2) Assignment samples matched improper pseudo-label assignment, as strategy is misguided by missed objects and inaccurate boxes....

10.1109/cvpr52729.2023.01495 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2023-06-01

We analyze the DETR-based framework on semi-supervised object detection (SSOD) and observe that (1) one-to-one assignment strategy generates incorrect matching when pseudo ground-truth bounding box is inaccurate, leading to training inefficiency; (2) detectors lack deterministic correspondence between input query its prediction output, which hinders applicability of consistency-based regularization widely used in current SSOD methods. present Semi-DETR, first transformer-based end-to-end...

10.1109/cvpr52729.2023.02280 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2023-06-01

Discriminatively localizing sounding objects in cocktail-party, i.e., mixed sound scenes, is commonplace for humans, but still challenging machines. In this paper, we propose a two-stage learning framework to perform self-supervised class-aware object localization. First, learn robust representations by aggregating the candidate localization results single source scenes. Then, maps are generated cocktail-party scenarios referring pre-learned knowledge, and accordingly selected matching audio...

10.48550/arxiv.2010.05466 preprint EN other-oa arXiv (Cornell University) 2020-01-01

In this report, we present the Baidu-UTS submission to AICity Challenge in CVPR 2020. This is winning solution vehicle re-identification (re-id) track. We focus on developing a robust re-id system for real-world scenarios. particular, aim fully leverage merits of synthetic data while arming with real images learn representation vehicles different views and illumination conditions. By comprehensively investigating evaluating various augmentation approaches popular strong baselines, analyze...

10.1109/cvprw50498.2020.00307 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) 2020-06-01

Monocular 3D object detection aims to predict the location, dimension and orientation in space alongside category given only a monocular image. It poses great challenge due its ill-posed property which is critically lack of depth information 2D image plane. While there exist approaches leveraging off-the-shelve estimation or relying on LiDAR sensors mitigate this problem, dependence additional model expensive equipment severely limits their scalability generic perception. In paper, we...

10.1109/lra.2022.3191849 article EN IEEE Robotics and Automation Letters 2022-07-18

Low-cost monocular 3D object detection plays a fundamental role in autonomous driving, whereas its accuracy is still far from satisfactory. In this paper, we dig into the task and reformulate it as sub-tasks of localization appearance perception, which benefits to deep excavation reciprocal information underlying entire task. We introduce Dynamic Feature Reflecting Network, named DFR-Net, contains two novel standalone modules: (i) Appearance-Localization module (ALFR) that first separates...

10.1109/iccv48922.2021.00271 article EN 2021 IEEE/CVF International Conference on Computer Vision (ICCV) 2021-10-01

Multi-Target Multi-Camera tracking (MTMC) is an essential task in the intelligent city and traffic analysis. It a great challenging due to several problems such as heavy occlusions appearance variance caused by various camera perspectives congested vehicles. In this paper, we propose practical framework for dealing with MTMC problem. The proposed contains three stage. Firstly, vehicles detection Re-ID stage, system leverages Cascade R-CNN detect all extract features module cameras. Secondly,...

10.1109/cvprw53098.2021.00456 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) 2021-06-01

Though action recognition in videos has achieved great success recently, it remains a challenging task due to the massive computational cost. Designing lightweight networks is possible solution, but may degrade performance. In this paper, we innovatively propose general dynamic inference idea improve efficiency by leveraging variation distinguishability of different videos. The approach can be from aspects network depth and number input video frames, or even joint input-wise depth-wise...

10.1109/cvprw50498.2020.00346 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) 2020-06-01

With the demands of intelligent traffic, vehicle counting has become a vital problem, which can be used to mitigate traffic congestion and elevate efficiency light. Traditional problems focus on vehicles in single frame or consecutive frames. Nevertheless, they are not expected count by movements interest (MOI), pre-defined all possible states vehicles, combining different lanes directions. In this paper, we mainly movement-specific problem. A detection-tracking-counting (DTC) framework is...

10.1109/cvprw50498.2020.00315 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) 2020-06-01

The human brain can effortlessly recognize and localize objects, whereas current 3D object detection methods based on LiDAR point clouds still report inferior performance for detecting occluded distant objects: cloud appearance varies greatly due to occlusion, has inherent variance in densities along the distance sensors. Therefore, designing feature representations robust such is critical. Inspired by associative recognition, we propose a novel framework that associates intact features...

10.1109/tpami.2021.3104172 article EN IEEE Transactions on Pattern Analysis and Machine Intelligence 2021-01-01

We explore the way to alleviate label-hungry problem in a semi-supervised setting for 3D instance segmentation. To leverage unlabeled data boost model performance, we present novel Two-Way Inter-label Self-Training framework named TWIST. It exploits inherent correlations between semantic understanding and information of scene. Specifically, consider two kinds pseudo labels semantic- instance-level supervision. Our key design is provide object-level denoising make use their correlation...

10.1109/cvpr52688.2022.00117 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2022-06-01

The SoccerNet 2022 challenges were the second annual video understanding organized by team. In 2022, composed of 6 vision-based tasks: (1) action spotting, focusing on retrieving timestamps in long untrimmed videos, (2) replay grounding, live moment an shown a replay, (3) pitch localization, detecting line and goal part elements, (4) camera calibration, dedicated to intrinsic extrinsic parameters, (5) player re-identification, same players across multiple views, (6) object tracking, tracking...

10.1145/3552437.3558545 preprint EN 2022-09-30

An up-to-date city-scale lane-level map is an indispensable infrastructure and a key enabling technology for ensuring the safety user experience of autonomous driving systems.In industrial scenarios, reliance on manual annotation updates creates critical bottleneck.Lane-level require precise change information must ensure consistency with adjacent data while adhering to strict standards.Traditional methods utilize three-stage approach-construction, detection, updating-which often...

10.1145/3690624.3709383 preprint EN 2025-04-04
Coming Soon ...