Xiaodong Yang

ORCID: 0009-0003-4638-8039
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Advanced Neural Network Applications
  • Human Pose and Action Recognition
  • Video Surveillance and Tracking Methods
  • Advanced Vision and Imaging
  • Generative Adversarial Networks and Image Synthesis
  • Autonomous Vehicle Technology and Safety
  • Anomaly Detection Techniques and Applications
  • Fermentation and Sensory Analysis
  • Advanced Image and Video Retrieval Techniques
  • Face recognition and analysis
  • Multimodal Machine Learning Applications
  • Visual Attention and Saliency Detection
  • Food Quality and Safety Studies
  • Robotics and Sensor-Based Localization
  • Traffic Prediction and Management Techniques
  • 3D Shape Modeling and Analysis
  • Domain Adaptation and Few-Shot Learning
  • Image Processing Techniques and Applications
  • Video Analysis and Summarization
  • Adversarial Robustness in Machine Learning
  • Vehicle License Plate Recognition
  • Traffic and Road Safety
  • Meat and Animal Product Quality
  • Image and Video Quality Assessment
  • Microbial Metabolic Engineering and Bioproduction

Shandong University of Science and Technology
2023

Computercraft (United States)
2023

Craft Engineering Associates (United States)
2020-2023

Zhejiang Gongshang University
2021

Nvidia (United States)
2016-2020

Nvidia (United Kingdom)
2017-2020

Hohai University
2020

Xiamen University
2019

City University of New York
2015

Science and Technology Department of Sichuan Province
2013

Person re-identification (re-id) remains challenging due to significant intra-class variations across different cameras. Recently, there has been a growing interest in using generative models augment training data and enhance the invariance input changes. The pipelines existing methods, however, stay relatively separate from discriminative re-id learning stages. Accordingly, are often trained straightforward manner on generated data. In this paper, we seek improve learned embeddings by...

10.1109/cvpr.2019.00224 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2019-06-01

Urban traffic optimization using cameras as sensors is driving the need to advance state-of-the-art multi-target multi-camera (MTMC) tracking. This work introduces CityFlow, a city-scale camera dataset consisting of more than 3 hours synchronized HD videos from 40 across 10 intersections, with longest distance between two simultaneous being 2.5 km. To best our knowledge, CityFlow largest-scale in terms spatial coverage and number cameras/videos an urban environment. The contains 200K...

10.1109/cvpr.2019.00900 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2019-06-01

10.1016/j.jvcir.2013.03.001 article EN Journal of Visual Communication and Image Representation 2013-03-14

Weakly supervised learning has emerged as a compelling tool for object detection by reducing the need strong supervision during training. However, major challenges remain: (1) differentiation of instances can be ambiguous; (2) detectors tend to focus on discriminative parts rather than entire objects; (3) without ground truth, proposals have redundant high recalls, causing significant memory consumption. Addressing these is difficult, it often requires eliminate uncertainties and trivial...

10.1109/cvpr42600.2020.01061 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2020-06-01

In comparison with person re-identification (ReID), which has been widely studied in the research community, vehicle ReID received less attention. Vehicle is challenging due to 1) high intra-class variability (caused by dependency of shape and appearance on viewpoint), 2) small inter-class similarity between vehicles produced different manufacturers). To address these challenges, we propose a Pose-Aware Multi-Task Re-Identification (PAMTRI) framework. This approach includes two innovations...

10.1109/iccv.2019.00030 article EN 2021 IEEE/CVF International Conference on Computer Vision (ICCV) 2019-10-01

The success of deep neural networks generally requires a vast amount training data to be labeled, which is expensive and unfeasible in scale, especially for video collections. To alleviate this problem, paper, we propose 3DRotNet: fully self-supervised approach learn spatiotemporal features from unlabeled videos. A set rotations are applied all videos, pretext task defined as prediction these rotations. When accomplishing task, 3DRotNet actually trained understand the semantic concepts...

10.48550/arxiv.1811.11387 preprint EN other-oa arXiv (Cornell University) 2018-01-01

The AI City Challenge was created to accelerate intelligent video analysis that helps make cities smarter and safer. Transportation is one of the largest segments can benefit from actionable insights derived data captured by sensors, where computer vision deep learning have shown promise in achieving large-scale practical deployment. 4th annual edition has attracted 315 participating teams across 37 countries, who leverage city-scale real traffic high-quality synthetic compete four challenge...

10.1109/cvprw50498.2020.00321 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) 2020-06-01

The AI City Challenge was created with two goals in mind: (1) pushing the boundaries of research and development intelligent video analysis for smarter cities use cases, (2) assessing tasks where level performance is enough to cause real-world adoption. Transportation a segment ripe such fifth attracted 305 participating teams across 38 countries, who leveraged city-scale real traffic data high-quality synthetic compete five challenge tracks. Track 1 addressed video-based automatic vehicle...

10.1109/cvprw53098.2021.00482 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) 2021-06-01

3D multi-object tracking in LiDAR point clouds is a key ingredient for self-driving vehicles. Existing methods are predominantly based on the tracking-by-detection pipeline and inevitably require heuristic matching step detection association. In this paper, we present SimTrack to simplify hand-crafted paradigm by proposing an end-to-end trainable model joint from raw clouds. Our design predict first-appear location of each object given snippet get identity then update motion estimation....

10.1109/iccv48922.2021.01032 article EN 2021 IEEE/CVF International Conference on Computer Vision (ICCV) 2021-10-01

Motion forecasting is a key module in an autonomous driving system. Due to the heterogeneous nature of multi-sourced input, multimodality agent behavior, and low latency required by onboard deployment, this task notoriously challenging. To cope with these difficulties, paper proposes novel agent-centric model anchor-informed proposals for efficient multimodal motion prediction. We design modality-agnostic strategy concisely encode complex input unified manner. generate diverse proposals,...

10.1109/cvpr52729.2023.02106 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2023-06-01

Person re-identification (re-id) remains challenging due to significant intra-class variations across different cameras. Recently, there has been a growing interest in using generative models augment training data and enhance the invariance input changes. The pipelines existing methods, however, stay relatively separate from discriminative re-id learning stages. Accordingly, are often trained straightforward manner on generated data. In this paper, we seek improve learned embeddings by...

10.48550/arxiv.1904.07223 preprint EN other-oa arXiv (Cornell University) 2019-01-01

Autonomous driving can benefit from motion behavior comprehension when interacting with diverse traffic participants in highly dynamic environments. Recently, there has been a growing interest estimating class-agnostic directly point clouds. Current estimation methods usually require vast amount of annotated training data self-driving scenes. However, manually labeling clouds is notoriously difficult, error-prone and time-consuming. In this paper, we seek to answer the research question...

10.1109/cvpr46437.2021.00320 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2021-06-01

Partial convolution weights convolutions with binary masks and renormalizes on valid pixels. It was originally proposed for image inpainting task because a corrupted processed by standard convolutional often leads to artifacts. Therefore, are constructed that define the pixels, so partial results only calculated based has been also used conditional synthesis task, when scene is generated, of an instance depend feature values belong same instance. One unexplored applications padding which...

10.1109/tpami.2022.3209702 article EN IEEE Transactions on Pattern Analysis and Machine Intelligence 2022-01-01

Aerial vehicle detection has significant applications in aerial surveillance and traffic control. The pictures captured by the UAV are characterized many tiny objects vehicles obscuring each other, significantly increasing challenge. In research of detecting images, there is a widespread problem missed false detections. Therefore, we customize model based on YOLOv5 to be more suitable for images. Firstly, add one additional prediction head detect smaller-scale objects. Furthermore, keep...

10.3390/s23125634 article EN cc-by Sensors 2023-06-16

Average precision (AP) is a widely used metric to evaluate detection accuracy of image and video object detectors. In this paper, we analyze the from point out that mAP alone not sufficient capture temporal nature detection. To tackle problem, propose comprehensive metric, Delay (AD), measure compare delay. facilitate delay evaluation, carefully select subset ImageNet VID, which name as VIDT with an emphasis on complex trajectories. By extensively evaluating wide range detectors VIDT, show...

10.1109/iccv.2019.00066 article EN 2021 IEEE/CVF International Conference on Computer Vision (ICCV) 2019-10-01

Recurrent neural networks (RNNs) have emerged as a powerful model for broad range of machine learning problems that involve sequential data. While an abundance work exists to understand and improve RNNs in the context language audio signals such modeling speech recognition, relatively little attention has been paid analyze or modify visual sequences, which by nature distinct properties. In this paper, we aim bridge gap present first large-scale exploration sequence learning. particular, with...

10.1109/cvpr.2018.00677 article EN 2018-06-01

This article aims to use graphic engines simulate a large number of training data that have free annotations and possibly strongly resemble real-world data. Between synthetic real, two-level domain gap exists, involving content level appearance level. While the latter is concerned with style, former problem arises from different mechanism, i.e., mismatch in attributes such as camera viewpoint, object placement lighting conditions. In contrast widely-studied appearance-level gap,...

10.1109/tpami.2023.3338291 article EN IEEE Transactions on Pattern Analysis and Machine Intelligence 2023-12-01

Monocular depth estimation is an ill-posed problem as the same 2D image can be projected from infinite 3D scenes. Although leading algorithms in this field have reported significant improvement, they are essentially geared to particular compound of pictorial observations and camera parameters (i.e., intrinsics extrinsics), strongly limiting their generalizability real-world scenarios. To cope with challenge, paper proposes a novel ground embedding module decouple cues, thus promoting...

10.1109/iccv51070.2023.01168 article EN 2021 IEEE/CVF International Conference on Computer Vision (ICCV) 2023-10-01

3D perception based on the representations learned from multi-camera bird's-eye-view (BEV) is trending as cameras are cost-effective for mass production in autonomous driving industry. However, there exists a distinct performance gap between BEV and LiDAR object detection. One key reason that captures accurate depth other geometry measurements, while it notoriously challenging to infer such information merely image input. In this work, we propose boost representation learning of student...

10.1109/iccv51070.2023.00793 article EN 2021 IEEE/CVF International Conference on Computer Vision (ICCV) 2023-10-01

This paper uses a graphic engine to simulate large amount of training data with free annotations. Between synthetic and real data, there is two-level domain gap, i.e., content level appearance level. While the latter has been widely studied, we focus on reducing gap in attributes like illumination viewpoint. To reduce problem complexity, choose smaller more controllable application, vehicle re-identification (re-ID). We introduce large-scale dataset VehicleX. Created Unity, it contains 1,362...

10.48550/arxiv.1912.08855 preprint EN other-oa arXiv (Cornell University) 2019-01-01

We investigate two crucial and closely related aspects of CNNs for optical flow estimation: models training. First, we design a compact but effective CNN model, called PWC-Net, according to simple well-established principles: pyramidal processing, warping, cost volume processing. PWC-Net is 17 times smaller in size, 2 faster inference, 11\% more accurate on Sintel final than the recent FlowNet2 model. It winning entry competition robust vision challenge. Next, experimentally analyze sources...

10.48550/arxiv.1809.05571 preprint EN other-oa arXiv (Cornell University) 2018-01-01
Coming Soon ...