Xinggang Wang

ORCID: 0000-0001-6732-7823
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Advanced Neural Network Applications
  • Advanced Image and Video Retrieval Techniques
  • Domain Adaptation and Few-Shot Learning
  • Video Surveillance and Tracking Methods
  • Image Retrieval and Classification Techniques
  • Multimodal Machine Learning Applications
  • Human Pose and Action Recognition
  • Visual Attention and Saliency Detection
  • Advanced Vision and Imaging
  • Robotics and Sensor-Based Localization
  • Image Processing and 3D Reconstruction
  • Computer Graphics and Visualization Techniques
  • Gait Recognition and Analysis
  • Adversarial Robustness in Machine Learning
  • Medical Image Segmentation Techniques
  • 3D Shape Modeling and Analysis
  • Image Enhancement Techniques
  • Industrial Vision Systems and Defect Detection
  • Handwritten Text Recognition Techniques
  • Autonomous Vehicle Technology and Safety
  • AI in cancer detection
  • Video Analysis and Summarization
  • Machine Learning and Data Classification
  • Image Processing Techniques and Applications
  • Hand Gesture Recognition Systems

Huazhong University of Science and Technology
2016-2025

China Power Engineering Consulting Group (China)
2025

Chinese Academy of Sciences
2016-2024

Shenyang Institute of Computing Technology (China)
2024

Hefei Institutes of Physical Science
2024

Institute of Solid State Physics
2024

University of Science and Technology of China
2024

Army Medical University
2024

Southwest Hospital
2024

PLA Army Service Academy
2023

High-resolution representations are essential for position-sensitive vision problems, such as human pose estimation, semantic segmentation, and object detection. Existing state-of-the-art frameworks first encode the input image a low-resolution representation through subnetwork that is formed by connecting high-to-low resolution convolutions <i>in series</i> (e.g., ResNet, VGGNet), then recover high-resolution from encoded representation. Instead, our proposed network, named High-Resolution...

10.1109/tpami.2020.2983686 article EN IEEE Transactions on Pattern Analysis and Machine Intelligence 2020-04-01

Full-image dependencies provide useful contextual information to benefit visual understanding problems. In this work, we propose a Criss-Cross Network (CCNet) for obtaining such in more effective and efficient way. Concretely, each pixel, novel criss-cross attention module CCNet harvests the of all pixels on its path. By taking further recurrent operation, pixel can finally capture full-image from pixels. Overall, is with following merits: 1) GPU memory friendly. Compared non-local block,...

10.1109/iccv.2019.00069 article EN 2021 IEEE/CVF International Conference on Computer Vision (ICCV) 2019-10-01

Letting a deep network be aware of the quality its own predictions is an interesting yet important problem. In task instance segmentation, confidence classification used as mask score in most segmentation frameworks. However, quality, quantified IoU between and ground truth, usually not well correlated with score. this paper, we study problem propose Mask Scoring R-CNN which contains block to learn predicted masks. The proposed takes feature corresponding together regress IoU. scoring...

10.1109/cvpr.2019.00657 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2019-06-01

A challenging aspect of scene text recognition is to handle with distortions or irregular layout. In particular, perspective and curved are common in natural scenes difficult recognize. this work, we introduce ASTER, an end-to-end neural network model that comprises a rectification network. The adaptively transforms input image into new one, rectifying the it. It powered by flexible Thin-Plate Spline transformation which handles variety irregularities trained without human annotations....

10.1109/tpami.2018.2848939 article EN IEEE Transactions on Pattern Analysis and Machine Intelligence 2018-06-25

This paper presents an end-to-end trainable fast scene text detector, named TextBoxes, which detects with both high accuracy and efficiency in a single network forward pass, involving no post-process except for standard non-maximum suppression. TextBoxes outperforms competing methods terms of localization is much faster, taking only 0.09s per image implementation. Furthermore, combined recognizer, significantly state-of-the-art approaches on word spotting recognition tasks.

10.1609/aaai.v31i1.11196 article EN Proceedings of the AAAI Conference on Artificial Intelligence 2017-02-12

Abstract Accurate and rapid diagnosis of COVID-19 suspected cases plays a crucial role in timely quarantine medical treatment. Developing deep learning-based model for automatic detection on chest CT is helpful to counter the outbreak SARS-CoV-2. A weakly-supervised software system was developed using 3D volumes detect COVID-19. For each patient, lung region segmented pre-trained UNet; then fed into neural network predict probability infectious. 499 collected from Dec. 13, 2019, Jan. 23,...

10.1101/2020.03.12.20027185 preprint EN cc-by-nc-nd medRxiv (Cold Spring Harbor Laboratory) 2020-03-17

Recognizing text in natural images is a challenging task with many unsolved problems. Different from those documents, words often possess irregular shapes, which are caused by perspective distortion, curved character placement, etc. We propose RARE (Robust recognizer Automatic REctification), recognition model that robust to text. speciallydesigned deep neural network, consists of Spatial Transformer Network (STN) and Sequence Recognition (SRN). In testing, an image firstly rectified via...

10.1109/cvpr.2016.452 article EN 2016-06-01

Accurate and rapid diagnosis of COVID-19 suspected cases plays a crucial role in timely quarantine medical treatment. Developing deep learning-based model for automatic on chest CT is helpful to counter the outbreak SARS-CoV-2. A weakly-supervised learning framework was developed using 3D volumes classification lesion localization. For each patient, lung region segmented pre-trained UNet; then fed into neural network predict probability infectious; lesions are localized by combining...

10.1109/tmi.2020.2995965 article EN IEEE Transactions on Medical Imaging 2020-05-20

High-resolution representation learning plays an essential role in many vision problems, e.g., pose estimation and semantic segmentation. The high-resolution network (HRNet)~\cite{SunXLW19}, recently developed for human estimation, maintains representations through the whole process by connecting high-to-low resolution convolutions \emph{parallel} produces strong repeatedly conducting fusions across parallel convolutions. In this paper, we conduct a further study on introducing simple yet...

10.48550/arxiv.1904.04514 preprint EN other-oa arXiv (Cornell University) 2019-01-01

This paper studies the problem of learning image semantic segmentation networks only using image-level labels as supervision, which is important since it can significantly reduce human annotation efforts. Recent state-of-the-art methods on this first infer sparse and discriminative regions for each object class a deep classification network, then train network supervision. Inspired by traditional seeded region growing, we propose to starting from progressively increase pixel-level...

10.1109/cvpr.2018.00733 article EN 2018-06-01

Contour detection serves as the basis of a variety computer vision tasks such image segmentation and object recognition. The mainstream works to address this problem focus on designing engineered gradient features. In work, we show that contour accuracy can be improved by instead making use deep features learned from convolutional neural networks (CNNs). While rather than using blackbox feature extractor, customize training strategy partitioning (positive) data into subclasses fitting each...

10.1109/cvpr.2015.7299024 article EN 2015-06-01

Of late, weakly supervised object detection is with great importance in recognition. Based on deep learning, detectors have achieved many promising results. However, compared fully detection, it more challenging to train network based a manner. Here we formulate as Multiple Instance Learning (MIL) problem, where instance classifiers (object detectors) are put into the hidden nodes. We propose novel online classifier refinement algorithm integrate MIL and procedure single network, end-to-end...

10.1109/cvpr.2017.326 article EN 2017-07-01

This paper presents an end-to-end trainable fast scene text detector, named TextBoxes, which detects with both high accuracy and efficiency in a single network forward pass, involving no post-process except for standard non-maximum suppression. TextBoxes outperforms competing methods terms of localization is much faster, taking only 0.09s per image implementation. Furthermore, combined recognizer, significantly state-of-the-art approaches on word spotting recognition tasks.

10.48550/arxiv.1611.06779 preprint EN other-oa arXiv (Cornell University) 2016-01-01

Contextual information is vital in visual understanding problems, such as semantic segmentation and object detection. We propose a criss-cross network (CCNet) for obtaining full-image contextual very effective efficient way. Concretely, each pixel, novel attention module harvests the of all pixels on its path. By taking further recurrent operation, pixel can finally capture dependencies. Besides, category consistent loss proposed to enforce produce more discriminative features. Overall,...

10.1109/tpami.2020.3007032 article EN IEEE Transactions on Pattern Analysis and Machine Intelligence 2020-07-03

Weakly Supervised Object Detection (WSOD), using only image-level annotations to train object detectors, is of growing importance in recognition. In this paper, we propose a novel deep network for WSOD. Unlike previous networks that transfer the detection problem an image classification Multiple Instance Learning (MIL), our strategy generates proposal clusters learn refined instance classifiers by iterative process. The proposals same cluster are spatially adjacent and associated with...

10.1109/tpami.2018.2876304 article EN IEEE Transactions on Pattern Analysis and Machine Intelligence 2018-10-16

A panoptic driving perception system is an essential part of autonomous driving. high-precision and real-time can assist the vehicle in making reasonable decision while We present a network (YOLOP) to perform traffic object detection, drivable area segmentation lane detection simultaneously. It composed one encoder for feature extraction three decoders handle specific tasks. Our model performs extremely well on challenging BDD100K dataset, achieving state-of-the-art all tasks terms accuracy...

10.1007/s11633-022-1339-y article EN cc-by Deleted Journal 2022-11-07

We launch EVA, a vision-centric foundation model to Explore the limits of Visual representation at scAle using only publicly accessible data. EVA is vanilla ViT pre-trained reconstruct masked out image-text aligned vision features conditioned on visible image patches. Via this pretext task, we can efficiently scale up one billion parameters, and sets new records broad range representative downstream tasks, such as recognition, video action object detection, instance segmentation semantic...

10.1109/cvpr52729.2023.01855 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2023-06-01
Coming Soon ...