Ming Tang

ORCID: 0000-0003-4976-3095
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Video Surveillance and Tracking Methods
  • Human Pose and Action Recognition
  • Advanced Neural Network Applications
  • Advanced Image and Video Retrieval Techniques
  • Domain Adaptation and Few-Shot Learning
  • Multimodal Machine Learning Applications
  • Face recognition and analysis
  • Face and Expression Recognition
  • Advanced Vision and Imaging
  • Infrared Target Detection Methodologies
  • Fire Detection and Safety Systems
  • Anomaly Detection Techniques and Applications
  • Image Retrieval and Classification Techniques
  • Image Enhancement Techniques
  • Visual Attention and Saliency Detection
  • Robotics and Sensor-Based Localization
  • Topic Modeling
  • Medical Image Segmentation Techniques
  • Autonomous Vehicle Technology and Safety
  • Target Tracking and Data Fusion in Sensor Networks
  • Image Processing Techniques and Applications
  • Video Analysis and Summarization
  • Gait Recognition and Analysis
  • Image and Object Detection Techniques
  • IoT-based Smart Home Systems

St. Jude Children's Research Hospital
2024-2025

Institute of Automation
2014-2024

Chinese Academy of Sciences
2015-2024

Changzhou University
2023-2024

University of Chinese Academy of Sciences
2017-2024

Shanghai Electric (China)
2024

Nanchang University
2024

Third Affiliated Hospital of Southern Medical University
2024

Shandong Institute of Automation
2005-2023

Shanghai Maritime University
2023

The Visual Object Tracking challenge 2015, VOT2015, aims at comparing short-term single-object visual trackers that do not apply pre-learned models of object appearance. Results 62 are presented. number tested makes VOT 2015 the largest benchmark on tracking to date. For each participating tracker, a short description is provided in appendix. Features VOT2015 go beyond its VOT2014 predecessor are: (i) new dataset twice as large with full annotation targets by rotated bounding boxes and...

10.1109/iccvw.2015.79 preprint EN 2015-12-01

End-to-end discriminative trackers improve the state of art significantly, yet improvement in robustness and efficiency is restricted by conventional model, i.e., least-squares based regression. In this paper, we present DTT, a novel single-object tracker, on an encoder-decoder Transformer architecture. By self- attention mechanisms, our approach able to exploit rich scene information end-to-end manner, effectively removing need for hand-designed models. online tracking, given new test...

10.1109/iccv48922.2021.00971 article EN 2021 IEEE/CVF International Conference on Computer Vision (ICCV) 2021-10-01

In person re-identification (re-ID), extracting part-level features from images has been verified to be crucial offer fine-grained information. Most of the existing CNN-based methods only locate human parts coarsely, or rely on pretrained parsing models and fail in locating identifiable nonhuman (e.g., knapsack). this article, we introduce an alignment scheme transformer architecture for first time propose auto-aligned (AAformer) automatically both ones at patch level. We "Part tokens...

10.1109/tnnls.2023.3301856 article EN IEEE Transactions on Neural Networks and Learning Systems 2023-08-25

Large Vision-Language Models (LVLMs) such as MiniGPT-4 and LLaVA have demonstrated the capability of understanding images achieved remarkable performance in various visual tasks. Despite their strong abilities recognizing common objects due to extensive training datasets, they lack specific domain knowledge a weaker localized details within objects, which hinders effectiveness Industrial Anomaly Detection (IAD) task. On other hand, most existing IAD methods only provide anomaly scores...

10.1609/aaai.v38i3.27963 article EN Proceedings of the AAAI Conference on Artificial Intelligence 2024-03-24

Correlation filter based trackers are ranked top in terms of performances. Nevertheless, they only employ a single kernel at time. In this paper, we will derive multi-kernel correlation (MKCF) tracker which fully takes advantage the invariance-discriminative power spectrums various features to further improve performance. Moreover, it may easily introduce location and representation errors search several discrete scales for proper one object bounding box, because normally candidate...

10.1109/iccv.2015.348 article EN 2015-12-01

Traffic surveillance has become an important topic in intelligent transportation systems (ITSs), which is aimed at monitoring and managing traffic flow. With the progress computer vision, video-based have made great advances on ITSs. However, performance of most existing susceptible to challenging complex scenes (e.g., object occlusion, pose variation, cluttered background). Moreover, related research mainly a single video sensor node, incapable addressing road networks. Accordingly, we...

10.1109/tits.2016.2552778 article EN IEEE Transactions on Intelligent Transportation Systems 2016-05-03

Traffic surveillance has become an important topic in intelligent transportation systems (ITSs), which is aimed at monitoring and managing traffic flow. With the progress computer vision, video-based have made great advances on ITSs. However, performance of most existing susceptible to challenging complex scenes (e.g., object occlusion, pose variation, cluttered background). Moreover, related research mainly a single video sensor node, incapable addressing road networks. Accordingly, we...

10.1109/tits.2014.2340701 article EN IEEE Transactions on Intelligent Transportation Systems 2014-08-11

Vehicle re-identification (re-ID) aims to identify the same vehicle across multiple non-overlapping cameras, which is rather a challenging task. On one hand, subtle changes in viewpoint and illumination condition can make look much different. other different vehicles, even models, may quite similar. In this paper, we propose novel Two-level Attention network supervised by Multi-grain Ranking loss (TAMR) learn an efficient feature embedding for re-ID The two-level attention consisting of hard...

10.1109/tip.2019.2910408 article EN IEEE Transactions on Image Processing 2019-04-16

Appearance model is a key component of tracking algorithms. Most existing approaches utilize the object information contained in current and previous frames to construct appearance locate with frame t + 1. This method may work well if just fluctuates short time intervals. Nevertheless, suboptimal locations will be generated 1 visual changes substantially from model. Then, continuous would accumulate errors finally result failure. To copy this problem, paper we propose novel algorithm -...

10.1109/cvpr.2012.6247884 article EN 2009 IEEE Conference on Computer Vision and Pattern Recognition 2012-06-01

Moving object detection is an essential, well-studied but still open problem in computer vision and plays a fundamental role many applications. Traditional approaches usually reconstruct background images with hand-crafted visual features, such as color, texture, edge. Due to lack of prior knowledge or semantic information, it difficult deal complicated rapid changing scenes. To exploit the temporal structure pixel-level this paper, we propose end-to-end deep sequence learning architecture...

10.1109/tcsvt.2017.2770319 article EN IEEE Transactions on Circuits and Systems for Video Technology 2017-11-06

The Thermal Infrared Visual Object Tracking challenge 2015, VOT-TIR2015, aims at comparing short-term single-object visual trackers that work on thermal infrared (TIR) sequences and do not apply pre-learned models of object appearance. VOT-TIR2015 is the first benchmark tracking in TIR sequences. Results 24 are presented. For each participating tracker, a short description provided appendix. based VOT2013 challenge, but introduces following novelties: (i) newly collected LTIR (Link -- ping...

10.1109/iccvw.2015.86 article EN 2015-12-01

Correlation filter (CF) based trackers are currently ranked top in terms of their performances. Nevertheless, only some them, such as KCF [26] and MKCF [48], able to exploit the powerful discriminability non-linear kernels. Although achieves more than through introducing multi-kernel learning (MKL) into KCF, its improvement over is quite limited computational burden increases significantly comparison with KCF. In this paper, we will introduce MKL a different way MKCF. We reformulate version...

10.1109/cvpr.2018.00512 article EN 2018-06-01

Despite considerable similarities between multiple object tracking (MOT) and single (SOT) tasks, modern MOT methods have not benefited from the development of SOT ones to achieve satisfactory performance. The major reason for this situation is that it inappropriate inefficient apply models directly task, although advanced are strong discriminative power can run at fast speeds.In paper, we propose a novel end-to-end trainable architecture extends CenterNet by adding an branch objects in...

10.1109/cvpr46437.2021.00248 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2021-06-01

To address the problem of long-tail distribution for large vocabulary object detection task, existing methods usually divide whole categories into several groups and treat each group with different strategies. These bring following two problems. One is training inconsistency between adjacent similar sizes, other that learned model lack discrimination tail which are semantically to some head categories. In this paper, we devise a novel Adaptive Class Suppression Loss (ACSL) effectively tackle...

10.1109/cvpr46437.2021.00312 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2021-06-01

Transformer has achieved great success in computer vision, while how to split patches an image remains a problem. Existing methods usually use fixed-size patch embedding which might destroy the semantics of objects. To address this problem, we propose new Deformable Patch (DePatch) module learns adaptively images into with different positions and scales data-driven way rather than using predefined fixed patches. In way, our method can well preserve The DePatch work as plug-and-play module,...

10.1145/3474085.3475467 article EN Proceedings of the 30th ACM International Conference on Multimedia 2021-10-17

Transformer has been widely used for self-supervised pre-training in Natural Language Processing (NLP) and achieved great success. However, it not fully explored visual learning. Meanwhile, previous methods only consider the high-level feature learning representation from a global perspective, which may fail to transfer downstream dense prediction tasks focusing on local features. In this paper, we present novel Masked Self-supervised approach named MST, can explicitly capture context of an...

10.48550/arxiv.2106.05656 preprint EN cc-by arXiv (Cornell University) 2021-01-01

Recently, deep learning based facial landmark detection has achieved great success. Despite this, we notice that the semantic ambiguity greatly degrades performance. Specifically, means some landmarks (e.g. those evenly distributed along face contour) do not have clear and accurate definition, causing inconsistent annotations by annotators. Accordingly, these annotations, which are usually provided public databases, commonly work as ground-truth to supervise network training, leading...

10.1109/cvpr.2019.00358 preprint EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2019-06-01

Image matting plays an important role in image and video editing. However, the formulation of is inherently ill-posed. Traditional methods usually employ interaction to deal with problem trimaps strokes, cannot run on mobile phone real-time. In this paper, we propose a real-time automatic deep approach for devices. By leveraging densely connected blocks dilated convolution, light full convolutional network designed predict coarse binary mask portrait image. And feathering block, which...

10.1145/3123266.3123286 article EN Proceedings of the 30th ACM International Conference on Multimedia 2017-10-19
Coming Soon ...