Tianheng Cheng

ORCID: 0009-0003-4100-1659
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Advanced Neural Network Applications
  • Advanced Image and Video Retrieval Techniques
  • Multimodal Machine Learning Applications
  • Domain Adaptation and Few-Shot Learning
  • Human Pose and Action Recognition
  • Robotics and Sensor-Based Localization
  • Advanced Vision and Imaging
  • Anomaly Detection Techniques and Applications
  • Image Retrieval and Classification Techniques
  • Data Management and Algorithms
  • Medical Image Segmentation Techniques
  • Video Surveillance and Tracking Methods
  • Image Processing Techniques and Applications
  • Image and Object Detection Techniques
  • Natural Language Processing Techniques
  • Autonomous Vehicle Technology and Safety
  • Generative Adversarial Networks and Image Synthesis
  • 3D Surveying and Cultural Heritage
  • Automated Road and Building Extraction
  • Text and Document Classification Technologies
  • Industrial Vision Systems and Defect Detection
  • Infrared Target Detection Methodologies
  • 3D Shape Modeling and Analysis
  • Video Analysis and Summarization
  • Human Motion and Animation

Shenyang Pharmaceutical University
2024

Huazhong University of Science and Technology
2018-2024

Horizon Robotics (China)
2022

High-resolution representations are essential for position-sensitive vision problems, such as human pose estimation, semantic segmentation, and object detection. Existing state-of-the-art frameworks first encode the input image a low-resolution representation through subnetwork that is formed by connecting high-to-low resolution convolutions <i>in series</i> (e.g., ResNet, VGGNet), then recover high-resolution from encoded representation. Instead, our proposed network, named High-Resolution...

10.1109/tpami.2020.2983686 article EN IEEE Transactions on Pattern Analysis and Machine Intelligence 2020-04-01

We present MMDetection, an object detection toolbox that contains a rich set of and instance segmentation methods as well related components modules. The started from codebase MMDet team who won the track COCO Challenge 2018. It gradually evolves into unified platform covers many popular contemporary not only includes training inference codes, but also provides weights for more than 200 network models. believe this is by far most complete toolbox. In paper, we introduce various features...

10.48550/arxiv.1906.07155 preprint EN other-oa arXiv (Cornell University) 2019-01-01

High-resolution representation learning plays an essential role in many vision problems, e.g., pose estimation and semantic segmentation. The high-resolution network (HRNet)~\cite{SunXLW19}, recently developed for human estimation, maintains representations through the whole process by connecting high-to-low resolution convolutions \emph{parallel} produces strong repeatedly conducting fusions across parallel convolutions. In this paper, we conduct a further study on introducing simple yet...

10.48550/arxiv.1904.04514 preprint EN other-oa arXiv (Cornell University) 2019-01-01

Configuration tuning is vital to optimize the performance of database management system (DBMS). It becomes more tedious and urgent for cloud databases (CDB) due diverse instances query workloads, which make administrator (DBA) incompetent. Although there are some studies on automatic DBMS configuration tuning, they have several limitations. Firstly, adopt a pipelined learning model but cannot overall in an end-to-end manner. Secondly, rely large-scale high-quality training samples hard...

10.1145/3299869.3300085 article EN Proceedings of the 2022 International Conference on Management of Data 2019-06-18

High-resolution representations are essential for position-sensitive vision problems, such as human pose estimation, semantic segmentation, and object detection. Existing state-of-the-art frameworks first encode the input image a low-resolution representation through subnetwork that is formed by connecting high-to-low resolution convolutions \emph{in series} (e.g., ResNet, VGGNet), then recover high-resolution from encoded representation. Instead, our proposed network, named High-Resolution...

10.48550/arxiv.1908.07919 preprint EN other-oa arXiv (Cornell University) 2019-01-01

In this paper, we propose a conceptually novel, efficient, and fully convolutional framework for real-time instance segmentation. Previously, most segmentation methods heavily rely on object detection perform mask prediction based bounding boxes or dense centers. contrast, sparse set of activation maps, as new representation, to high-light informative regions each foreground object. Then instance-level features are obtained by aggregating according the highlighted recognition Moreover,...

10.1109/cvpr52688.2022.00439 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2022-06-01

10.1109/cvpr52733.2024.01599 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2024-06-16

Labeling objects with pixel-wise segmentation requires a huge amount of human labor compared to bounding boxes. Most existing methods for weakly supervised instance focus on designing heuristic losses priors from While, we find that box-supervised can produce some fine masks and wonder whether the detectors could learn these while ignoring low-quality masks. To answer this question, present BoxTeacher, an efficient end-to-end training framework high-performance segmentation, which leverages...

10.1109/cvpr52729.2023.00307 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2023-06-01

High-definition (HD) map provides abundant and precise environmental information of the driving scene, serving as a fundamental indispensable component for planning in autonomous system. We present MapTR, structured end-to-end Transformer efficient online vectorized HD construction. propose unified permutation-equivalent modeling approach, i.e., element point set with group equivalent permutations, which accurately describes shape stabilizes learning process. design hierarchical query...

10.48550/arxiv.2208.14437 preprint EN cc-by-nc-sa arXiv (Cornell University) 2022-01-01

10.1109/cvpr52733.2024.01915 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2024-06-16

3D detection based on surround-view camera system is a critical technique in autopilot. In this work, we present Polar Parametrization for detection, which reformulates position parametrization, velocity decomposition, perception range, label assignment and loss function polar coordinate system. establishes explicit associations between image patterns prediction targets, exploiting the view symmetry of cameras as inductive bias to ease optimization boost performance. Based Parametrization,...

10.48550/arxiv.2206.10965 preprint EN cc-by-nc-sa arXiv (Cornell University) 2022-01-01

Learning Bird's Eye View (BEV) representation from surrounding-view cameras is of great importance for autonomous driving. In this work, we propose a Geometry-guided Kernel Transformer (GKT), novel 2D-to-BEV learning mechanism. GKT leverages the geometric priors to guide transformer focus on discriminative regions and unfolds kernel features generate BEV representation. For fast inference, further introduce look-up table (LUT) indexing method get rid camera's calibrated parameters at...

10.48550/arxiv.2206.04584 preprint EN cc-by-nc-nd arXiv (Cornell University) 2022-01-01

The You Only Look Once (YOLO) series of detectors have established themselves as efficient and practical tools. However, their reliance on predefined trained object categories limits applicability in open scenarios. Addressing this limitation, we introduce YOLO-World, an innovative approach that enhances YOLO with open-vocabulary detection capabilities through vision-language modeling pre-training large-scale datasets. Specifically, propose a new Re-parameterizable Vision-Language Path...

10.48550/arxiv.2401.17270 preprint EN arXiv (Cornell University) 2024-01-30

Recent Multimodal Large Language Models (MLLMs) have achieved remarkable performance but face deployment challenges due to their quadratic computational complexity, growing Key-Value cache requirements, and reliance on separate vision encoders. We propose mmMamba, a framework for developing linear-complexity native multimodal state space models through progressive distillation from existing MLLMs using moderate academic resources. Our approach enables the direct conversion of trained...

10.48550/arxiv.2502.13145 preprint EN arXiv (Cornell University) 2025-02-18

We have developed a structure-based virtual screening approach to explore non-sulfonamide CA IX inhibitors exhibiting distinctive structures in the FDA database.

10.1039/d3cp05846h article EN Physical Chemistry Chemical Physics 2024-01-01

Recently, the semantics of scene text has been proven to be essential in fine-grained image classification. However, existing methods mainly exploit literal meaning for recognition, which might irrelevant when it is not significantly related objects/scenes. We propose an end-to-end trainable network that mines implicit contextual knowledge behind and enhance correlation fine-tune representation. Unlike methods, our model integrates three modalities: visual feature extraction, correlating...

10.1109/cvpr52688.2022.00458 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2022-06-01

Image restoration aims to reconstruct degraded images, e.g., denoising or deblurring. Existing works focus on designing task-specific methods and there are inadequate attempts at universal methods. However, simply unifying multiple tasks into one architecture suffers from uncontrollable undesired predictions. To address those issues, we explore prompt learning in architectures for image tasks. In this paper, present Degradation-aware Visual Prompts, which encode various types of degradation,...

10.48550/arxiv.2306.13653 preprint EN cc-by arXiv (Cornell University) 2023-01-01

Small object detection requires the head to scan a large number of positions on image feature maps, which is extremely hard for computation- and energy-efficient lightweight generic detectors. To accurately detect small objects with limited computation, we propose two-stage framework low computation complexity, termed as TinyDet. It enables high-resolution maps dense anchoring better cover objects, proposes sparsely-connected convolution reduction, enhances early stage features in backbone,...

10.48550/arxiv.2304.03428 preprint EN cc-by arXiv (Cornell University) 2023-01-01

Video instance segmentation on mobile devices is an important yet very challenging edge AI problem. It mainly suffers from (1) heavy computation and memory costs for frame-by-frame pixel-level perception (2) complicated heuristics tracking objects. To address these issues, we present MobileInst, a lightweight mobile-friendly framework video devices. Firstly, MobileInst adopts vision transformer to extract multi-level semantic features presents efficient query-based dual-transformer decoder...

10.1609/aaai.v38i7.28555 article EN Proceedings of the AAAI Conference on Artificial Intelligence 2024-03-24

Recent techniques built on generative adversarial networks (GANs), such as cycle-consistent GANs, are able to learn mappings among different domains from unpaired data sets, through min–max optimization games between generators and discriminators. However, it remains challenging stabilize the training process thus cyclic models fall into mode collapse accompanied by success of discriminator. To address this problem, we propose an novel Bayesian model integrated framework for interdomain...

10.1109/tnnls.2020.3017669 article EN IEEE Transactions on Neural Networks and Learning Systems 2020-09-03

In this paper, we explore a novel point representation for 3D occupancy prediction from multi-view images, which is named Occupancy as Set of Points. Existing camera-based methods tend to exploit dense volume-based predict the whole scene, making it hard focus on special areas or out perception range. comparison, present Points Interest (PoIs) represent scene and propose OSP, framework point-based prediction. Owing inherent flexibility representation, OSP achieves strong performance compared...

10.48550/arxiv.2407.04049 preprint EN arXiv (Cornell University) 2024-07-04
Coming Soon ...