Yiming Cui

ORCID: 0000-0003-2423-8972
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Advanced Image and Video Retrieval Techniques
  • Advanced Neural Network Applications
  • Video Surveillance and Tracking Methods
  • Multimodal Machine Learning Applications
  • Robotics and Sensor-Based Localization
  • Visual Attention and Saliency Detection
  • Domain Adaptation and Few-Shot Learning
  • Autonomous Vehicle Technology and Safety
  • Automated Road and Building Extraction
  • Remote Sensing and LiDAR Applications
  • Human Pose and Action Recognition
  • Yersinia bacterium, plague, ectoparasites research
  • Plant-based Medicinal Research
  • Bacillus and Francisella bacterial research
  • Plant Micronutrient Interactions and Effects
  • Plant nutrient uptake and metabolism
  • Cancer-related molecular mechanisms research
  • 3D Surveying and Cultural Heritage
  • Engineering Applied Research
  • Statistical Methods and Inference
  • Multi-Criteria Decision Making
  • Nuclear reactor physics and engineering
  • Video Coding and Compression Technologies
  • Bayesian Modeling and Causal Inference
  • Medical Imaging and Analysis

University of Florida
2020-2024

Fujian Agriculture and Forestry University
2023

Zhoukou Normal University
2023

Institute of Microbiology
2023

Wenzhou-Kean University
2023

Jilin University
2020

Video instance segmentation (VIS) is a new and critical task in computer vision. To date, top-performing VIS methods extend the two-stage Mask R-CNN by adding tracking branch, leaving plenty of room for improvement. In contrast, we approach from perspective propose one-stage spatial granularity network (SG-Net). Compared to conventional methods, SG-Net demonstrates four advantages: 1) Our method has compact architecture each head (detection, segmentation, tracking) crafted interdependently...

10.1109/cvpr46437.2021.00969 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2021-06-01

Video objection detection is a challenging task because isolated video frames may encounter appearance deterioration, which introduces great confusion for detection. One of the popular solutions to exploit temporal information and enhance per-frame representation through aggregating features from neighboring frames. Despite achieving improvements in detection, existing methods focus on selection higher-level aggregation rather than modeling lower-level relations increase feature...

10.1109/iccv48922.2021.00803 article EN 2021 IEEE/CVF International Conference on Computer Vision (ICCV) 2021-10-01

In this work, we introduce a Denser Feature Network(DenserNet) for visual localization. Our work provides three principal contributions. First, develop convolutional neural network (CNN) architecture which aggregates feature maps at different semantic levels image representations. Using denser maps, our method can produce more key point features and increase retrieval accuracy. Second, model is trained end-to-end without pixel-level an-notation other than positive negative GPS-tagged pairs....

10.1609/aaai.v35i7.16760 article EN Proceedings of the AAAI Conference on Artificial Intelligence 2021-05-18

Video captioning is a challenging task as it needs to accurately transform visual understanding into natural language description. To date, state-of-the-art methods inadequately model global-local representation across video frames for caption generation, leaving plenty of room improvement. In this work, we approach the from new perspective and propose GL-RG framework captioning, namely Global-Local Representation Granularity. Our demonstrates three advantages over prior efforts: 1)...

10.24963/ijcai.2022/384 article EN Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence 2022-07-01

As the size of transformer-based, models continues to grow, fine-tuning these large-scale pretrained vision for new tasks has become increasingly parameter-intensive. Parameter-efficient learning been developed reduce number tunable parameters during fine-tuning. Although methods show promising results, there is still a significant performance gap compared full To address this challenge, we propose an Effective and Efficient Visual Prompt Tuning (E <sup...

10.1109/iccv51070.2023.01604 article EN 2021 IEEE/CVF International Conference on Computer Vision (ICCV) 2023-10-01

Geo-localization is a critical task in computer vision. In this work, we cast the geo-localization as 2D image retrieval task. Current state-of-the-art methods for are not robust to locate scene with drastic scale variations because they only exploit features from one semantic level representations. To address limitation, introduce hierarchical attention fusion network using multi-scale geo-localization. We extract feature maps convolutional neural (CNN) and organically fuse extracted Our...

10.1109/icassp39728.2021.9414517 article EN ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2021-05-13

Accurate localization is a foundational capacity, required for autonomous vehicles to accomplish other tasks such as navigation or path planning. It common practice use GPS acquire location information. However, the application of can result in severe challenges when run within inner city where different kinds structures may shadow signal and lead inaccurate results. To address urban settings, we propose novel feature voting technique visual localization. Different from conventional...

10.1109/icpr48806.2021.9411961 article EN 2022 26th International Conference on Pattern Recognition (ICPR) 2021-01-10

Multi-object tracking and segmentation (MOTS) is a critical task for autonomous driving applications. The existing MOTS studies face two challenges: 1) the published datasets inadequately capture real-world complexity network training to address various settings; 2) working pipeline annotation tool under-studied in literature improve quality of learning examples. In this work, we introduce DG-Labeler DGL-MOTS dataset facilitate data accordingly accuracy efficiency. uses novel...

10.1109/wacv51458.2022.00347 article EN 2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) 2022-01-01

The advancement of computer vision has pushed visual analysis tasks from still images to the video domain. In recent years, instance segmentation, which aims at tracking and segment multiple objects in frames, drawn much attention for its potential applications various emerging areas such as autonomous driving, intelligent transportation, smart retail. this article, we propose an effective framework instance-level on can simultaneously conduct object detection, multi-object tracking. core...

10.1145/3632181 article EN ACM Journal on Autonomous Transportation Systems 2023-11-10

Cervical cancer is the fourth most common in women and its subtyping requires examining histopathological slides or digital images, such as whole slide images (WSIs). However, manually inspecting WSIs with gigapixel sizes can be laborious prone to errors for pathologists. To address this issue, computer-aided approaches based on weakly-supervised learning techniques have been proposed. These methods predict disease types directly from highlight diagnosis-relevant regions, which help...

10.1109/tcsvt.2023.3294938 article EN IEEE Transactions on Circuits and Systems for Video Technology 2023-07-13

Video object detection needs to solve feature degradation situations that rarely happen in the image domain. One solution is use temporal information and fuse features from neighboring frames. With Transformer-based detectors getting a better performance on domain tasks, recent works began extend those methods video detection. However, existing still follow same pipeline as used for classical detectors, like enhancing representations by aggregation. In this work, we take different...

10.1109/cvpr52729.2023.00616 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2023-06-01

Object detection is a fundamental task for autonomous driving systems. One bottleneck hindering accuracy shortage of well-annotated image data. Virtual reality has provided feasible low-cost way to facilitate computer vision related developments. In area, existing public datasets from real world generally have data biases and cannot represent wide range weather conditions, such as rainy or snowy roads. To address this challenge, we introduce new large-scale simulation dataset which generated...

10.1109/ijcnn48605.2020.9206716 article EN 2022 International Joint Conference on Neural Networks (IJCNN) 2020-07-01

Visual-based perception is the key module for autonomous driving. Among those visual tasks, video object detection a primary yet challenging one because of feature degradation caused by fast motion or multiple poses. Current models usually aggregate features from neighboring frames to enhance representations task heads generate more accurate predictions. Though getting better performance, these methods rely on information future and suffer high computational complexity. Meanwhile,...

10.1145/3674117 article EN ACM Journal on Autonomous Transportation Systems 2024-06-20

Indoor navigation is a challenging task for mobile agents. The latest vision-based indoor methods make remarkable progress in this field but do not fully leverage visual information policy learning and struggle to perform well unseen scenes. To address the existing limitations, we present multimodal vision fusion model (MVFM). We implement joint modality of different image recognition networks learning. proposed incorporates object detection target searching, depth estimation distance...

10.1109/ijcnn48605.2020.9207265 article EN 2022 International Joint Conference on Neural Networks (IJCNN) 2020-07-01

This paper presents CLUSTERFORMER, a universal vision model that is based on the CLUSTERing paradigm with TransFORMER. It comprises two novel designs: 1. recurrent cross-attention clustering, which reformulates mechanism in Transformer and enables recursive updates of cluster centers to facilitate strong representation learning; 2. feature dispatching, uses updated redistribute image features through similarity-based metrics, resulting transparent pipeline. elegant design streamlines an...

10.48550/arxiv.2309.13196 preprint EN cc-by arXiv (Cornell University) 2023-01-01

Object detection is a basic computer vision task to loccalize and categorize objects in given image. Most state-of-the-art methods utilize fixed number of proposals as an intermediate representation object candidates, which unable adapt different computational constraints during inference. In this paper, we propose simple yet effective method adaptive resources by generating dynamic for detection. We first design module make single query-based model be able inference with numbers proposals....

10.48550/arxiv.2207.05252 preprint EN other-oa arXiv (Cornell University) 2022-01-01

Spatial transcriptomics (ST) has become an important methodology in the analysis of tumor microenvironment (TME) due to its ability provide gene expression information with spatial resolution, enabling identification and characterization TME markers. Deep learning methods are proposed for analyzing transcriptomic data clustering regions based on expression. However, deep often imposed by errors, which can impact accuracy quantification identification. To address this issue, we propose a...

10.1109/tcsvt.2023.3301677 article EN IEEE Transactions on Circuits and Systems for Video Technology 2023-08-03

Understanding a plant's root system architecture (RSA) is crucial for variety of plant science problem domains including sustainability and climate adaptation. Minirhizotron (MR) technology widely-used approach phenotyping RSA non-destructively by capturing imagery over time. Precisely segmenting roots from the soil in MR critical step studying features. In this paper, we introduce large-scale dataset images captured technology. total, there are 72K RGB across six different species cotton,...

10.48550/arxiv.2201.08002 preprint EN cc-by arXiv (Cornell University) 2022-01-01

Video instance segmentation (VIS) is a new and critical task in computer vision. To date, top-performing VIS methods extend the two-stage Mask R-CNN by adding tracking branch, leaving plenty of room for improvement. In contrast, we approach from perspective propose one-stage spatial granularity network (SG-Net). Compared to conventional methods, SG-Net demonstrates four advantages: 1) Our method has compact architecture each head (detection, segmentation, tracking) crafted interdependently...

10.48550/arxiv.2103.10284 preprint EN other-oa arXiv (Cornell University) 2021-01-01

Transformer-based detection and segmentation methods use a list of learned queries to retrieve information from the transformer network learn predict location category one specific object each query. We empirically find that random convex combinations are still good for corresponding models. then propose combination with dynamic coefficients based on high-level semantics image. The generated queries, named modulated better capture prior locations categories in different images. Equipped our...

10.48550/arxiv.2307.12239 preprint EN other-oa arXiv (Cornell University) 2023-01-01
Coming Soon ...