Qimeng Wang

ORCID: 0000-0002-9715-836X
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Advanced Neural Network Applications
  • Multimodal Machine Learning Applications
  • Advanced Image and Video Retrieval Techniques
  • Domain Adaptation and Few-Shot Learning
  • Natural Language Processing Techniques
  • Industrial Vision Systems and Defect Detection
  • Topic Modeling
  • Dental Radiography and Imaging
  • Video Analysis and Summarization
  • Dental Research and COVID-19
  • Intelligent Tutoring Systems and Adaptive Learning
  • COVID-19 diagnosis using AI
  • AI in cancer detection
  • Image and Object Detection Techniques
  • Human Pose and Action Recognition
  • Non-Destructive Testing Techniques
  • Integrated Circuits and Semiconductor Failure Analysis
  • Virtual Reality Applications and Impacts
  • VLSI and Analog Circuit Testing
  • Visual Attention and Saliency Detection
  • Advanced Optical Imaging Technologies
  • Anomaly Detection Techniques and Applications
  • Image and Video Quality Assessment
  • Engineering and Test Systems
  • Advanced Data Compression Techniques

Jiangnan University
2024-2025

Huazhong University of Science and Technology
2020-2022

University of Science and Technology Beijing
2021

Object detection has recently experienced substantial progress. Yet, the widely adopted horizontal bounding box representation is not appropriate for ubiquitous oriented objects such as in aerial images and scene texts. In this paper, we propose a simple yet effective framework to detect multi-oriented objects. Instead of directly regressing four vertices, glide vertex on each corresponding side accurately describe object. Specifically, We regress length ratios characterizing relative...

10.1109/tpami.2020.2974745 article EN publisher-specific-oa IEEE Transactions on Pattern Analysis and Machine Intelligence 2020-02-18

Temporal action detection (TAD) aims to determine the semantic label and temporal interval of every instance in an untrimmed video. It is a fundamental challenging task video understanding. Previous methods tackle this with complicated pipelines. They often need train multiple networks involve hand-designed operations, such as non-maximal suppression anchor generation, which limit flexibility prevent end-to-end learning. In paper, we propose Transformer-based method for TAD, termed TadTR....

10.1109/tip.2022.3195321 article EN IEEE Transactions on Image Processing 2022-01-01

Multi-modal large language models (MLLMs), such as GPT-4, exhibit great comprehension capabilities on human instruction, well zero-shot ability new downstream multi-modal tasks. To integrate the different modalities within a unified embedding space, previous MLLMs attempted to conduct visual instruction tuning with massive and high-quality image-text pair data, which requires substantial costs in data collection training resources. In this article, we propose TOMGPT (Text-Only GPT),...

10.1145/3654674 article EN ACM Transactions on Knowledge Discovery from Data 2024-03-28

Abstract In defect detection on metal surfaces, there are many small defects with subtle features that difficult to distinguish from the background environment using mainstream object methods. To alleviate this issue, study proposes an improved CenterNet model for enhancing of namely MSDD. work, we utilize attention mechanism reconstruct basic feature extraction module in network, aiming enhance focus related defects. Additionally, redesign efficient deconvolution extract multi‐scale...

10.1002/adts.202301230 article EN Advanced Theory and Simulations 2024-06-11

Non-maximum suppression (NMS) is widely used in object detection pipelines for removing duplicated bounding boxes. The inconsistency between the confidence NMS and real localization seriously affects performance. Prior works propose to predict Intersection-over-Union (IoU) boxes corresponding ground-truths improve NMS, while accurately predicting IoU still a challenging problem. We argue that complex definition of feature misalignment make it difficult accurately. In this paper, we novel...

10.1145/3474085.3475707 article EN Proceedings of the 30th ACM International Conference on Multimedia 2021-10-17

A well-known dilemma in large vision-language models (e.g., GPT-4, LLaVA) is that while increasing the number of vision tokens generally enhances visual understanding, it also significantly raises memory and computational costs, especially long-term, dense video frame streaming scenarios. Although learnable approaches like Q-Former Perceiver Resampler have been developed to reduce token burden, they overlook context causally modeled by LLMs (i.e., key-value cache), potentially leading missed...

10.48550/arxiv.2408.16730 preprint EN arXiv (Cornell University) 2024-08-29

Instructional documents are rich sources of knowledge for completing various tasks, yet their unique challenges in conversational question answering (CQA) have not been thoroughly explored. Existing benchmarks primarily focused on basic factual question-answering from single narrative documents, making them inadequate assessing a model`s ability to comprehend complex real-world instructional and provide accurate step-by-step guidance daily life. To bridge this gap, we present InsCoQA, novel...

10.48550/arxiv.2410.00526 preprint EN arXiv (Cornell University) 2024-10-01

This article introduces a universal semiconductor Automatic Test Pattern Generation (ATPG) solution for Automated Equipment (ATE) platform. With the increasing trend of Artificial Intelligence (AI) and Advanced Driving Assistance System (ADAS) communication between devices requires advanced protocols such as Mobile Industry Processor Interface (MIPI) Point-to-point (P2P) protocols. A designer-based is developed to provide one-click software approach create test vectors common customized As...

10.1109/cstic52283.2021.9461259 article EN 2022 China Semiconductor Technology International Conference (CSTIC) 2021-03-14

CLIP (Contrastive Language-Image Pretraining) is well-developed for open-vocabulary zero-shot image-level recognition, while its applications in pixel-level tasks are less investigated, where most efforts directly adopt features without deliberative adaptations. In this work, we first demonstrate the necessity of image-pixel feature adaption, then provide Multi-View Prompt learning (MVP-SEG) as an effective solution to achieve adaptation and solve semantic segmentation. Concretely, MVP-SEG...

10.48550/arxiv.2304.06957 preprint EN other-oa arXiv (Cornell University) 2023-01-01

Non-maximum suppression (NMS) is widely used in object detection pipelines for removing duplicated bounding boxes. The inconsistency between the confidence NMS and real localization seriously affects performance. Prior works propose to predict Intersection-over-Union (IoU) boxes corresponding ground-truths improve NMS, while accurately predicting IoU still a challenging problem. We argue that complex definition of feature misalignment make it difficult accurately. In this paper, we novel...

10.48550/arxiv.2202.00866 preprint EN other-oa arXiv (Cornell University) 2022-01-01
Coming Soon ...