NFDI4DS | UHH-SEMS - Publication Details

Mengxue Qu

ORCID: 0000-0001-9432-0205

Publications

Citations

Views

---

Saved

---

About

Contact & Profiles

A5009938315

Research Areas

Multimodal Machine Learning Applications
Human Pose and Action Recognition
Topic Modeling
Advanced Image and Video Retrieval Techniques
Industrial Vision Systems and Defect Detection
Domain Adaptation and Few-Shot Learning
Advanced Neural Network Applications
Adversarial Robustness in Machine Learning
Medical Image Segmentation Techniques
Subtitles and Audiovisual Media
Oceanographic and Atmospheric Processes
Heavy metals in environment
Arctic and Antarctic ice dynamics
Plant Stress Responses and Tolerance
Aluminum toxicity and tolerance in plants and animals
Advanced Vision and Imaging
Epigenetics and DNA Methylation
Video Analysis and Summarization
Cancer-related gene regulation
Explainable Artificial Intelligence (XAI)
Artificial Intelligence in Healthcare and Education
Underwater Acoustics Research
Plant tissue culture and regeneration
Plant Virus Research Studies
Visual Attention and Saliency Detection

Beijing Jiaotong University
2022-2025

Wuhan University
2023

Shandong Agricultural University
2023

First Institute of Oceanography
2022

Ministry of Natural Resources
2022

RelFormer: Advancing contextual relations for transformer-based dense captioning

OPENALEX - Publications

Weiming Jin Mengxue Qu Caijuan Shi Yao Zhao Yunchao Wei

10.1016/j.cviu.2025.104300 article EN Computer Vision and Image Understanding 2025-01-01

AdGPT: Explore Meaningful Advertising with ChatGPT

OPENALEX - Publications

Jiannan Huang Mengxue Qu L. Li Yunchao Wei

Advertising is pervasive in everyday life. Some advertisements are not as readily comprehensible, they convey a deeper message or purpose, which referred to “meaningful advertising”. These ads often aim create an emotional connection with the audience promote social cause. Developing method for automatically understanding meaningful advertising would be advantageous dissemination and creation of such ads. However, current models ad primarily focus on superficial aspects images. In this...

10.1145/3720546 article EN ACM Transactions on Multimedia Computing Communications and Applications 2025-02-27

Learning to Segment Every Referring Object Point by Point

OPENALEX - Publications

Mengxue Qu Yu Wu Yunchao Wei Wu Liu Xiaodan Liang and 1 more

Referring Expression Segmentation (RES) can facilitate pixel-level semantic alignment between vision and language. Most of the existing RES approaches require massive annotations, which are expensive exhaustive. In this paper, we propose a new partially supervised training paradigm for RES, i.e., using abundant referring bounding boxes only few (e.g., 1%) masks. To maximize transferability from REC model, construct our model based on point-based sequence prediction model. We co-content...

10.1109/cvpr52729.2023.00295 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2023-06-01

ChatVTG: Video Temporal Grounding via Chat with Video Dialogue Large Language Models

OPENALEX - Publications

Mengxue Qu Xiaohong Chen Wu Liu Alicia Li Yao Zhao

10.1109/cvprw63382.2024.00191 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) 2024-06-17

Differences of cadmium uptake and accumulation in roots of two maize varieties (Zea mays L.)

OPENALEX - Publications

Mengxue Qu Jie Song Hao Ren Bin Zhao Jiwang Zhang and 2 more

10.1007/s11356-023-29340-9 article EN Environmental Science and Pollution Research 2023-08-16

Intent3D: 3D Object Detection in RGB-D Scans Based on Human Intention

OPENALEX - Publications

Weitai Kang Mengxue Qu Jyoti Kini Yunchao Wei Mubarak Shah and 1 more

In real-life scenarios, humans seek out objects in the 3D world to fulfill their daily needs or intentions. This inspires us introduce intention grounding, a new task object detection employing RGB-D, based on human intention, such as "I want something support my back". Closely related, visual grounding focuses understanding reference. To achieve it relies observe scene, reason target that aligns with ("pillow" this case), and finally provide reference AI system, "A pillow couch". Instead,...

10.48550/arxiv.2405.18295 preprint EN arXiv (Cornell University) 2024-05-28

Single-Frame Supervision for Spatio-Temporal Video Grounding

OPENALEX - Publications

K. J. Ray Liu Mengxue Qu Yang Liu Yunchao Wei Wenming Zhe and 2 more

Spatio-Temporal Video Grounding (STVG) aims at localizing the spatio-temporal tube of a specific object in an untrimmed video given free-form natural language query. As annotation tubes is labor intensive, researchers are motivated to explore weakly supervised approaches recent works, which usually results significant performance degradation. To achieve less expensive STVG method with acceptable accuracy, this work investigates "single-frame supervision" paradigm that requires single frame...

10.1109/tpami.2024.3415087 article EN IEEE Transactions on Pattern Analysis and Machine Intelligence 2024-01-01

RIO: A Benchmark for Reasoning Intention-Oriented Objects in Open Environments

OPENALEX - Publications

Mengxue Qu Yu Wu Wu Liu Xiaodan Liang Jingkuan Song and 2 more

Intention-oriented object detection aims to detect desired objects based on specific intentions or requirements. For instance, when we desire "lie down and rest", instinctively seek out a suitable option such as "bed" "sofa" that can fulfill our needs. Previous work in this area is limited either by the number of intention descriptions affordance vocabulary available for objects. These limitations make it challenging handle open environments effectively. To facilitate research, construct...

10.48550/arxiv.2310.17290 preprint EN other-oa arXiv (Cornell University) 2023-01-01

ACTRESS: Active Retraining for Semi-supervised Visual Grounding

OPENALEX - Publications

Weitai Kang Mengxue Qu Yunchao Wei Yan Yan Yan

Semi-Supervised Visual Grounding (SSVG) is a new challenge for its sparse labeled data with the need multimodel understanding. A previous study, RefTeacher, makes first attempt to tackle this task by adopting teacher-student framework provide pseudo confidence supervision and attention-based supervision. However, approach incompatible current state-of-the-art visual grounding models, which follow Transformer-based pipeline. These pipelines directly regress results without region proposals or...

10.48550/arxiv.2407.03251 preprint EN arXiv (Cornell University) 2024-07-03

ChatVTG: Video Temporal Grounding via Chat with Video Dialogue Large Language Models

OPENALEX - Publications

Mengxue Qu Xiaohong Chen Wu Liu Alicia Li Yao Zhao

Video Temporal Grounding (VTG) aims to ground specific segments within an untrimmed video corresponding the given natural language query. Existing VTG methods largely depend on supervised learning and extensive annotated data, which is labor-intensive prone human biases. To address these challenges, we present ChatVTG, a novel approach that utilizes Dialogue Large Language Models (LLMs) for zero-shot temporal grounding. Our ChatVTG leverages LLMs generate multi-granularity segment captions...

10.48550/arxiv.2410.12813 preprint EN arXiv (Cornell University) 2024-10-01

ROSE: Revolutionizing Open-Set Dense Segmentation with Patch-Wise Perceptual Large Multimodal Model

OPENALEX - Publications

Kunyang Han Y Hu Mengxue Qu Hailin Shi Yao Zhao and 1 more

Advances in CLIP and large multimodal models (LMMs) have enabled open-vocabulary free-text segmentation, yet existing still require predefined category prompts, limiting free-form self-generation. Most segmentation LMMs also remain confined to sparse predictions, restricting their applicability open-set environments. In contrast, we propose ROSE, a Revolutionary Open-set dense SEgmentation LMM, which enables mask prediction open-category generation through patch-wise perception. Our method...

10.48550/arxiv.2412.00153 preprint EN arXiv (Cornell University) 2024-11-29

Objective array design for three-dimensional temperature and salinity observation: Application to the South China Sea

OPENALEX - Publications

Mengxue Qu Zexun Wei Yanfeng Wang Yonggang Wang Tengfei Xu

10.1007/s13131-021-1975-z article EN Acta Oceanologica Sinica 2022-07-01

SiRi: A Simple Selective Retraining Mechanism for Transformer-based Visual Grounding

OPENALEX - Publications

Mengxue Qu Yu Wu Wu Liu Qiqi Gong Xiaodan Liang and 3 more

In this paper, we investigate how to achieve better visual grounding with modern vision-language transformers, and propose a simple yet powerful Selective Retraining (SiRi) mechanism for challenging task. Particularly, SiRi conveys significant principle the research of grounding, i.e., initialized encoder would help model converge local minimum, advancing performance accordingly. specific, continually update parameters as training goes on, while periodically re-initialize rest compel be...

10.48550/arxiv.2207.13325 preprint EN other-oa arXiv (Cornell University) 2022-01-01

Coming Soon ...