NFDI4DS | UHH-SEMS - Publication Details

Mingchen Zhuge

ORCID: 0000-0003-2561-7712

Publications

Citations

Views

---

Saved

---

About

Contact & Profiles

A5030077358

Research Areas

Advanced Image and Video Retrieval Techniques
Visual Attention and Saliency Detection
Multimodal Machine Learning Applications
Image Enhancement Techniques
Advanced Neural Network Applications
Generative Adversarial Networks and Image Synthesis
Video Analysis and Summarization
Reinforcement Learning in Robotics
Domain Adaptation and Few-Shot Learning
Speech and Audio Processing
Music and Audio Processing
Multi-Agent Systems and Negotiation
Topic Modeling
Natural Language Processing Techniques
Video Surveillance and Tracking Methods
Handwritten Text Recognition Techniques
Smart Grid Energy Management
Remote Sensing and Land Use
Cognitive Science and Education Research
Human Pose and Action Recognition
Language and cultural evolution
Scientific Computing and Data Management
Semantic Web and Ontologies
Remote-Sensing Image Classification
Simulation Techniques and Applications

King Abdullah University of Science and Technology
2023-2025

Alibaba Group (China)
2021-2023

Southern University of Science and Technology
2023

Inception Institute of Artificial Intelligence
2022

China University of Geosciences (Beijing)
2021-2022

Shandong University
2022

Alibaba Group (United States)
2021

China University of Geosciences
2020

Salient Object Detection via Integrity Learning

OPENALEX - Publications

Mingchen Zhuge Deng-Ping Fan Nian Liu Dingwen Zhang Dong Xu and 1 more

Although current salient object detection (SOD) works have achieved significant progress, they are limited when it comes to the integrity of predicted regions. We define concept at both a micro and macro level. Specifically, level, model should highlight all parts that belong certain object. Meanwhile, needs discover objects in given image. To facilitate learning for SOD, we design novel Integrity Cognition Network (ICON), which explores three important components strong features. 1) Unlike...

10.1109/tpami.2022.3179526 article EN IEEE Transactions on Pattern Analysis and Machine Intelligence 2022-01-01

Fast Camouflaged Object Detection via Edge-based Reversible Re-calibration Network

OPENALEX - Publications

Ge-Peng Ji Lei Zhu Mingchen Zhuge Keren Fu

10.1016/j.patcog.2021.108414 article EN Pattern Recognition 2021-11-02

CubeNet: X-shape connection for camouflaged object detection

OPENALEX - Publications

Mingchen Zhuge Xiankai Lu Yiyou Guo Zhihua Cai Shuhan Chen

10.1016/j.patcog.2022.108644 article EN Pattern Recognition 2022-03-09

Kaleido-BERT: Vision-Language Pre-training on Fashion Domain

OPENALEX - Publications

Mingchen Zhuge Dehong Gao Deng-Ping Fan Linbo Jin Ben Chen and 3 more

We present a new vision-language (VL) pre-training model dubbed Kaleido-BERT , which introduces novel kaleido strategy for fashion cross-modality representations from transformers. In contrast to random masking of recent VL models, we design alignment guided jointly focus more on image-text semantic relations. To this end, carry out five tasks, i.e., rotation, jigsaw, camouflage, grey-to-color, and blank-to-color self-supervised at patches different scale. is conceptually simple easy extend...

10.1109/cvpr46437.2021.01246 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2021-06-01

Masked Vision-language Transformer in Fashion

OPENALEX - Publications

Ge-Peng Ji Mingchen Zhuge Dehong Gao Deng-Ping Fan Christos Sakaridis and 1 more

Abstract We present a masked vision-language transformer (MVLT) for fashion-specific multi-modal representation. Technically, we simply utilize the vision architecture replacing bidirectional encoder representations from Transformers (BERT) in pre-training model, making MVLT first end-to-end framework fashion domain. Besides, designed image reconstruction (MIR) fine-grained understanding of fashion. is an extensible and convenient that admits raw inputs without extra pre-processing models...

10.1007/s11633-022-1394-4 article EN cc-by Deleted Journal 2023-02-27

Data Interpreter: An LLM Agent For Data Science

OPENALEX - Publications

Sirui Hong Yizhang Lin Bangbang Liu Binhao Wu Danyang Li and 19 more

Large Language Model (LLM)-based agents have demonstrated remarkable effectiveness. However, their performance can be compromised in data science scenarios that require real-time adjustment, expertise optimization due to complex dependencies among various tasks, and the ability identify logical errors for precise reasoning. In this study, we introduce Data Interpreter, a solution designed solve with code emphasizes three pivotal techniques augment problem-solving science: 1) dynamic planning...

10.48550/arxiv.2402.18679 preprint EN arXiv (Cornell University) 2024-02-28

Mindstorms in natural language-based societies of mind

OPENALEX - Publications

Mingchen Zhuge Haozhe Liu Francesco Faccio Dylan R. Ashley Róbert Csordás and 21 more

10.26599/cvm.2025.9450460 article EN cc-by Computational Visual Media 2025-02-01

Beyond Outlining: Heterogeneous Recursive Planning for Adaptive Long-form Writing with Language Models

OPENALEX - Publications

Ruibin Xiong Yimeng Chen Dmitrii Khizbullin Mingchen Zhuge Jürgen Schmidhuber

Long-form writing agents require flexible integration and interaction across information retrieval, reasoning, composition. Current approaches rely on predetermined workflows rigid thinking patterns to generate outlines before writing, resulting in constrained adaptability during writing. In this paper we propose a general agent framework that achieves human-like adaptive through recursive task decomposition dynamic of three fundamental types, i.e. Our methodology features: 1) planning...

10.48550/arxiv.2503.08275 preprint EN arXiv (Cornell University) 2025-03-11

Advances and Challenges in Foundation Agents: From Brain-Inspired Intelligence to Evolutionary, Collaborative, and Safe Systems

OPENALEX - Publications

Bang Liu Xinfeng Li Jiayi Zhang Jinlin Wang Tanjin He and 42 more

The advent of large language models (LLMs) has catalyzed a transformative shift in artificial intelligence, paving the way for advanced intelligent agents capable sophisticated reasoning, robust perception, and versatile action across diverse domains. As these increasingly drive AI research practical applications, their design, evaluation, continuous improvement present intricate, multifaceted challenges. This survey provides comprehensive overview, framing within modular, brain-inspired...

10.48550/arxiv.2504.01990 preprint EN arXiv (Cornell University) 2025-03-31

Cooperative Spectral–Spatial Attention Dense Network for Hyperspectral Image Classification

OPENALEX - Publications

Zhimin Dong Yaoming Cai Zhihua Cai Xiaobo Liu Zhao-Yu Yang and 1 more

Recently, deep learning-based methods have made great progress in hyperspectral image (HSI) classification (HSIC). Different from ordinary images, the intrinsic complexity of HSIs data still limits performance many common convolutional neural network (CNN) models. Thus, architecture becomes more and complex to extract discriminative spectral-spatial features. For instance, 3-D CNN usually has a large number trainable parameters, thus increasing computational HSIC. In this letter, we designed...

10.1109/lgrs.2020.2989437 article EN IEEE Geoscience and Remote Sensing Letters 2020-05-05

Fast Camouflaged Object Detection via Edge-based Reversible Re-calibration Network

OPENALEX - Publications

Ge-Peng Ji Lei Zhu Mingchen Zhuge Keren Fu

Camouflaged Object Detection (COD) aims to detect objects with similar patterns (e.g., texture, intensity, colour, etc) their surroundings, and recently has attracted growing research interest. As camouflaged often present very ambiguous boundaries, how determine object locations as well weak boundaries is challenging also the key this task. Inspired by biological visual perception process when a human observer discovers objects, paper proposes novel edge-based reversible re-calibration...

10.48550/arxiv.2111.03216 preprint EN cc-by arXiv (Cornell University) 2021-01-01

Salient Object Detection via Integrity Learning

OPENALEX - Publications

Mingchen Zhuge Deng-Ping Fan Nian Liu Dingwen Zhang Dong Xu and 1 more

10.48550/arxiv.2101.07663 preprint EN other-oa arXiv (Cornell University) 2021-01-01

Mindstorms in Natural Language-Based Societies of Mind

OPENALEX - Publications

Mingchen Zhuge Haozhe Liu Francesco Faccio Dylan R. Ashley Róbert Csordás and 21 more

Both Minsky's "society of mind" and Schmidhuber's "learning to think" inspire diverse societies large multimodal neural networks (NNs) that solve problems by interviewing each other in a "mindstorm." Recent implementations NN-based minds consist language models (LLMs) experts communicating through natural interface. In doing so, they overcome the limitations single LLMs, improving zero-shot reasoning. these language-based mind (NLSOMs), new agents -- all same universal symbolic are easily...

10.48550/arxiv.2305.17066 preprint EN other-oa arXiv (Cornell University) 2023-01-01

Accurate Camouflaged Object Detection via Mixture Convolution and Interactive Fusion

OPENALEX - Publications

Geng Chen Xinrui Chen Bo Dong Mingchen Zhuge Yongxiong Wang and 4 more

Camouflaged object detection (COD), which aims to identify the objects that conceal themselves into surroundings, has recently drawn increasing research efforts in field of computer vision. In practice, success deep learning based COD is mainly determined by two key factors, including (i) A significantly large receptive field, provides rich context information, and (ii) An effective fusion strategy, aggregates multi-level features for accurate COD. Motivated these observations, this paper,...

10.48550/arxiv.2101.05687 preprint EN other-oa arXiv (Cornell University) 2021-01-01

AFlow: Automating Agentic Workflow Generation

OPENALEX - Publications

Jiayi Zhang Jianhai Xiang Zhaoyang Yu Fei Teng Xionghui Chen and 9 more

Large language models (LLMs) have demonstrated remarkable potential in solving complex tasks across diverse domains, typically by employing agentic workflows that follow detailed instructions and operational sequences. However, constructing these requires significant human effort, limiting scalability generalizability. Recent research has sought to automate the generation optimization of workflows, but existing methods still rely on initial manual setup fall short achieving fully automated...

10.48550/arxiv.2410.10762 preprint EN arXiv (Cornell University) 2024-10-14

Skating-Mixer: Long-Term Sport Audio-Visual Modeling with MLPs

OPENALEX - Publications

Jingfei Xia Mingchen Zhuge Tiantian Geng Shun Fan Yuantai Wei and 2 more

Figure skating scoring is challenging because it requires judging players’ technical moves as well coordination with the background music. Most learning-based methods struggle for two reasons: 1) each move in figure changes quickly, hence simply applying traditional frame sampling will lose a lot of valuable information, especially 3 to 5 minutes lasting videos; 2) prior rarely considered critical audio-visual relationship their models. Due these reasons, we introduce novel architecture,...

10.1609/aaai.v37i3.25392 article EN Proceedings of the AAAI Conference on Artificial Intelligence 2023-06-26

Learning to Identify Critical States for Reinforcement Learning from Videos

OPENALEX - Publications

Haozhe Liu Mingchen Zhuge Bing Li Yuhui Wang Francesco Faccio and 2 more

Recent work on deep reinforcement learning (DRL) has pointed out that algorithmic information about good policies can be extracted from offline data which lack explicit executed actions [45], [46], [30]. For example, videos of humans or robots may convey a lot implicit rewarding action sequences, but DRL machine wants to profit watching such must first learn by itself identify and recognize relevant states/actions/rewards. Without relying ground-truth annotations, our new method called Deep...

10.1109/iccv51070.2023.00187 article EN 2021 IEEE/CVF International Conference on Computer Vision (ICCV) 2023-10-01

Kaleido-BERT: Vision-Language Pre-training on Fashion Domain

OPENALEX - Publications

Mingchen Zhuge Dehong Gao Deng-Ping Fan Linbo Jin Ben Chen and 3 more

We present a new vision-language (VL) pre-training model dubbed Kaleido-BERT, which introduces novel kaleido strategy for fashion cross-modality representations from transformers. In contrast to random masking of recent VL models, we design alignment guided jointly focus more on image-text semantic relations. To this end, carry out five tasks, i.e., rotation, jigsaw, camouflage, grey-to-color, and blank-to-color self-supervised at patches different scale. Kaleido-BERT is conceptually simple...

10.48550/arxiv.2103.16110 preprint EN cc-by arXiv (Cornell University) 2021-01-01

NewsNet: A Novel Dataset for Hierarchical Temporal Segmentation

OPENALEX - Publications

Haoqian Wu Keyu Chen Haozhe Liu Mingchen Zhuge Bing Li and 10 more

Temporal video segmentation is the get-to- go automatic analysis, which decomposes a long-form into smaller components for following-up understanding tasks. Recent works have studied several levels of granularity to segment video, such as shot, event, and scene. Those segmentations can help compare semantics in corresponding scales, but lack wider view larger temporal spans, especially when complex structured. Therefore, we present two abstractive study their hierarchy existing fine-grained...

10.1109/cvpr52729.2023.01028 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2023-06-01

Language Agents as Optimizable Graphs

OPENALEX - Publications

Mingchen Zhuge Wenyi Wang Louis Kirsch Francesco Faccio Dmitrii Khizbullin and 1 more

Various human-designed prompt engineering techniques have been proposed to improve problem solvers based on Large Language Models (LLMs), yielding many disparate code bases. We unify these approaches by describing LLM-based agents as computational graphs. The nodes implement functions process multimodal data or query LLMs, and the edges describe information flow between operations. Graphs can be recursively combined into larger composite graphs representing hierarchies of inter-agent...

10.48550/arxiv.2402.16823 preprint EN arXiv (Cornell University) 2024-02-26

Goldfish: Vision-Language Understanding of Arbitrarily Long Videos

OPENALEX - Publications

Kirolos Ataallah Xiaoqian Shen Eslam Abdelrahman Essam Sleiman Mingchen Zhuge and 4 more

Most current LLM-based models for video understanding can process videos within minutes. However, they struggle with lengthy due to challenges such as "noise and redundancy", well "memory computation" constraints. In this paper, we present Goldfish, a methodology tailored comprehending of arbitrary lengths. We also introduce the TVQA-long benchmark, specifically designed evaluate models' capabilities in long questions both vision text content. Goldfish approaches these an efficient retrieval...

10.48550/arxiv.2407.12679 preprint EN arXiv (Cornell University) 2024-07-17

Agent-as-a-Judge: Evaluate Agents with Agents

OPENALEX - Publications

Mingchen Zhuge Changsheng Zhao Dylan Ashley Wenyi Wang Dmitrii Khizbullin and 8 more

Contemporary evaluation techniques are inadequate for agentic systems. These approaches either focus exclusively on final outcomes -- ignoring the step-by-step nature of systems, or require excessive manual labour. To address this, we introduce Agent-as-a-Judge framework, wherein systems used to evaluate This is an organic extension LLM-as-a-Judge incorporating features that enable intermediate feedback entire task-solving process. We apply task code generation. overcome issues with existing...

10.48550/arxiv.2410.10934 preprint EN arXiv (Cornell University) 2024-10-14

Multimodal Inplace Prompt Tuning for Open-set Object Detection

OPENALEX - Publications

Guilin Li Mengdan Zhang Xiawu Zheng Peixian Chen Zihan Wang and 7 more

10.1145/3664647.3681275 article EN 2024-10-26

OpenHands: An Open Platform for AI Software Developers as Generalist Agents

OPENALEX - Publications

Xingyao Wang Boxuan Li Yufan Song Frank F. Xu Xiangru Tang and 19 more

Software is one of the most powerful tools that we humans have at our disposal; it allows a skilled programmer to interact with world in complex and profound ways. At same time, thanks improvements large language models (LLMs), there has also been rapid development AI agents affect change their surrounding environments. In this paper, introduce OpenHands (f.k.a. OpenDevin), platform for flexible similar ways those human developer: by writing code, interacting command line, browsing web. We...

10.48550/arxiv.2407.16741 preprint EN arXiv (Cornell University) 2024-07-23

Coming Soon ...