NFDI4DS | UHH-SEMS - Publication Details

Ming Yan

ORCID: 0000-0002-4388-6708

Publications

Citations

Views

---

Saved

---

About

Contact & Profiles

A5000844861

Research Areas

Natural Language Processing Techniques
Topic Modeling
Multimodal Machine Learning Applications
SARS-CoV-2 and COVID-19 Research
Advanced Image and Video Retrieval Techniques
Speech Recognition and Synthesis
Parallel Computing and Optimization Techniques
Domain Adaptation and Few-Shot Learning
Music and Audio Processing
Multi-Agent Systems and Negotiation
Anomaly Detection Techniques and Applications
vaccines and immunoinformatics approaches
CAR-T cell therapy research
Food Quality and Safety Studies
Mobile Agent-Based Network Management
Dementia and Cognitive Impairment Research
Semantic Web and Ontologies
Tea Polyphenols and Effects
Fermentation and Sensory Analysis
Speech and dialogue systems
Human Pose and Action Recognition
Reinforcement Learning in Robotics
Recommender Systems and Techniques
Advanced Computational Techniques and Applications
Speech and Audio Processing

Alibaba Group (United States)
2022-2024

University of California, Los Angeles
2024

Alibaba Group (China)
2023-2024

Alibaba Group (Cayman Islands)
2024

Yancheng Institute of Technology
2023

Yanching Institute of Technology
2023

Xiamen University
2023

Colorado State University
2023

Beijing Technology and Business University
2022

Huaqiao University
2021-2022

mPLUG: Effective and Efficient Vision-Language Learning by Cross-modal Skip-connections

OPENALEX - Publications

Chenliang Li Haiyang Xu Junfeng Tian Wei Wang Ming Yan and 10 more

Chenliang Li, Haiyang Xu, Junfeng Tian, Wei Wang, Ming Yan, Bin Bi, Jiabo Ye, He Chen, Guohai Zheng Cao, Ji Zhang, Songfang Huang, Fei Jingren Zhou, Luo Si. Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing. 2022.

10.18653/v1/2022.emnlp-main.488 article EN cc-by Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing 2022-01-01

mPLUG-OwI2: Revolutionizing Multi-modal Large Language Model with Modality Collaboration

OPENALEX - Publications

Qinghao Ye Haiyang Xu Jiabo Ye Ming Yan Anwen Hu and 4 more

10.1109/cvpr52733.2024.01239 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2024-06-16

mPLUG-2: A Modularized Multi-modal Foundation Model Across Text, Image and Video

OPENALEX - Publications

Haiyang Xu Qinghao Ye Ming Yan Yaya Shi Jiabo Ye and 10 more

Recent years have witnessed a big convergence of language, vision, and multi-modal pretraining. In this work, we present mPLUG-2, new unified paradigm with modularized design for pretraining, which can benefit from modality collaboration while addressing the problem entanglement. contrast to predominant paradigms solely relying on sequence-to-sequence generation or encoder-based instance discrimination, mPLUG-2 introduces multi-module composition network by sharing common universal modules...

10.48550/arxiv.2302.00402 preprint EN cc-by arXiv (Cornell University) 2023-01-01

Hallucination Augmented Contrastive Learning for Multimodal Large Language Model

OPENALEX - Publications

Chaoya Jiang Haiyang Xu Mengfan Dong Jiaxing Chen Wei Ye and 5 more

10.1109/cvpr52733.2024.02553 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2024-06-16

Machine Learning and Multi-Omics Integration to Reveal Biomarkers and Microbial Community Assembly Differences in Abnormal Stacking Fermentation of Sauce-Flavor Baijiu

OPENALEX - Publications

Shuai Li Yueran Han Ming Yan Shuyi Qiu Jun Lu

Stacking fermentation is critical in sauce-flavor Baijiu production, but winter production often sees abnormal fermentations, like Waistline and Sub-Temp fermentation, affecting yield quality. This study used three machine learning models (Logistic Regression, KNN, Random Forest) combined with multi-omics (metagenomics flavoromics) to develop a classification model for fermentation. SHAP analysis identified 13 Fermentation 9 microbial biomarkers, along 12 flavor biomarkers. Komagataeibacter...

10.3390/foods14020245 article EN cc-by Foods 2025-01-14

UReader: Universal OCR-free Visually-situated Language Understanding with Multimodal Large Language Model

OPENALEX - Publications

Jiabo Ye Anwen Hu Haiyang Xu Qinghao Ye Ming Yan and 9 more

Jiabo Ye, Anwen Hu, Haiyang Xu, Qinghao Ming Yan, Guohai Chenliang Li, Junfeng Tian, Qi Qian, Ji Zhang, Qin Jin, Liang He, Xin Lin, Fei Huang. Findings of the Association for Computational Linguistics: EMNLP 2023.

10.18653/v1/2023.findings-emnlp.187 article EN cc-by 2023-01-01

Smart Living Using Bluetooth-Based Android Smartphone

OPENALEX - Publications

Ming Yan Hao Shi

With the development of modern technology and Android Smartphone, Smart Living is gradually changing people's life.Bluetooth technology, which aims to exchange data wirelessly in a short distance using short-wavelength radio transmissions, providing necessary create convenience, intelligence controllability.In this paper, new system called home lighting control Bluetooth-based Smartphone proposed prototyped.First Bluetooth are reviewed.Second architecture, communication protocol hardware...

10.5121/ijwmn.2013.5105 article EN International Journal of Wireless & Mobile Networks 2013-02-28

CIMI4D: A Large Multimodal Climbing Motion Dataset under Human-scene Interactions

OPENALEX - Publications

Ming Yan Xin Wang Yudi Dai Siqi Shen Chenglu Wen and 3 more

Motion capture is a long-standing research problem. Although it has been studied for decades, the majority of focus on ground-based movements such as walking, sitting, dancing, etc. Off- grounded actions climbing are largely overlooked. As an important type action in sports and firefighting field, challenging to because its complex back poses, intricate human-scene interactions, difficult global localization. The community does not have indepth understanding due lack specific datasets. To...

10.1109/cvpr52729.2023.01247 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2023-06-01

mPLUG-DocOwl: Modularized Multimodal Large Language Model for Document Understanding

OPENALEX - Publications

Jiabo Ye Anwen Hu Haiyang Xu Qinghao Ye Ming Yan and 8 more

Document understanding refers to automatically extract, analyze and comprehend information from various types of digital documents, such as a web page. Existing Multi-model Large Language Models (MLLMs), including mPLUG-Owl, have demonstrated promising zero-shot capabilities in shallow OCR-free text recognition, indicating their potential for document understanding. Nevertheless, without in-domain training, these models tend ignore fine-grained OCR features, sophisticated tables or large...

10.48550/arxiv.2307.02499 preprint EN other-oa arXiv (Cornell University) 2023-01-01

Validity, feasibility, and effectiveness of a voice‐recognition based digital cognitive screener for dementia and mild cognitive impairment in community‐dwelling older Chinese adults: A large‐scale implementation study

OPENALEX - Publications

Xuhao Zhao Haoxuan Wen Guohai Xu Ting Pang Yaping Zhang and 6 more

Abstract INTRODUCTION We investigated the validity, feasibility, and effectiveness of a voice recognition‐based digital cognitive screener (DCS), for detecting dementia mild impairment (MCI) in large‐scale community elderly participants. METHODS Eligible participants completed demographic, cognitive, functional assessments DCS. Neuropsychological tests were used to assess domain‐specific global cognition, while diagnosis MCI relied on Clinical Dementia Rating Scale. RESULTS Among 11,186...

10.1002/alz.13668 article EN cc-by-nc-nd Alzheimer s & Dementia 2024-02-01

Mobile-Agent-E: Self-Evolving Mobile Assistant for Complex Tasks

OPENALEX - Publications

Zhenhailong Wang Haiyang Xu Junyang Wang Xi Zhang Ming Yan and 3 more

Smartphones have become indispensable in modern life, yet navigating complex tasks on mobile devices often remains frustrating. Recent advancements large multimodal model (LMM)-based agents demonstrated the ability to perceive and act environments. However, current approaches face significant limitations: they fall short addressing real-world human needs, struggle with reasoning-intensive long-horizon tasks, lack mechanisms learn improve from prior experiences. To overcome these challenges,...

10.48550/arxiv.2501.11733 preprint EN arXiv (Cornell University) 2025-01-20

Health disparities among older adults following tropical cyclone exposure in Florida

OPENALEX - Publications

Kate Burrows G. Brooke Anderson Ming Yan Ander Wilson M. Benjamin Sabath and 4 more

Tropical cyclones (TCs) pose a significant threat to human health, and research is needed identify high-risk subpopulations. We investigated whether hospitalization risks from TCs in Florida (FL), United States, varied across individuals communities. modeled the associations between all storms FL 1999 2016 over 3.5 million Medicare hospitalizations for respiratory (RD) cardiovascular disease (CVD). estimated relative risk (RR), comparing during TC-periods (2 days before 7 after) matched...

10.1038/s41467-023-37675-7 article EN cc-by Nature Communications 2023-04-19

Intrusion detection based on improved density peak clustering for imbalanced data on sensor-cloud systems

OPENALEX - Publications

Ming Yan Yewang Chen Xiaoliang Hu Dongdong Cheng Yi Chen and 1 more

10.1016/j.sysarc.2021.102212 article EN Journal of Systems Architecture 2021-06-24

A lightweight weakly supervised learning segmentation algorithm for imbalanced image based on rotation density peaks

OPENALEX - Publications

Ming Yan Yewang Chen Yi Chen Guoyao Zeng Xiaoliang Hu and 1 more

10.1016/j.knosys.2022.108513 article EN Knowledge-Based Systems 2022-03-07

Mobile-Agent: Autonomous Multi-Modal Mobile Device Agent with Visual Perception

OPENALEX - Publications

Junyang Wang Haiyang Xu Jiabo Ye Ming Yan Weizhou Shen and 3 more

Mobile device agent based on Multimodal Large Language Models (MLLM) is becoming a popular application. In this paper, we introduce Mobile-Agent, an autonomous multi-modal mobile agent. Mobile-Agent first leverages visual perception tools to accurately identify and locate both the textual elements within app's front-end interface. Based perceived vision context, it then autonomously plans decomposes complex operation task, navigates Apps through operations step by step. Different from...

10.48550/arxiv.2401.16158 preprint EN arXiv (Cornell University) 2024-01-29

Machine Learning Discrimination and Prediction of Different Quality Grades of Sauce-Flavor Baijiu Based on Biomarker and Key Flavor Compounds Screening

OPENALEX - Publications

Shuai Li Tao Li Yueran Han Yan Pei Guohui Li and 4 more

10.2139/ssrn.4939729 preprint EN 2024-01-01

TinyChart: Efficient Chart Understanding with Program-of-Thoughts Learning and Visual Token Merging

OPENALEX - Publications

Liang Zhang Anwen Hu Haiyang Xu Ming Yan Yichen Xu and 3 more

10.18653/v1/2024.emnlp-main.112 article EN Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing 2024-01-01

CValues: Measuring the Values of Chinese Large Language Models from Safety to Responsibility

OPENALEX - Publications

Guohai Xu Jiayi Liu Ming Yan Haotian Xu Jinghui Si and 9 more

With the rapid evolution of large language models (LLMs), there is a growing concern that they may pose risks or have negative social impacts. Therefore, evaluation human values alignment becoming increasingly important. Previous work mainly focuses on assessing performance LLMs certain knowledge and reasoning abilities, while neglecting to values, especially in Chinese context. In this paper, we present CValues, first benchmark measure ability terms both safety responsibility criteria. As...

10.48550/arxiv.2307.09705 preprint EN other-oa arXiv (Cornell University) 2023-01-01

ReAct Meets ActRe: Autonomous Annotations of Agent Trajectories for Contrastive Self-Training

OPENALEX - Publications

Zonghan Yang Peng Li Ming Yan Ji Zhang Fei Huang and 1 more

Language agents have demonstrated autonomous decision-making abilities by reasoning with foundation models. Recently, efforts been made to train language for performance improvement, multi-step and action trajectories as the training data. However, collecting such still requires considerable human effort, either artificial annotations or implementations of diverse prompting frameworks. In this work, we propose A$^3$T, a framework that enables Autonomous Annotation Agent Trajectories in style...

10.48550/arxiv.2403.14589 preprint EN arXiv (Cornell University) 2024-03-21

TinyChart: Efficient Chart Understanding with Visual Token Merging and Program-of-Thoughts Learning

OPENALEX - Publications

Liang Zhang Anwen Hu Haiyang Xu Ming Yan Yichen Xu and 3 more

Charts are important for presenting and explaining complex data relationships. Recently, multimodal large language models (MLLMs) have shown remarkable capabilities in various chart understanding tasks. However, the sheer size of these terms parameters computational requirements limits their use resource-constrained environments. In this paper, we present TinyChart, an efficient MLLM with only 3B parameters. TinyChart overcomes two key challenges understanding: (1) reduce burden learning...

10.48550/arxiv.2404.16635 preprint EN arXiv (Cornell University) 2024-04-25

Breaking Barriers of System Heterogeneity: Straggler-Tolerant Multimodal Federated Learning via Knowledge Distillation

OPENALEX - Publications

Jinqian Chen Hao‐Yu Tang Junhao Cheng Ming Yan Ji Zhang and 3 more

Internet of Things (IoT) devices possess valuable yet private multimodal data, calling for a decentralized machine learning scheme. Though several federated (MFL) methods have been proposed, most them merely overlook the system heterogeneity across IoT devices, resulting in inadaptability to real world applications. Aiming at this, we conduct theoretical analysis and exploration experiments on straggler impacts uncover fact that stragglers caused by are fatal MFL, catastrophic time overhead....

10.24963/ijcai.2024/419 article EN 2024-07-26

Machine learning discrimination and prediction of different quality grades of sauce-flavor baijiu based on biomarker and key flavor compounds screening

OPENALEX - Publications

Shuai Li Tao Li Yueran Han Yan Pei Guohui Li and 4 more

The quality grade of base

10.1016/j.fochx.2024.101877 article EN cc-by-nc Food Chemistry X 2024-10-05

Youku-mPLUG: A 10 Million Large-scale Chinese Video-Language Dataset for Pre-training and Benchmarks

OPENALEX - Publications

Haiyang Xu Qinghao Ye Xuan Wu Ming Yan Yuan Miao and 11 more

To promote the development of Vision-Language Pre-training (VLP) and multimodal Large Language Model (LLM) in Chinese community, we firstly release largest public high-quality video-language dataset named Youku-mPLUG, which is collected from Youku, a well-known video-sharing website, with strict criteria safety, diversity, quality. Youku-mPLUG contains 10 million video-text pairs filtered 400 raw videos across wide range 45 diverse categories for large-scale pre-training. In addition, to...

10.48550/arxiv.2306.04362 preprint EN cc-by-nc-sa arXiv (Cornell University) 2023-01-01

Learning Deep Semantic Model for Code Search using CodeSearchNet Corpus

OPENALEX - Publications

Chen Wu Ming Yan

Semantic code search is the task of retrieving relevant snippet given a natural language query. Different from typical information retrieval tasks, requires to bridge semantic gap between programming and language, for better describing intrinsic concepts semantics. Recently, deep neural network has been hot research topic. Typical methods first represent query text as separate embeddings, then use vector distance (e.g. dot-product or cosine) calculate similarity them. There exist many...

10.48550/arxiv.2201.11313 preprint EN other-oa arXiv (Cornell University) 2022-01-01

A Comprehensive Study on Optimizing Systems with Data Processing Units

OPENALEX - Publications

Shangyi Sun Chunpu Huang Rui Zhang Lulu Chen Yukai Huang and 2 more

New hardware, such as SmartNICs, has been released to offload network applications in data centers. Off-path a type of multi-core SoC have attracted the attention many researchers. Unfortunatelly, they lack fully exploration off-path SmartNICs. In this paper, we use BlueField SmartNIC an example conduct systematical study on advantages and disadvantages We make detailed performance characterization including computing power communication overhead, propose following advices: 1) Directly...

10.48550/arxiv.2301.06070 preprint EN other-oa arXiv (Cornell University) 2023-01-01

Coming Soon ...