Ming Yan

ORCID: 0000-0002-4388-6708
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Natural Language Processing Techniques
  • Topic Modeling
  • Multimodal Machine Learning Applications
  • SARS-CoV-2 and COVID-19 Research
  • Advanced Image and Video Retrieval Techniques
  • Speech Recognition and Synthesis
  • Parallel Computing and Optimization Techniques
  • Domain Adaptation and Few-Shot Learning
  • Music and Audio Processing
  • Multi-Agent Systems and Negotiation
  • Anomaly Detection Techniques and Applications
  • vaccines and immunoinformatics approaches
  • CAR-T cell therapy research
  • Food Quality and Safety Studies
  • Mobile Agent-Based Network Management
  • Dementia and Cognitive Impairment Research
  • Semantic Web and Ontologies
  • Tea Polyphenols and Effects
  • Fermentation and Sensory Analysis
  • Speech and dialogue systems
  • Human Pose and Action Recognition
  • Reinforcement Learning in Robotics
  • Recommender Systems and Techniques
  • Advanced Computational Techniques and Applications
  • Speech and Audio Processing

Alibaba Group (United States)
2022-2024

University of California, Los Angeles
2024

Alibaba Group (China)
2023-2024

Alibaba Group (Cayman Islands)
2024

Yancheng Institute of Technology
2023

Yanching Institute of Technology
2023

Xiamen University
2023

Colorado State University
2023

Beijing Technology and Business University
2022

Huaqiao University
2021-2022

Chenliang Li, Haiyang Xu, Junfeng Tian, Wei Wang, Ming Yan, Bin Bi, Jiabo Ye, He Chen, Guohai Zheng Cao, Ji Zhang, Songfang Huang, Fei Jingren Zhou, Luo Si. Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing. 2022.

10.18653/v1/2022.emnlp-main.488 article EN cc-by Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing 2022-01-01

10.1109/cvpr52733.2024.01239 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2024-06-16

Recent years have witnessed a big convergence of language, vision, and multi-modal pretraining. In this work, we present mPLUG-2, new unified paradigm with modularized design for pretraining, which can benefit from modality collaboration while addressing the problem entanglement. contrast to predominant paradigms solely relying on sequence-to-sequence generation or encoder-based instance discrimination, mPLUG-2 introduces multi-module composition network by sharing common universal modules...

10.48550/arxiv.2302.00402 preprint EN cc-by arXiv (Cornell University) 2023-01-01

Stacking fermentation is critical in sauce-flavor Baijiu production, but winter production often sees abnormal fermentations, like Waistline and Sub-Temp fermentation, affecting yield quality. This study used three machine learning models (Logistic Regression, KNN, Random Forest) combined with multi-omics (metagenomics flavoromics) to develop a classification model for fermentation. SHAP analysis identified 13 Fermentation 9 microbial biomarkers, along 12 flavor biomarkers. Komagataeibacter...

10.3390/foods14020245 article EN cc-by Foods 2025-01-14

Jiabo Ye, Anwen Hu, Haiyang Xu, Qinghao Ming Yan, Guohai Chenliang Li, Junfeng Tian, Qi Qian, Ji Zhang, Qin Jin, Liang He, Xin Lin, Fei Huang. Findings of the Association for Computational Linguistics: EMNLP 2023.

10.18653/v1/2023.findings-emnlp.187 article EN cc-by 2023-01-01

With the development of modern technology and Android Smartphone, Smart Living is gradually changing people's life.Bluetooth technology, which aims to exchange data wirelessly in a short distance using short-wavelength radio transmissions, providing necessary create convenience, intelligence controllability.In this paper, new system called home lighting control Bluetooth-based Smartphone proposed prototyped.First Bluetooth are reviewed.Second architecture, communication protocol hardware...

10.5121/ijwmn.2013.5105 article EN International Journal of Wireless & Mobile Networks 2013-02-28

Motion capture is a long-standing research problem. Although it has been studied for decades, the majority of focus on ground-based movements such as walking, sitting, dancing, etc. Off- grounded actions climbing are largely overlooked. As an important type action in sports and firefighting field, challenging to because its complex back poses, intricate human-scene interactions, difficult global localization. The community does not have indepth understanding due lack specific datasets. To...

10.1109/cvpr52729.2023.01247 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2023-06-01

Document understanding refers to automatically extract, analyze and comprehend information from various types of digital documents, such as a web page. Existing Multi-model Large Language Models (MLLMs), including mPLUG-Owl, have demonstrated promising zero-shot capabilities in shallow OCR-free text recognition, indicating their potential for document understanding. Nevertheless, without in-domain training, these models tend ignore fine-grained OCR features, sophisticated tables or large...

10.48550/arxiv.2307.02499 preprint EN other-oa arXiv (Cornell University) 2023-01-01

Abstract INTRODUCTION We investigated the validity, feasibility, and effectiveness of a voice recognition‐based digital cognitive screener (DCS), for detecting dementia mild impairment (MCI) in large‐scale community elderly participants. METHODS Eligible participants completed demographic, cognitive, functional assessments DCS. Neuropsychological tests were used to assess domain‐specific global cognition, while diagnosis MCI relied on Clinical Dementia Rating Scale. RESULTS Among 11,186...

10.1002/alz.13668 article EN cc-by-nc-nd Alzheimer s & Dementia 2024-02-01

Smartphones have become indispensable in modern life, yet navigating complex tasks on mobile devices often remains frustrating. Recent advancements large multimodal model (LMM)-based agents demonstrated the ability to perceive and act environments. However, current approaches face significant limitations: they fall short addressing real-world human needs, struggle with reasoning-intensive long-horizon tasks, lack mechanisms learn improve from prior experiences. To overcome these challenges,...

10.48550/arxiv.2501.11733 preprint EN arXiv (Cornell University) 2025-01-20

Tropical cyclones (TCs) pose a significant threat to human health, and research is needed identify high-risk subpopulations. We investigated whether hospitalization risks from TCs in Florida (FL), United States, varied across individuals communities. modeled the associations between all storms FL 1999 2016 over 3.5 million Medicare hospitalizations for respiratory (RD) cardiovascular disease (CVD). estimated relative risk (RR), comparing during TC-periods (2 days before 7 after) matched...

10.1038/s41467-023-37675-7 article EN cc-by Nature Communications 2023-04-19

Mobile device agent based on Multimodal Large Language Models (MLLM) is becoming a popular application. In this paper, we introduce Mobile-Agent, an autonomous multi-modal mobile agent. Mobile-Agent first leverages visual perception tools to accurately identify and locate both the textual elements within app's front-end interface. Based perceived vision context, it then autonomously plans decomposes complex operation task, navigates Apps through operations step by step. Different from...

10.48550/arxiv.2401.16158 preprint EN arXiv (Cornell University) 2024-01-29

10.18653/v1/2024.emnlp-main.112 article EN Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing 2024-01-01

With the rapid evolution of large language models (LLMs), there is a growing concern that they may pose risks or have negative social impacts. Therefore, evaluation human values alignment becoming increasingly important. Previous work mainly focuses on assessing performance LLMs certain knowledge and reasoning abilities, while neglecting to values, especially in Chinese context. In this paper, we present CValues, first benchmark measure ability terms both safety responsibility criteria. As...

10.48550/arxiv.2307.09705 preprint EN other-oa arXiv (Cornell University) 2023-01-01

Language agents have demonstrated autonomous decision-making abilities by reasoning with foundation models. Recently, efforts been made to train language for performance improvement, multi-step and action trajectories as the training data. However, collecting such still requires considerable human effort, either artificial annotations or implementations of diverse prompting frameworks. In this work, we propose A$^3$T, a framework that enables Autonomous Annotation Agent Trajectories in style...

10.48550/arxiv.2403.14589 preprint EN arXiv (Cornell University) 2024-03-21

Charts are important for presenting and explaining complex data relationships. Recently, multimodal large language models (MLLMs) have shown remarkable capabilities in various chart understanding tasks. However, the sheer size of these terms parameters computational requirements limits their use resource-constrained environments. In this paper, we present TinyChart, an efficient MLLM with only 3B parameters. TinyChart overcomes two key challenges understanding: (1) reduce burden learning...

10.48550/arxiv.2404.16635 preprint EN arXiv (Cornell University) 2024-04-25

Internet of Things (IoT) devices possess valuable yet private multimodal data, calling for a decentralized machine learning scheme. Though several federated (MFL) methods have been proposed, most them merely overlook the system heterogeneity across IoT devices, resulting in inadaptability to real world applications. Aiming at this, we conduct theoretical analysis and exploration experiments on straggler impacts uncover fact that stragglers caused by are fatal MFL, catastrophic time overhead....

10.24963/ijcai.2024/419 article EN 2024-07-26

To promote the development of Vision-Language Pre-training (VLP) and multimodal Large Language Model (LLM) in Chinese community, we firstly release largest public high-quality video-language dataset named Youku-mPLUG, which is collected from Youku, a well-known video-sharing website, with strict criteria safety, diversity, quality. Youku-mPLUG contains 10 million video-text pairs filtered 400 raw videos across wide range 45 diverse categories for large-scale pre-training. In addition, to...

10.48550/arxiv.2306.04362 preprint EN cc-by-nc-sa arXiv (Cornell University) 2023-01-01

Semantic code search is the task of retrieving relevant snippet given a natural language query. Different from typical information retrieval tasks, requires to bridge semantic gap between programming and language, for better describing intrinsic concepts semantics. Recently, deep neural network has been hot research topic. Typical methods first represent query text as separate embeddings, then use vector distance (e.g. dot-product or cosine) calculate similarity them. There exist many...

10.48550/arxiv.2201.11313 preprint EN other-oa arXiv (Cornell University) 2022-01-01

New hardware, such as SmartNICs, has been released to offload network applications in data centers. Off-path a type of multi-core SoC have attracted the attention many researchers. Unfortunatelly, they lack fully exploration off-path SmartNICs. In this paper, we use BlueField SmartNIC an example conduct systematical study on advantages and disadvantages We make detailed performance characterization including computing power communication overhead, propose following advices: 1) Directly...

10.48550/arxiv.2301.06070 preprint EN other-oa arXiv (Cornell University) 2023-01-01
Coming Soon ...