- Natural Language Processing Techniques
- Topic Modeling
- Multimodal Machine Learning Applications
- SARS-CoV-2 and COVID-19 Research
- Advanced Image and Video Retrieval Techniques
- Speech Recognition and Synthesis
- Parallel Computing and Optimization Techniques
- Domain Adaptation and Few-Shot Learning
- Music and Audio Processing
- Multi-Agent Systems and Negotiation
- Anomaly Detection Techniques and Applications
- vaccines and immunoinformatics approaches
- CAR-T cell therapy research
- Food Quality and Safety Studies
- Mobile Agent-Based Network Management
- Dementia and Cognitive Impairment Research
- Semantic Web and Ontologies
- Tea Polyphenols and Effects
- Fermentation and Sensory Analysis
- Speech and dialogue systems
- Human Pose and Action Recognition
- Reinforcement Learning in Robotics
- Recommender Systems and Techniques
- Advanced Computational Techniques and Applications
- Speech and Audio Processing
Alibaba Group (United States)
2022-2024
University of California, Los Angeles
2024
Alibaba Group (China)
2023-2024
Alibaba Group (Cayman Islands)
2024
Yancheng Institute of Technology
2023
Yanching Institute of Technology
2023
Xiamen University
2023
Colorado State University
2023
Beijing Technology and Business University
2022
Huaqiao University
2021-2022
Chenliang Li, Haiyang Xu, Junfeng Tian, Wei Wang, Ming Yan, Bin Bi, Jiabo Ye, He Chen, Guohai Zheng Cao, Ji Zhang, Songfang Huang, Fei Jingren Zhou, Luo Si. Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing. 2022.
Recent years have witnessed a big convergence of language, vision, and multi-modal pretraining. In this work, we present mPLUG-2, new unified paradigm with modularized design for pretraining, which can benefit from modality collaboration while addressing the problem entanglement. contrast to predominant paradigms solely relying on sequence-to-sequence generation or encoder-based instance discrimination, mPLUG-2 introduces multi-module composition network by sharing common universal modules...
Stacking fermentation is critical in sauce-flavor Baijiu production, but winter production often sees abnormal fermentations, like Waistline and Sub-Temp fermentation, affecting yield quality. This study used three machine learning models (Logistic Regression, KNN, Random Forest) combined with multi-omics (metagenomics flavoromics) to develop a classification model for fermentation. SHAP analysis identified 13 Fermentation 9 microbial biomarkers, along 12 flavor biomarkers. Komagataeibacter...
Jiabo Ye, Anwen Hu, Haiyang Xu, Qinghao Ming Yan, Guohai Chenliang Li, Junfeng Tian, Qi Qian, Ji Zhang, Qin Jin, Liang He, Xin Lin, Fei Huang. Findings of the Association for Computational Linguistics: EMNLP 2023.
With the development of modern technology and Android Smartphone, Smart Living is gradually changing people's life.Bluetooth technology, which aims to exchange data wirelessly in a short distance using short-wavelength radio transmissions, providing necessary create convenience, intelligence controllability.In this paper, new system called home lighting control Bluetooth-based Smartphone proposed prototyped.First Bluetooth are reviewed.Second architecture, communication protocol hardware...
Motion capture is a long-standing research problem. Although it has been studied for decades, the majority of focus on ground-based movements such as walking, sitting, dancing, etc. Off- grounded actions climbing are largely overlooked. As an important type action in sports and firefighting field, challenging to because its complex back poses, intricate human-scene interactions, difficult global localization. The community does not have indepth understanding due lack specific datasets. To...
Document understanding refers to automatically extract, analyze and comprehend information from various types of digital documents, such as a web page. Existing Multi-model Large Language Models (MLLMs), including mPLUG-Owl, have demonstrated promising zero-shot capabilities in shallow OCR-free text recognition, indicating their potential for document understanding. Nevertheless, without in-domain training, these models tend ignore fine-grained OCR features, sophisticated tables or large...
Abstract INTRODUCTION We investigated the validity, feasibility, and effectiveness of a voice recognition‐based digital cognitive screener (DCS), for detecting dementia mild impairment (MCI) in large‐scale community elderly participants. METHODS Eligible participants completed demographic, cognitive, functional assessments DCS. Neuropsychological tests were used to assess domain‐specific global cognition, while diagnosis MCI relied on Clinical Dementia Rating Scale. RESULTS Among 11,186...
Smartphones have become indispensable in modern life, yet navigating complex tasks on mobile devices often remains frustrating. Recent advancements large multimodal model (LMM)-based agents demonstrated the ability to perceive and act environments. However, current approaches face significant limitations: they fall short addressing real-world human needs, struggle with reasoning-intensive long-horizon tasks, lack mechanisms learn improve from prior experiences. To overcome these challenges,...
Tropical cyclones (TCs) pose a significant threat to human health, and research is needed identify high-risk subpopulations. We investigated whether hospitalization risks from TCs in Florida (FL), United States, varied across individuals communities. modeled the associations between all storms FL 1999 2016 over 3.5 million Medicare hospitalizations for respiratory (RD) cardiovascular disease (CVD). estimated relative risk (RR), comparing during TC-periods (2 days before 7 after) matched...
Mobile device agent based on Multimodal Large Language Models (MLLM) is becoming a popular application. In this paper, we introduce Mobile-Agent, an autonomous multi-modal mobile agent. Mobile-Agent first leverages visual perception tools to accurately identify and locate both the textual elements within app's front-end interface. Based perceived vision context, it then autonomously plans decomposes complex operation task, navigates Apps through operations step by step. Different from...
With the rapid evolution of large language models (LLMs), there is a growing concern that they may pose risks or have negative social impacts. Therefore, evaluation human values alignment becoming increasingly important. Previous work mainly focuses on assessing performance LLMs certain knowledge and reasoning abilities, while neglecting to values, especially in Chinese context. In this paper, we present CValues, first benchmark measure ability terms both safety responsibility criteria. As...
Language agents have demonstrated autonomous decision-making abilities by reasoning with foundation models. Recently, efforts been made to train language for performance improvement, multi-step and action trajectories as the training data. However, collecting such still requires considerable human effort, either artificial annotations or implementations of diverse prompting frameworks. In this work, we propose A$^3$T, a framework that enables Autonomous Annotation Agent Trajectories in style...
Charts are important for presenting and explaining complex data relationships. Recently, multimodal large language models (MLLMs) have shown remarkable capabilities in various chart understanding tasks. However, the sheer size of these terms parameters computational requirements limits their use resource-constrained environments. In this paper, we present TinyChart, an efficient MLLM with only 3B parameters. TinyChart overcomes two key challenges understanding: (1) reduce burden learning...
Internet of Things (IoT) devices possess valuable yet private multimodal data, calling for a decentralized machine learning scheme. Though several federated (MFL) methods have been proposed, most them merely overlook the system heterogeneity across IoT devices, resulting in inadaptability to real world applications. Aiming at this, we conduct theoretical analysis and exploration experiments on straggler impacts uncover fact that stragglers caused by are fatal MFL, catastrophic time overhead....
The quality grade of base
To promote the development of Vision-Language Pre-training (VLP) and multimodal Large Language Model (LLM) in Chinese community, we firstly release largest public high-quality video-language dataset named Youku-mPLUG, which is collected from Youku, a well-known video-sharing website, with strict criteria safety, diversity, quality. Youku-mPLUG contains 10 million video-text pairs filtered 400 raw videos across wide range 45 diverse categories for large-scale pre-training. In addition, to...
Semantic code search is the task of retrieving relevant snippet given a natural language query. Different from typical information retrieval tasks, requires to bridge semantic gap between programming and language, for better describing intrinsic concepts semantics. Recently, deep neural network has been hot research topic. Typical methods first represent query text as separate embeddings, then use vector distance (e.g. dot-product or cosine) calculate similarity them. There exist many...
New hardware, such as SmartNICs, has been released to offload network applications in data centers. Off-path a type of multi-core SoC have attracted the attention many researchers. Unfortunatelly, they lack fully exploration off-path SmartNICs. In this paper, we use BlueField SmartNIC an example conduct systematical study on advantages and disadvantages We make detailed performance characterization including computing power communication overhead, propose following advices: 1) Directly...