- Topic Modeling
- Natural Language Processing Techniques
- Multimodal Machine Learning Applications
- Text Readability and Simplification
- Speech and dialogue systems
- Computational Drug Discovery Methods
- Speech Recognition and Synthesis
- Video Analysis and Summarization
- Software Engineering Research
- Web Data Mining and Analysis
- Text and Document Classification Technologies
- Biomedical Text Mining and Ontologies
- Human Motion and Animation
- Machine Learning in Materials Science
- Bioinformatics and Genomic Networks
- Domain Adaptation and Few-Shot Learning
- Multi-Agent Systems and Negotiation
- Semantic Web and Ontologies
- Image Retrieval and Classification Techniques
- Handwritten Text Recognition Techniques
- Advanced Graph Neural Networks
- Human Pose and Action Recognition
- Artificial Intelligence in Games
- Science Education and Pedagogy
- Gastric Cancer Management and Outcomes
Tencent (China)
2018-2025
Alibaba Group (China)
2025
Zhejiang University
2024
Dublin City University
2015-2024
Beijing Institute of Technology
2024
Xiangtan University
2024
University of Illinois Chicago
2024
Hunan University
2024
Macao Polytechnic University
2024
University of Hong Kong
2020-2023
In translation, considering the document as a whole can help to resolve ambiguities and inconsistencies. this paper, we propose cross-sentence context-aware approach investigate influence of historical contextual information on performance neural machine translation (NMT). First, history is summarized in hierarchical way. We then integrate representation into NMT two strategies: 1) warm-start encoder decoder states, 2) an auxiliary context source for updating states. Experimental results...
While large language models (LLMs) have demonstrated remarkable capabilities across a range of downstream tasks, significant concern revolves around their propensity to exhibit hallucinations: LLMs occasionally generate content that diverges from the user input, contradicts previously generated context, or misaligns with established world knowledge. This phenomenon poses substantial challenge reliability in real-world scenarios. In this paper, we survey recent efforts on detection,...
Large language models (LLMs) such as ChatGPT can produce coherent, cohesive, relevant, and fluent answers for various natural processing (NLP) tasks. Taking document-level machine translation (MT) a testbed, this paper provides an in-depth evaluation of LLMs’ ability on discourse modeling. The study focuses three aspects: 1) Effects Context-Aware Prompts, where we investigate the impact different prompts quality phenomena; 2) Comparison Translation Models, compare performance with commercial...
Baosong Yang, Longyue Wang, Derek F. Wong, Lidia S. Chao, Zhaopeng Tu. Proceedings of the 2019 Conference North American Chapter Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). 2019.
Jie Hao, Xing Wang, Baosong Yang, Longyue Jinfeng Zhang, Zhaopeng Tu. Proceedings of the 2019 Conference North American Chapter Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). 2019.
This work evaluates GPT-4V's multimodal capability for medical image analysis, focusing on three representative tasks radiology report generation, visual question answering, and grounding. For the evaluation, a set of prompts is designed each task to induce corresponding GPT-4V produce sufficiently good outputs. Three evaluation ways including quantitative human case study are employed achieve an in-depth extensive evaluation. Our shows that excels in understanding images can generate...
Xing Wang, Zhaopeng Tu, Longyue Shuming Shi. Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and 9th International Joint (EMNLP-IJCNLP). 2019.
Machine Translation (MT) has greatly advanced over the years due to developments in deep neural networks. However, emergence of Large Language Models (LLMs) like GPT-4 and ChatGPT is introducing a new phase MT domain. In this context, we believe that future intricately tied capabilities LLMs. These models not only offer vast linguistic understandings but also bring innovative methodologies, such as prompt-based techniques, have potential further elevate MT. paper, provide an overview...
Although instruction-tuned large language models (LLMs) have exhibited remarkable capabilities across various NLP tasks, their effectiveness on other data modalities beyond text has not been fully studied. In this work, we propose Macaw-LLM, a novel multi-modal LLM that seamlessly integrates visual, audio, and textual information. Macaw-LLM consists of three main components: modality module for encoding data, cognitive harnessing pretrained LLMs, an alignment harmonizing diverse...
A bstract This paper presents a comprehensive evaluation of GPT-4V’s capabilities across diverse medical imaging tasks, including Radiology Report Generation, Medical Visual Question Answering (VQA), and Grounding. While prior efforts have explored performance in imaging, to the best our knowledge, study represents first quantitative on publicly available benchmarks. Our findings highlight potential generating descriptive reports for chest X-ray images, particularly when guided by...
Shilin He, Zhaopeng Tu, Xing Wang, Longyue Michael Lyu, Shuming Shi. Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and 9th International Joint (EMNLP-IJCNLP). 2019.
Knowledge distillation (KD) is essential for training non-autoregressive translation (NAT) models by reducing the complexity of raw data with an autoregressive teacher model. In this study, we empirically show that as a side effect training, lexical choice errors on low-frequency words are propagated to NAT model from To alleviate problem, propose expose restore useful information words, which missed in distilled data. end, introduce extra Kullback-Leibler divergence term derived comparing...
Position encoding (PE), an essential part of self-attention networks (SANs), is used to preserve the word order information for natural language processing tasks, generating fixed position indices input sequences. However, in cross-lingual scenarios, machine translation, PEs source and target sentences are modeled independently. Due divergences different languages, modeling positional relationships might help SANs tackle this problem. In paper, we augment with representations model...
Liang Ding, Longyue Wang, Xuebo Liu, Derek F. Wong, Dacheng Tao, Zhaopeng Tu. Proceedings of the 59th Annual Meeting Association for Computational Linguistics and 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). 2021.
Virtual film production requires intricate decision-making processes, including scriptwriting, virtual cinematography, and precise actor positioning actions. Motivated by recent advances in automated with language agent-based societies, this paper introduces FilmAgent, a novel LLM-based multi-agent collaborative framework for end-to-end automation our constructed 3D spaces. FilmAgent simulates various crew roles, directors, screenwriters, actors, cinematographers, covers key stages of...
Recent advancements in Multimodal Large Language Models (MLLMs) underscore the significance of scalable models and data to boost performance, yet this often incurs substantial computational costs. Although Mixture Experts (MoE) architecture has been employed scale large language or visual-language efficiently, these efforts typically involve fewer experts limited modalities. To address this, our work presents pioneering attempt develop a unified MLLM with MoE architecture, named Uni-MoE that...
Despite being pretrained on multilingual corpora, large language models (LLMs) exhibit suboptimal performance low-resource languages. Recent approaches have leveraged encoders alongside LLMs by introducing trainable parameters connecting the two models. However, these methods typically focus encoder's output, overlooking valuable information from other layers. We propose \aname (\mname), a framework that integrates representations all encoder layers, coupled with \attaname mechanism to...
Multi-aspect controllable text generation aims to control in attributes from multiple aspects, making it a complex but powerful task natural language processing. Supervised fine-tuning methods are often employed for this due their simplicity and effectiveness. However, they still have some limitations: low rank adaptation (LoRA) only fine-tunes few parameters has suboptimal effects, while full (FFT) requires significant computational resources is susceptible overfitting, particularly when...