- Natural Language Processing Techniques
- Topic Modeling
- Multimodal Machine Learning Applications
- Handwritten Text Recognition Techniques
- Speech and dialogue systems
- Speech Recognition and Synthesis
- Software Engineering Research
- Text Readability and Simplification
- Explainable Artificial Intelligence (XAI)
- Machine Learning and Data Classification
- Semantic Web and Ontologies
- Data Quality and Management
- Advanced Graph Neural Networks
- Biomedical Text Mining and Ontologies
- Neural Networks and Applications
- Advanced Text Analysis Techniques
- Sentiment Analysis and Opinion Mining
- Software Testing and Debugging Techniques
- Machine Learning in Bioinformatics
- Text and Document Classification Technologies
- EEG and Brain-Computer Interfaces
- Adversarial Robustness in Machine Learning
- Big Data and Digital Economy
- Second Language Acquisition and Learning
- Computational and Text Analysis Methods
Tencent (China)
2017-2024
Beijing Jiaotong University
2023
Bellevue Hospital Center
2023
Harbin Institute of Technology
2011-2019
National Institute of Information and Communications Technology
2016-2019
Shanghai Jiao Tong University
2018
While large language models (LLMs) have demonstrated remarkable capabilities across a range of downstream tasks, significant concern revolves around their propensity to exhibit hallucinations: LLMs occasionally generate content that diverges from the user input, contradicts previously generated context, or misaligns with established world knowledge. This phenomenon poses substantial challenge reliability in real-world scenarios. In this paper, we survey recent efforts on detection,...
Instance weighting has been widely applied to phrase-based machine translation domain adaptation. However, it is challenging be Neural Machine Translation (NMT) directly, because NMT not a linear model. In this paper, two instance technologies, i.e., sentence and with dynamic weight learning strategy, are proposed for Empirical results on the IWSLT English-German/French tasks show that methods can substantially improve performance by up 2.7-6.7 BLEU points, outperforming existing baselines...
The attention mechanisim is appealing for neural machine translation, since it able to dynam- ically encode a source sentence by generating alignment between target word and words. Unfortunately, has been proved be worse than conventional models in aligment accuracy. In this paper, we analyze explain issue from the point view of re- ordering, propose supervised which learned with guidance models. Experiments on two Chinese-to-English translation tasks show that super- vised mechanism yields...
Lemao Liu, Masao Utiyama, Andrew Finch, Eiichiro Sumita. Proceedings of the 2016 Conference North American Chapter Association for Computational Linguistics: Human Language Technologies. 2016.
Prior researches suggest that neural machine translation (NMT) captures word alignment through its attention mechanism, however, this paper finds may almost fail to capture for some NMT models. This thereby proposes two methods induce which are general and agnostic specific Experiments show both much better than attention. further visualizes the induced by NMT. In particular, it analyzes effect of errors on at level quantitative analysis over many testing examples consistently demonstrate...
Recently, retrieval-augmented text generation attracted increasing attention of the computational linguistics community. Compared with conventional models, has remarkable advantages and particularly achieved state-of-the-art performance in many NLP tasks. This paper aims to conduct a survey about generation. It firstly highlights generic paradigm generation, then it reviews notable approaches according different tasks including dialogue response machine translation, other Finally, points out...
Recently retrieval-augmented text generation has achieved state-of-the-art performance in many NLP tasks and attracted increasing attention of the IR community, this tutorial thereby aims to present recent advances comprehensively comparatively. It firstly highlights generic paradigm generation, then reviews notable works for different including dialogue machine translation, other tasks, finally points out some limitations shortcomings facilitate future research.
Source dependency information has been successfully introduced into statistical machine translation. However, there are only a few preliminary attempts for Neural Machine Translation (NMT), such as concatenating representations of source word and its label together. In this paper, we propose novel NMT with representation to improve translation performance NMT, especially long sentences. Empirical results on NIST Chinese-to-English task show that our method achieves 1.6 BLEU improvements...
In statistical machine translation, translation prediction considers not only the aligned source word itself but also its contextual information. Learning context representation is a promising method for improving results, particularly through neural networks. Most of existing methods process words sequentially and neglect long-distance dependencies. this paper, we propose novel approach to dependence-based prediction. The proposed model capable encoding dependencies capturing functional...
Neural machine translation (NMT) has been prominent in many tasks. However, some domain-specific tasks, only the corpora from similar domains can improve performance. If out-of-domain are directly added into in-domain corpus, performance may even degrade. Therefore, domain adaptation techniques essential to solve NMT problem. Most existing methods for designed conventional phrase-based translation. For adaptation, there have a few studies on topics such as fine tuning, tags, and features. In...
Qiuxiang He, Guoping Huang, Qu Cui, Li Li, Lemao Liu. Proceedings of the 59th Annual Meeting Association for Computational Linguistics and 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). 2021.
While LLMs have exhibited strong performance on various NLP tasks, it is noteworthy that most of these tasks rely utilizing the vast amount knowledge encoded in LLMs' parameters, rather than solving new problems without prior knowledge. In cognitive research, latter ability referred to as fluid intelligence, which considered be critical for assessing human intelligence. Recent research intelligence assessments has highlighted significant deficiencies abilities. this paper, we analyze...
Recurrent neural networks, particularly the long short- term memory are extremely appealing for sequence-to-sequence learning tasks. Despite their great success, they typically suffer from a fundamental coming: prone to generate unbalanced targets with good prefixes but bad suffixes, and thus perfor- mance suffers when dealing sequences. We propose simple yet effective approach overcome this shortcoming. Our relies on agreement between pair of target-directional LSTMs, which generates more...
A translation memory (TM) is proved to be helpful improve neural machine (NMT). Existing approaches either pursue the decoding efficiency by merely accessing local information in a TM or encode global yet sacrificing due redundancy. We propose an efficient approach making use of TM. The key idea pack redundant into compact graph and perform additional attention mechanisms over packed for integrating representation network. implement model extending state-of-the-art NMT, Transformer....
Our purely neural network-based system represents a paradigm shift away from the techniques based on phrase-based statistical machine translation we have used in past.The approach exploits agreement between pair of target-bidirectional LSTMs, order to generate balanced targets with both good suffixes and prefixes.The evaluation results show that method is able match even surpass current state-of-the-art most language pairs, but also exposes weaknesses some tasks motivating further study.The...
Yanling Xiao, Lemao Liu, Guoping Huang, Qu Cui, Shujian Shuming Shi, Jiajun Chen. Proceedings of the 60th Annual Meeting Association for Computational Linguistics (Volume 1: Long Papers). 2022.
Guanlin Li, Lemao Liu, Guoping Huang, Conghui Zhu, Tiejun Zhao. Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and 9th International Joint (EMNLP-IJCNLP). 2019.
Recently many efforts have been devoted to interpreting the black-box NMT models, but little progress has made on metrics evaluate explanation methods. Word Alignment Error Rate can be used as such a metric that matches human understanding, however, it not measure methods those target words are aligned any source word. This paper thereby makes an initial attempt from alternative viewpoint. To this end, proposes principled based fidelity in regard predictive behavior of model. As exact...
Automatic machine translation is super efficient to produce translations yet their quality not guaranteed. This technique report introduces TranSmart, a practical human-machine interactive system that able trade off and efficiency. Compared existing publicly available systems, TranSmart supports three key features, word-level autocompletion, sentence-level autocompletion memory. By allows users interactively translate words in own manners rather than the strict manner from left right. In...
In this paper we examine the effectiveness of neural network sequence-to-sequence transduction in task transliteration generation.In year's shared evaluation submitted two systems into all tasks.The primary system was based on used for NEWS 2012 workshop, but augmented with an additional feature which generation probability from a network.The secondary model its own together simple beam search algorithm.Our results show that adding score as phrase-based statistical machine able to increase...
Xintong Li, Lemao Liu, Zhaopeng Tu, Shuming Shi, Max Meng. Proceedings of the 2018 Conference North American Chapter Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers). 2018.
Lianhui Qin, Lemao Liu, Wei Bi, Yan Wang, Xiaojiang Zhiting Hu, Hai Zhao, Shuming Shi. Proceedings of the 56th Annual Meeting Association for Computational Linguistics (Volume 2: Short Papers). 2018.
Lemao Liu, Haisong Zhang, Haiyun Jiang, Yangming Li, Enbo Zhao, Kun Xu, Linfeng Song, Suncong Zheng, Botong Zhou, Dick Zhu, Xiao Feng, Tao Chen, Yang, Dong Yu, Feng ZhanHui Kang, Shuming Shi. Proceedings of the 59th Annual Meeting Association for Computational Linguistics and 11th International Joint Conference on Natural Language Processing: System Demonstrations. 2021.