- Topic Modeling
- Natural Language Processing Techniques
- Multimodal Machine Learning Applications
- Advanced Image and Video Retrieval Techniques
- Domain Adaptation and Few-Shot Learning
- Adversarial Robustness in Machine Learning
- Image Retrieval and Classification Techniques
- Speech and dialogue systems
- Anomaly Detection Techniques and Applications
- Data Management and Algorithms
- Biomedical Text Mining and Ontologies
- Chaos control and synchronization
- Advanced Clustering Algorithms Research
- Face and Expression Recognition
- Sentiment Analysis and Opinion Mining
- Neural Networks and Reservoir Computing
- Advanced Neural Network Applications
- Machine Learning and ELM
- Nonlinear Dynamics and Pattern Formation
- Human Pose and Action Recognition
- Advanced Statistical Methods and Models
- Random lasers and scattering media
- Bayesian Modeling and Causal Inference
- Advanced Graph Neural Networks
- AI in Service Interactions
Nanyang Technological University
2020-2024
Jinan University
2024
State Key Laboratory of Cryptology
2024
University of Illinois Urbana-Champaign
2023
Microsoft (United States)
2021-2023
Carnegie Mellon University
2021-2023
Nankai University
2021-2022
Southwest University
2021-2022
Civil Aviation Management Institute of China
2022
Microsoft (Finland)
2020-2022
Pretraining large neural language models, such as BERT, has led to impressive gains on many natural processing (NLP) tasks. However, most pretraining efforts focus general domain corpora, newswire and Web. A prevailing assumption is that even domain-specific can benefit by starting from general-domain models. In this article, we challenge showing for domains with abundant unlabeled text, biomedicine, models scratch results in substantial over continual of To facilitate investigation, compile...
Jacob Devlin, Hao Cheng, Fang, Saurabh Gupta, Li Deng, Xiaodong He, Geoffrey Zweig, Margaret Mitchell. Proceedings of the 53rd Annual Meeting Association for Computational Linguistics and 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers). 2015.
Large neural language models have transformed modern natural processing (NLP) applications. However, fine-tuning such for specific tasks remains challenging as model size increases, especially with small labeled datasets, which are common in biomedical NLP. We conduct a systematic study on stability show that performance may be sensitive to pretraining settings and an exploration of techniques addressing instability. these can substantially improve low-resource NLP Specifically, freezing...
It is well known that deep neural networks (DNNs) are vulnerable to adversarial attacks, which implemented by adding crafted perturbations onto benign examples. Min-max robust optimization based training can provide a notion of security against attacks. However, robustness requires significantly larger capacity the network than for natural with only This paper proposes framework concurrent and weight pruning enables model compression while still preserving essentially tackles dilemma...
Generalization and robustness are both key desiderata for designing machine learning methods. Adversarial training can enhance robustness, but past work often finds it hurts generalization. In natural language processing (NLP), pre-training large neural models such as BERT have demonstrated impressive gain in generalization a variety of tasks, with further improvement from adversarial fine-tuning. However, these still vulnerable to attacks. this paper, we show that improve robustness. We...
Task-oriented conversational systems often use dialogue state tracking to represent the user's intentions, which involves filling in values of pre-defined slots. Many approaches have been proposed, using task-specific architectures with special-purpose classifiers. Recently, good results obtained more general based on pretrained language models. Here, we introduce a new variation modeling approach that uses schema-driven prompting provide task-aware history encoding is used for both...
We review the EfficientQA competition from NeurIPS 2020. The focused on open-domain question answering (QA), where systems take natural language questions as input and return answers. aim of was to build that can predict correct answers while also satisfying strict on-disk memory budgets. These budgets were designed encourage contestants explore trade-off between storing retrieval corpora or parameters learned models. In this report, we describe motivation organization competition, best...
Few-shot classification aims to learn a discriminative feature representation recognize unseen classes with few labeled support samples. While most few-shot learning methods focus on exploiting the spatial information of image samples, frequency has also been proven essential in tasks. In this paper, we investigate effect different components To enhance performance and generalizability methods, propose novel Frequency-Guided Learning framework (dubbed FGFL), which leverages task-specific...
Existing large language models (LLMs) can only afford fix-sized inputs due to the input length limit, preventing them from utilizing rich long-context information past inputs. To address this, we propose a framework, Language Models Augmented with Long-Term Memory (LongMem), which enables LLMs memorize long history. We design novel decoupled network architecture original backbone LLM frozen as memory encoder and an adaptive residual side-network retriever reader. Such easily cache update...
Learning the generalizable feature representation is critical to few-shot image classification. While recent works exploited task-specific embedding using meta-tasks for learning, they are limited in many challenging tasks as being distracted by excursive features such background, domain, and style of samples. In this work, we propose a novel disentangled (DFR) framework, dubbed DFR, learning applications. DFR can adaptively decouple discriminative that modeled classification branch, from...
Hao Fang, Cheng, Maarten Sap, Elizabeth Clark, Ari Holtzman, Yejin Choi, Noah A. Smith, Mari Ostendorf. Proceedings of the 2018 Conference North American Chapter Association for Computational Linguistics: Demonstrations. 2018.
Recent advances in Graph Neural Networks (GNNs) have achieved superior results many challenging tasks, such as few-shot learning. Despite its capacity to learn and generalize a model from only few annotated samples, GNN is limited scalability, deep models usually suffer severe over-fitting over-smoothing. In this work, we propose novel framework with <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">triple-attention mechanism</i> ,...
Most of today's AI systems focus on using self-attention mechanisms and transformer architectures large amounts diverse data to achieve impressive performance gains. In this paper, we propose augment the architecture with an external attention mechanism bring knowledge context bear. By integrating information into prediction process, hope reduce need for ever-larger models increase democratization systems. We find that proposed can significantly improve existing systems, allowing...
Data clustering is a difficult problem due to the complex and heterogeneous natures of multidimensional data. To improve accuracy, we propose scheme capture local correlation structures: associate each cluster with an independent weighting vector embed it in subspace spanned by adaptive combination dimensions. Our algorithm takes advantage known pairwise instance-level constraints. The data points constraint set are divided into groups through inference; group assigned feasible which...
Out-of-vocabulary name errors in speech recognition create significant problems for downstream language processing, but the fact that they are rare poses challenges automatic detection, particularly an open-domain scenario.To address this problem, a multi-task recurrent neural network model sentence-level detection is proposed use combination with out-of-vocabulary word detection.The also effective leveraging external text data.Experiments show 26% improvement name-error F-score over system...
We develop a novel bi-directional attention model for dependency parsing, which learns to agree on headword predictions from the forward and backward parsing directions. The procedure each direction is formulated as sequentially querying memory component that stores continuous embeddings. proposed parser makes use of {\it soft} embeddings, allowing implicitly capture high-order history without dramatically increasing computational complexity. conduct experiments English, Chinese, 12 other...
Hao Cheng, Yelong Shen, Xiaodong Liu, Pengcheng He, Weizhu Chen, Jianfeng Gao. Proceedings of the 59th Annual Meeting Association for Computational Linguistics and 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). 2021.
Although deep neural networks (DNNs) have achieved a great success in various computer vision tasks, it is recently found that they are vulnerable to adversarial attacks. In this paper, we focus on the so-called \textit{backdoor attack}, which injects backdoor trigger small portion of training data (also known as poisoning) such trained DNN induces misclassification while facing examples with trigger. To be specific, carefully study effect both real and synthetic attacks internal response...
Xiaodong Liu, Yu Wang, Jianshu Ji, Hao Cheng, Xueyun Zhu, Emmanuel Awa, Pengcheng He, Weizhu Chen, Hoifung Poon, Guihong Cao, Jianfeng Gao. Proceedings of the 58th Annual Meeting Association for Computational Linguistics: System Demonstrations. 2020.
Hao Cheng, Xiaodong Liu, Lis Pereira, Yaoliang Yu, Jianfeng Gao. Proceedings of the 2021 Conference North American Chapter Association for Computational Linguistics: Human Language Technologies. 2021.
The retrieval model is an indispensable component for real-world knowledge-intensive tasks, e.g., open-domain question answering (ODQA). As separate skills are annotated different datasets, recent work focuses on customized methods, limiting the transfer- ability and scalability. In this work, we propose a modular retriever where individual modules correspond to key that can be reused across datasets. Our approach supports flexible skill configurations based target domain boost performance....
Extracting patient information from unstructured text is a critical task in health decision-support and clinical research. Large language models (LLMs) have shown the potential to accelerate curation via few-shot in-context learning, contrast supervised learning which requires much more costly human annotations. However, despite drastic advances modern LLMs such as GPT-4, they still struggle with issues regarding accuracy interpretability, especially mission-critical domains health. Here, we...
Hao Cheng, Fang, Mari Ostendorf. Proceedings of the 2019 Conference North American Chapter Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). 2019.