- Topic Modeling
- Natural Language Processing Techniques
- Multimodal Machine Learning Applications
- Semantic Web and Ontologies
- Advanced Text Analysis Techniques
- Text Readability and Simplification
- Software Engineering Research
- Advanced Graph Neural Networks
- Data Quality and Management
- Recommender Systems and Techniques
- Scientific Computing and Data Management
- Domain Adaptation and Few-Shot Learning
- Computational and Text Analysis Methods
- Distributed and Parallel Computing Systems
- Cloud Computing and Resource Management
- Speech Recognition and Synthesis
- Biomedical Text Mining and Ontologies
- Text and Document Classification Technologies
- Speech and dialogue systems
- Advanced Neural Network Applications
- Advanced Data Storage Technologies
- Video Analysis and Summarization
- Reinforcement Learning in Robotics
- Neuroscience of respiration and sleep
- Emotion and Mood Recognition
Binzhou University
2024
Binzhou Medical University
2024
Tsinghua University
2018-2024
Zhejiang University
2024
Beijing Academy of Artificial Intelligence
2020-2022
Nanchang Institute of Science & Technology
2016
Hebei GEO University
2009
Southeast University
2004-2005
Abstract Pre-trained language representation models (PLMs) cannot well capture factual knowledge from text. In contrast, embedding (KE) methods can effectively represent the relational facts in graphs (KGs) with informative entity embeddings, but conventional KE take full advantage of abundant textual information. this paper, we propose a unified model for Knowledge Embedding and LanguagERepresentation (KEPLER), which not only better integrate into PLMs also produce effective text-enhanced...
Abstract With the prevalence of pre-trained language models (PLMs) and pre-training–fine-tuning paradigm, it has been continuously shown that larger tend to yield better performance. However, as PLMs scale up, fine-tuning storing all parameters is prohibitively costly eventually becomes practically infeasible. This necessitates a new branch research focusing on parameter-efficient adaptation PLMs, which optimizes small portion model while keeping rest fixed, drastically cutting down...
Abstract As pre-trained language models (PLMs) have become the fundamental infrastructure for various NLP tasks and researchers readily enjoyed themselves in pretraining-finetuning paradigm, evidence from emerging research has continuously proven that larger tend to yield better performance. However, despite welcome outcome, process of fine-tuning large-scale PLMs brings prohibitive adaptation costs. In fact, fine- tuning all parameters a colossal model retaining separate instances different...
Xiaozhi Wang, Xu Han, Zhiyuan Liu, Maosong Sun, Peng Li. Proceedings of the 2019 Conference North American Chapter Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). 2019.
Xiaozhi Wang, Ziqi Xu Han, Wangyi Jiang, Rong Zhiyuan Liu, Juanzi Li, Peng Yankai Lin, Jie Zhou. Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). 2020.
Xiaozhi Wang, Ziqi Xu Han, Zhiyuan Liu, Juanzi Li, Peng Maosong Sun, Jie Zhou, Xiang Ren. Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and 9th International Joint (EMNLP-IJCNLP). 2019.
Ziqi Wang, Xiaozhi Xu Han, Yankai Lin, Lei Hou, Zhiyuan Liu, Peng Li, Juanzi Jie Zhou. Proceedings of the 59th Annual Meeting Association for Computational Linguistics and 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). 2021.
Pre-trained Language Models (PLMs) have proven to be beneficial for various downstream NLP tasks. Recently, GPT-3, with 175 billion parameters and 570 GB training data, drew a lot of attention due the capacity few-shot (even zero-shot) learning. However, applying GPT-3 address Chinese tasks is still challenging, as corpus primarily English, are not publicly available. In this technical report, we release Model (CPM) generative pre-training on large-scale data. To best our knowledge, CPM, 2.6...
Prompt tuning (PT) is a promising parameter-efficient method to utilize extremely large pre-trained language models (PLMs), which can achieve comparable performance full-parameter fine-tuning by only few soft prompts. However, PT requires much more training time than fine-tuning. Intuitively, knowledge transfer help improve the efficiency. To explore whether we via prompt transfer, empirically investigate transferability of prompts across different downstream tasks and PLMs in this work. We...
Pre-trained language representation models (PLMs) cannot well capture factual knowledge from text. In contrast, embedding (KE) methods can effectively represent the relational facts in graphs (KGs) with informative entity embeddings, but conventional KE take full advantage of abundant textual information. this paper, we propose a unified model for Knowledge Embedding and LanguagE Representation (KEPLER), which not only better integrate into PLMs also produce effective text-enhanced strong...
Recently, pre-trained language models mostly follow the pre-train-then-fine-tuning paradigm and have achieved great performance on various downstream tasks. However, since pre-training stage is typically task-agnostic fine-tuning usually suffers from insufficient supervised data, cannot always well capture domain-specific task-specific patterns. In this paper, we propose a three-stage framework by adding task-guided with selective masking between general fine-tuning. stage, model trained...
Despite the success, process of fine-tuning large-scale PLMs brings prohibitive adaptation costs. In fact, all parameters a colossal model and retaining separate instances for different tasks are practically infeasible. This necessitates new branch research focusing on parameter-efficient PLMs, dubbed as delta tuning in this paper. contrast with standard fine-tuning, only fine-tunes small portion while keeping rest untouched, largely reducing both computation storage Recent studies have...
Numerous benchmarks have been established to assess the performance of foundation models on open-ended question answering, which serves as a comprehensive test model's ability understand and generate language in manner similar humans. Most these works focus proposing new datasets, however, we see two main issues within previous benchmarking pipelines, namely testing leakage evaluation automation. In this paper, propose novel framework, Language-Model-as-an-Examiner, where LM knowledgeable...
The unprecedented performance of large language models (LLMs) necessitates improvements in evaluations. Rather than merely exploring the breadth LLM abilities, we believe meticulous and thoughtful designs are essential to thorough, unbiased, applicable Given importance world knowledge LLMs, construct a Knowledge-oriented Assessment benchmark (KoLA), which carefully design three crucial factors: (1) For ability modeling, mimic human cognition form four-level taxonomy knowledge-related...
Conceptual knowledge is fundamental to human cognition and bases. However, existing probing works only focus on evaluating factual of pre-trained language models (PLMs) ignore conceptual knowledge. Since often appears as implicit commonsense behind texts, designing probes for hard. Inspired by representation schemata, we comprehensively evaluate PLMs three tasks probe whether organize entities similarities, learn properties, conceptualize in contexts, respectively. For the tasks, collect...
Abstract Tokenization is fundamental to pretrained language models (PLMs). Existing tokenization methods for Chinese PLMs typically treat each character as an indivisible token. However, they ignore the unique feature of writing system where additional linguistic information exists below level, i.e., at sub-character level. To utilize such information, we propose (SubChar short) tokenization. Specifically, first encode input text by converting into a short sequence based on its glyph or...
Despite the recent emergence of video captioning models, how to generate vivid, fine-grained descriptions based on background knowledge (i.e., long and informative commentary about domain-specific scenes with appropriate reasoning) is still far from being solved, which however has great applications such as automatic sports narrative. Based soccer game videos synchronized data, we present GOAL, a benchmark over 8.9k clips, 22k sentences, 42k triples for proposing challenging new task setting...
Why can pre-trained language models (PLMs) learn universal representations and effectively adapt to broad NLP tasks differing a lot superficially? In this work, we empirically find evidence indicating that the adaptations of PLMs various few-shot be reparameterized as optimizing only few free parameters in unified low-dimensional <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">intrinsic task subspace</i> , which may help us understand why...
While there are abundant researches about evaluating ChatGPT on natural language understanding and generation tasks, few studies have investigated how ChatGPT's behavior changes over time. In this paper, we collect a coarse-to-fine temporal dataset called ChatLog, consisting of two parts that update monthly daily: ChatLog-Monthly is 38,730 question-answer pairs collected every month including questions from both the reasoning classification tasks. ChatLog-Daily, other hand, consists...
Transformer-based pre-trained language models have demonstrated superior performance on various natural processing tasks. However, it remains unclear how the skills required to handle these tasks distribute among model parameters. In this paper, we find that after prompt tuning for specific tasks, activations of some neurons within Transformers are highly predictive task labels. We dub skill and confirm they encode task-specific by finding that: (1) Skill crucial handling Performances a...
Pre-trained Language Models (PLMs) have proven to be beneficial for various downstream NLP tasks. Recently, GPT-3, with 175 billion parameters and 570GB training data, drew a lot of attention due the capacity few-shot (even zero-shot) learning. However, applying GPT-3 address Chinese tasks is still challenging, as corpus primarily English, are not publicly available. In this technical report, we release Model (CPM) generative pre-training on large-scale data. To best our knowledge, CPM, 2.6...