- Topic Modeling
- Natural Language Processing Techniques
- Advanced Text Analysis Techniques
- Speech and dialogue systems
- Multimodal Machine Learning Applications
- Text and Document Classification Technologies
- Speech Recognition and Synthesis
- Web Data Mining and Analysis
- Semantic Web and Ontologies
- Advanced Graph Neural Networks
- Sentiment Analysis and Opinion Mining
- Biomedical Text Mining and Ontologies
- Text Readability and Simplification
- Information Retrieval and Search Behavior
- Advanced Computational Techniques and Applications
- Data Quality and Management
- Software Engineering Research
- Recommender Systems and Techniques
- Artificial Intelligence in Healthcare and Education
- Language, Metaphor, and Cognition
- Misinformation and Its Impacts
- Image and Signal Denoising Methods
- Handwritten Text Recognition Techniques
- Complex Network Analysis Techniques
- Expert finding and Q&A systems
Harbin Institute of Technology
2014-2024
Shanghai Jiao Tong University
2023
South China Normal University
2021
Microsoft (United States)
2021
University of California, Santa Barbara
2021
Xi'an University of Architecture and Technology
2021
Dalian Polytechnic University
2020
University of Toronto
2015
University at Albany, State University of New York
2013
Stanford University
2012
Jiang Guo, Wanxiang Che, David Yarowsky, Haifeng Wang, Ting Liu. Proceedings of the 53rd Annual Meeting Association for Computational Linguistics and 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). 2015.
The emergence of large language models (LLMs) has marked a significant breakthrough in natural processing (NLP), fueling paradigm shift information acquisition. Nevertheless, LLMs are prone to hallucination, generating plausible yet nonfactual content. This phenomenon raises concerns over the reliability real-world retrieval (IR) systems and attracted intensive research detect mitigate such hallucinations. Given open-ended general-purpose attributes inherent LLMs, LLM hallucinations present...
In this paper, we study the problem of data augmentation for language understanding in task-oriented dialogue system. contrast to previous work which augments an utterance without considering its relation with other utterances, propose a sequence-to-sequence generation based framework that leverages one utterance's same semantic alternatives training data. A novel diversity rank is incorporated into representation make model produce diverse utterances and these diversely augmented help...
Sentiment analysis, which addresses the computational treatment of opinion, sentiment, and subjectivity in text, has received considerable attention recent years. In contrast to traditional coarse-grained sentiment analysis tasks, such as document-level classification, we are interested fine-grained aspect-based that aims identify aspects users comment on these aspects' polarities. Aspect-based relies heavily syntactic features. However, reviews this task focuses natural spontaneous, thus...
Yuxuan Wang, Wanxiang Che, Jiang Guo, Yijia Liu, Ting Liu. Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and 9th International Joint (EMNLP-IJCNLP). 2019.
Distant supervised relation extraction (RE) has been an effective way of finding novel relational facts from text without labeled training data. Typically it can be formalized as a multi-instance multi-label problem.In this paper, we introduce neural approach for distant with specific focus on attention mechanisms.Unlike the feature-based logistic regression model and compositional models such CNN, our includes two major attention-based memory components, which is capable explicitly...
Cross-lingual model transfer has been a promising approach for inducing dependency parsers low-resource languages where annotated treebanks are not available. The major obstacles the two-fold: 1. Lexical features directly transferable across languages; 2. Target language-specific syntactic structures difficult to be recovered. To address these two challenges, we present novel representation learning framework multi-source parsing. Our allows parsing using full lexical straightforwardly. By...
Causal reasoning ability is crucial for numerous NLP applications. Despite the impressive emerging of ChatGPT in various tasks, it unclear how well performs causal reasoning. In this paper, we conduct first comprehensive evaluation ChatGPT's capabilities. Experiments show that not a good reasoner, but interpreter. Besides, has serious hallucination on reasoning, possibly due to reporting biases between and non-causal relationships natural language, as upgrading processes, such RLHF. The...
Paraphrase generation (PG) is important in plenty of NLP applications. However, the research PG far from enough. In this paper, we propose a novel method for statistical paraphrase (SPG), which can (1) achieve various applications based on uniform model, and (2) naturally combine multiple resources to enhance performance. our experiments, use proposed generate paraphrases three different The results show that be easily transformed one application another valuable interesting paraphrases.
In this paper, we formally define the problem of representing and leveraging abstract event causality to power downstream applications. We propose a novel solution problem, which build an network embed into continuous vector space. The is generalized from specific one, with nodes represented by frequently co-occurring word pairs. To perform embedding task, design dual cause-effect transition model. Therefore, proposed method can obtain general, frequent, simple patterns, meanwhile, simplify...
Recent work on Chinese analysis has led to large-scale annotations of the internal structures words, enabling characterlevel syntactic structures.In this paper, we investigate problem character-level dependency parsing, building trees over characters.Character-level information can benefit downstream applications by offering flexible granularities for word segmentation while improving wordlevel parsing accuracies.We present novel adaptations two major shift-reduce algorithms...
We focus on essay generation, which is a challenging task that generates paragraph-level text with multiple topics.Progress towards understanding different topics and expressing diversity in this requires more powerful generators richer training evaluation resources. To address this, we develop multi-topic aware long short-term memory (MTA-LSTM) network.In model, maintain novel coverage vector, learns the weight of each topic sequentially updated during decoding process.Afterwards vector fed...
Heng Gong, Xiaocheng Feng, Bing Qin, Ting Liu. Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and 9th International Joint (EMNLP-IJCNLP). 2019.
We propose a kind of double-image-encryption algorithm by using the affine transform in gyrator domain. Two original images are converted into real part and imaginary complex function employing transform. And then is encoded transformed The transform, encoding performed twice this encryption method. parameters regarded as key for algorithm. Some numerical simulations have validated feasibility proposed image scheme.
In this paper, we model the problem of disfluency detection using a transition-based framework, which incrementally constructs and labels chunk input sentences new transition system without syntax information. Compared with sequence labeling methods, it can capture non-local chunk-level features; compared joint parsing is free for noise in syntax. Experiments show that our achieves state-of-the-art f-score 87.5% on commonly used English Switchboard test set, set in-house annotated Chinese data.
Existing approaches for Chinese zero pronoun resolution typically utilize only syntactical and lexical features while ignoring semantic information. The fundamental reason is that pronouns have no descriptive information, which brings difficulty in explicitly capturing their similarities with antecedents. Meanwhile, representing challenging since they are merely gaps convey actual content. In this paper, we address issue by building a deep memory network capable of encoding into vector...
Conditions play an essential role in scientific observations, hypotheses, and statements. Unfortunately, existing knowledge graphs (SciKGs) represent factual as a flat relational network of concepts, same the KGs general domain, without considering conditions facts being valid, which loses important contexts for inference exploration. In this work, we propose novel representation SciKG, has three layers. The first layer concept nodes, attribute well attaching links from to concept. second...
Many natural language processing (NLP) tasks can be generalized into segmentation problem. In this paper, we combine semi-CRF with neural network to solve NLP tasks. Our model represents a segment both by composing the input units and embedding entire segment. We thoroughly study different composition functions embeddings. conduct extensive experiments on two typical tasks: named entity recognition (NER) Chinese word (CWS). Experimental results show that our benefits from representing...
In the age of Web 2.0, community user contributed questions and answers provide an important alternative for knowledge acquisition through web search. Question retrieval in current community-based question answering (CQA) services do not, general, work well long complex queries, such as questions. The main reasons are verboseness natural language queries word mismatch between candidate CQA archive during retrieval. To address these two problems, existing solutions try to refine search by...