- Topic Modeling
- Natural Language Processing Techniques
- Speech and dialogue systems
- Advanced Text Analysis Techniques
- Speech Recognition and Synthesis
- Anomaly Detection Techniques and Applications
- Multimodal Machine Learning Applications
- Advanced Malware Detection Techniques
- Domain Adaptation and Few-Shot Learning
- Forecasting Techniques and Applications
- Context-Aware Activity Recognition Systems
- Authorship Attribution and Profiling
- Spatial Cognition and Navigation
- Semantic Web and Ontologies
- Advanced Authentication Protocols Security
- Gait Recognition and Analysis
- Gaze Tracking and Assistive Technology
- Intelligent Tutoring Systems and Adaptive Learning
- User Authentication and Security Systems
- AI in Service Interactions
- Advanced Graph Neural Networks
- Complex Network Analysis Techniques
- Intellectual Property and Patents
- Online Learning and Analytics
- Stock Market Forecasting Methods
Jilin Province Science and Technology Department
2024
Jilin University
2024
City University of Hong Kong
2020-2024
Sun Yat-sen University
2024
City University of Hong Kong, Shenzhen Research Institute
2021-2023
University of Science and Technology of China
2018-2023
Northwestern Polytechnic University
2022
Henan University of Technology
2021
University of International Business and Economics
2021
Beijing University of Posts and Telecommunications
2020
Semantic role labeling (SRL) aims to extract the arguments for each predicate in an input sentence. Traditional SRL can fail analyze dialogues because it only works on every single sentence, while ellipsis and anaphora frequently occur dialogues. To address this problem, we propose conversational task, where argument be dialogue participants, a phrase history or current As existing datasets are sentence level, manually annotate semantic roles 3000 chit-chat (27198 sentences) boost research...
Patent litigation is an expensive legal process faced by many companies. To reduce the cost of patent litigation, one effective approach proactive management based on predictive analysis. However, automatic prediction still open problem due to complexity lawsuits. In this paper, we propose a data-driven framework, Convolutional Tensor Factorization (CTF), identify patents that may cause litigations between two Specifically, CTF hybrid modeling approach, where content features from are...
For multi-turn dialogue rewriting, the capacity of effectively modeling linguistic knowledge in dialog context and getting ride noises is essential to improve its performance. Existing attentive models attend all words without prior focus, which results inaccurate concentration on some dispensable words. In this paper, we propose use semantic role labeling (SRL), highlights core information who did what whom, provide additional guidance for rewriter model. Experiments show that significantly...
Contrastive learning has shown great potential in unsupervised sentence embedding tasks, e.g., SimCSE (CITATION).However, these existing solutions are heavily affected by superficial features like the length of sentences or syntactic structures. In this paper, we propose a semantic-aware contrastive framework for embeddings, termed Pseudo-Token BERT (PT-BERT), which is able to explore pseudo-token space (i.e., latent semantic space) representation while eliminating impact such as and syntax....
Natural Language Inference (NLI) task requires an agent to determine the semantic relation between a premise sentence (p) and hypothesis (h), which demands sufficient understanding about sentences from lexical knowledge global semantic. Due issues such as polysemy, ambiguity, well fuzziness of sentences, fully is still challenging. To this end, we propose Image-Enhanced Multi-Level Sentence Representation Net (IEMLRN), novel architecture that able utilize image enhance at different scales....
Traditional methods of domain named entity recognition (NER) rely on manually-defined feature templates and experience. Aiming at NER task unstructured cyber threat intelligence (CTI), this paper proposed an approach based BiLSTM-CRF model dictionary matching correction. This utilizes bi-directional Long Short-Term Memory (BiLSTM) to automatically capture features context, Conditional Random Fields (CRF) learn label constraint rule, ontology-based for Due the lack available dataset, adopts...
Han Wu, Kun Xu, Linfeng Song, Lifeng Jin, Haisong Zhang, Linqi Song. Proceedings of the 59th Annual Meeting Association for Computational Linguistics and 11th International Joint Conference on Natural Language Processing (Volume 2: Short Papers). 2021.
Personalized learning is a promising educational approach that aims to provide high-quality personalized services for each student with minimum demands practice data. The key achieving lies in the cognitive diagnosis task, which estimates state of through his/her logged data doing quizzes. Nevertheless, scenario, existing models suffer from inability (1) quickly adapt new students using small amount data, and (2) measure reliability result avoid improper mismatch student's actual state. In...
Abstract Honey vaults are useful tools for password management. A vault usually contains usernames each domain, and the corresponding passwords, encrypted with a master chosen by owner. By generating decoy incorrect attempts, honey force attackers vault’s storage file to engage in online verification distinguish real vaults, thus thwarting offline guessing attacks. However, sophisticated can acquire additional information, such as personally identifiable information (PII) partial passwords...
Compared to news and chat summarization, the development of meeting summarization is hugely decelerated by limited data. To this end, we introduce a versatile Chinese dataset, dubbed VCSum, consisting 239 real-life meetings, with total duration over 230 hours. We claim our dataset because provide annotations topic segmentation, headlines, segmentation summaries, overall salient sentences for each transcript. As such, can adapt various tasks or methods, including segmentation-based...
While conversational semantic role labeling (CSRL) has shown its usefulness on Chinese tasks, it is still under-explored in non-Chinese languages due to the lack of multilingual CSRL annotations for parser training. To avoid expensive data collection and error-propagation translation-based methods, we present a simple but effective approach perform zero-shot cross-lingual CSRL.Our model implicitly learns language-agnostic, structure-aware semantically rich representations with hierarchical...
Conversational semantic role labeling (CSRL) is believed to be a crucial step towards dialogue understanding. However, it remains major challenge for existing CSRL parser handle conversational structural information. In this paper, we present simple and effective architecture which aims address problem. Our model based on structure aware graph network explicitly encodes the speaker dependent We also propose multi-task learning method further improve model. Experimental results benchmark...
Compared to news and chat summarization, the development of meeting summarization is hugely decelerated by limited data. To this end, we introduce a versatile Chinese dataset, dubbed VCSum, consisting 239 real-life meetings, with total duration over 230 hours. We claim our dataset because provide annotations topic segmentation, headlines, segmentation summaries, overall salient sentences for each transcript. As such, can adapt various tasks or methods, including segmentation-based...
Conversational semantic role labeling (CSRL) is believed to be a crucial step toward dialogue understanding. By incorporating the CSRL information into conversational models, previous work [1] has confirmed usefulness of downstream conversation-based tasks, including multi-turn rewriting and response generation. However, Xu <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">et al.,</i> found that quality extracted structures would consequently...
Meetings typically involve multiple participants and lengthy conversations, resulting in redundant trivial content. To overcome these challenges, we propose a two-step framework, Reconstruct before Summarize (RbS), for effective efficient meeting summarization. RbS first leverages self-supervised paradigm to annotate essential contents by reconstructing the transcripts. Secondly, relative positional bucketing (RPB) algorithm equip (conventional) summarization models generate summary. Despite...
Text classification is a fundamental and classical problem in natural language processing. Existing methods this area attach more attention to structure modeling of texts, while largely ignoring the cognitive principles human reading. Actually, as an important aspect exploring characteristics comprehension, neuroscience research recent years has demonstrated instinct for abstract thinking, where semantic processing summarizing play essential roles. To end, we propose novel text method with...
Semantic role labeling (SRL) is widely used to extract predicate-argument pairs from sentences. Traditional SRL methods can perform well on the single sentence but fail work in dialogue scenario where ellipsis and anaphora frequently occurs. Some research has been proposed solve this problem, i.e. Conversational Role Labeling (CSRL), there are still huge room for improvements. The error case study of BERT-based CSRL model shown that majority errors observed boundary matching, especially...
The daily average relative humidity is significant for both agriculture and industry. Due to high stochastic, intermittent non-linear characteristics by nature, the accurate forecasting of a very challenging task. For improving performance, two LSTM-attention methods with attention mechanism added after input before output are developed in this paper. First, meteorological data during 1 January 1999 31 December 2017 from station Shaanxi, China, were analyzed, where rainfall mean transformed...
Existing dialogue modeling methods have achieved promising performance on various tasks with the aid of Transformer and large-scale pre-trained language models. However, some recent studies revealed that context representations produced by these suffer problem anisotropy. In this paper, we find generated are also not conversational, losing conversation structure information during stage. To end, identify two properties in modeling, i.e., locality isotropy, present a simple method for...
Humans learn language via multi-modal knowledge. However, due to the text-only pre-training scheme, most existing pre-trained models (PLMs) are hindered from information. To inject visual knowledge into PLMs, methods incorporate either text or image encoder of vision-language (VLMs) encode information and update all original parameters PLMs for fusion. In this paper, we propose a new plug-and-play module, X-adapter, flexibly leverage aligned textual learned in VLMs efficiently them PLMs....
Meetings typically involve multiple participants and lengthy conversations, resulting in redundant trivial content. To overcome these challenges, we propose a two-step framework, Reconstruct before Summarize (RbS), for effective efficient meeting summarization. RbS first leverages self-supervised paradigm to annotate essential contents by reconstructing the transcripts. Secondly, relative positional bucketing (RPB) algorithm equip (conventional) summarization models generate summary. Despite...