NFDI4DS | UHH-SEMS - Publication Details

Conversational Semantic Role Labeling

OPENALEX - Publications

Kun Xu Han Wu Linfeng Song Haisong Zhang Linqi Song and 1 more

Semantic role labeling (SRL) aims to extract the arguments for each predicate in an input sentence. Traditional SRL can fail analyze dialogues because it only works on every single sentence, while ellipsis and anaphora frequently occur dialogues. To address this problem, we propose conversational task, where argument be dialogue participants, a phrase history or current As existing datasets are sentence level, manually annotate semantic roles 3000 chit-chat (27198 sentences) boost research...

10.1109/taslp.2021.3074014 article EN IEEE/ACM Transactions on Audio Speech and Language Processing 2021-01-01

Patent Litigation Prediction: A Convolutional Tensor Factorization Approach

OPENALEX - Publications

Qi Liu Han Wu Yuyang Ye Hongke Zhao Chuanren Liu and 1 more

Patent litigation is an expensive legal process faced by many companies. To reduce the cost of patent litigation, one effective approach proactive management based on predictive analysis. However, automatic prediction still open problem due to complexity lawsuits. In this paper, we propose a data-driven framework, Convolutional Tensor Factorization (CTF), identify patents that may cause litigations between two Specifically, CTF hybrid modeling approach, where content features from are...

10.24963/ijcai.2018/701 article EN 2018-07-01

Semantic Role Labeling Guided Multi-turn Dialogue ReWriter

OPENALEX - Publications

Kun Xu Haochen Tan Linfeng Song Han Wu Haisong Zhang and 2 more

For multi-turn dialogue rewriting, the capacity of effectively modeling linguistic knowledge in dialog context and getting ride noises is essential to improve its performance. Existing attentive models attend all words without prior focus, which results inaccurate concentration on some dispensable words. In this paper, we propose use semantic role labeling (SRL), highlights core information who did what whom, provide additional guidance for rewriter model. Experiments show that significantly...

10.18653/v1/2020.emnlp-main.537 article EN cc-by 2020-01-01

A Sentence is Worth 128 Pseudo Tokens: A Semantic-Aware Contrastive Learning Framework for Sentence Embeddings

OPENALEX - Publications

Haochen Tan Wei Shao Han Wu Ke Yang Linqi Song

Contrastive learning has shown great potential in unsupervised sentence embedding tasks, e.g., SimCSE (CITATION).However, these existing solutions are heavily affected by superficial features like the length of sentences or syntactic structures. In this paper, we propose a semantic-aware contrastive framework for embeddings, termed Pseudo-Token BERT (PT-BERT), which is able to explore pseudo-token space (i.e., latent semantic space) representation while eliminating impact such as and syntax....

10.18653/v1/2022.findings-acl.22 article EN cc-by Findings of the Association for Computational Linguistics: ACL 2022 2022-01-01

Image-Enhanced Multi-level Sentence Representation Net for Natural Language Inference

OPENALEX - Publications

Kun Zhang Guangyi Lv Le Wu Enhong Chen Qi Liu and 2 more

Natural Language Inference (NLI) task requires an agent to determine the semantic relation between a premise sentence (p) and hypothesis (h), which demands sufficient understanding about sentences from lexical knowledge global semantic. Due issues such as polysemy, ambiguity, well fuzziness of sentences, fully is still challenging. To this end, we propose Image-Enhanced Multi-Level Sentence Representation Net (IEMLRN), novel architecture that able utilize image enhance at different scales....

10.1109/icdm.2018.00090 article EN 2021 IEEE International Conference on Data Mining (ICDM) 2018-11-01

An Effective Approach of Named Entity Recognition for Cyber Threat Intelligence

OPENALEX - Publications

Han Wu Xiaoyong Li Yali Gao

Traditional methods of domain named entity recognition (NER) rely on manually-defined feature templates and experience. Aiming at NER task unstructured cyber threat intelligence (CTI), this paper proposed an approach based BiLSTM-CRF model dictionary matching correction. This utilizes bi-directional Long Short-Term Memory (BiLSTM) to automatically capture features context, Conditional Random Fields (CRF) learn label constraint rule, ontology-based for Due the lack available dataset, adopts...

10.1109/itnec48623.2020.9085102 article EN 2020 IEEE 4th Information Technology, Networking, Electronic and Automation Control Conference (ITNEC) 2020-05-05

Domain-Adaptive Pretraining Methods for Dialogue Understanding

OPENALEX - Publications

Han Wu Kun Xu Linfeng Song Lifeng Jin Haisong Zhang and 1 more

Han Wu, Kun Xu, Linfeng Song, Lifeng Jin, Haisong Zhang, Linqi Song. Proceedings of the 59th Annual Meeting Association for Computational Linguistics and 11th International Joint Conference on Natural Language Processing (Volume 2: Short Papers). 2021.

10.18653/v1/2021.acl-short.84 article EN cc-by 2021-01-01

BETA-CD: A Bayesian Meta-Learned Cognitive Diagnosis Framework for Personalized Learning

OPENALEX - Publications

Haoyang Bi Enhong Chen Weidong He Han Wu Weihao Zhao and 2 more

Personalized learning is a promising educational approach that aims to provide high-quality personalized services for each student with minimum demands practice data. The key achieving lies in the cognitive diagnosis task, which estimates state of through his/her logged data doing quizzes. Nevertheless, scenario, existing models suffer from inability (1) quickly adapt new students using small amount data, and (2) measure reliability result avoid improper mismatch student's actual state. In...

10.1609/aaai.v37i4.25629 article EN Proceedings of the AAAI Conference on Artificial Intelligence 2023-06-26

Honey password vaults tolerating leakage of both personally identifiable information and passwords

OPENALEX - Publications

Chao An YuTing Xiao HaiHang Liu Han Wu Rui Zhang

Abstract Honey vaults are useful tools for password management. A vault usually contains usernames each domain, and the corresponding passwords, encrypted with a master chosen by owner. By generating decoy incorrect attempts, honey force attackers vault’s storage file to engage in online verification distinguish real vaults, thus thwarting offline guessing attacks. However, sophisticated can acquire additional information, such as personally identifiable information (PII) partial passwords...

10.1186/s42400-024-00236-6 article EN cc-by Cybersecurity 2024-10-04

VCSUM: A Versatile Chinese Meeting Summarization Dataset

OPENALEX - Publications

Han Wu Mingjie Zhan Haochen Tan Zhaohui Hou Liang Ding and 1 more

Compared to news and chat summarization, the development of meeting summarization is hugely decelerated by limited data. To this end, we introduce a versatile Chinese dataset, dubbed VCSum, consisting 239 real-life meetings, with total duration over 230 hours. We claim our dataset because provide annotations topic segmentation, headlines, segmentation summaries, overall salient sentences for each transcript. As such, can adapt various tasks or methods, including segmentation-based...

10.48550/arxiv.2305.05280 preprint EN cc-by-nc-sa arXiv (Cornell University) 2023-01-01

Zero-shot Cross-lingual Conversational Semantic Role Labeling

OPENALEX - Publications

Han Wu Haochen Tan Kun Xu Shuqi Liu Lianwei Wu and 1 more

While conversational semantic role labeling (CSRL) has shown its usefulness on Chinese tasks, it is still under-explored in non-Chinese languages due to the lack of multilingual CSRL annotations for parser training. To avoid expensive data collection and error-propagation translation-based methods, we present a simple but effective approach perform zero-shot cross-lingual CSRL.Our model implicitly learns language-agnostic, structure-aware semantically rich representations with hierarchical...

10.18653/v1/2022.findings-naacl.20 article EN cc-by Findings of the Association for Computational Linguistics: NAACL 2022 2022-01-01

CSAGN: Conversational Structure Aware Graph Network for Conversational Semantic Role Labeling

OPENALEX - Publications

Han Wu Kun Xu Linqi Song

Conversational semantic role labeling (CSRL) is believed to be a crucial step towards dialogue understanding. However, it remains major challenge for existing CSRL parser handle conversational structural information. In this paper, we present simple and effective architecture which aims address problem. Our model based on structure aware graph network explicitly encodes the speaker dependent We also propose multi-task learning method further improve model. Experimental results benchmark...

10.18653/v1/2021.emnlp-main.177 article EN cc-by Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing 2021-01-01

VCSUM: A Versatile Chinese Meeting Summarization Dataset

OPENALEX - Publications

Han Wu Mingjie Zhan Haochen Tan Zhaohui Hou Liang Ding and 1 more

Compared to news and chat summarization, the development of meeting summarization is hugely decelerated by limited data. To this end, we introduce a versatile Chinese dataset, dubbed VCSum, consisting 239 real-life meetings, with total duration over 230 hours. We claim our dataset because provide annotations topic segmentation, headlines, segmentation summaries, overall salient sentences for each transcript. As such, can adapt various tasks or methods, including segmentation-based...

10.18653/v1/2023.findings-acl.377 article EN cc-by Findings of the Association for Computational Linguistics: ACL 2022 2023-01-01

Structure-Aware Dialogue Modeling Methods for Conversational Semantic Role Labeling

OPENALEX - Publications

Han Wu Kun Xu Linqi Song

Conversational semantic role labeling (CSRL) is believed to be a crucial step toward dialogue understanding. By incorporating the CSRL information into conversational models, previous work [1] has confirmed usefulness of downstream conversation-based tasks, including multi-turn rewriting and response generation. However, Xu <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">et al.,</i> found that quality extracted structures would consequently...

10.1109/taslp.2023.3331576 article EN IEEE/ACM Transactions on Audio Speech and Language Processing 2023-11-20

Reconstruct Before Summarize: An Efficient Two-Step Framework for Condensing and Summarizing Meeting Transcripts

OPENALEX - Publications

Haochen Tan Han Wu Wei Shao Xinyun Zhang Mingjie Zhan and 3 more

Meetings typically involve multiple participants and lengthy conversations, resulting in redundant trivial content. To overcome these challenges, we propose a two-step framework, Reconstruct before Summarize (RbS), for effective efficient meeting summarization. RbS first leverages self-supervised paradigm to annotate essential contents by reconstructing the transcripts. Secondly, relative positional bucketing (RPB) algorithm equip (conventional) summarization models generate summary. Despite...

10.18653/v1/2023.emnlp-main.812 article EN cc-by Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing 2023-01-01

Text Classification via Learning Semantic Dependency and Association

OPENALEX - Publications

Guanqi Zhu Hanqing Tao Han Wu Liyi Chen Ye Liu and 2 more

Text classification is a fundamental and classical problem in natural language processing. Existing methods this area attach more attention to structure modeling of texts, while largely ignoring the cognitive principles human reading. Actually, as an important aspect exploring characteristics comprehension, neuroscience research recent years has demonstrated instinct for abstract thinking, where semantic processing summarizing play essential roles. To end, we propose novel text method with...

10.1109/ijcnn55064.2022.9892656 article EN 2022 International Joint Conference on Neural Networks (IJCNN) 2022-07-18

Towards Improving Interpretability of Language Model Generation through a Structured Knowledge Discovery Approach

OPENALEX - Publications

Shuqi Liu Han Wu Guanzhi Deng Jianshu Chen Xiaoyang Wang and 1 more

10.1109/jstsp.2024.3414147 article EN IEEE Journal of Selected Topics in Signal Processing 2024-06-13

The SES framework and Frequency domain information fusion strategy for Human activity recognition

OPENALEX - Publications

Han Wu Haotian Feng Lida Shi Hongda Zhang Hao Xu

10.1109/ijcnn60899.2024.10650417 article EN 2022 International Joint Conference on Neural Networks (IJCNN) 2024-06-30

Improving Transferable Targeted Adversarial Attacks with Model Self-Enhancement

OPENALEX - Publications

Han Wu Guanyan Ou Weibin Wu Zibin Zheng

10.1109/cvpr52733.2024.02324 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2024-06-16

K-CSRL: Knowledge Enhanced Conversational Semantic Role Labeling

OPENALEX - Publications

Boyu He Han Wu Congduan Li Linqi Song Weigang Chen

Semantic role labeling (SRL) is widely used to extract predicate-argument pairs from sentences. Traditional SRL methods can perform well on the single sentence but fail work in dialogue scenario where ellipsis and anaphora frequently occurs. Some research has been proposed solve this problem, i.e. Conversational Role Labeling (CSRL), there are still huge room for improvements. The error case study of BERT-based CSRL model shown that majority errors observed boundary matching, especially...

10.1145/3457682.3457763 article EN 2021-02-26

Daily average relative humidity forecasting via two LSTM-attention methods

OPENALEX - Publications

Han Wu Liang Yan Pan-hai Zheng

The daily average relative humidity is significant for both agriculture and industry. Due to high stochastic, intermittent non-linear characteristics by nature, the accurate forecasting of a very challenging task. For improving performance, two LSTM-attention methods with attention mechanism added after input before output are developed in this paper. First, meteorological data during 1 January 1999 31 December 2017 from station Shaanxi, China, were analyzed, where rainfall mean transformed...

10.23919/ccc55666.2022.9902384 article EN 2022 41st Chinese Control Conference (CCC) 2022-07-25

Learning Locality and Isotropy in Dialogue Modeling

OPENALEX - Publications

Han Wu Haochen Tan Mingjie Zhan Gangming Zhao Shaoqing Lu and 2 more

Existing dialogue modeling methods have achieved promising performance on various tasks with the aid of Transformer and large-scale pre-trained language models. However, some recent studies revealed that context representations produced by these suffer problem anisotropy. In this paper, we find generated are also not conversational, losing conversation structure information during stage. To end, identify two properties in modeling, i.e., locality isotropy, present a simple method for...

10.48550/arxiv.2205.14583 preprint EN cc-by-nc-sa arXiv (Cornell University) 2022-01-01

Towards Versatile and Efficient Visual Knowledge Integration into Pre-trained Language Models with Cross-Modal Adapters

OPENALEX - Publications

Xinyun Zhang Haochen Tan Han Wu Mingjie Zhan Liang Ding and 1 more

Humans learn language via multi-modal knowledge. However, due to the text-only pre-training scheme, most existing pre-trained models (PLMs) are hindered from information. To inject visual knowledge into PLMs, methods incorporate either text or image encoder of vision-language (VLMs) encode information and update all original parameters PLMs for fusion. In this paper, we propose a new plug-and-play module, X-adapter, flexibly leverage aligned textual learned in VLMs efficiently them PLMs....

10.48550/arxiv.2305.07358 preprint EN other-oa arXiv (Cornell University) 2023-01-01

Reconstruct Before Summarize: An Efficient Two-Step Framework for Condensing and Summarizing Meeting Transcripts

OPENALEX - Publications

Haochen Tan Han Wu Wei Shao Xinyun Zhang Mingjie Zhan and 3 more

Meetings typically involve multiple participants and lengthy conversations, resulting in redundant trivial content. To overcome these challenges, we propose a two-step framework, Reconstruct before Summarize (RbS), for effective efficient meeting summarization. RbS first leverages self-supervised paradigm to annotate essential contents by reconstructing the transcripts. Secondly, relative positional bucketing (RPB) algorithm equip (conventional) summarization models generate summary. Despite...

10.48550/arxiv.2305.07988 preprint EN other-oa arXiv (Cornell University) 2023-01-01