Han Wu

ORCID: 0000-0002-8008-064X
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Topic Modeling
  • Natural Language Processing Techniques
  • Speech and dialogue systems
  • Advanced Text Analysis Techniques
  • Speech Recognition and Synthesis
  • Anomaly Detection Techniques and Applications
  • Multimodal Machine Learning Applications
  • Advanced Malware Detection Techniques
  • Domain Adaptation and Few-Shot Learning
  • Forecasting Techniques and Applications
  • Context-Aware Activity Recognition Systems
  • Authorship Attribution and Profiling
  • Spatial Cognition and Navigation
  • Semantic Web and Ontologies
  • Advanced Authentication Protocols Security
  • Gait Recognition and Analysis
  • Gaze Tracking and Assistive Technology
  • Intelligent Tutoring Systems and Adaptive Learning
  • User Authentication and Security Systems
  • AI in Service Interactions
  • Advanced Graph Neural Networks
  • Complex Network Analysis Techniques
  • Intellectual Property and Patents
  • Online Learning and Analytics
  • Stock Market Forecasting Methods

Jilin Province Science and Technology Department
2024

Jilin University
2024

City University of Hong Kong
2020-2024

Sun Yat-sen University
2024

City University of Hong Kong, Shenzhen Research Institute
2021-2023

University of Science and Technology of China
2018-2023

Northwestern Polytechnic University
2022

Henan University of Technology
2021

University of International Business and Economics
2021

Beijing University of Posts and Telecommunications
2020

Semantic role labeling (SRL) aims to extract the arguments for each predicate in an input sentence. Traditional SRL can fail analyze dialogues because it only works on every single sentence, while ellipsis and anaphora frequently occur dialogues. To address this problem, we propose conversational task, where argument be dialogue participants, a phrase history or current As existing datasets are sentence level, manually annotate semantic roles 3000 chit-chat (27198 sentences) boost research...

10.1109/taslp.2021.3074014 article EN IEEE/ACM Transactions on Audio Speech and Language Processing 2021-01-01

Patent litigation is an expensive legal process faced by many companies. To reduce the cost of patent litigation, one effective approach proactive management based on predictive analysis. However, automatic prediction still open problem due to complexity lawsuits. In this paper, we propose a data-driven framework, Convolutional Tensor Factorization (CTF), identify patents that may cause litigations between two Specifically, CTF hybrid modeling approach, where content features from are...

10.24963/ijcai.2018/701 article EN 2018-07-01

For multi-turn dialogue rewriting, the capacity of effectively modeling linguistic knowledge in dialog context and getting ride noises is essential to improve its performance. Existing attentive models attend all words without prior focus, which results inaccurate concentration on some dispensable words. In this paper, we propose use semantic role labeling (SRL), highlights core information who did what whom, provide additional guidance for rewriter model. Experiments show that significantly...

10.18653/v1/2020.emnlp-main.537 article EN cc-by 2020-01-01

Contrastive learning has shown great potential in unsupervised sentence embedding tasks, e.g., SimCSE (CITATION).However, these existing solutions are heavily affected by superficial features like the length of sentences or syntactic structures. In this paper, we propose a semantic-aware contrastive framework for embeddings, termed Pseudo-Token BERT (PT-BERT), which is able to explore pseudo-token space (i.e., latent semantic space) representation while eliminating impact such as and syntax....

10.18653/v1/2022.findings-acl.22 article EN cc-by Findings of the Association for Computational Linguistics: ACL 2022 2022-01-01

Natural Language Inference (NLI) task requires an agent to determine the semantic relation between a premise sentence (p) and hypothesis (h), which demands sufficient understanding about sentences from lexical knowledge global semantic. Due issues such as polysemy, ambiguity, well fuzziness of sentences, fully is still challenging. To this end, we propose Image-Enhanced Multi-Level Sentence Representation Net (IEMLRN), novel architecture that able utilize image enhance at different scales....

10.1109/icdm.2018.00090 article EN 2021 IEEE International Conference on Data Mining (ICDM) 2018-11-01

Traditional methods of domain named entity recognition (NER) rely on manually-defined feature templates and experience. Aiming at NER task unstructured cyber threat intelligence (CTI), this paper proposed an approach based BiLSTM-CRF model dictionary matching correction. This utilizes bi-directional Long Short-Term Memory (BiLSTM) to automatically capture features context, Conditional Random Fields (CRF) learn label constraint rule, ontology-based for Due the lack available dataset, adopts...

10.1109/itnec48623.2020.9085102 article EN 2020 IEEE 4th Information Technology, Networking, Electronic and Automation Control Conference (ITNEC) 2020-05-05

Han Wu, Kun Xu, Linfeng Song, Lifeng Jin, Haisong Zhang, Linqi Song. Proceedings of the 59th Annual Meeting Association for Computational Linguistics and 11th International Joint Conference on Natural Language Processing (Volume 2: Short Papers). 2021.

10.18653/v1/2021.acl-short.84 article EN cc-by 2021-01-01

Personalized learning is a promising educational approach that aims to provide high-quality personalized services for each student with minimum demands practice data. The key achieving lies in the cognitive diagnosis task, which estimates state of through his/her logged data doing quizzes. Nevertheless, scenario, existing models suffer from inability (1) quickly adapt new students using small amount data, and (2) measure reliability result avoid improper mismatch student's actual state. In...

10.1609/aaai.v37i4.25629 article EN Proceedings of the AAAI Conference on Artificial Intelligence 2023-06-26

Abstract Honey vaults are useful tools for password management. A vault usually contains usernames each domain, and the corresponding passwords, encrypted with a master chosen by owner. By generating decoy incorrect attempts, honey force attackers vault’s storage file to engage in online verification distinguish real vaults, thus thwarting offline guessing attacks. However, sophisticated can acquire additional information, such as personally identifiable information (PII) partial passwords...

10.1186/s42400-024-00236-6 article EN cc-by Cybersecurity 2024-10-04

Compared to news and chat summarization, the development of meeting summarization is hugely decelerated by limited data. To this end, we introduce a versatile Chinese dataset, dubbed VCSum, consisting 239 real-life meetings, with total duration over 230 hours. We claim our dataset because provide annotations topic segmentation, headlines, segmentation summaries, overall salient sentences for each transcript. As such, can adapt various tasks or methods, including segmentation-based...

10.48550/arxiv.2305.05280 preprint EN cc-by-nc-sa arXiv (Cornell University) 2023-01-01

While conversational semantic role labeling (CSRL) has shown its usefulness on Chinese tasks, it is still under-explored in non-Chinese languages due to the lack of multilingual CSRL annotations for parser training. To avoid expensive data collection and error-propagation translation-based methods, we present a simple but effective approach perform zero-shot cross-lingual CSRL.Our model implicitly learns language-agnostic, structure-aware semantically rich representations with hierarchical...

10.18653/v1/2022.findings-naacl.20 article EN cc-by Findings of the Association for Computational Linguistics: NAACL 2022 2022-01-01

Conversational semantic role labeling (CSRL) is believed to be a crucial step towards dialogue understanding. However, it remains major challenge for existing CSRL parser handle conversational structural information. In this paper, we present simple and effective architecture which aims address problem. Our model based on structure aware graph network explicitly encodes the speaker dependent We also propose multi-task learning method further improve model. Experimental results benchmark...

10.18653/v1/2021.emnlp-main.177 article EN cc-by Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing 2021-01-01

Compared to news and chat summarization, the development of meeting summarization is hugely decelerated by limited data. To this end, we introduce a versatile Chinese dataset, dubbed VCSum, consisting 239 real-life meetings, with total duration over 230 hours. We claim our dataset because provide annotations topic segmentation, headlines, segmentation summaries, overall salient sentences for each transcript. As such, can adapt various tasks or methods, including segmentation-based...

10.18653/v1/2023.findings-acl.377 article EN cc-by Findings of the Association for Computational Linguistics: ACL 2022 2023-01-01

Conversational semantic role labeling (CSRL) is believed to be a crucial step toward dialogue understanding. By incorporating the CSRL information into conversational models, previous work [1] has confirmed usefulness of downstream conversation-based tasks, including multi-turn rewriting and response generation. However, Xu <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">et al.,</i> found that quality extracted structures would consequently...

10.1109/taslp.2023.3331576 article EN IEEE/ACM Transactions on Audio Speech and Language Processing 2023-11-20

Meetings typically involve multiple participants and lengthy conversations, resulting in redundant trivial content. To overcome these challenges, we propose a two-step framework, Reconstruct before Summarize (RbS), for effective efficient meeting summarization. RbS first leverages self-supervised paradigm to annotate essential contents by reconstructing the transcripts. Secondly, relative positional bucketing (RPB) algorithm equip (conventional) summarization models generate summary. Despite...

10.18653/v1/2023.emnlp-main.812 article EN cc-by Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing 2023-01-01

Text classification is a fundamental and classical problem in natural language processing. Existing methods this area attach more attention to structure modeling of texts, while largely ignoring the cognitive principles human reading. Actually, as an important aspect exploring characteristics comprehension, neuroscience research recent years has demonstrated instinct for abstract thinking, where semantic processing summarizing play essential roles. To end, we propose novel text method with...

10.1109/ijcnn55064.2022.9892656 article EN 2022 International Joint Conference on Neural Networks (IJCNN) 2022-07-18

10.1109/ijcnn60899.2024.10650417 article EN 2022 International Joint Conference on Neural Networks (IJCNN) 2024-06-30

10.1109/cvpr52733.2024.02324 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2024-06-16

Semantic role labeling (SRL) is widely used to extract predicate-argument pairs from sentences. Traditional SRL methods can perform well on the single sentence but fail work in dialogue scenario where ellipsis and anaphora frequently occurs. Some research has been proposed solve this problem, i.e. Conversational Role Labeling (CSRL), there are still huge room for improvements. The error case study of BERT-based CSRL model shown that majority errors observed boundary matching, especially...

10.1145/3457682.3457763 article EN 2021-02-26

The daily average relative humidity is significant for both agriculture and industry. Due to high stochastic, intermittent non-linear characteristics by nature, the accurate forecasting of a very challenging task. For improving performance, two LSTM-attention methods with attention mechanism added after input before output are developed in this paper. First, meteorological data during 1 January 1999 31 December 2017 from station Shaanxi, China, were analyzed, where rainfall mean transformed...

10.23919/ccc55666.2022.9902384 article EN 2022 41st Chinese Control Conference (CCC) 2022-07-25

Existing dialogue modeling methods have achieved promising performance on various tasks with the aid of Transformer and large-scale pre-trained language models. However, some recent studies revealed that context representations produced by these suffer problem anisotropy. In this paper, we find generated are also not conversational, losing conversation structure information during stage. To end, identify two properties in modeling, i.e., locality isotropy, present a simple method for...

10.48550/arxiv.2205.14583 preprint EN cc-by-nc-sa arXiv (Cornell University) 2022-01-01

Humans learn language via multi-modal knowledge. However, due to the text-only pre-training scheme, most existing pre-trained models (PLMs) are hindered from information. To inject visual knowledge into PLMs, methods incorporate either text or image encoder of vision-language (VLMs) encode information and update all original parameters PLMs for fusion. In this paper, we propose a new plug-and-play module, X-adapter, flexibly leverage aligned textual learned in VLMs efficiently them PLMs....

10.48550/arxiv.2305.07358 preprint EN other-oa arXiv (Cornell University) 2023-01-01

Meetings typically involve multiple participants and lengthy conversations, resulting in redundant trivial content. To overcome these challenges, we propose a two-step framework, Reconstruct before Summarize (RbS), for effective efficient meeting summarization. RbS first leverages self-supervised paradigm to annotate essential contents by reconstructing the transcripts. Secondly, relative positional bucketing (RPB) algorithm equip (conventional) summarization models generate summary. Despite...

10.48550/arxiv.2305.07988 preprint EN other-oa arXiv (Cornell University) 2023-01-01
Coming Soon ...