- Topic Modeling
- Natural Language Processing Techniques
- Sentiment Analysis and Opinion Mining
- Computational and Text Analysis Methods
- Advanced Text Analysis Techniques
- Web Data Mining and Analysis
- Text and Document Classification Technologies
- Speech and dialogue systems
- Speech Recognition and Synthesis
- Advanced Graph Neural Networks
- Model-Driven Software Engineering Techniques
- Semantic Web and Ontologies
- Seismology and Earthquake Studies
- Context-Aware Activity Recognition Systems
- Multimodal Machine Learning Applications
- Data Quality and Management
- Domain Adaptation and Few-Shot Learning
National University of Defense Technology
2021-2025
Recently, character-word lattice structures have achieved promising results for Chinese named entity recognition (NER), reducing word segmentation errors and increasing boundary information character sequences. However, constructing the structure is complex time-consuming, thus these lattice-based models usually suffer from low inference speed. Moreover, quality of lexicon affects accuracy NER model. Since noise words can potentially confuse NER, limited coverage cause to degenerate into...
Model editing techniques modify a minor proportion of knowledge in Large Language Models (LLMs) at relatively low cost, which have demonstrated notable success. Existing methods assume Transformer Layer (TL) hidden states are values key-value memories the Feed-Forward Network (FFN). They usually optimize TL to memorize target and use it update weights FFN LLMs. However, information flow comes from three parts: Multi-Head Self-Attention (MHSA), FFN, residual connections. neglect fact that...
Large Language Models (LLMs) exhibit strong general-purpose language capabilities. However, fine-tuning these models on domain-specific tasks often leads to catastrophic forgetting, where the model overwrites or loses essential knowledge acquired during pretraining. This phenomenon significantly limits broader applicability of LLMs. To address this challenge, we propose a novel approach compute element-wise importance parameters crucial for preserving general fine-tuning. Our method utilizes...
Multimodal named entity recognition (MNER) is an emerging field that aims to automatically detect entities and classify their categories, utilizing input text auxiliary resources such as images. While previous studies have leveraged object detectors preprocess images fuse textual semantics with corresponding image features, these methods often overlook the potential finer grained information within each modality may exacerbate error propagation due predetection. To address issues, we propose...
The general capabilities of large language models (LLMs) make them the infrastructure for various AI applications, but updating their inner knowledge requires significant resources. Recent model editing is a promising technique efficiently small amount LLMs and has attracted much attention. In particular, local methods, which directly update parameters, are proven suitable amounts knowledge. Local methods weights by computing least squares closed-form solutions identify edited vector-level...
Multimodal Entity Linking (MEL) aims at linking ambiguous mentions with multimodal information to entity in Knowledge Graph (KG) such as Wikipedia, which plays a key role many applications. However, existing methods suffer from shortcomings, including modality impurity noise raw image and textual representation, puts obstacles MEL. We formulate neural text matching problem where each (text image) is treated query, the model learns mapping query relevant candidate entities. This paper...
Model editing techniques modify a minor proportion of knowledge in Large Language Models (LLMs) at relatively low cost, which have demonstrated notable success. Existing methods assume Transformer Layer (TL) hidden states are values key-value memories the Feed-Forward Network (FFN). They usually optimize TL to memorize target and use it update weights FFN LLMs. However, information flow comes from three parts: Multi-Head Self-Attention (MHSA), FFN, residual connections. neglect fact that...
This review paper explores Multimodal Large Language Models (MLLMs), which integrate (LLMs) like GPT-4 to handle multimodal data such as text and vision. MLLMs demonstrate capabilities generating image narratives answering image-based questions, bridging the gap towards real-world human-computer interactions hinting at a potential pathway artificial general intelligence. However, still face challenges in processing semantic multimodality, may lead erroneous generation, posing risks society....
Model editing has recently gained widespread attention. Current model methods primarily involve modifying parameters or adding additional modules to the existing model. However, former causes irreversible damage LLMs, while latter incurs inference overhead and fuzzy vector matching is not always reliable. To address these issues, we propose an expandable Subject Word Embedding Altering (SWEA) framework, which modifies representation of subjects achieve goal knowledge during stage. SWEA uses...
Multimodal entity linking (MEL) aims to utilize multimodal information (usually textual and visual information) link ambiguous mentions unambiguous entities in knowledge base. Current methods facing main issues: (1)treating the entire image as input may contain redundant information. (2)the insufficient utilization of entity-related information, such attributes images. (3)semantic inconsistency between base its representation. To this end, we propose DWE+ for linking. could capture finer...
Multimodal aspect-based sentiment analysis (MABSA) aims to understand opinions in a granular manner, advancing human-computer interaction and other fields. Traditionally, MABSA methods use joint prediction approach identify aspects sentiments simultaneously. However, we argue that models are not always superior. Our shows struggle align relevant text tokens with image patches, leading misalignment ineffective utilization. In contrast, pipeline framework first identifies through MATE...
Our study delves into Multimodal Entity Linking, aligning the mention in multimodal information with entities knowledge base. Existing methods are still facing challenges like ambiguous entity representations and limited image utilization. Thus, we propose dynamic extraction using ChatGPT, which dynamically extracts enhances datasets. We also a method: Dynamically Integrate base (DIM), employing capability of Large Language Model (LLM) for visual understanding. The LLM, such as BLIP-2,...
Knowledge editing has emerged as an efficient technology for updating the knowledge of large language models (LLMs), attracting increasing attention in recent years. However, there is a lack effective measures to prevent malicious misuse this technology, which could lead harmful edits LLMs. These modifications cause LLMs generate toxic content, misleading users into inappropriate actions. In front risk, we introduce new task, Editing Type Identification (KETI), aimed at identifying different...
Multimodal Aspect-Based Sentiment Analysis (MABSA) aims to extract aspect terms and their corresponding sentiment polarities from multimodal information, including text images. While traditional supervised learning methods have shown effectiveness in this task, the adaptability of large language models (LLMs) MABSA remains uncertain. Recent advances LLMs, such as Llama2, LLaVA, ChatGPT, demonstrate strong capabilities general tasks, yet performance complex fine-grained scenarios like is...
Multimodal large language models (MLLMs) have shown remarkable progress in high-level semantic tasks such as visual question answering, image captioning, and emotion recognition. However, despite advancements, there remains a lack of standardized benchmarks for evaluating MLLMs performance multi-object sentiment analysis, key task understanding. To address this gap, we introduce MOSABench, novel evaluation dataset designed specifically analysis. MOSABench includes approximately 1,000 images...
Paper quality evaluation is of great significance as it helps to select high papers from the massive amount academic papers. However, existing models needs improvement on interaction and aggregation hierarchical structure. These also ignore guiding role title abstract in paper text. To address above two issues, we propose a well-designed modular model (MHM) for evaluation. Firstly, input our most text, no additional information needed. Secondly, fully exploit inherent hierarchy text with...
Multimodal Entity Linking (MEL) aims at linking ambiguous mentions with multimodal information to entity in Knowledge Graph (KG) such as Wikipedia, which plays a key role many applications. However, existing methods suffer from shortcomings, including modality impurity noise raw image and textual representation, puts obstacles MEL. We formulate neural text matching problem where each (text image) is treated query, the model learns mapping query relevant candidate entities. This paper...
Multi-turn dialogue is challenging because semantic information not only contained in the current utterance, but also context. In fact, understanding multiturn a dynamic process. With increase of turn, users' changing. this case, we propose network based on utterance hidden state transfer for task-oriented (USET). our model, first extract previous as comprehension. Then, comprehension passed to next turn. We take prior knowledge understand Finally, put and together utterance. order realize...
The mainstream SLU models, such as SDEN, take the joint training way of slot filling and intent detection because their correlation add contextual information to improve model performance by vector. Although these models have proved effective, it also brings challenges for filling. decoder is fed with deep-layer semantic encoding without alignment information, which will affect history utterances attenuated in context vector repeated fusion process, not conducive improvement In order solve...
Recent dialogue state tracking (DST) usually treats utterance, system action and ontology equally to estimate the slot types values. In this way, expression of in utterance is restricted. As main way directly express user semantics, should receive further attention its proportion semantic be dynamic according content. It's common recognize different importance information all DST models. However, most them pay little position utterance. fact, semantics are related due human grammatical...