Shezheng Song

ORCID: 0009-0007-9985-7619
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Topic Modeling
  • Natural Language Processing Techniques
  • Sentiment Analysis and Opinion Mining
  • Computational and Text Analysis Methods
  • Advanced Text Analysis Techniques
  • Web Data Mining and Analysis
  • Text and Document Classification Technologies
  • Speech and dialogue systems
  • Speech Recognition and Synthesis
  • Advanced Graph Neural Networks
  • Model-Driven Software Engineering Techniques
  • Semantic Web and Ontologies
  • Seismology and Earthquake Studies
  • Context-Aware Activity Recognition Systems
  • Multimodal Machine Learning Applications
  • Data Quality and Management
  • Domain Adaptation and Few-Shot Learning

National University of Defense Technology
2021-2025

Recently, character-word lattice structures have achieved promising results for Chinese named entity recognition (NER), reducing word segmentation errors and increasing boundary information character sequences. However, constructing the structure is complex time-consuming, thus these lattice-based models usually suffer from low inference speed. Moreover, quality of lexicon affects accuracy NER model. Since noise words can potentially confuse NER, limited coverage cause to degenerate into...

10.1109/tnnls.2025.3528416 article EN IEEE Transactions on Neural Networks and Learning Systems 2025-01-01

Model editing techniques modify a minor proportion of knowledge in Large Language Models (LLMs) at relatively low cost, which have demonstrated notable success. Existing methods assume Transformer Layer (TL) hidden states are values key-value memories the Feed-Forward Network (FFN). They usually optimize TL to memorize target and use it update weights FFN LLMs. However, information flow comes from three parts: Multi-Head Self-Attention (MHSA), FFN, residual connections. neglect fact that...

10.1609/aaai.v38i17.29818 article EN Proceedings of the AAAI Conference on Artificial Intelligence 2024-03-24

Large Language Models (LLMs) exhibit strong general-purpose language capabilities. However, fine-tuning these models on domain-specific tasks often leads to catastrophic forgetting, where the model overwrites or loses essential knowledge acquired during pretraining. This phenomenon significantly limits broader applicability of LLMs. To address this challenge, we propose a novel approach compute element-wise importance parameters crucial for preserving general fine-tuning. Our method utilizes...

10.48550/arxiv.2501.13669 preprint EN arXiv (Cornell University) 2025-01-23

Multimodal named entity recognition (MNER) is an emerging field that aims to automatically detect entities and classify their categories, utilizing input text auxiliary resources such as images. While previous studies have leveraged object detectors preprocess images fuse textual semantics with corresponding image features, these methods often overlook the potential finer grained information within each modality may exacerbate error propagation due predetection. To address issues, we propose...

10.1109/tnnls.2025.3528567 article EN IEEE Transactions on Neural Networks and Learning Systems 2025-01-01

The general capabilities of large language models (LLMs) make them the infrastructure for various AI applications, but updating their inner knowledge requires significant resources. Recent model editing is a promising technique efficiently small amount LLMs and has attracted much attention. In particular, local methods, which directly update parameters, are proven suitable amounts knowledge. Local methods weights by computing least squares closed-form solutions identify edited vector-level...

10.1609/aaai.v39i23.34628 article EN Proceedings of the AAAI Conference on Artificial Intelligence 2025-04-11

Multimodal Entity Linking (MEL) aims at linking ambiguous mentions with multimodal information to entity in Knowledge Graph (KG) such as Wikipedia, which plays a key role many applications. However, existing methods suffer from shortcomings, including modality impurity noise raw image and textual representation, puts obstacles MEL. We formulate neural text matching problem where each (text image) is treated query, the model learns mapping query relevant candidate entities. This paper...

10.1609/aaai.v38i17.29867 article EN Proceedings of the AAAI Conference on Artificial Intelligence 2024-03-24

Model editing techniques modify a minor proportion of knowledge in Large Language Models (LLMs) at relatively low cost, which have demonstrated notable success. Existing methods assume Transformer Layer (TL) hidden states are values key-value memories the Feed-Forward Network (FFN). They usually optimize TL to memorize target and use it update weights FFN LLMs. However, information flow comes from three parts: Multi-Head Self-Attention (MHSA), FFN, residual connections. neglect fact that...

10.48550/arxiv.2308.08742 preprint EN other-oa arXiv (Cornell University) 2023-01-01

This review paper explores Multimodal Large Language Models (MLLMs), which integrate (LLMs) like GPT-4 to handle multimodal data such as text and vision. MLLMs demonstrate capabilities generating image narratives answering image-based questions, bridging the gap towards real-world human-computer interactions hinting at a potential pathway artificial general intelligence. However, still face challenges in processing semantic multimodality, may lead erroneous generation, posing risks society....

10.48550/arxiv.2311.07594 preprint EN cc-by-nc-nd arXiv (Cornell University) 2023-01-01

Model editing has recently gained widespread attention. Current model methods primarily involve modifying parameters or adding additional modules to the existing model. However, former causes irreversible damage LLMs, while latter incurs inference overhead and fuzzy vector matching is not always reliable. To address these issues, we propose an expandable Subject Word Embedding Altering (SWEA) framework, which modifies representation of subjects achieve goal knowledge during stage. SWEA uses...

10.48550/arxiv.2401.17809 preprint EN arXiv (Cornell University) 2024-01-31

Multimodal entity linking (MEL) aims to utilize multimodal information (usually textual and visual information) link ambiguous mentions unambiguous entities in knowledge base. Current methods facing main issues: (1)treating the entire image as input may contain redundant information. (2)the insufficient utilization of entity-related information, such attributes images. (3)semantic inconsistency between base its representation. To this end, we propose DWE+ for linking. could capture finer...

10.48550/arxiv.2404.04818 preprint EN arXiv (Cornell University) 2024-04-07

Multimodal aspect-based sentiment analysis (MABSA) aims to understand opinions in a granular manner, advancing human-computer interaction and other fields. Traditionally, MABSA methods use joint prediction approach identify aspects sentiments simultaneously. However, we argue that models are not always superior. Our shows struggle align relevant text tokens with image patches, leading misalignment ineffective utilization. In contrast, pipeline framework first identifies through MATE...

10.48550/arxiv.2406.00017 preprint EN arXiv (Cornell University) 2024-05-22

Our study delves into Multimodal Entity Linking, aligning the mention in multimodal information with entities knowledge base. Existing methods are still facing challenges like ambiguous entity representations and limited image utilization. Thus, we propose dynamic extraction using ChatGPT, which dynamically extracts enhances datasets. We also a method: Dynamically Integrate base (DIM), employing capability of Large Language Model (LLM) for visual understanding. The LLM, such as BLIP-2,...

10.48550/arxiv.2407.12019 preprint EN arXiv (Cornell University) 2024-06-27

Knowledge editing has emerged as an efficient technology for updating the knowledge of large language models (LLMs), attracting increasing attention in recent years. However, there is a lack effective measures to prevent malicious misuse this technology, which could lead harmful edits LLMs. These modifications cause LLMs generate toxic content, misleading users into inappropriate actions. In front risk, we introduce new task, Editing Type Identification (KETI), aimed at identifying different...

10.48550/arxiv.2409.19663 preprint EN arXiv (Cornell University) 2024-09-29

Multimodal Aspect-Based Sentiment Analysis (MABSA) aims to extract aspect terms and their corresponding sentiment polarities from multimodal information, including text images. While traditional supervised learning methods have shown effectiveness in this task, the adaptability of large language models (LLMs) MABSA remains uncertain. Recent advances LLMs, such as Llama2, LLaVA, ChatGPT, demonstrate strong capabilities general tasks, yet performance complex fine-grained scenarios like is...

10.48550/arxiv.2411.15408 preprint EN arXiv (Cornell University) 2024-11-22

Multimodal large language models (MLLMs) have shown remarkable progress in high-level semantic tasks such as visual question answering, image captioning, and emotion recognition. However, despite advancements, there remains a lack of standardized benchmarks for evaluating MLLMs performance multi-object sentiment analysis, key task understanding. To address this gap, we introduce MOSABench, novel evaluation dataset designed specifically analysis. MOSABench includes approximately 1,000 images...

10.48550/arxiv.2412.00060 preprint EN arXiv (Cornell University) 2024-11-25

Paper quality evaluation is of great significance as it helps to select high papers from the massive amount academic papers. However, existing models needs improvement on interaction and aggregation hierarchical structure. These also ignore guiding role title abstract in paper text. To address above two issues, we propose a well-designed modular model (MHM) for evaluation. Firstly, input our most text, no additional information needed. Secondly, fully exploit inherent hierarchy text with...

10.5121/csit.2023.130702 article EN Artificial Intelligence Advances 2023-04-29

Multimodal Entity Linking (MEL) aims at linking ambiguous mentions with multimodal information to entity in Knowledge Graph (KG) such as Wikipedia, which plays a key role many applications. However, existing methods suffer from shortcomings, including modality impurity noise raw image and textual representation, puts obstacles MEL. We formulate neural text matching problem where each (text image) is treated query, the model learns mapping query relevant candidate entities. This paper...

10.48550/arxiv.2312.11816 preprint EN cc-by-nc-nd arXiv (Cornell University) 2023-01-01

Multi-turn dialogue is challenging because semantic information not only contained in the current utterance, but also context. In fact, understanding multiturn a dynamic process. With increase of turn, users' changing. this case, we propose network based on utterance hidden state transfer for task-oriented (USET). our model, first extract previous as comprehension. Then, comprehension passed to next turn. We take prior knowledge understand Finally, put and together utterance. order realize...

10.1109/ijcnn52387.2021.9534450 article EN 2022 International Joint Conference on Neural Networks (IJCNN) 2021-07-18

The mainstream SLU models, such as SDEN, take the joint training way of slot filling and intent detection because their correlation add contextual information to improve model performance by vector. Although these models have proved effective, it also brings challenges for filling. decoder is fed with deep-layer semantic encoding without alignment information, which will affect history utterances attenuated in context vector repeated fusion process, not conducive improvement In order solve...

10.1109/ictai52525.2021.00201 article EN 2021 IEEE 33rd International Conference on Tools with Artificial Intelligence (ICTAI) 2021-11-01

Recent dialogue state tracking (DST) usually treats utterance, system action and ontology equally to estimate the slot types values. In this way, expression of in utterance is restricted. As main way directly express user semantics, should receive further attention its proportion semantic be dynamic according content. It's common recognize different importance information all DST models. However, most them pay little position utterance. fact, semantics are related due human grammatical...

10.1109/ictai52525.2021.00154 article EN 2021 IEEE 33rd International Conference on Tools with Artificial Intelligence (ICTAI) 2021-11-01
Coming Soon ...