NFDI4DS | UHH-SEMS - Publication Details

How to Bridge the Gap between Modalities: Survey on Multimodal Large Language Model

OPENALEX - Publications

Shezheng Song Xiaopeng Li Shasha Li Shan Zhao Jie Yu and 4 more

10.1109/tkde.2025.3527978 article EN IEEE Transactions on Knowledge and Data Engineering 2025-01-01

Hierarchical Label-Enhanced Contrastive Learning for Chinese NER

OPENALEX - Publications

Chengyu Wang Shan Zhao Tianwei Yan Shezheng Song Wentao Ma and 2 more

Recently, character-word lattice structures have achieved promising results for Chinese named entity recognition (NER), reducing word segmentation errors and increasing boundary information character sequences. However, constructing the structure is complex time-consuming, thus these lattice-based models usually suffer from low inference speed. Moreover, quality of lexicon affects accuracy NER model. Since noise words can potentially confuse NER, limited coverage cause to degenerate into...

10.1109/tnnls.2025.3528416 article EN IEEE Transactions on Neural Networks and Learning Systems 2025-01-01

PMET: Precise Model Editing in a Transformer

OPENALEX - Publications

Xiaopeng Li Shasha Li Shezheng Song Jing Yang Jun Ma and 1 more

Model editing techniques modify a minor proportion of knowledge in Large Language Models (LLMs) at relatively low cost, which have demonstrated notable success. Existing methods assume Transformer Layer (TL) hidden states are values key-value memories the Feed-Forward Network (FFN). They usually optimize TL to memorize target and use it update weights FFN LLMs. However, information flow comes from three parts: Multi-Head Self-Attention (MHSA), FFN, residual connections. neglect fact that...

10.1609/aaai.v38i17.29818 article EN Proceedings of the AAAI Conference on Artificial Intelligence 2024-03-24

How to Complete Domain Tuning while Keeping General Ability in LLM: Adaptive Layer-wise and Element-wise Regularization

OPENALEX - Publications

Shezheng Song Hao Xu Jun Ma Shasha Li Long Peng and 3 more

Large Language Models (LLMs) exhibit strong general-purpose language capabilities. However, fine-tuning these models on domain-specific tasks often leads to catastrophic forgetting, where the model overwrites or loses essential knowledge acquired during pretraining. This phenomenon significantly limits broader applicability of LLMs. To address this challenge, we propose a novel approach compute element-wise importance parameters crucial for preserving general fine-tuning. Our method utilizes...

10.48550/arxiv.2501.13669 preprint EN arXiv (Cornell University) 2025-01-23

FRCL-MNER: A Finer Grained Rank-Based Contrastive Learning Framework for Multimodal NER

OPENALEX - Publications

Tianwei Yan Shan Zhao Wentao Ma Shezheng Song Chengyu Wang and 4 more

Multimodal named entity recognition (MNER) is an emerging field that aims to automatically detect entities and classify their categories, utilizing input text auxiliary resources such as images. While previous studies have leveraged object detectors preprocess images fuse textual semantics with corresponding image features, these methods often overlook the potential finer grained information within each modality may exacerbate error propagation due predetection. To address issues, we propose...

10.1109/tnnls.2025.3528567 article EN IEEE Transactions on Neural Networks and Learning Systems 2025-01-01

Enhancing Text-Based Person Search with Re-Ranking and Advanced Cross-Modal Alignment Techniques

OPENALEX - Publications

Yu Bai Wentao Ma Shan Zhao Tianwei Yan Shezheng Song and 2 more

10.2139/ssrn.5211784 preprint EN 2025-01-01

SWEA: Updating Factual Knowledge in Large Language Models via Subject Word Embedding Altering

OPENALEX - Publications

Xiaopeng Li Shasha Li Shezheng Song Huijun Liu Bin Ji and 6 more

The general capabilities of large language models (LLMs) make them the infrastructure for various AI applications, but updating their inner knowledge requires significant resources. Recent model editing is a promising technique efficiently small amount LLMs and has attracted much attention. In particular, local methods, which directly update parameters, are proven suitable amounts knowledge. Local methods weights by computing least squares closed-form solutions identify edited vector-level...

10.1609/aaai.v39i23.34628 article EN Proceedings of the AAAI Conference on Artificial Intelligence 2025-04-11

A Dual-Way Enhanced Framework from Text Matching Point of View for Multimodal Entity Linking

OPENALEX - Publications

Shezheng Song Shan Zhao Chengyu Wang Tianwei Yan Shasha Li and 2 more

Multimodal Entity Linking (MEL) aims at linking ambiguous mentions with multimodal information to entity in Knowledge Graph (KG) such as Wikipedia, which plays a key role many applications. However, existing methods suffer from shortcomings, including modality impurity noise raw image and textual representation, puts obstacles MEL. We formulate neural text matching problem where each (text image) is treated query, the model learns mapping query relevant candidate entities. This paper...

10.1609/aaai.v38i17.29867 article EN Proceedings of the AAAI Conference on Artificial Intelligence 2024-03-24

PMET: Precise Model Editing in a Transformer

OPENALEX - Publications

Xiaopeng Li Shasha Li Shezheng Song Jing Yang Jun Ma and 1 more

Model editing techniques modify a minor proportion of knowledge in Large Language Models (LLMs) at relatively low cost, which have demonstrated notable success. Existing methods assume Transformer Layer (TL) hidden states are values key-value memories the Feed-Forward Network (FFN). They usually optimize TL to memorize target and use it update weights FFN LLMs. However, information flow comes from three parts: Multi-Head Self-Attention (MHSA), FFN, residual connections. neglect fact that...

10.48550/arxiv.2308.08742 preprint EN other-oa arXiv (Cornell University) 2023-01-01

How to Bridge the Gap between Modalities: A Comprehensive Survey on Multimodal Large Language Model

OPENALEX - Publications

Shezheng Song Xiaopeng Li Shasha Li

This review paper explores Multimodal Large Language Models (MLLMs), which integrate (LLMs) like GPT-4 to handle multimodal data such as text and vision. MLLMs demonstrate capabilities generating image narratives answering image-based questions, bridging the gap towards real-world human-computer interactions hinting at a potential pathway artificial general intelligence. However, still face challenges in processing semantic multimodality, may lead erroneous generation, posing risks society....

10.48550/arxiv.2311.07594 preprint EN cc-by-nc-nd arXiv (Cornell University) 2023-01-01

SWEA: Changing Factual Knowledge in Large Language Models via Subject Word Embedding Altering

OPENALEX - Publications

Xiaopeng Li Shasha Li Bin Ji Shezheng Song Xi Wang and 5 more

Model editing has recently gained widespread attention. Current model methods primarily involve modifying parameters or adding additional modules to the existing model. However, former causes irreversible damage LLMs, while latter incurs inference overhead and fuzzy vector matching is not always reliable. To address these issues, we propose an expandable Subject Word Embedding Altering (SWEA) framework, which modifies representation of subjects achieve goal knowledge during stage. SWEA uses...

10.48550/arxiv.2401.17809 preprint EN arXiv (Cornell University) 2024-01-31

DWE+: Dual-Way Matching Enhanced Framework for Multimodal Entity Linking

OPENALEX - Publications

Shezheng Song Shasha Li Shan Zhao Xiaopeng Li Chengyu Wang and 5 more

Multimodal entity linking (MEL) aims to utilize multimodal information (usually textual and visual information) link ambiguous mentions unambiguous entities in knowledge base. Current methods facing main issues: (1)treating the entire image as input may contain redundant information. (2)the insufficient utilization of entity-related information, such attributes images. (3)semantic inconsistency between base its representation. To this end, we propose DWE+ for linking. could capture finer...

10.48550/arxiv.2404.04818 preprint EN arXiv (Cornell University) 2024-04-07

PTA: Enhancing Multimodal Sentiment Analysis through Pipelined Prediction and Translation-based Alignment

OPENALEX - Publications

Shezheng Song Shasha Li Shan Zhao Chengyu Wang Xiaopeng Li and 6 more

Multimodal aspect-based sentiment analysis (MABSA) aims to understand opinions in a granular manner, advancing human-computer interaction and other fields. Traditionally, MABSA methods use joint prediction approach identify aspects sentiments simultaneously. However, we argue that models are not always superior. Our shows struggle align relevant text tokens with image patches, leading misalignment ineffective utilization. In contrast, pipeline framework first identifies through MATE...

10.48550/arxiv.2406.00017 preprint EN arXiv (Cornell University) 2024-05-22

DIM: Dynamic Integration of Multimodal Entity Linking with Large Language Model

OPENALEX - Publications

Shezheng Song Shasha Li Jie Yu Shan Zhao Xiao Peng Li and 4 more

Our study delves into Multimodal Entity Linking, aligning the mention in multimodal information with entities knowledge base. Existing methods are still facing challenges like ambiguous entity representations and limited image utilization. Thus, we propose dynamic extraction using ChatGPT, which dynamically extracts enhances datasets. We also a method: Dynamically Integrate base (DIM), employing capability of Large Language Model (LLM) for visual understanding. The LLM, such as BLIP-2,...

10.48550/arxiv.2407.12019 preprint EN arXiv (Cornell University) 2024-06-27

Identifying Knowledge Editing Types in Large Language Models

OPENALEX - Publications

Xiao Peng Li Shangwen Wang Shezheng Song Bin Ji Huijun Liu and 3 more

Knowledge editing has emerged as an efficient technology for updating the knowledge of large language models (LLMs), attracting increasing attention in recent years. However, there is a lack effective measures to prevent malicious misuse this technology, which could lead harmful edits LLMs. These modifications cause LLMs generate toxic content, misleading users into inappropriate actions. In front risk, we introduce new task, Editing Type Identification (KETI), aimed at identifying different...

10.48550/arxiv.2409.19663 preprint EN arXiv (Cornell University) 2024-09-29

Exploring Large Language Models for Multimodal Sentiment Analysis: Challenges, Benchmarks, and Future Directions

OPENALEX - Publications

Shezheng Song

Multimodal Aspect-Based Sentiment Analysis (MABSA) aims to extract aspect terms and their corresponding sentiment polarities from multimodal information, including text images. While traditional supervised learning methods have shown effectiveness in this task, the adaptability of large language models (LLMs) MABSA remains uncertain. Recent advances LLMs, such as Llama2, LLaVA, ChatGPT, demonstrate strong capabilities general tasks, yet performance complex fine-grained scenarios like is...

10.48550/arxiv.2411.15408 preprint EN arXiv (Cornell University) 2024-11-22

MOSABench: Multi-Object Sentiment Analysis Benchmark for Evaluating Multimodal Large Language Models Understanding of Complex Image

OPENALEX - Publications

Shezheng Song Chao He Shasha Li Shan Zhao Chengyu Wang and 6 more

Multimodal large language models (MLLMs) have shown remarkable progress in high-level semantic tasks such as visual question answering, image captioning, and emotion recognition. However, despite advancements, there remains a lack of standardized benchmarks for evaluating MLLMs performance multi-object sentiment analysis, key task understanding. To address this gap, we introduce MOSABench, novel evaluation dataset designed specifically analysis. MOSABench includes approximately 1,000 images...

10.48550/arxiv.2412.00060 preprint EN arXiv (Cornell University) 2024-11-25

Whether Current Large Language Models is Suitable for Multimodal Aspect-based Sentiment Analysis?

OPENALEX - Publications

Shezheng Song Shan Zhao

10.1145/3712623.3712644 article EN 2024-12-20

A Modular Hierarchical Model for Paper Quality Evaluation

OPENALEX - Publications

Xi Deng Shasha Li Jie Yu Jun Ma Bin Ji and 3 more

Paper quality evaluation is of great significance as it helps to select high papers from the massive amount academic papers. However, existing models needs improvement on interaction and aggregation hierarchical structure. These also ignore guiding role title abstract in paper text. To address above two issues, we propose a well-designed modular model (MHM) for evaluation. Firstly, input our most text, no additional information needed. Secondly, fully exploit inherent hierarchy text with...

10.5121/csit.2023.130702 article EN Artificial Intelligence Advances 2023-04-29

A Dual-way Enhanced Framework from Text Matching Point of View for Multimodal Entity Linking

OPENALEX - Publications

Shezheng Song Shan Zhao Chengyu Wang Tianwei Yan Shasha Li and 2 more

Multimodal Entity Linking (MEL) aims at linking ambiguous mentions with multimodal information to entity in Knowledge Graph (KG) such as Wikipedia, which plays a key role many applications. However, existing methods suffer from shortcomings, including modality impurity noise raw image and textual representation, puts obstacles MEL. We formulate neural text matching problem where each (text image) is treated query, the model learns mapping query relevant candidate entities. This paper...

10.48550/arxiv.2312.11816 preprint EN cc-by-nc-nd arXiv (Cornell University) 2023-01-01

USET : A network based on Utterance hidden State transfEr for Task-oriented dialogue

OPENALEX - Publications

Shezheng Song Changjian Wang Dongsong Zhang Yuxing Peng Yuan Yuan and 1 more

Multi-turn dialogue is challenging because semantic information not only contained in the current utterance, but also context. In fact, understanding multiturn a dynamic process. With increase of turn, users' changing. this case, we propose network based on utterance hidden state transfer for task-oriented (USET). our model, first extract previous as comprehension. Then, comprehension passed to next turn. We take prior knowledge understand Finally, put and together utterance. order realize...

10.1109/ijcnn52387.2021.9534450 article EN 2022 International Joint Conference on Neural Networks (IJCNN) 2021-07-18

SEED: A Cross-Layer Semantic Enhanced SLU Model With Role Context Differentiated Fusion

OPENALEX - Publications

Changjian Wang Dongsong Zhang Shezheng Song Zhen Huang Yuxing Peng

The mainstream SLU models, such as SDEN, take the joint training way of slot filling and intent detection because their correlation add contextual information to improve model performance by vector. Although these models have proved effective, it also brings challenges for filling. decoder is fed with deep-layer semantic encoding without alignment information, which will affect history utterances attenuated in context vector repeated fusion process, not conducive improvement In order solve...

10.1109/ictai52525.2021.00201 article EN 2021 IEEE 33rd International Conference on Tools with Artificial Intelligence (ICTAI) 2021-11-01

T-Mask: An Active and Accurate Dialogue State Tracking with Token Mask Prediction

OPENALEX - Publications

Shezheng Song Changjian Wang Dongsong Zhang Zhen Huang Yuxing Peng

Recent dialogue state tracking (DST) usually treats utterance, system action and ontology equally to estimate the slot types values. In this way, expression of in utterance is restricted. As main way directly express user semantics, should receive further attention its proportion semantic be dynamic according content. It's common recognize different importance information all DST models. However, most them pay little position utterance. fact, semantics are related due human grammatical...

10.1109/ictai52525.2021.00154 article EN 2021 IEEE 33rd International Conference on Tools with Artificial Intelligence (ICTAI) 2021-11-01