- Mental Health via Writing
- Natural Language Processing Techniques
- Topic Modeling
- Multimodal Machine Learning Applications
- Cloud Computing and Resource Management
- Semantic Web and Ontologies
- Image Processing and 3D Reconstruction
- Geological Modeling and Analysis
- Advanced Database Systems and Queries
- Simulation and Modeling Applications
- Advanced Graph Neural Networks
Nanjing University
2007-2023
Multimodal Named Entity Recognition (MNER) aims to locate and classify named entities mentioned in a (text, image) pair. However, dominant work independently models the internal matching relations pair of image text, ignoring external between different pairs inside dataset, though such are crucial for alleviating noise MNER task. In this paper, we primarily explore two kinds pairs, i.e., inter-modal intra-modal relations. On basis, propose Relation-enhanced Graph Convolutional Network...
Multimodal Entity Linking (MEL) is a task that aims to link ambiguous mentions within multimodal contexts referential entities in knowledge base. Recent methods for MEL adopt common framework: they first interact and fuse the text image obtain representations of mention entity respectively, then compute similarity between them predict correct entity. However, these still suffer from two limitations: first, as features before matching, cannot fully exploit fine-grained alignment relations...
Multimodal large language models (MLLMs) have attracted increasing attention in the past few years, but they may still generate descriptions that include objects not present corresponding images, a phenomenon known as object hallucination. To eliminate hallucinations, existing methods manually annotate paired responses with and without then employ various alignment algorithms to improve capability between images text. However, only demand considerable computation resources during finetuning...