NFDI4DS | UHH-SEMS - Publication Details

Multi-Modal Knowledge Graph Construction and Application: A Survey

OPENALEX - Publications

Xiangru Zhu Zhixu Li Xiaodan Wang Xueyao Jiang Penglei Sun and 3 more

Recent years have witnessed the resurgence of knowledge engineering which is featured by fast growth graphs. However, most existing graphs are represented with pure symbols, hurts machine's capability to understand real world. The multi-modalization an inevitable key step towards realization human-level machine intelligence. results this endeavor Multi-modal Knowledge Graphs (MMKGs). In survey on MMKGs constructed texts and images, we first give definitions MMKGs, followed preliminaries...

10.1109/tkde.2022.3224228 article EN IEEE Transactions on Knowledge and Data Engineering 2022-11-24

Shifting More Attention to Visual Backbone: Query-modulated Refinement Networks for End-to-End Visual Grounding

OPENALEX - Publications

Jiabo Ye Junfeng Tian Ming Yan Xiaoshan Yang Xuwu Wang and 3 more

Visual grounding focuses on establishing fine-grained alignment between vision and natural language, which has essential applications in multimodal reasoning systems. Existing methods use pre-trained query-agnostic visual backbones to extract feature maps independently without considering the query information. We argue that features extracted from really needed for are inconsistent. One reason is there differences pre-training tasks grounding. Moreover, since query-agnostic, it difficult...

10.1109/cvpr52688.2022.01506 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2022-06-01

CAT-MNER: Multimodal Named Entity Recognition with Knowledge-Refined Cross-Modal Attention

OPENALEX - Publications

Xuwu Wang Jiabo Ye Zhixu Li Junfeng Tian Yong Jiang and 3 more

Multimodal named entity recognition (MNER) aims to detect and classify entities in multimodal scenarios. It requires bridging the gap between natural language visual context, which presents two-fold challenges: cross-modal alignment is diversified, interaction sometimes implicit. Existing MNER methods are vulnerable some implicit interactions prone overlook involved significant features. To tackle this problem, we novelly propose refine attention by identifying highlighting task-salient The...

10.1109/icme52920.2022.9859972 article EN 2022 IEEE International Conference on Multimedia and Expo (ICME) 2022-07-18

WikiDiverse: A Multimodal Entity Linking Dataset with Diversified Contextual Topics and Entity Types

OPENALEX - Publications

Xuwu Wang Junfeng Tian Min Gui Zhixu Li Rui Wang and 3 more

Xuwu Wang, Junfeng Tian, Min Gui, Zhixu Li, Rui Ming Yan, Lihan Chen, Yanghua Xiao. Proceedings of the 60th Annual Meeting Association for Computational Linguistics (Volume 1: Long Papers). 2022.

10.18653/v1/2022.acl-long.328 article EN cc-by Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) 2022-01-01

Each Rank Could be an Expert: Single-Ranked Mixture of Experts LoRA for Multi-Task Learning

OPENALEX - Publications

Ziyu Zhao Yixiao Zhou Didi Zhu Tao Shen Xuwu Wang and 5 more

Low-Rank Adaptation (LoRA) is widely used for adapting large language models (LLMs) to specific domains due its efficiency and modularity. Meanwhile, vanilla LoRA struggles with task conflicts in multi-task scenarios. Recent works adopt Mixture of Experts (MoE) by treating each module as an expert, thereby mitigating interference through multiple specialized modules. While effective, these methods often isolate knowledge within individual tasks, failing fully exploit the shared across...

10.48550/arxiv.2501.15103 preprint EN arXiv (Cornell University) 2025-01-25

AGREE: Aligning Cross-Modal Entities for Image-Text Retrieval Upon Vision-Language Pre-trained Models

OPENALEX - Publications

Xiaodan Wang Lei Li Zhixu Li Xuwu Wang Xiangru Zhu and 3 more

Image-text retrieval is a challenging cross-modal task that arouses much attention. While the traditional methods cannot break down barriers between different modalities, Vision-Language Pre-trained (VLP) models greatly improve image-text performance based on massive pairs. Nonetheless, VLP-based are still prone to produce results be aligned with entities. Recent efforts try fix this problem at pre-training stage, which not only expensive but also unpractical due unavailable of full...

10.1145/3539597.3570481 article EN 2023-02-22

Bayes EMbedding (BEM)

OPENALEX - Publications

Yuting Ye Xuwu Wang Jiangchao Yao Kunyang Jia Jingren Zhou and 2 more

Low-dimensional embeddings of knowledge graphs and behavior have proved remarkably powerful in varieties tasks, from predicting unobserved edges between entities to content recommendation. The two types can contain distinct complementary information for the same entities/nodes. However, previous works focus either on graph embedding or while few consider both a unified way. Here we present BEM, Bayesian framework that incorporates graphs. To be more specific, BEM takes as prior pre-trained...

10.1145/3357384.3358014 article EN 2019-11-03

UniQRNet: Unifying Referring Expression Grounding and Segmentation with QRNet

OPENALEX - Publications

Jiabo Ye Junfeng Tian Ming Yan Haiyang Xu Qinghao Ye and 6 more

Referring expression comprehension aims to align natural language queries with visual scenes, which requires establishing fine-grained correspondence between vision and language. This has important applications in multi-modal reasoning systems. Existing methods typically use text-agnostic backbones extract features independently without considering the specific text input. However, we argue that extracted can be inconsistent referring expression, hurts understanding. To address this, first...

10.1145/3660638 article EN ACM Transactions on Multimedia Computing Communications and Applications 2024-04-25

Multi-task entity linking with supervision from a taxonomy

OPENALEX - Publications

Xuwu Wang Lihan Chen Wei Zhu Yuan Ni Guotong Xie and 2 more

10.1007/s10115-023-01905-7 article EN Knowledge and Information Systems 2023-05-29

InfiAgent-DABench: Evaluating Agents on Data Analysis Tasks

OPENALEX - Publications

Xueyu Hu Ziyu Zhao Shuang Wei Ziwei Chai Guoyin Wang and 10 more

In this paper, we introduce "InfiAgent-DABench", the first benchmark specifically designed to evaluate LLM-based agents in data analysis tasks. This contains DAEval, a dataset consisting of 311 questions derived from 55 CSV files, and an agent framework LLMs as agents. We adopt format-prompting technique, ensuring be closed-form that can automatically evaluated. Our extensive benchmarking 23 state-of-the-art uncovers current challenges encountered addition, have developed DAAgent,...

10.48550/arxiv.2401.05507 preprint EN other-oa arXiv (Cornell University) 2024-01-01

Flickr30K-CFQ: A Compact and Fragmented Query Dataset for Text-image Retrieval

OPENALEX - Publications

Haoyu Liu Yaoxian Song Xuwu Wang Xiangru Zhu Zhixu Li and 2 more

With the explosive growth of multi-modal information on Internet, unimodal search cannot satisfy requirement Internet applications. Text-image retrieval research is needed to realize high-quality and efficient between different modalities. Existing text-image mostly based general vision-language datasets (e.g. MS-COCO, Flickr30K), in which query utterance rigid unnatural (i.e. verbosity formality). To overcome shortcoming, we construct a new Compact Fragmented Query challenge dataset (named...

10.48550/arxiv.2403.13317 preprint EN arXiv (Cornell University) 2024-03-20

Merging LoRAs like Playing LEGO: Pushing the Modularity of LoRA to Extremes Through Rank-Wise Clustering

OPENALEX - Publications

Ziyu Zhao Tao Shen Didi Zhu Zexi Li Jing Su and 3 more

Low-Rank Adaptation (LoRA) has emerged as a popular technique for fine-tuning large language models (LLMs) to various domains due its modular design and widespread availability on platforms like Huggingface. This modularity sparked interest in combining multiple LoRAs enhance LLM capabilities. However, existing methods LoRA composition primarily focus task-specific adaptations that require additional training, current model merging techniques often fail fully leverage LoRA's nature, leading...

10.48550/arxiv.2409.16167 preprint EN arXiv (Cornell University) 2024-09-24

CONSTRUCTURE: Benchmarking CONcept STRUCTUre REasoning for Multimodal Large Language Models

OPENALEX - Publications

Zhiwei Zha Xiangru Zhu Yuanyi Xu Chenghua Huang Jingping Liu and 5 more

10.18653/v1/2024.findings-emnlp.285 article EN 2024-01-01

BabelBench: An Omni Benchmark for Code-Driven Analysis of Multimodal and Multistructured Data

OPENALEX - Publications

Xuwu Wang Qiwen Cui Yunzhe Tao Yiran Wang Ziwei Chai and 14 more

Large language models (LLMs) have become increasingly pivotal across various domains, especially in handling complex data types. This includes structured processing, as exemplified by ChartQA and ChatGPT-Ada, multimodal unstructured processing seen Visual Question Answering (VQA). These areas attracted significant attention from both industry academia. Despite this, there remains a lack of unified evaluation methodologies for these diverse scenarios. In response, we introduce BabelBench, an...

10.48550/arxiv.2410.00773 preprint EN arXiv (Cornell University) 2024-10-01

Multi-Modal Knowledge Graph Construction and Application: A Survey

OPENALEX - Publications

Xiangru Zhu Zhixu Li Xiaodan Wang Xueyao Jiang Penglei Sun and 3 more

Recent years have witnessed the resurgence of knowledge engineering which is featured by fast growth graphs. However, most existing graphs are represented with pure symbols, hurts machine's capability to understand real world. The multi-modalization an inevitable key step towards realization human-level machine intelligence. results this endeavor Multi-modal Knowledge Graphs (MMKGs). In survey on MMKGs constructed texts and images, we first give definitions MMKGs, followed preliminaries...

10.48550/arxiv.2202.05786 preprint EN other-oa arXiv (Cornell University) 2022-01-01

Bayes EMbedding (BEM): Refining Representation by Integrating Knowledge Graphs and Behavior-specific Networks

OPENALEX - Publications

Yuting Ye Xuwu Wang Jiangchao Yao Kunyang Jia Jingren Zhou and 2 more

Low-dimensional embeddings of knowledge graphs and behavior have proved remarkably powerful in varieties tasks, from predicting unobserved edges between entities to content recommendation. The two types can contain distinct complementary information for the same entities/nodes. However, previous works focus either on graph embedding or while few consider both a unified way. Here we present BEM , Bayesian framework that incorporates graphs. To be more specific, takes as prior pre-trained...

10.48550/arxiv.1908.10611 preprint EN other-oa arXiv (Cornell University) 2019-01-01

Shifting More Attention to Visual Backbone: Query-modulated Refinement Networks for End-to-End Visual Grounding

OPENALEX - Publications

Jiabo Ye Junfeng Tian Ming Yan Xiaoshan Yang Xuwu Wang and 3 more

Visual grounding focuses on establishing fine-grained alignment between vision and natural language, which has essential applications in multimodal reasoning systems. Existing methods use pre-trained query-agnostic visual backbones to extract feature maps independently without considering the query information. We argue that features extracted from really needed for are inconsistent. One reason is there differences pre-training tasks grounding. Moreover, since query-agnostic, it difficult...

10.48550/arxiv.2203.15442 preprint EN other-oa arXiv (Cornell University) 2022-01-01