NFDI4DS | UHH-SEMS - Publication Details

Xiaohan Lan

ORCID: 0000-0001-5382-6699

Publications

Citations

Views

---

Saved

---

About

Contact & Profiles

A5009016908

Research Areas

Multimodal Machine Learning Applications
Video Analysis and Summarization
Human Pose and Action Recognition
Topic Modeling
Domain Adaptation and Few-Shot Learning
Advanced Graph Neural Networks
Music and Audio Processing
Bioinformatics and Genomic Networks

Tsinghua University
2021-2023

Tsinghua–Berkeley Shenzhen Institute
2022

Beijing Normal University
2018

Cross-lingual Knowledge Graph Alignment via Graph Convolutional Networks

OPENALEX - Publications

Zhichun Wang Qingsong Lv Xiaohan Lan Yu Zhang

Multilingual knowledge graphs (KGs) such as DBpedia and YAGO contain structured of entities in several distinct languages, they are useful resources for cross-lingual AI NLP applications. Cross-lingual KG alignment is the task matching with their counterparts different which an important way to enrich links multilingual KGs. In this paper, we propose a novel approach via graph convolutional networks (GCNs). Given set pre-aligned entities, our trains GCNs embed each language into unified...

10.18653/v1/d18-1032 article EN cc-by Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing 2018-01-01

A Closer Look at Temporal Sentence Grounding in Videos

OPENALEX - Publications

Yitian Yuan Xiaohan Lan Xin Wang Long Chen Zhi Wang and 1 more

Temporal Sentence Grounding in Videos (TSGV), i.e., grounding a natural language sentence which indicates complex human activities long and untrimmed video sequence, has received unprecedented attentions over the last few years. Although each newly proposed method plausibly can achieve better performance than previous ones, current TSGV models still tend to capture moment annotation biases fail take full advantage of multi-modal inputs. Even more incredibly, several extremely simple...

10.1145/3475723.3484247 preprint EN 2021-11-23

A Survey on Temporal Sentence Grounding in Videos

OPENALEX - Publications

Xiaohan Lan Yitian Yuan Xin Wang Zhi Wang Wenwu Zhu

Temporal sentence grounding in videos (TSGV), which aims at localizing one target segment from an untrimmed video with respect to a given query, has drawn increasing attentions the research community over past few years. Different task of temporal action localization, TSGV is more flexible since it can locate complicated activities via natural languages, without restrictions predefined categories. Meanwhile, challenging requires both textual and visual understanding for semantic alignment...

10.1145/3532626 article EN ACM Transactions on Multimedia Computing Communications and Applications 2022-05-20

Curriculum Multi-Negative Augmentation for Debiased Video Grounding

OPENALEX - Publications

Xiaohan Lan Yitian Yuan Hong Chen Xin Wang Zequn Jie and 3 more

Video Grounding (VG) aims to locate the desired segment from a video given sentence query. Recent studies have found that current VG models are prone over-rely groundtruth moment annotation distribution biases in training set. To discourage standard model's behavior of exploiting such temporal and improve model generalization ability, we propose multiple negative augmentations hierarchical way, including cross-video clip-/video-level, self-shuffled with masks. These can effectively diversify...

10.1609/aaai.v37i1.25204 article EN Proceedings of the AAAI Conference on Artificial Intelligence 2023-06-26

A Closer Look at Debiased Temporal Sentence Grounding in Videos: Dataset, Metric, and Approach

OPENALEX - Publications

Xiaohan Lan Yitian Yuan Xin Wang Long Chen Zhi Wang and 2 more

Temporal Sentence Grounding in Videos (TSGV) , which aims to ground a natural language sentence that indicates complex human activities an untrimmed video, has drawn widespread attention over the past few years. However, recent studies have found current benchmark datasets may obvious moment annotation biases, enabling several simple baselines even without training achieve state-of-the-art (SOTA) performance. In this paper, we take closer look at existing evaluation protocols for TSGV, and...

10.1145/3565573 article EN cc-by ACM Transactions on Multimedia Computing Communications and Applications 2022-10-07

Curriculum-Listener: Consistency- and Complementarity-Aware Audio-Enhanced Temporal Sentence Grounding

OPENALEX - Publications

H Chen Xin Wang Xiaohan Lan Hong Chen Xuguang Duan and 2 more

Temporal Sentence Grounding aims to retrieve a video moment given natural language query. Most existing literature merely focuses on visual information in videos without considering the naturally accompanied audio which may contain rich semantics. The few works simply regard it as an additional modality, overlooking that: i) it's non-trivial explore consistency and complementarity between visual; ii) such exploration requires handling different levels of densities noises two modalities. To...

10.1145/3581783.3612504 article EN cc-by 2023-10-26

Mixup-Augmented Temporally Debiased Video Grounding with Content-Location Disentanglement

OPENALEX - Publications

Xin Wang Zihao Wu Hong Chen Xiaohan Lan Wenwu Zhu

Video Grounding (VG), has drawn widespread attention over the past few years, and numerous studies have been devoted to improving performance on various VG benchmarks. Nevertheless, label annotation procedures in produce imbalanced query-moment-label distributions datasets, which severely deteriorate learning model's capability of truly understanding video contents. Existing works debiased either focus adjusting model or conducting video-level augmentation, failing handle temporal bias issue...

10.1145/3581783.3612401 article EN cc-by 2023-10-26

Video Grounding and Its Generalization

OPENALEX - Publications

Xin Wang Xiaohan Lan Wenwu Zhu

Video grounding aims to ground a sentence query in video by determining the start and end timestamps of semantically matched segment. It is fundamental essential vision-and-language problem widely investigated research community, it also has potential values applied industrial domains. This tutorial will give detailed introduction about development evolution this task, point out limitations existing benchmarks, extend such text-based task more general scenarios, especially how guides...

10.1145/3503161.3546971 article EN Proceedings of the 30th ACM International Conference on Multimedia 2022-10-10

A Survey on Temporal Sentence Grounding in Videos

OPENALEX - Publications

Xiaohan Lan Yitian Yuan Xin Wang Zhi Wang Wenwu Zhu

Temporal sentence grounding in videos(TSGV), which aims to localize one target segment from an untrimmed video with respect a given query, has drawn increasing attentions the research community over past few years. Different task of temporal action localization, TSGV is more flexible since it can locate complicated activities via natural languages, without restrictions predefined categories. Meanwhile, challenging requires both textual and visual understanding for semantic alignment between...

10.48550/arxiv.2109.08039 preprint EN other-oa arXiv (Cornell University) 2021-01-01

A Closer Look at Debiased Temporal Sentence Grounding in Videos: Dataset, Metric, and Approach

OPENALEX - Publications

Xiaohan Lan Yitian Yuan Xin Wang Long Chen Zhi Wang and 2 more

Temporal Sentence Grounding in Videos (TSGV), which aims to ground a natural language sentence an untrimmed video, has drawn widespread attention over the past few years. However, recent studies have found that current benchmark datasets may obvious moment annotation biases, enabling several simple baselines even without training achieve SOTA performance. In this paper, we take closer look at existing evaluation protocols, and find both prevailing dataset metrics are devils lead...

10.48550/arxiv.2203.05243 preprint EN other-oa arXiv (Cornell University) 2022-01-01

Coming Soon ...