Xiaohan Lan

ORCID: 0000-0001-5382-6699
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Multimodal Machine Learning Applications
  • Video Analysis and Summarization
  • Human Pose and Action Recognition
  • Topic Modeling
  • Domain Adaptation and Few-Shot Learning
  • Advanced Graph Neural Networks
  • Music and Audio Processing
  • Bioinformatics and Genomic Networks

Tsinghua University
2021-2023

Tsinghua–Berkeley Shenzhen Institute
2022

Beijing Normal University
2018

Multilingual knowledge graphs (KGs) such as DBpedia and YAGO contain structured of entities in several distinct languages, they are useful resources for cross-lingual AI NLP applications. Cross-lingual KG alignment is the task matching with their counterparts different which an important way to enrich links multilingual KGs. In this paper, we propose a novel approach via graph convolutional networks (GCNs). Given set pre-aligned entities, our trains GCNs embed each language into unified...

10.18653/v1/d18-1032 article EN cc-by Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing 2018-01-01

Temporal Sentence Grounding in Videos (TSGV), i.e., grounding a natural language sentence which indicates complex human activities long and untrimmed video sequence, has received unprecedented attentions over the last few years. Although each newly proposed method plausibly can achieve better performance than previous ones, current TSGV models still tend to capture moment annotation biases fail take full advantage of multi-modal inputs. Even more incredibly, several extremely simple...

10.1145/3475723.3484247 preprint EN 2021-11-23

Temporal sentence grounding in videos (TSGV), which aims at localizing one target segment from an untrimmed video with respect to a given query, has drawn increasing attentions the research community over past few years. Different task of temporal action localization, TSGV is more flexible since it can locate complicated activities via natural languages, without restrictions predefined categories. Meanwhile, challenging requires both textual and visual understanding for semantic alignment...

10.1145/3532626 article EN ACM Transactions on Multimedia Computing Communications and Applications 2022-05-20

Video Grounding (VG) aims to locate the desired segment from a video given sentence query. Recent studies have found that current VG models are prone over-rely groundtruth moment annotation distribution biases in training set. To discourage standard model's behavior of exploiting such temporal and improve model generalization ability, we propose multiple negative augmentations hierarchical way, including cross-video clip-/video-level, self-shuffled with masks. These can effectively diversify...

10.1609/aaai.v37i1.25204 article EN Proceedings of the AAAI Conference on Artificial Intelligence 2023-06-26

Temporal Sentence Grounding in Videos (TSGV) , which aims to ground a natural language sentence that indicates complex human activities an untrimmed video, has drawn widespread attention over the past few years. However, recent studies have found current benchmark datasets may obvious moment annotation biases, enabling several simple baselines even without training achieve state-of-the-art (SOTA) performance. In this paper, we take closer look at existing evaluation protocols for TSGV, and...

10.1145/3565573 article EN cc-by ACM Transactions on Multimedia Computing Communications and Applications 2022-10-07

Temporal Sentence Grounding aims to retrieve a video moment given natural language query. Most existing literature merely focuses on visual information in videos without considering the naturally accompanied audio which may contain rich semantics. The few works simply regard it as an additional modality, overlooking that: i) it's non-trivial explore consistency and complementarity between visual; ii) such exploration requires handling different levels of densities noises two modalities. To...

10.1145/3581783.3612504 article EN cc-by 2023-10-26

Video Grounding (VG), has drawn widespread attention over the past few years, and numerous studies have been devoted to improving performance on various VG benchmarks. Nevertheless, label annotation procedures in produce imbalanced query-moment-label distributions datasets, which severely deteriorate learning model's capability of truly understanding video contents. Existing works debiased either focus adjusting model or conducting video-level augmentation, failing handle temporal bias issue...

10.1145/3581783.3612401 article EN cc-by 2023-10-26

Video grounding aims to ground a sentence query in video by determining the start and end timestamps of semantically matched segment. It is fundamental essential vision-and-language problem widely investigated research community, it also has potential values applied industrial domains. This tutorial will give detailed introduction about development evolution this task, point out limitations existing benchmarks, extend such text-based task more general scenarios, especially how guides...

10.1145/3503161.3546971 article EN Proceedings of the 30th ACM International Conference on Multimedia 2022-10-10

Temporal sentence grounding in videos(TSGV), which aims to localize one target segment from an untrimmed video with respect a given query, has drawn increasing attentions the research community over past few years. Different task of temporal action localization, TSGV is more flexible since it can locate complicated activities via natural languages, without restrictions predefined categories. Meanwhile, challenging requires both textual and visual understanding for semantic alignment between...

10.48550/arxiv.2109.08039 preprint EN other-oa arXiv (Cornell University) 2021-01-01

Temporal Sentence Grounding in Videos (TSGV), which aims to ground a natural language sentence an untrimmed video, has drawn widespread attention over the past few years. However, recent studies have found that current benchmark datasets may obvious moment annotation biases, enabling several simple baselines even without training achieve SOTA performance. In this paper, we take closer look at existing evaluation protocols, and find both prevailing dataset metrics are devils lead...

10.48550/arxiv.2203.05243 preprint EN other-oa arXiv (Cornell University) 2022-01-01
Coming Soon ...