Linjun Li

ORCID: 0000-0003-1795-5535
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Speech and Audio Processing
  • Sustainability and Ecological Systems Analysis
  • Spectral Theory in Mathematical Physics
  • Multimodal Machine Learning Applications
  • Music and Audio Processing
  • Housing Market and Economics
  • Quantum chaos and dynamical systems
  • Speech Recognition and Synthesis
  • Human Pose and Action Recognition
  • Geometric and Algebraic Topology
  • Advanced Mathematical Modeling in Engineering
  • semigroups and automata theory
  • Water Resources and Sustainability
  • Ecology and Vegetation Dynamics Studies
  • Homotopy and Cohomology in Algebraic Topology
  • Numerical methods in inverse problems
  • Domain Adaptation and Few-Shot Learning
  • Landslides and related hazards
  • Conservation, Biodiversity, and Resource Management
  • Fire effects on ecosystems
  • Cognitive Science and Mapping
  • Ecosystem dynamics and resilience
  • Fibroblast Growth Factor Research
  • Graph theory and applications
  • Subtitles and Audiovisual Media

Nanjing Surveying and Mapping Research Institute (China)
2024

Zhejiang University
2023

University of Pennsylvania
2018-2022

Tsinghua University
2015-2021

Fujian Normal University
2020

Peking University
2013-2015

Chinese Academy of Sciences
2009-2014

South China Botanical Garden
2009-2013

Nanjing University of Posts and Telecommunications
2013

University of Chinese Academy of Sciences
2009-2011

Spatial and temporal patterns of carbon (C) storage in forest ecosystems significantly affect the terrestrial C budget, but such are unclear forests Hainan Province, largest tropical island China. Here, we estimated spatial from 1993–2008 Hainan's by combining our measured data with four consecutive national inventories data. Forest coverage increased 20.7% 1950s to 56.4% 2010s. The average density 163.7 Mg C/ha this study was slightly higher than that China's mainland forests, remarkably...

10.1371/journal.pone.0108163 article EN cc-by PLoS ONE 2014-09-17

Multi-media communications facilitate global interaction among people. However, despite researchers exploring cross-lingual translation techniques such as machine and audio speech to overcome language barriers, there is still a shortage of studies on visual speech. This lack research mainly due the absence datasets containing translated text pairs. In this paper, we present \textbf{AVMuST-TED}, first dataset for \textbf{A}udio-\textbf{V}isual \textbf{Mu}ltilingual \textbf{S}peech...

10.48550/arxiv.2303.05309 preprint EN cc-by arXiv (Cornell University) 2023-01-01

Multi-modal Contrastive Representation learning aims to encode different modalities into a semantically aligned shared space. This paradigm shows remarkable generalization ability on numerous downstream tasks across various modalities. However, the reliance massive high-quality data pairs limits its further development more paper proposes novel training-efficient method for MCR without paired called Connecting Representations (C-MCR). Specifically, given two existing MCRs pre-trained (A, B)...

10.48550/arxiv.2305.14381 preprint EN cc-by arXiv (Cornell University) 2023-01-01

Rongjie Huang, Huadai Liu, Xize Cheng, Yi Ren, Linjun Li, Zhenhui Ye, Jinzheng He, Lichao Zhang, Jinglin Xiang Yin, Zhou Zhao. Proceedings of the 61st Annual Meeting Association for Computational Linguistics (Volume 1: Long Papers). 2023.

10.18653/v1/2023.acl-long.479 article EN cc-by 2023-01-01

Audio-visual text generation aims to understand multi-modality contents and translate them into texts. Although various transfer learning techniques of have been proposed, they focused on uni-modal analysis (e.g. text-to-text, visual-to-text) lack consideration multi-modal content cross-modal relation. Motivated by the fact that humans can recognize timbre same low-level concepts (e.g., footstep, rainfall, laughing), even in different visual conditions, we aim mitigate domain discrepancies...

10.18653/v1/2023.acl-long.836 article EN cc-by 2023-01-01

Speech Recognition builds a bridge between the multimedia streaming (audio-only, visual-only or audio-visual) and corresponding text transcription. However, when training specific model of new domain, it often gets stuck in lack new-domain utterances, especially labeled visual utterances. To break through this restriction, we attempt to achieve zero-shot modality transfer by maintaining multi-modality alignment phoneme space learned with unlabeled utterances high resource domain during...

10.18653/v1/2023.acl-long.363 article EN cc-by 2023-01-01

We consider the Anderson model with Bernoulli potential on three-dimensional (3D) lattice Z3, and prove localization of eigenfunctions corresponding to eigenvalues near zero, lower boundary spectrum. follow framework Bourgain–Kenig Ding–Smart, our main contribution is a 3D discrete unique continuation, which says that any eigenfunction harmonic operator bounded cannot be too small significant fractional portion all points. Its proof relies geometric arguments about lattice.

10.1215/00127094-2021-0038 article EN Duke Mathematical Journal 2022-02-01

The task of spoken video grounding aims to localize moments in videos that are relevant descriptive queries. However, extracting semantic information from speech and modeling the cross-modal correlation pose two critical challenges. Previous studies solve them by representing queries based on matched frames, which require tremendous effort for frame-level labeling. In this work, we investigate weakly-supervised grounding, i.e., learning without expensive temporal annotations. To effectively...

10.18653/v1/2023.acl-long.611 article EN cc-by 2023-01-01

10.1007/s00220-022-04366-1 article EN Communications in Mathematical Physics 2022-03-18

Visual segmentation from language queries has attracted significant research interest. Despite the effectiveness, existing works require expensive labeling and suffer severe degradation when deployed to an unseen domain. In this paper, we investigate a novel task Cross-domain Query-based Segmentation (CQVS), aiming adapt model labeled domain new unlabeled The challenges of CQVS stem three discrepancies: (1) multi-modal content shift, (2) uni-modal feature gap (3) cross-modal relation bias....

10.18653/v1/2023.findings-acl.621 article EN cc-by Findings of the Association for Computational Linguistics: ACL 2022 2023-01-01

Visual temporal-aligned translation aims to transform the visual sequence into natural words, including important applicable tasks such as lipreading and fingerspelling recognition. However, various performance habits of specific words by different speakers or signers can lead ambiguity, which has become a major obstacle development current methods. Considering constraints above, generalization ability system is supposed be further explored through evaluation results on unseen performers. In...

10.18653/v1/2023.findings-acl.699 article EN cc-by Findings of the Association for Computational Linguistics: ACL 2022 2023-01-01
Coming Soon ...