- Speech and Audio Processing
- Sustainability and Ecological Systems Analysis
- Spectral Theory in Mathematical Physics
- Multimodal Machine Learning Applications
- Music and Audio Processing
- Housing Market and Economics
- Quantum chaos and dynamical systems
- Speech Recognition and Synthesis
- Human Pose and Action Recognition
- Geometric and Algebraic Topology
- Advanced Mathematical Modeling in Engineering
- semigroups and automata theory
- Water Resources and Sustainability
- Ecology and Vegetation Dynamics Studies
- Homotopy and Cohomology in Algebraic Topology
- Numerical methods in inverse problems
- Domain Adaptation and Few-Shot Learning
- Landslides and related hazards
- Conservation, Biodiversity, and Resource Management
- Fire effects on ecosystems
- Cognitive Science and Mapping
- Ecosystem dynamics and resilience
- Fibroblast Growth Factor Research
- Graph theory and applications
- Subtitles and Audiovisual Media
Nanjing Surveying and Mapping Research Institute (China)
2024
Zhejiang University
2023
University of Pennsylvania
2018-2022
Tsinghua University
2015-2021
Fujian Normal University
2020
Peking University
2013-2015
Chinese Academy of Sciences
2009-2014
South China Botanical Garden
2009-2013
Nanjing University of Posts and Telecommunications
2013
University of Chinese Academy of Sciences
2009-2011
Spatial and temporal patterns of carbon (C) storage in forest ecosystems significantly affect the terrestrial C budget, but such are unclear forests Hainan Province, largest tropical island China. Here, we estimated spatial from 1993–2008 Hainan's by combining our measured data with four consecutive national inventories data. Forest coverage increased 20.7% 1950s to 56.4% 2010s. The average density 163.7 Mg C/ha this study was slightly higher than that China's mainland forests, remarkably...
Multi-media communications facilitate global interaction among people. However, despite researchers exploring cross-lingual translation techniques such as machine and audio speech to overcome language barriers, there is still a shortage of studies on visual speech. This lack research mainly due the absence datasets containing translated text pairs. In this paper, we present \textbf{AVMuST-TED}, first dataset for \textbf{A}udio-\textbf{V}isual \textbf{Mu}ltilingual \textbf{S}peech...
Multi-modal Contrastive Representation learning aims to encode different modalities into a semantically aligned shared space. This paradigm shows remarkable generalization ability on numerous downstream tasks across various modalities. However, the reliance massive high-quality data pairs limits its further development more paper proposes novel training-efficient method for MCR without paired called Connecting Representations (C-MCR). Specifically, given two existing MCRs pre-trained (A, B)...
Rongjie Huang, Huadai Liu, Xize Cheng, Yi Ren, Linjun Li, Zhenhui Ye, Jinzheng He, Lichao Zhang, Jinglin Xiang Yin, Zhou Zhao. Proceedings of the 61st Annual Meeting Association for Computational Linguistics (Volume 1: Long Papers). 2023.
Audio-visual text generation aims to understand multi-modality contents and translate them into texts. Although various transfer learning techniques of have been proposed, they focused on uni-modal analysis (e.g. text-to-text, visual-to-text) lack consideration multi-modal content cross-modal relation. Motivated by the fact that humans can recognize timbre same low-level concepts (e.g., footstep, rainfall, laughing), even in different visual conditions, we aim mitigate domain discrepancies...
Speech Recognition builds a bridge between the multimedia streaming (audio-only, visual-only or audio-visual) and corresponding text transcription. However, when training specific model of new domain, it often gets stuck in lack new-domain utterances, especially labeled visual utterances. To break through this restriction, we attempt to achieve zero-shot modality transfer by maintaining multi-modality alignment phoneme space learned with unlabeled utterances high resource domain during...
We consider the Anderson model with Bernoulli potential on three-dimensional (3D) lattice Z3, and prove localization of eigenfunctions corresponding to eigenvalues near zero, lower boundary spectrum. follow framework Bourgain–Kenig Ding–Smart, our main contribution is a 3D discrete unique continuation, which says that any eigenfunction harmonic operator bounded cannot be too small significant fractional portion all points. Its proof relies geometric arguments about lattice.
The task of spoken video grounding aims to localize moments in videos that are relevant descriptive queries. However, extracting semantic information from speech and modeling the cross-modal correlation pose two critical challenges. Previous studies solve them by representing queries based on matched frames, which require tremendous effort for frame-level labeling. In this work, we investigate weakly-supervised grounding, i.e., learning without expensive temporal annotations. To effectively...
Visual segmentation from language queries has attracted significant research interest. Despite the effectiveness, existing works require expensive labeling and suffer severe degradation when deployed to an unseen domain. In this paper, we investigate a novel task Cross-domain Query-based Segmentation (CQVS), aiming adapt model labeled domain new unlabeled The challenges of CQVS stem three discrepancies: (1) multi-modal content shift, (2) uni-modal feature gap (3) cross-modal relation bias....
Visual temporal-aligned translation aims to transform the visual sequence into natural words, including important applicable tasks such as lipreading and fingerspelling recognition. However, various performance habits of specific words by different speakers or signers can lead ambiguity, which has become a major obstacle development current methods. Considering constraints above, generalization ability system is supposed be further explored through evaluation results on unseen performers. In...