NFDI4DS | UHH-SEMS - Publication Details

Heli Qi

ORCID: 0000-0001-9512-7140

Publications

Citations

Views

---

Saved

---

About

Contact & Profiles

A5038740801

Research Areas

Speech Recognition and Synthesis
Natural Language Processing Techniques
Advanced Neural Network Applications
Speech and Audio Processing
Anomaly Detection Techniques and Applications
Machine Learning and Data Classification
Topic Modeling
3D Shape Modeling and Analysis
Domain Adaptation and Few-Shot Learning
Remote Sensing and LiDAR Applications
3D Surveying and Cultural Heritage
Context-Aware Activity Recognition Systems

Nara Institute of Science and Technology
2022

USB: A Unified Semi-supervised Learning Benchmark for Classification

OPENALEX - Publications

Yidong Wang Hao Chen Yue Fan Sun Wang Ran Tao and 17 more

Semi-supervised learning (SSL) improves model generalization by leveraging massive unlabeled data to augment limited labeled samples. However, currently, popular SSL evaluation protocols are often constrained computer vision (CV) tasks. In addition, previous work typically trains deep neural networks from scratch, which is time-consuming and environmentally unfriendly. To address the above issues, we construct a Unified Benchmark (USB) for classification selecting 15 diverse, challenging,...

10.48550/arxiv.2208.07204 preprint EN cc-by arXiv (Cornell University) 2022-01-01

SpeeChain: A Speech Toolkit for Large-Scale Machine Speech Chain

OPENALEX - Publications

Heli Qi Sashi Novitasari Andros Tjandra Sakriani Sakti Satoshi Nakamura

This paper introduces SpeeChain, an open-source Pytorch-based toolkit designed to develop the machine speech chain for large-scale use. first release focuses on TTS-to-ASR chain, a core component of that refers TTS data augmentation by unspoken text ASR. To build efficient pipeline we implement easy-to-use multi-GPU batch-level model inference, multi-dataloader batch generation, and on-the-fly selection techniques. In this paper, explain overall procedure difficulties each step. Then,...

10.48550/arxiv.2301.02966 preprint EN other-oa arXiv (Cornell University) 2023-01-01

CAT-SAM: Conditional Tuning Network for Few-Shot Adaptation of Segmentation Anything Model

OPENALEX - Publications

Aoran Xiao Weihao Xuan Heli Qi Yun Xing Ruijie Ren and 2 more

The recent Segment Anything Model (SAM) has demonstrated remarkable zero-shot capability and flexible geometric prompting in general image segmentation. However, SAM often struggles when handling various unconventional images, such as aerial, medical, non-RGB images. This paper presents CAT-SAM, a ConditionAl Tuning network that adapts toward target tasks with just few-shot samples. CAT-SAM freezes the entire its mask decoder encoder simultaneously small number of learnable parameters. core...

10.48550/arxiv.2402.03631 preprint EN arXiv (Cornell University) 2024-02-05

TSG-Seg: Temporal-selective guidance for semi-supervised semantic segmentation of 3D LiDAR point clouds

OPENALEX - Publications

Weihao Xuan Heli Qi Aoran Xiao

10.1016/j.isprsjprs.2024.07.020 article EN ISPRS Journal of Photogrammetry and Remote Sensing 2024-08-08

Improved Consistency Training for Semi-Supervised Sequence-to-Sequence ASR via Speech Chain Reconstruction and Self-Transcribing

OPENALEX - Publications

Heli Qi Sashi Novitasari Sakriani Sakti Satoshi Nakamura

Consistency regularization has recently been applied to semi-supervised sequence-to-sequence (S2S) automatic speech recognition (ASR).This principle encourages an ASR model output similar predictions for the same input with different perturbations.The existing paradigm of S2S utilizes SpecAugment as data augmentation and requires a static teacher produce pseudo transcripts untranscribed speech.However, this fails take full advantage consistency regularization.First, masking operations may...

10.21437/interspeech.2022-11169 article EN Interspeech 2022 2022-09-16

Improved Consistency Training for Semi-Supervised Sequence-to-Sequence ASR via Speech Chain Reconstruction and Self-Transcribing

OPENALEX - Publications

Heli Qi Sashi Novitasari Sakriani Sakti Satoshi Nakamura

Consistency regularization has recently been applied to semi-supervised sequence-to-sequence (S2S) automatic speech recognition (ASR). This principle encourages an ASR model output similar predictions for the same input with different perturbations. The existing paradigm of S2S utilizes SpecAugment as data augmentation and requires a static teacher produce pseudo transcripts untranscribed speech. However, this fails take full advantage consistency regularization. First, masking operations...

10.48550/arxiv.2205.06963 preprint EN other-oa arXiv (Cornell University) 2022-01-01

Coming Soon ...