- Speech Recognition and Synthesis
- Natural Language Processing Techniques
- Advanced Neural Network Applications
- Speech and Audio Processing
- Anomaly Detection Techniques and Applications
- Machine Learning and Data Classification
- Topic Modeling
- 3D Shape Modeling and Analysis
- Domain Adaptation and Few-Shot Learning
- Remote Sensing and LiDAR Applications
- 3D Surveying and Cultural Heritage
- Context-Aware Activity Recognition Systems
Nara Institute of Science and Technology
2022
Semi-supervised learning (SSL) improves model generalization by leveraging massive unlabeled data to augment limited labeled samples. However, currently, popular SSL evaluation protocols are often constrained computer vision (CV) tasks. In addition, previous work typically trains deep neural networks from scratch, which is time-consuming and environmentally unfriendly. To address the above issues, we construct a Unified Benchmark (USB) for classification selecting 15 diverse, challenging,...
This paper introduces SpeeChain, an open-source Pytorch-based toolkit designed to develop the machine speech chain for large-scale use. first release focuses on TTS-to-ASR chain, a core component of that refers TTS data augmentation by unspoken text ASR. To build efficient pipeline we implement easy-to-use multi-GPU batch-level model inference, multi-dataloader batch generation, and on-the-fly selection techniques. In this paper, explain overall procedure difficulties each step. Then,...
The recent Segment Anything Model (SAM) has demonstrated remarkable zero-shot capability and flexible geometric prompting in general image segmentation. However, SAM often struggles when handling various unconventional images, such as aerial, medical, non-RGB images. This paper presents CAT-SAM, a ConditionAl Tuning network that adapts toward target tasks with just few-shot samples. CAT-SAM freezes the entire its mask decoder encoder simultaneously small number of learnable parameters. core...
Consistency regularization has recently been applied to semi-supervised sequence-to-sequence (S2S) automatic speech recognition (ASR).This principle encourages an ASR model output similar predictions for the same input with different perturbations.The existing paradigm of S2S utilizes SpecAugment as data augmentation and requires a static teacher produce pseudo transcripts untranscribed speech.However, this fails take full advantage consistency regularization.First, masking operations may...
Consistency regularization has recently been applied to semi-supervised sequence-to-sequence (S2S) automatic speech recognition (ASR). This principle encourages an ASR model output similar predictions for the same input with different perturbations. The existing paradigm of S2S utilizes SpecAugment as data augmentation and requires a static teacher produce pseudo transcripts untranscribed speech. However, this fails take full advantage consistency regularization. First, masking operations...