- Speech and Audio Processing
- Speech Recognition and Synthesis
- Music and Audio Processing
- Blind Source Separation Techniques
- Phonocardiography and Auscultation Techniques
- Diverse Musicological Studies
- Neural Networks and Applications
- Advanced Adaptive Filtering Techniques
- Spectroscopy and Chemometric Analyses
- Topic Modeling
- Natural Language Processing Techniques
- Handwritten Text Recognition Techniques
- Silicone and Siloxane Chemistry
- Target Tracking and Data Fusion in Sensor Networks
- Vehicle License Plate Recognition
- Advanced Image and Video Retrieval Techniques
- Multimodal Machine Learning Applications
- Speech and dialogue systems
- Vehicle Noise and Vibration Control
- Advanced Optical Imaging Technologies
- Surface Roughness and Optical Measurements
- Advanced Sensor and Energy Harvesting Materials
- Image Enhancement Techniques
- Domain Adaptation and Few-Shot Learning
- Synthesis and properties of polymers
Kyungpook National University
2020-2025
Electronics and Telecommunications Research Institute
2002-2019
Philips (Finland)
2008
Korea Advanced Institute of Science and Technology
1999-2006
In this paper, we proposed new speech features using independent component analysis to human speeches. When is applied signals for efficient encoding the adapted basis functions resemble Gabor-like features. Trained have some redundancies, so select of by reordering method. The are almost ordered from low frequency vector high vector. And compatible with fact that much more information in range. Those can be used automatic recognition systems and method gives better rates than conventional...
Despite the remarkable advances in deep learning technology, achieving satisfactory performance lung sound classification remains a challenge due to scarcity of available data. Moreover, respiratory samples are collected from variety electronic stethoscopes, which could potentially introduce biases into trained models. When significant distribution shift occurs within test dataset or practical scenario, it can substantially decrease performance. To tackle this issue, we cross-domain...
Despite considerable advancements in deep learning, optimizing respiratory sound classification (RSC) models remains challenging. This is partly due to the bias from inconsistent recording processes and imbalanced representation of demographics, which leads poor performance when a model trained with dataset applied real-world use cases. RSC datasets usually include various metadata attributes describing certain aspects data, such as environmental demographic factors. To address issues caused...
A method for directly extracting clean speech features from noisy is proposed. This process based on independent component analysis (ICA) and a new feature technique reducing the computational complexity of frequency-domain ICA. For signals recorded in real environments, this yielded considerable performance improvement.
Abstract Audio classification related to military activities is a challenging task due the high levels of background noise and lack suitable publicly available datasets. To bridge this gap, paper constructs introduces new audio dataset, named MAD, which for training evaluating systems. The proposed MAD dataset extracted from various videos contains 8,075 sound samples 7 classes corresponding approximately 12 hours, exhibiting distinctive characteristics not presented in academic datasets...
We propose a novel feature processing technique which can provide cepstral liftering effect in the log-spectral domain. Cepstral aims at equalization of variance coefficients for distance-based speech recognizer, and as result, provides robustness additive noise speaker variability. However, popular hidden Markov model based framework, has no recognition performance. derive filtering method domain corresponding to liftering. The proposed performs high-pass on decorrelation filter-bank...
We address a low-performance problem of the elderly in automatic speech recognition (ASR) through feature adaptation agnostic to ASR. Most datasets for models consist collected from adult speakers. Consequently, majority commercial systems typically tend perform well on In other words, limited diversity speakers training yields unreliable performance minority (e.g., elderly) due infeasible acquisition data. response, this paper suggests neural network-based voice conversion framework enhance...
This paper introduces a robust human-robot interface (HRI) system using speech recognition and user localization. For indoors under unknown noises acoustic reverberations, blind source separation (BSS) algorithm is implemented by block-wise processing developed digital signal board to guarantee real-time operation. And reverberation-robust sound localization separated signals proposed. Although the BSS method cannot completely preserve room information, proposed overcomes this problem target...
Successful applications of deep learning technologies in the natural language processing domain have improved text-based intent classifications. However, practical spoken dialogue applications, users' articulation styles and background noises cause automatic speech recognition (ASR) errors, these may lead models to misclassify intents. To overcome limited performance classification task system, we propose a novel approach that jointly uses both recognized text obtained by ASR model given...
Deep generative models have emerged as a promising approach in the medical image domain to address data scarcity. However, their use for sequential like respiratory sounds is less explored. In this work, we propose straightforward augment imbalanced sound using an audio diffusion model conditional neural vocoder. We also demonstrate simple yet effective adversarial fine-tuning method align features between synthetic and real samples improve classification performance. Our experimental...
A conventional environment adaptation for robust speech recognition is usually conducted using transform-based techniques. Here, we present a discriminative strategy based on multi-condition-trained model, and propose new method to provide universal application the environment's specific conditions. Experimental results show that system adapted proposed works successfully other conditions as well those of environment.
Abstract— Image deformation caused by an outside force is observed to remain for hours at high gray levels liquid‐crystal displays (LCDs) in the multi‐domain (MD) vertical‐alignment (VA) mode. This so‐called moving‐image‐sticking phenomenon demonstrated a non‐symmetric luminance profile left and right viewing direction MDVA‐mode LCDs which have original symmetric viewing‐angle characteristics. The generation of stable reverse‐tilt domain was assumed be cause this phenomenon, stability under...
In this paper, we propose a rank-weighted reconstruction feature to improve the robustness of feed-forward deep neural network (FFDNN)-based acoustic model. FFDNN-based model, an input is constructed by vectorizing submatrix that created slicing vectors frames within context window. type construction, appropriate window size important because it determines amount trivial or discriminative information, such as redundancy, temporal features. However, ascertained whether single parameter...
In this paper, we propose a deep neural network (DNN) model parameter reduction technique for an efficient acoustic model. One of the most common DNN techniques is to use low-rank matrix approximation. Although it can reduce significant number parameters, there are two problems be considered; one performance degradation, and other appropriate rank selection. To solve these problems, retraining carried out, so-called explained variance used. However, takes additional time, not directly...
Mel-frequency filter bank (MFB) based approaches have the advantage of higher learning speeds compared to using raw spectrum due a smaller number features. However, speech generators with MFB approach require an additional computationally expensive vocoder for training process. The pre- and post-processing needed by is not essential convert human voices, because it possible use only generate different style voices clear pronunciation. In this paper, we introduce vocoder-free end-to-end voice...
This paper presents an investigation of the minimum verification error linear regression (MVELR) method for discriminative linear-transform based adaptation. The MVE criterion is employed to estimate a set transformations which achieve smallest empirical average loss with given adaptation data. MVELR directly minimizes total detection errors, some are results characteristic mismatch in In this study, segment-based phonetic detectors reflecting important processing layer speech event...
Unsupervised learning-based approaches for training speech vector representations (SVR) have recently been widely applied. While pretrained SVR models excel in relatively clean automatic recognition (ASR) tasks, such as those recorded laboratory environments, they are still insufficient practical applications with various types of noise, intonation, and dialects. To cope this problem, we present a novel unsupervised learning method end-to-end ASR models. Our approach involves designing...
Despite the remarkable advances in deep learning technology, achieving satisfactory performance lung sound classification remains a challenge due to scarcity of available data. Moreover, respiratory samples are collected from variety electronic stethoscopes, which could potentially introduce biases into trained models. When significant distribution shift occurs within test dataset or practical scenario, it can substantially decrease performance. To tackle this issue, we cross-domain...
Recent advancements in AI have democratized its deployment as a healthcare assistant. While pretrained models from large-scale visual and audio datasets demonstrably generalized to this task, surprisingly, no studies explored speech models, which, human-originated sounds, intuitively would share closer resemblance lung sounds. This paper explores the efficacy of for respiratory sound classification. We find that there is characterization gap between samples, bridge gap, data augmentation...
Respiratory sound classification (RSC) is challenging due to varied acoustic signatures, primarily influenced by patient demographics and recording environments. To address this issue, we introduce a text-audio multimodal model that utilizes metadata of respiratory sounds, which provides useful complementary information for RSC. Specifically, fine-tune pretrained using free-text descriptions derived from the samples' includes gender age patients, type devices, location on patient's body. Our...
Respiratory sound classification (RSC) is challenging due to varied acoustic signatures, primarily influenced by patient demographics and recording environments. To address this issue, we introduce a text-audio multimodal model that utilizes metadata of respiratory sounds, which provides useful complementary information for RSC. Specifically, fine-tune pretrained using free-text descriptions derived from the samples' includes gender age patients, type devices, location on patient's body. Our...
In this paper, we introduce the Actions and Objects Pathways (AOPath) for out-of-domain generalization in video question answering tasks. AOPath leverages features from a large pretrained model to enhance generalizability without need explicit training on unseen domains. Inspired by human brain, dissociates into action object features, subsequently processes them through separate reasoning pathways. It utilizes novel module which converts domain-agnostic introducing any trainable weights. We...