NFDI4DS | UHH-SEMS - Publication Details

Speech feature extraction using independent component analysis

OPENALEX - Publications

Jong‐Hwan Lee Ho‐Young Jung Te-Won Lee Soo Young Lee

In this paper, we proposed new speech features using independent component analysis to human speeches. When is applied signals for efficient encoding the adapted basis functions resemble Gabor-like features. Trained have some redundancies, so select of by reordering method. The are almost ordered from low frequency vector high vector. And compatible with fact that much more information in range. Those can be used automatic recognition systems and method gives better rates than conventional...

10.1109/icassp.2000.862023 article EN 2002-11-07

Stethoscope-Guided Supervised Contrastive Learning for Cross-Domain Adaptation on Respiratory Sound Classification

OPENALEX - Publications

June-Woo Kim Sangmin Bae Won-Yang Cho Byungjo Lee Ho‐Young Jung

Despite the remarkable advances in deep learning technology, achieving satisfactory performance lung sound classification remains a challenge due to scarcity of available data. Moreover, respiratory samples are collected from variety electronic stethoscopes, which could potentially introduce biases into trained models. When significant distribution shift occurs within test dataset or practical scenario, it can substantially decrease performance. To tackle this issue, we cross-domain...

10.1109/icassp48485.2024.10447734 article EN ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2024-03-18

Adaptive Metadata-Guided Supervised Contrastive Learning for Domain Adaptation on Respiratory Sound Classification

OPENALEX - Publications

June-Woo Kim Miika Toikkanen Amin Jalali Min‐Seok Kim Hye-ji Han and 4 more

Despite considerable advancements in deep learning, optimizing respiratory sound classification (RSC) models remains challenging. This is partly due to the bias from inconsistent recording processes and imbalanced representation of demographics, which leads poor performance when a model trained with dataset applied real-world use cases. RSC datasets usually include various metadata attributes describing certain aspects data, such as environmental demographic factors. To address issues caused...

10.1109/jbhi.2025.3545159 article EN IEEE Journal of Biomedical and Health Informatics 2025-01-01

Subband-based blind signal separation for noisy speech recognition

OPENALEX - Publications

Hyung‐Min Park Ho‐Young Jung Te-Won Lee Soo-Young Lee

A method for directly extracting clean speech features from noisy is proposed. This process based on independent component analysis (ICA) and a new feature technique reducing the computational complexity of frequency-domain ICA. For signals recorded in real environments, this yielded considerable performance improvement.

10.1049/el:19991358 article EN Electronics Letters 1999-11-11

OPENALEX - Publications

Jong‐Hwan Lee Te-Won Lee Ho‐Young Jung Soo-Young Lee

10.1023/a:1015777200976 article EN Neural Processing Letters 2002-01-01

A Military Audio Dataset for Situational Awareness and Surveillance

OPENALEX - Publications

June-Woo Kim Chihyeon Yoon Ho‐Young Jung

Abstract Audio classification related to military activities is a challenging task due the high levels of background noise and lack suitable publicly available datasets. To bridge this gap, paper constructs introduces new audio dataset, named MAD, which for training evaluating systems. The proposed MAD dataset extracted from various videos contains 8,075 sound samples 7 classes corresponding approximately 12 hours, exhibiting distinctive characteristics not presented in academic datasets...

10.1038/s41597-024-03511-w article EN cc-by Scientific Data 2024-06-22

Filtering of Filter-Bank Energies for Robust Speech Recognition

OPENALEX - Publications

Ho‐Young Jung

We propose a novel feature processing technique which can provide cepstral liftering effect in the log-spectral domain. Cepstral aims at equalization of variance coefficients for distance-based speech recognizer, and as result, provides robustness additive noise speaker variability. However, popular hidden Markov model based framework, has no recognition performance. derive filtering method domain corresponding to liftering. The proposed performs high-pass on decorrelation filter-bank...

10.4218/etrij.04.0203.0033 article EN ETRI Journal 2004-06-15

Linguistic-Coupled Age-to-Age Voice Translation to Improve Speech Recognition Performance in Real Environments

OPENALEX - Publications

June-Woo Kim Hyekyung Yoon Ho‐Young Jung

We address a low-performance problem of the elderly in automatic speech recognition (ASR) through feature adaptation agnostic to ASR. Most datasets for models consist collected from adult speakers. Consequently, majority commercial systems typically tend perform well on In other words, limited diversity speakers training yields unreliable performance minority (e.g., elderly) due infeasible acquisition data. response, this paper suggests neural network-based voice conversion framework enhance...

10.1109/access.2021.3115608 article EN cc-by-nc-nd IEEE Access 2021-01-01

Human-robot interface using robust speech recognition and user localization based on noise separation device

OPENALEX - Publications

Kiyoung Park Sung Joo Lee Ho‐Young Jung Yun‐Keun Lee

This paper introduces a robust human-robot interface (HRI) system using speech recognition and user localization. For indoors under unknown noises acoustic reverberations, blind source separation (BSS) algorithm is implemented by block-wise processing developed digital signal board to guarantee real-time operation. And reverberation-robust sound localization separated signals proposed. Although the BSS method cannot completely preserve room information, proposed overcomes this problem target...

10.1109/roman.2009.5326264 article EN 2009-09-01

Improved Spoken Language Representation for Intent Understanding in a Task-Oriented Dialogue System

OPENALEX - Publications

June-Woo Kim Hyekyung Yoon Ho‐Young Jung

Successful applications of deep learning technologies in the natural language processing domain have improved text-based intent classifications. However, practical spoken dialogue applications, users' articulation styles and background noises cause automatic speech recognition (ASR) errors, these may lead models to misclassify intents. To overcome limited performance classification task system, we propose a novel approach that jointly uses both recognized text obtained by ASR model given...

10.3390/s22041509 article EN cc-by Sensors 2022-02-15

Adversarial Fine-tuning using Generated Respiratory Sound to Address Class Imbalance

OPENALEX - Publications

June-Woo Kim Chihyeon Yoon Miika Toikkanen Sangmin Bae Ho‐Young Jung

Deep generative models have emerged as a promising approach in the medical image domain to address data scarcity. However, their use for sequential like respiratory sounds is less explored. In this work, we propose straightforward augment imbalanced sound using an audio diffusion model conditional neural vocoder. We also demonstrate simple yet effective adversarial fine-tuning method align features between synthetic and real samples improve classification performance. Our experimental...

10.48550/arxiv.2311.06480 preprint EN cc-by arXiv (Cornell University) 2023-01-01

Discriminative noise adaptive training approach for an environment migration

OPENALEX - Publications

Byung-Ok Kang Ho‐Young Jung Yun‐Keun Lee

10.21437/interspeech.2007-564 article EN Interspeech 2022 2007-08-27

Model Adaptation Using Discriminative Noise Adaptive Training Approach for New Environments

OPENALEX - Publications

Ho‐Young Jung Byung‐Ok Kang Yun‐Keun Lee

A conventional environment adaptation for robust speech recognition is usually conducted using transform-based techniques. Here, we present a discriminative strategy based on multi-condition-trained model, and propose new method to provide universal application the environment's specific conditions. Experimental results show that system adapted proposed works successfully other conditions as well those of environment.

10.4218/etrij.08.0208.0256 article EN ETRI Journal 2008-12-04

Moving-image-sticking phenomenon induced by an outside force in liquid-crystal displays

OPENALEX - Publications

Hyung Ki Hong Ji‐Young Ahn Ho‐Young Jung Heume‐Il Baek Moojong Lim and 1 more

Abstract— Image deformation caused by an outside force is observed to remain for hours at high gray levels liquid‐crystal displays (LCDs) in the multi‐domain (MD) vertical‐alignment (VA) mode. This so‐called moving‐image‐sticking phenomenon demonstrated a non‐symmetric luminance profile left and right viewing direction MDVA‐mode LCDs which have original symmetric viewing‐angle characteristics. The generation of stable reverse‐tilt domain was assumed be cause this phenomenon, stability under...

10.1889/1.2966451 article EN Journal of the Society for Information Display 2008-07-22

Rank‐weighted reconstruction feature for a robust deep neural network‐based acoustic model

OPENALEX - Publications

Hoon Chung Jeon Gue Park Ho‐Young Jung

In this paper, we propose a rank-weighted reconstruction feature to improve the robustness of feed-forward deep neural network (FFDNN)-based acoustic model. FFDNN-based model, an input is constructed by vectorizing submatrix that created slicing vectors frames within context window. type construction, appropriate window size important because it determines amount trivial or discriminative information, such as redundancy, temporal features. However, ascertained whether single parameter...

10.4218/etrij.2018-0189 article EN publisher-specific-oa ETRI Journal 2019-02-03

Parameter Reduction For Deep Neural Network Based Acoustic Models Using Sparsity Regularized Factorization Neurons

OPENALEX - Publications

Hoon Chung Euisok Chung Jeon Gue Park Ho‐Young Jung

In this paper, we propose a deep neural network (DNN) model parameter reduction technique for an efficient acoustic model. One of the most common DNN techniques is to use low-rank matrix approximation. Although it can reduce significant number parameters, there are two problems be considered; one performance degradation, and other appropriate rank selection. To solve these problems, retraining carried out, so-called explained variance used. However, takes additional time, not directly...

10.1109/ijcnn.2019.8852021 article EN 2022 International Joint Conference on Neural Networks (IJCNN) 2019-07-01

Vocoder-free End-to-End Voice Conversion with Transformer Network

OPENALEX - Publications

June-Woo Kim Ho‐Young Jung Minho Lee

Mel-frequency filter bank (MFB) based approaches have the advantage of higher learning speeds compared to using raw spectrum due a smaller number features. However, speech generators with MFB approach require an additional computationally expensive vocoder for training process. The pre- and post-processing needed by is not essential convert human voices, because it possible use only generate different style voices clear pronunciation. In this paper, we introduce vocoder-free end-to-end voice...

10.1109/ijcnn48605.2020.9207653 article EN 2022 International Joint Conference on Neural Networks (IJCNN) 2020-07-01

Discriminative linear-transform based adaptation using minimum verification error

OPENALEX - Publications

Sung-Hwan Shin Ho‐Young Jung Tae-Yoon Kim Biing‐Hwang Juang

This paper presents an investigation of the minimum verification error linear regression (MVELR) method for discriminative linear-transform based adaptation. The MVE criterion is employed to estimate a set transformations which achieve smallest empirical average loss with given adaptation data. MVELR directly minimizes total detection errors, some are results characteristic mismatch in In this study, segment-based phonetic detectors reflecting important processing layer speech event...

10.1109/icassp.2010.5495659 article EN IEEE International Conference on Acoustics Speech and Signal Processing 2010-01-01

Unsupervised Representation Learning with Task-Agnostic Feature Masking for Robust End-to-End Speech Recognition

OPENALEX - Publications

June-Woo Kim Hoon Chung Ho‐Young Jung

Unsupervised learning-based approaches for training speech vector representations (SVR) have recently been widely applied. While pretrained SVR models excel in relatively clean automatic recognition (ASR) tasks, such as those recorded laboratory environments, they are still insufficient practical applications with various types of noise, intonation, and dialects. To cope this problem, we present a novel unsupervised learning method end-to-end ASR models. Our approach involves designing...

10.3390/math11030622 article EN cc-by Mathematics 2023-01-26

Stethoscope-guided Supervised Contrastive Learning for Cross-domain Adaptation on Respiratory Sound Classification

OPENALEX - Publications

June-Woo Kim Sangmin Bae Won-Yang Cho Byungjo Lee Ho‐Young Jung

Despite the remarkable advances in deep learning technology, achieving satisfactory performance lung sound classification remains a challenge due to scarcity of available data. Moreover, respiratory samples are collected from variety electronic stethoscopes, which could potentially introduce biases into trained models. When significant distribution shift occurs within test dataset or practical scenario, it can substantially decrease performance. To tackle this issue, we cross-domain...

10.48550/arxiv.2312.09603 preprint EN cc-by arXiv (Cornell University) 2023-01-01

RepAugment: Input-Agnostic Representation-Level Augmentation for Respiratory Sound Classification

OPENALEX - Publications

June-Woo Kim Miika Toikkanen Sangmin Bae Min‐Seok Kim Ho‐Young Jung

Recent advancements in AI have democratized its deployment as a healthcare assistant. While pretrained models from large-scale visual and audio datasets demonstrably generalized to this task, surprisingly, no studies explored speech models, which, human-originated sounds, intuitively would share closer resemblance lung sounds. This paper explores the efficacy of for respiratory sound classification. We find that there is characterization gap between samples, bridge gap, data augmentation...

10.48550/arxiv.2405.02996 preprint EN arXiv (Cornell University) 2024-05-05

BTS: Bridging Text and Sound Modalities for Metadata-Aided Respiratory Sound Classification

OPENALEX - Publications

June-Woo Kim Miika Toikkanen Yera Choi Seoung-Eun Moon Ho‐Young Jung

Respiratory sound classification (RSC) is challenging due to varied acoustic signatures, primarily influenced by patient demographics and recording environments. To address this issue, we introduce a text-audio multimodal model that utilizes metadata of respiratory sounds, which provides useful complementary information for RSC. Specifically, fine-tune pretrained using free-text descriptions derived from the samples' includes gender age patients, type devices, location on patient's body. Our...

10.48550/arxiv.2406.06786 preprint EN arXiv (Cornell University) 2024-06-10

BTS: Bridging Text and Sound Modalities for Metadata-Aided Respiratory Sound Classification

OPENALEX - Publications

June-Woo Kim Miika Toikkanen Yera Choi Seoung-Eun Moon Ho‐Young Jung

Respiratory sound classification (RSC) is challenging due to varied acoustic signatures, primarily influenced by patient demographics and recording environments. To address this issue, we introduce a text-audio multimodal model that utilizes metadata of respiratory sounds, which provides useful complementary information for RSC. Specifically, fine-tune pretrained using free-text descriptions derived from the samples' includes gender age patients, type devices, location on patient's body. Our...

10.21437/interspeech.2024-492 article EN Interspeech 2022 2024-09-01

Actions and Objects Pathways for Domain Adaptation in Video Question Answering

OPENALEX - Publications

Safaa Abdullahi Moallim Mohamud Ho‐Young Jung

In this paper, we introduce the Actions and Objects Pathways (AOPath) for out-of-domain generalization in video question answering tasks. AOPath leverages features from a large pretrained model to enhance generalizability without need explicit training on unseen domains. Inspired by human brain, dissociates into action object features, subsequently processes them through separate reasoning pathways. It utilizes novel module which converts domain-agnostic introducing any trainable weights. We...

10.48550/arxiv.2411.19434 preprint EN arXiv (Cornell University) 2024-11-28