Ho‐Young Jung

ORCID: 0000-0003-0398-831X
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Speech and Audio Processing
  • Speech Recognition and Synthesis
  • Music and Audio Processing
  • Blind Source Separation Techniques
  • Phonocardiography and Auscultation Techniques
  • Diverse Musicological Studies
  • Neural Networks and Applications
  • Advanced Adaptive Filtering Techniques
  • Spectroscopy and Chemometric Analyses
  • Topic Modeling
  • Natural Language Processing Techniques
  • Handwritten Text Recognition Techniques
  • Silicone and Siloxane Chemistry
  • Target Tracking and Data Fusion in Sensor Networks
  • Vehicle License Plate Recognition
  • Advanced Image and Video Retrieval Techniques
  • Multimodal Machine Learning Applications
  • Speech and dialogue systems
  • Vehicle Noise and Vibration Control
  • Advanced Optical Imaging Technologies
  • Surface Roughness and Optical Measurements
  • Advanced Sensor and Energy Harvesting Materials
  • Image Enhancement Techniques
  • Domain Adaptation and Few-Shot Learning
  • Synthesis and properties of polymers

Kyungpook National University
2020-2025

Electronics and Telecommunications Research Institute
2002-2019

Philips (Finland)
2008

Korea Advanced Institute of Science and Technology
1999-2006

In this paper, we proposed new speech features using independent component analysis to human speeches. When is applied signals for efficient encoding the adapted basis functions resemble Gabor-like features. Trained have some redundancies, so select of by reordering method. The are almost ordered from low frequency vector high vector. And compatible with fact that much more information in range. Those can be used automatic recognition systems and method gives better rates than conventional...

10.1109/icassp.2000.862023 article EN 2002-11-07

Despite the remarkable advances in deep learning technology, achieving satisfactory performance lung sound classification remains a challenge due to scarcity of available data. Moreover, respiratory samples are collected from variety electronic stethoscopes, which could potentially introduce biases into trained models. When significant distribution shift occurs within test dataset or practical scenario, it can substantially decrease performance. To tackle this issue, we cross-domain...

10.1109/icassp48485.2024.10447734 article EN ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2024-03-18

Despite considerable advancements in deep learning, optimizing respiratory sound classification (RSC) models remains challenging. This is partly due to the bias from inconsistent recording processes and imbalanced representation of demographics, which leads poor performance when a model trained with dataset applied real-world use cases. RSC datasets usually include various metadata attributes describing certain aspects data, such as environmental demographic factors. To address issues caused...

10.1109/jbhi.2025.3545159 article EN IEEE Journal of Biomedical and Health Informatics 2025-01-01

A method for directly extracting clean speech features from noisy is proposed. This process based on independent component analysis (ICA) and a new feature technique reducing the computational complexity of frequency-domain ICA. For signals recorded in real environments, this yielded considerable performance improvement.

10.1049/el:19991358 article EN Electronics Letters 1999-11-11

10.1023/a:1015777200976 article EN Neural Processing Letters 2002-01-01

Abstract Audio classification related to military activities is a challenging task due the high levels of background noise and lack suitable publicly available datasets. To bridge this gap, paper constructs introduces new audio dataset, named MAD, which for training evaluating systems. The proposed MAD dataset extracted from various videos contains 8,075 sound samples 7 classes corresponding approximately 12 hours, exhibiting distinctive characteristics not presented in academic datasets...

10.1038/s41597-024-03511-w article EN cc-by Scientific Data 2024-06-22

We propose a novel feature processing technique which can provide cepstral liftering effect in the log-spectral domain. Cepstral aims at equalization of variance coefficients for distance-based speech recognizer, and as result, provides robustness additive noise speaker variability. However, popular hidden Markov model based framework, has no recognition performance. derive filtering method domain corresponding to liftering. The proposed performs high-pass on decorrelation filter-bank...

10.4218/etrij.04.0203.0033 article EN ETRI Journal 2004-06-15

We address a low-performance problem of the elderly in automatic speech recognition (ASR) through feature adaptation agnostic to ASR. Most datasets for models consist collected from adult speakers. Consequently, majority commercial systems typically tend perform well on In other words, limited diversity speakers training yields unreliable performance minority (e.g., elderly) due infeasible acquisition data. response, this paper suggests neural network-based voice conversion framework enhance...

10.1109/access.2021.3115608 article EN cc-by-nc-nd IEEE Access 2021-01-01

This paper introduces a robust human-robot interface (HRI) system using speech recognition and user localization. For indoors under unknown noises acoustic reverberations, blind source separation (BSS) algorithm is implemented by block-wise processing developed digital signal board to guarantee real-time operation. And reverberation-robust sound localization separated signals proposed. Although the BSS method cannot completely preserve room information, proposed overcomes this problem target...

10.1109/roman.2009.5326264 article EN 2009-09-01

Successful applications of deep learning technologies in the natural language processing domain have improved text-based intent classifications. However, practical spoken dialogue applications, users' articulation styles and background noises cause automatic speech recognition (ASR) errors, these may lead models to misclassify intents. To overcome limited performance classification task system, we propose a novel approach that jointly uses both recognized text obtained by ASR model given...

10.3390/s22041509 article EN cc-by Sensors 2022-02-15

Deep generative models have emerged as a promising approach in the medical image domain to address data scarcity. However, their use for sequential like respiratory sounds is less explored. In this work, we propose straightforward augment imbalanced sound using an audio diffusion model conditional neural vocoder. We also demonstrate simple yet effective adversarial fine-tuning method align features between synthetic and real samples improve classification performance. Our experimental...

10.48550/arxiv.2311.06480 preprint EN cc-by arXiv (Cornell University) 2023-01-01

A conventional environment adaptation for robust speech recognition is usually conducted using transform-based techniques. Here, we present a discriminative strategy based on multi-condition-trained model, and propose new method to provide universal application the environment's specific conditions. Experimental results show that system adapted proposed works successfully other conditions as well those of environment.

10.4218/etrij.08.0208.0256 article EN ETRI Journal 2008-12-04

Abstract— Image deformation caused by an outside force is observed to remain for hours at high gray levels liquid‐crystal displays (LCDs) in the multi‐domain (MD) vertical‐alignment (VA) mode. This so‐called moving‐image‐sticking phenomenon demonstrated a non‐symmetric luminance profile left and right viewing direction MDVA‐mode LCDs which have original symmetric viewing‐angle characteristics. The generation of stable reverse‐tilt domain was assumed be cause this phenomenon, stability under...

10.1889/1.2966451 article EN Journal of the Society for Information Display 2008-07-22

In this paper, we propose a rank-weighted reconstruction feature to improve the robustness of feed-forward deep neural network (FFDNN)-based acoustic model. FFDNN-based model, an input is constructed by vectorizing submatrix that created slicing vectors frames within context window. type construction, appropriate window size important because it determines amount trivial or discriminative information, such as redundancy, temporal features. However, ascertained whether single parameter...

10.4218/etrij.2018-0189 article EN publisher-specific-oa ETRI Journal 2019-02-03

In this paper, we propose a deep neural network (DNN) model parameter reduction technique for an efficient acoustic model. One of the most common DNN techniques is to use low-rank matrix approximation. Although it can reduce significant number parameters, there are two problems be considered; one performance degradation, and other appropriate rank selection. To solve these problems, retraining carried out, so-called explained variance used. However, takes additional time, not directly...

10.1109/ijcnn.2019.8852021 article EN 2022 International Joint Conference on Neural Networks (IJCNN) 2019-07-01

Mel-frequency filter bank (MFB) based approaches have the advantage of higher learning speeds compared to using raw spectrum due a smaller number features. However, speech generators with MFB approach require an additional computationally expensive vocoder for training process. The pre- and post-processing needed by is not essential convert human voices, because it possible use only generate different style voices clear pronunciation. In this paper, we introduce vocoder-free end-to-end voice...

10.1109/ijcnn48605.2020.9207653 article EN 2022 International Joint Conference on Neural Networks (IJCNN) 2020-07-01

This paper presents an investigation of the minimum verification error linear regression (MVELR) method for discriminative linear-transform based adaptation. The MVE criterion is employed to estimate a set transformations which achieve smallest empirical average loss with given adaptation data. MVELR directly minimizes total detection errors, some are results characteristic mismatch in In this study, segment-based phonetic detectors reflecting important processing layer speech event...

10.1109/icassp.2010.5495659 article EN IEEE International Conference on Acoustics Speech and Signal Processing 2010-01-01

Unsupervised learning-based approaches for training speech vector representations (SVR) have recently been widely applied. While pretrained SVR models excel in relatively clean automatic recognition (ASR) tasks, such as those recorded laboratory environments, they are still insufficient practical applications with various types of noise, intonation, and dialects. To cope this problem, we present a novel unsupervised learning method end-to-end ASR models. Our approach involves designing...

10.3390/math11030622 article EN cc-by Mathematics 2023-01-26

Despite the remarkable advances in deep learning technology, achieving satisfactory performance lung sound classification remains a challenge due to scarcity of available data. Moreover, respiratory samples are collected from variety electronic stethoscopes, which could potentially introduce biases into trained models. When significant distribution shift occurs within test dataset or practical scenario, it can substantially decrease performance. To tackle this issue, we cross-domain...

10.48550/arxiv.2312.09603 preprint EN cc-by arXiv (Cornell University) 2023-01-01

Recent advancements in AI have democratized its deployment as a healthcare assistant. While pretrained models from large-scale visual and audio datasets demonstrably generalized to this task, surprisingly, no studies explored speech models, which, human-originated sounds, intuitively would share closer resemblance lung sounds. This paper explores the efficacy of for respiratory sound classification. We find that there is characterization gap between samples, bridge gap, data augmentation...

10.48550/arxiv.2405.02996 preprint EN arXiv (Cornell University) 2024-05-05

Respiratory sound classification (RSC) is challenging due to varied acoustic signatures, primarily influenced by patient demographics and recording environments. To address this issue, we introduce a text-audio multimodal model that utilizes metadata of respiratory sounds, which provides useful complementary information for RSC. Specifically, fine-tune pretrained using free-text descriptions derived from the samples' includes gender age patients, type devices, location on patient's body. Our...

10.48550/arxiv.2406.06786 preprint EN arXiv (Cornell University) 2024-06-10

Respiratory sound classification (RSC) is challenging due to varied acoustic signatures, primarily influenced by patient demographics and recording environments. To address this issue, we introduce a text-audio multimodal model that utilizes metadata of respiratory sounds, which provides useful complementary information for RSC. Specifically, fine-tune pretrained using free-text descriptions derived from the samples' includes gender age patients, type devices, location on patient's body. Our...

10.21437/interspeech.2024-492 article EN Interspeech 2022 2024-09-01

In this paper, we introduce the Actions and Objects Pathways (AOPath) for out-of-domain generalization in video question answering tasks. AOPath leverages features from a large pretrained model to enhance generalizability without need explicit training on unseen domains. Inspired by human brain, dissociates into action object features, subsequently processes them through separate reasoning pathways. It utilizes novel module which converts domain-agnostic introducing any trainable weights. We...

10.48550/arxiv.2411.19434 preprint EN arXiv (Cornell University) 2024-11-28
Coming Soon ...