Wooil Kim

ORCID: 0000-0002-3854-2783
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Speech and Audio Processing
  • Speech Recognition and Synthesis
  • Music and Audio Processing
  • Advanced Adaptive Filtering Techniques
  • Phonetics and Phonology Research
  • Voice and Speech Disorders
  • Emotion and Mood Recognition
  • Advanced Data Compression Techniques
  • Acoustic Wave Phenomena Research
  • Blind Source Separation Techniques
  • Direction-of-Arrival Estimation Techniques
  • Wireless Networks and Protocols
  • Molecular Biology Techniques and Applications
  • Video Surveillance and Tracking Methods
  • Wireless Communication Networks Research
  • Natural Language Processing Techniques
  • Biometric Identification and Security
  • Software Testing and Debugging Techniques
  • Sentiment Analysis and Opinion Mining
  • Cleft Lip and Palate Research
  • Dysphagia Assessment and Management
  • Bayesian Methods and Mixture Models
  • Underwater Acoustics Research
  • Autonomous Vehicle Technology and Safety
  • Energy and Environmental Systems

Incheon National University
2013-2022

The University of Texas at Dallas
2008-2015

Seokyeong University
2014

Carnegie Mellon University
2005

Korea University
2000-2004

Deep Neural Network (DNN) based transfer learning has been shown to be effective in Visual Object Classification (VOC) for complementing the deficit of target domain training samples by adapting classifiers that have pre-trained other large-scaled DataBase (DB). Although there exists an abundance acoustic data, it can also said datasets specific scenes are sparse Acoustic Scene (ASC) models. By exploiting VOC DNN's ability beyond its environments, this paper proposes DNN ASC. Effectiveness...

10.1109/icassp.2017.7952265 article EN 2017-03-01

In real-life conditions, mismatch between development and test domain degrades speaker recognition performance. To solve the issue, many researchers explored adaptation approaches using matched in-domain dataset. However, would be not effective if dataset is insufficient to estimate channel variability of domain. this paper, we explore problem performance degradation under such a situation information. order exploit limited effectively, propose an unsupervised approach Autoencoder based...

10.21437/interspeech.2017-49 preprint EN Interspeech 2022 2017-08-16

The problem of detecting psychological stress from speech is challenging due to differences in how speakers convey stress. Changes production speaker state are not linearly dependent on changes Research further complicated by the existence different types and lack metrics capable discriminating levels. This study addresses automatic detection under using a previously developed feature extraction scheme based Teager Energy Operator (TEO). To improve performance (i) selected sub-band frequency...

10.1155/2011/906789 article EN cc-by EURASIP Journal on Advances in Signal Processing 2011-03-07

A new voice activity detector for noisy environments is proposed. In conventional algorithms, the endpoint of speech found by applying an edge detection filter that finds abrupt changing point in a feature domain. However, since frame energy unstable environments, it difficult to accurately find speech. Therefore, novel extraction algorithm based on double-combined Fourier transform and envelope line fitting It combined with effective endpoints. Effectiveness proposed evaluated compared...

10.1155/2014/146040 article EN cc-by The Scientific World JOURNAL 2014-01-01

A speech state-dependent spectral subtraction method to regulate the blind for improved enhancement is proposed. In this method, a modified rule applied over selectively contingent state being voiced or unvoiced, in an effort incorporate acoustic characteristics of phonemes. The aim remedy induced signal distortion attained by two procedures: spectrum sharpening and minimum bound. order remove residual noise, proposed employs procedure utilising masking effect. subtraction, including noise...

10.1049/ip-vis:20000408 article EN IEE Proceedings - Vision Image and Signal Processing 2000-01-01

Band-limited speech represents one of the most challenging factors for robust recognition. This is especially true in supporting audio corpora from sources that have a range conditions spoken document retrieval requiring effective automatic The missing-feature reconstruction method has problem when applied to band-limited reconstruction, since it assumes observations unreliable regions are always greater than latent original clean speech. approach developed here depends only on reliable...

10.1109/tasl.2009.2015080 article EN IEEE Transactions on Audio Speech and Language Processing 2009-07-17

Recent acoustic event classification research has focused on training suitable filters to represent events. However, due limited availability of target databases and linearity conventional filters, there is still room for improving performance. By exploiting the non-linear modeling deep neural networks (DNNs) their ability learn beyond pre-trained environments, this letter proposes a DNN-based feature extraction scheme The effectiveness robustness noise proposed method are demonstrated using...

10.1587/transinf.2017edl8048 article EN IEICE Transactions on Information and Systems 2017-01-01

This study proposes an effective angry speech detection approach by leveraging content structure within the input speech. A classifier based on "emotional" language model score is formulated and combined with acoustic feature classifiers including TEO-based conventional Mel frequency cepstral coefficients (MFCC). The proposed algorithm evaluated real-life conversational which was recorded between customers call center operators over a telephone network. Analysis corpus presents distinctive...

10.1109/icassp.2010.5495021 article EN IEEE International Conference on Acoustics Speech and Signal Processing 2010-01-01

This paper proposes a novel missing-feature reconstruction method to improve speech recognition in background noise environments. The existing utilizes log-spectral correlation across frequency bands. In this paper, we propose employ temporal spectral feature analysis the performance by leveraging neighboring frames. similar manner with conventional method, Gaussian mixture model is obtained training over set. final estimates for are selective combination of original based and proposed...

10.1109/tasl.2010.2041698 article EN IEEE Transactions on Audio Speech and Language Processing 2010-06-14

Various types of classifiers and feature extraction methods for acoustic scene classification have been recently proposed in the IEEE Detection Classification Acoustic Scenes Events (DCASE) 2016 Challenge Task 1. The results final evaluation, however, shown that even top 10 ranked teams, showed extremely low accuracy performance particular class pairs with similar sounds. Due to such sound classes being difficult distinguish by human ears, conventional deep learning based methods, as used...

10.1587/transinf.2017edl8132 article EN IEICE Transactions on Information and Systems 2017-01-01

This study proposes an effective feature compensation-method to improve speech recognition in real-life conditions, where (i) severe background noise and channel distortion simultaneously exist, (ii) no development data is available, (iii) clean for ASR training the latent test are mismatched acoustic structure. The proposed compensation method employs online GMM adaptation procedure which based on MLLR, a minimum statistics replacement technique non-speech segments. DARPA Tank corpus used...

10.1109/icassp.2012.6288825 article EN 2012-03-01

In this paper, we propose an effective mask-estimation method for missing-feature reconstruction in order to achieve robust speech recognition unknown noise environments. previous work, it was found that training a model mask estimation on corrupted by white did not provide environment-independent accuracy. paper describe based bands of colored is more reflecting spectral variations across neighboring frames and subbands. We also achieved further improvement accuracy reconsidering appeared...

10.21437/interspeech.2005-248 article EN Interspeech 2022 2005-09-04

This letter presents a novel confidence measure for the purpose of improving user performance in Spoken Document Retrieval (SDR). The proposed is based on phonetic distance between subword models, employing an anti-model which determined to be discriminative target model using offline training data. As advancement from our previous work, method employs separate similarity knowledge vowels and consonants, resulting more reliable over diverse SDR recorded speech conditions. A transcript...

10.1109/lsp.2009.2034551 article EN IEEE Signal Processing Letters 2009-10-16

One of the most recent speaker recognition methods that demonstrates outstanding performance in noisy environments involves extracting embedding using attention mechanism instead average or statistics pooling. In method, is improved by employing multiple heads rather than a single head. this paper, we propose advanced to extract new compensating for disadvantages single-head and multi-head methods. The combination method comprising split-based attentions shows 5.39% Equal Error Rate (EER)....

10.3390/electronics9122201 article EN Electronics 2020-12-21

This paper proposes a novel mask estimation method for missing-feature reconstruction to improve speech recognition performance in time-varying background noise conditions. Conventional methods based on estimates and spectral subtraction fail reliably estimate the mask. The proposed utilizes posterior-based representative mean (PRM) vector determining reliability of input spectrum, which is obtained as weighted sum parameters model with posterior probabilities. To obtain noise-corrupted...

10.1109/asru.2009.5373398 article EN 2009-12-01
Coming Soon ...