- Speech and Audio Processing
- Advanced Adaptive Filtering Techniques
- Speech Recognition and Synthesis
- Music and Audio Processing
- Blind Source Separation Techniques
- Hearing Loss and Rehabilitation
- Metal complexes synthesis and properties
- Metal-Organic Frameworks: Synthesis and Applications
- Emotion and Mood Recognition
- Molecular Sensors and Ion Detection
- Crystal structures of chemical compounds
- Crystallography and molecular interactions
- Advanced Data Compression Techniques
- Magnetism in coordination complexes
- Image and Signal Denoising Methods
- Metal-Catalyzed Oxygenation Mechanisms
- Acoustic Wave Phenomena Research
- Indoor and Outdoor Localization Technologies
- Neural dynamics and brain function
- Natural Language Processing Techniques
- Sentiment Analysis and Opinion Mining
- Luminescence and Fluorescent Materials
- Advanced Nanomaterials in Catalysis
- Gaussian Processes and Bayesian Inference
- Diverse Topics in Contemporary Research
Gwangju Institute of Science and Technology
2015-2025
Kookmin University
2025
Korea Institute of Science & Technology Information
2016-2020
Kyung Hee University
2017
Kyungpook National University
2009-2015
Seoul National University
2004-2009
Seoul National University of Science and Technology
2008
University of California, Santa Barbara
2006
Seoul Media Institute of Technology
2005
Feasibility of a high speed pattern recognition system using 1k-bit cross-point synaptic RRAM array and CMOS-based neuron chip has been experimentally demonstrated. Learning capability neuromorphic comprising synapses CMOS neurons confirmed experimentally, for the first time.
A synthetic approach to highly efficient thermally activated delayed fluorescence (TADF) is proposed that uses ortho donor (D)–acceptor (A) compounds (PXZoB, DPAoB, and CzoB), wherein the acceptor based on triarylboron phenoxazine (PXZ), diphenylamine (DPA), or carbazole (Cz). Combined with D–A connectivity, bulky nature of endows dyads inherent steric "locking" for a twisted arrangement, leading small energy difference between singlet triplet excited states (ΔEST) thus exhibiting very TADF...
The performance of most the classical sound source localization algorithms degrades seriously in presence background noise or reverberation. Recently, deep neural networks (DNNs) have successfully been applied to localization, which mainly aim classify direction-of-arrival (DoA) into one candidate sectors. In this paper, we propose a DNN-based phase difference enhancement for DoA estimation, turned out be better than direct estimation DoAs from input interchannel differences (IPDs)....
In this letter, we propose a new statistical model, two-sided generalized gamma distribution (G/spl Gamma/D) for an efficient parametric characterization of speech spectra. G/spl Gamma/D forms class distributions, including the Gaussian, Laplacian, and Gamma probability density functions (pdfs) as special cases. We also computationally inexpensive online maximum likelihood (ML) parameter estimation algorithm Gamma/D. Likelihoods, coefficients variation (CVs), Kolmogorov-Smirnov (KS) tests...
This letter presents a speech enhancement technique combining statistical models and non-negative matrix factorization (NMF) with on-line update of noise bases. The model-based methods have been known to be less effective non-stationary noises while the template-based techniques can deal them quite well. However, usually rely on priori information. To overcome shortcomings both approaches, we propose novel method that combines scheme NMF-based gain function. For better performance in...
In this paper, we propose a novel emotion recognition method to reflect affect salient information using acoustic and lexical features. The features are extracted from the speech signal by applying statistical functionals of emotionally high-level derived Deep Neural Network (DNN). These early fused with two types text transcription signal, which distributed representation affective lexicon-based dimensions. fed another DNN for utterance-level classification. Experimental results on...
In this letter, we propose a novel approach to voice activity detection (VAD) based on the modified maximum posteriori (MAP) criterion conditioned decision made in previous frame. To exploit inter-frame correlation of activity, probability presence both observed spectrum and frame is employed instead conventional strategy that depends only current observation. The proposed conditional MAP incorporating temporal correlations leads two separate thresholds for likelihood ratio test (LRT)...
From an investigation of a statistical model-based voice activity detection (VAD), it is discovered that simple heuristic way like geometric mean has been adopted for decision rule based on the likelihood ratio (LR) test. For successful VAD operation, authors first review behaviour mechanism support vector machine (SVM) and then propose novel technique, which employs function SVM using LRs, while conventional techniques perform comparing LRs with given threshold value. The proposed SVM-based...
Within a single speech emotion corpus, deep neural networks have shown decent performance in recognition. However, the of recognition based on data-driven learning methods degrades significantly for cross-corpus scenario. To relieve this issue without any labeled samples from target domain, we propose few-shot and unsupervised domain adaptation, which is trained to learn class (emotion) similarity source adapted domain. In addition, utilize multiple corpora training enhance robustness unseen...
There is a surge in interest self-supervised learning approaches for end-to-end speech encoding recent years as they have achieved great success. Especially, WavLM showed state-of-the-art performance on various processing tasks. To better understand the efficacy of models enhancement, this work, we design and conduct series experiments with three resource conditions by combining two high-quality enhancement systems. Also, We propose regression-based training objective noise-mixing data...
The reaction of N-(2-pyridylmethyl)iminodiethanol (H2pmide) and Fe(NO3)3·9H2O in MeOH led to the formation a dimeric iron(III) complex, [(Hpmide)Fe(NO3)]2(NO3)2·2CH3OH (1). Its anion-exchanged form, [(pmide)Fe(N3)]2 (2), was prepared by 1and NaN3 MeOH, during which Hpmide ligand 1 also deprotonated. These compounds were investigated single crystal X-ray diffraction magnetochemistry. In complex 1, one ion bonded with mono-deprotonated nitrate ion. two ions within dinuclear unit connected...
The cobalt(ii) complex incorporating π-conjugated substituent, [Co(Naph-C2-terpy)2](BF4)2 (1; Naph-C2-terpy = 4'-(2-naphthoxy(ethoxy))-2,2':6',2''-terpyridine), exhibits an abrupt spin transition (ST) behavior (cooperative factor C 0.91) while its solvated product, 1·2MeOH, shows gradual crossover (SCO) (C 0.49). Single crystal X-ray structural analyses demonstrated that the octahedral coordination core [CoN6] in 1 larger distortion both high-spin and low-spin states than 1·2MeOH or another...
In this letter, we propose results of distribution tests that indicate for many natural images, the statistics discrete cosine transform (DCT) coefficients are best approximated by a generalized gamma function (G/spl Gamma/F), which includes conventional Gaussian, Laplacian, and probability density functions. The major parameter G/spl Gamma/F is estimated according to maximum likelihood (ML) principle. Experimental on number /spl chi//sup 2/ can be used effectively modeling DCT compared...
In this letter, we propose a spectro-temporal filtering algorithm for multichannel speech enhancement in the short-time Fourier transform (STFT) domain. Compared with traditional multiplicative technique, proposed method takes account of interdependencies between components adjacent frames and frequency bins. For filtering, noise power spectral density (PSD) matrices are estimated based on an extended formulation utilizing temporal correlations, parametric reduction filter these PSD is...
Speech enhancement based on statistical models has been studied for several decades. Recently, the speech adopting a power spectral density (PSD) uncertainty model proposed. This approach distinguishes true PSD from its estimate and considers both as random variables. It incorporates prior distribution of spectra estimators to derive uncertainty-aware counterpart conventional clean estimators, which results in performance improvement. However, not yet adopted parameter estimations such...
Time-domain approaches have shown the potential to improve performance of speaker verification, but still predominant utilize hand-crafted features such as mel filterbank energies. Although these are based on speech perception models and exhibited impressive performances, fixed frame size does not allow good temporal spectral resolutions at same time there is information loss when taking magnitude spectrum during frequency rescaling. In this paper, we propose incorporate multi-resolution...
A series of Zn (II), Pd (II) and Cd complexes, [(L) n MX 2 ] m (L = L‐a–L‐c; M Zn, Pd; X Cl; Cd; Br; , 1 or 2), containing 4‐methoxy‐ N ‐(pyridin‐2‐ylmethylene) aniline ( L‐a ), ‐(pyridin‐2‐ylmethyl) L‐b ) ‐methyl‐ L‐c have been synthesized characterized. The X‐ray crystal structures complexes [L PdCl L‐c) revealed distorted square planar geometries obtained via coordinative interaction the nitrogen atoms pyridine amine moieties two chloro ligands. geometry around center in [(L‐a)ZnCl...
We propose a voice activity detection (VAD) algorithm based on the generalized gamma distribution (G/spl Gamma/D). The distributions of noise spectra and noisy speech spectra, including speech-inactive intervals, are modeled by set G/spl Gamma/Ds applied to likelihood ratio test (LRT) for VAD. parameters Gamma/D estimated through an on-line maximum (ML) estimation procedure where global absence probability (GSAP) is incorporated under forgetting scheme. Experimental results show that...
A novel approach to a voice activity detector (VAD) in noisy environments is presented. The generalised Gaussian distribution (GGD) employed as parametric model for speech, which enables tuning the actual data. According experimental results, it was discovered that proposed GGD more effective VAD algorithm compared conventional Laplacian model.