- Speech Recognition and Synthesis
- Speech and Audio Processing
- Music and Audio Processing
- Speech and dialogue systems
- Distributed and Parallel Computing Systems
- X-ray Diffraction in Crystallography
- Cloud Computing and Resource Management
- Crystallization and Solubility Studies
- Advanced Adaptive Filtering Techniques
- Industrial Vision Systems and Defect Detection
- Natural Language Processing Techniques
- Voice and Speech Disorders
- Transportation Planning and Optimization
- Infant Health and Development
- Educational Technology and Assessment
- Image Processing and 3D Reconstruction
- Medicinal Plant Pharmacodynamics Research
- Cancer Mechanisms and Therapy
- Machine Learning and Data Classification
- NF-κB Signaling Pathways
- Smart Parking Systems Research
- Luminescence and Fluorescent Materials
- Organic Light-Emitting Diodes Research
- Data Stream Mining Techniques
- Digital Media Forensic Detection
Seoul National University
2017-2023
LG (South Korea)
2019-2023
Soongsil University
2019-2020
Korea Advanced Institute of Science and Technology
2010-2018
Ajou University
2013
Metastatic triple-negative breast cancer (mTNBC) is a fatal type of (BC), and signal transducer activator transcription 3 (STAT3) has emerged as an effective target for mTNBC. In the present study, compound MC0704 was found to be novel synthetic STAT3 pathway inhibitor, its potential antitumor activity demonstrated using in vitro vivo models docetaxel-resistant TNBC cells. Based on marinacarboline (MC), series β-carboline derivatives were synthesized investigated their activities against...
This paper addresses the problem of recognizing speech uttered by patients with dysarthria, which is a motor disorder impeding physical production speech. Patients dysarthria have articulatory limitation, and therefore, they often trouble in pronouncing certain sounds, resulting undesirable phonetic variation. Modern automatic recognition systems designed for regular speakers are ineffective dysarthric sufferers due to To capture variation, Kullback-Leibler divergence-based hidden Markov...
This paper presents a new method for automatically assessing the speech intelligibility of patients with dysarthria, which is motor disorder impeding physical production speech. The proposed consists two main steps: feature representation and prediction. In step, utterance converted into phone sequence using an automatic recognition technique then aligned canonical from pronunciation dictionary weighted finite state transducer to capture mappings such as match, substitution, deletion....
In this paper, we propose a new pooling method called spatial pyramid encoding (SPE) to generate speaker embeddings for text-independent verification. We first partition the output feature maps from deep residual network (ResNet) into increasingly fine sub-regions and extract each sub-region through learnable dictionary layer. These are concatenated obtain final representation. The SPE layer not only generates fixed-dimensional embedding variable-length speech segment, but also aggregates...
Voice activity detection (VAD) is an important preprocessing module in many speech applications. Choosing appropriate features and model structures a significant challenge active area of current VAD research. Mel-scale such as Mel-frequency cepstral coefficients (MFCCs) log Mel-filterbank (LMFB) energies have been widely used well recognition. The reason for feature extraction Mel- frequency scale to be one the most popular methods that it mimics how human ears process sound. However,...
This paper addresses the problem of query-by-example spoken term detection (QbE-STD) in presence background noises that are inevitable real applications. To deal with this, we propose a convolutional neural network (CNN) based bottleneck feature representation for keyword. A combined is made by attaching layer on top CNN trained Wall Street Journal (WSJ) database. Finally, dynamic time warping (DTW) template matching performed to measure distance between enrollment and test matrices which...
This paper addresses the problem of automatically detecting infant crying sounds. Infant sounds show distinct and regular time-frequency patterns that include a clear harmonic structure unique melody. Therefore, extracting appropriate features to properly represent these characteristics is important in achieving good performance. In this paper, we propose weighted segment-based two-dimensional linear-frequency cepstral coefficients characterize within long-range segment target signal. A...
A proper representation that can well express the characteristics of a word plays an important role in wake-up detection (WWD). However, it may be easily corrupted due to various types environmental noise occurred place where WWD typically works, causing unreliable performance. To deal with this practical issue, we propose novel strategy called cross-informed domain adversarial training (CiDAT) for noise-robust WWD. In method, additional paths were introduced conventional (DAT) encourage its...
In this paper, we presents the method and procedure for collecting Korean distant multi-channel speech noise databases, which were designed developing highly accurate recognition system indoor conversational robot applications. The database was collected at four different positions in an in-door room, furnished to simulate a living room acoustically, by playback-and-recording that uses artificial mouth playing clean source data three kinds of microphone arrays recording data. further...
This paper addresses the problem of recognizing malicious sounds, such as sexual scream or moan, to detect and block objectionable multimedia contents. The sounds show distinct characteristics that have large temporal variations fast spectral transitions. Therefore, extracting appropriate features properly represent these is important in achieving a better performance. In this paper, we employ segment-based two-dimensional Mel-frequency cepstral coefficients histograms gradient directions...
In this paper, we introduce a new feature engineering approach for deep learning-based acoustic modeling, which utilizes input contributions. For purpose, propose an auxiliary neural network (DNN) called contribution (FCN) whose output layer is composed of sigmoid-based gates. our framework, the FCN tries to learn element-level discriminative contributions features and model (AMN) trained by gated generated element-wise multiplication between gate outputs features. addition, also...
Previous researches on acoustic word embeddings used in query-by-example spoken term detection have shown remarkable performance improvements when using a triplet network. However, the network is trained only limited information about similarity between words. In this paper, we propose novel architecture, phonetically associated (PATN), which aims at increasing discriminative power of by utilizing phonetic as well identity. The proposed model learned to minimize combined loss function that...
Previous research methods on wake-up word detection (WWD) have been proposed with focus finding a decent representation that can well express the characteristics of word. However, there are various obstacles such as noise and reverberation which make it difficult in real-world environments where WWD works. To tackle this, we propose novel architecture called interlayer selective attention network (ISAN) generates more robust by introducing concept attention. Experiments scenarios...
Maximum a posterior (MAP) adaptation is one of the popular and powerful methods for obtaining speaker-specific acoustic model. Basically, MAP needs data storage speaker adaptive (SA) model as much independent (SI) needs. Modern speech recognition systems have huge number parameters deal with millions users. To reduce SA models, in this paper, we propose constrained maximum likelihood estimation-based L1 regularization. By proposed method, can more efficiently perform adjustments models...
Self-supervised learning method that provides generalized speech representations has recently received increasing attention. Wav2vec 2.0 is the most famous example, showing remarkable performance in numerous downstream processing tasks. Despite its success, it challenging to use directly for wake-up word detection on mobile devices due expensive computational cost. In this work, we propose LiteFEW, a lightweight feature encoder preserves inherent ability of wav2vec with minimum scale....
A typical statistical parametric speech synthesis (text-to-speech, TTS) system consists of separate modules, such as a text analysis module, an acoustic modeling and module. This causes two problems: 1) expert knowledge each module is required, 2) errors generated in accumulate passing through An end-to-end TTS could avoid problems by synthesizing voice signals directly from input string. In this study, we implemented Korean using Google
Autonomic machine learning platforms must provide the necessary management tasks while monitoring execution status of remotely running and performance model being trained. In this paper, we design a cluster framework. The proposed framework monitors distributed computing resources so that it helps autonomic platform to select proper algorithm execute model.
The diversity of technologies in today's world has made Smart city a reality and contribute to improving living conditions. transportation is one the smart aspects that aims solve many problems cities such as traffic congestion problems, lack parking lot others. In this paper, we propose platform for analyzing designing private applications. proposed model can be served blueprint application projects order slot problems. used deriving an optimal routing based on alleviate during peak time...
By using information about specific road grades, one can predict the power required by a vehicle. The prediction of enables driver to choose driving route with best fuel economy, which results in cost and energy savings. A clinometer is more simple tools used measure but requires significant time effort. In this paper, new method for measuring grades from within vehicle proposed experimentally verified.
Abstract The role of the statistical model-based voice activity detector (SMVAD) is to detect speech regions from input signals using models noise and noisy speech. decision rule SMVAD based on likelihood ratio test (LRT). LRT-based may cause detection errors because properties signals. In this article, we first analyze reasons why occur then propose two modified rules reliable ratios (LRs). We also an effective weighting scheme considering spectral characteristics experiments proposed in...
The main advantage of the statistical parametric speech synthesis is its flexibility in changing voice characteristics. A personalized text-to-speech(TTS) system can be implemented by combining a and transformation system, it widely used many application areas. It known that fundamental frequency spectral envelope signal independently modified to convert Also important maintain naturalness transformed speech. In this paper, based on Hidden Markov Model(HMM-based synthesis, HTS) using...