- Speech Recognition and Synthesis
- Speech and Audio Processing
- Seismology and Earthquake Studies
- Music and Audio Processing
- Earthquake Detection and Analysis
- Advanced Image Processing Techniques
- earthquake and tectonic studies
- Human Pose and Action Recognition
- Image Processing Techniques and Applications
- Hand Gesture Recognition Systems
- Human Motion and Animation
- Advanced Vision and Imaging
- Image and Signal Denoising Methods
- Anomaly Detection Techniques and Applications
- Seismic Waves and Analysis
- Face recognition and analysis
- Generative Adversarial Networks and Image Synthesis
- Heart Failure Treatment and Management
- Advanced Text Analysis Techniques
- Advanced Data Compression Techniques
- Female Genital Mutilation/Cutting Issues
- Industrial Vision Systems and Defect Detection
- Text and Document Classification Technologies
- Speech and dialogue systems
- Antenna Design and Optimization
Korea University
2020-2024
This paper reviews the NTIRE 2020 challenge on real world super-resolution. It focuses participating methods and final results. The addresses setting, where paired true high low-resolution images are unavailable. For training, only one set of source input is therefore provided along with a unpaired high-quality target images. In Track 1: Image Processing artifacts, aim to super-resolve synthetically generated image processing artifacts. allows for quantitative benchmarking approaches w.r.t....
This paper reviews the NTIRE 2020 challenge on perceptual extreme super-resolution with focus proposed solutions and results.The task was to super-resolve an input image a magnification factor ×16 based set of prior examples low corresponding high resolution images.The goal is obtain network design capable produce results best quality similar ground truth.The track had 280 registered participants, 19 teams submitted final results.They gauge state-of-the-art in single superresolution....
This paper proposes an unsupervised single-image Super-Resolution(SR) model using cycleGAN and domain discriminator to solve the problem of SR with unknown degradation unpaired dataset. In previous approaches, paired dataset is required for training assumed levels image degradation. real world applications, however, sets are typically not low high resolution pairs, but only images provided as inputs. To address problem, we introduce a cycle-in-cycle GAN based learning addition, combine...
This letter proposes a multiple station-based seismic event classification model using deep convolution neural network (CNN) and graph (GCN). To classify various events, such as natural earthquakes, artificial noise, the proposed consists of weight-shared layers, fully connected layers. We employed layers in order to aggregate features from stations. Representative experimental results with Korean peninsula earthquake datasets 2016 2019 showed that is superior single-station based state-of...
This letter presents a deep convolutional neural network (CNN) with attention module that improves the performance of classification various earthquake events. Addressing all possible events, including not only microearthquakes and artificial-earthquakes but also large-earthquakes, requires both suitable feature expression classifier can effectively discriminate seismic waveforms under adverse conditions. To robustly classify CNN an was proposed in raw waveforms. Representative experimental...
A mixed sample data augmentation strategy is proposed to enhance the performance of models on audio scene classification, sound event and speech enhancement tasks.While there have been several methods shown be effective in improving image classification performance, their efficacy toward time-frequency domain features not assured.We propose a novel approach named "Specmix" specifically designed for dealing with features.The method consists mixing two different samples by applying masks...
Single image extreme Super Resolution (SR) is a difficult task as scale factor in the order of 10X or greater typically attempted. For instance, case 16x upscale an image, single pixel from low resolution gets expanded to 16x16 patch. Such attempts often result fuzzy quality and loss details reconstructed images. To handle these difficulties, we propose network architecture composed series connected blocks recurrent feedback fashions for enhanced SR reconstruction. By use network, refined...
This letter proposes a multifeature fusion model using deep convolution neural networks and transfer learning approach for earthquake event classification. There are several feature representations seismic analysis, such as the time domain, frequency time–frequency domain. To successfully classify various events, we propose novel that combines these features hierarchically. In addition, apply to mitigate overfitting problem of while achieving high classification performance. evaluate our...
A mixed sample data augmentation strategy is proposed to enhance the performance of models on audio scene classification, sound event and speech enhancement tasks. While there have been several methods shown be effective in improving image classification performance, their efficacy toward time-frequency domain features not assured. We propose a novel approach named "Specmix" specifically designed for dealing with features. The method consists mixing two different samples by applying masks...
When virtual agents interact with humans, gestures are crucial to delivering their intentions speech. Previous multimodal co-speech gesture generation models required encoded features of all modalities generate gestures. If some input removed or contain noise, the model may not properly. To acquire robust and generalized encodings, we propose a novel framework pre-trained encoder for generation. In proposed method, multi-head-attention-based is trained self-supervised learning information on...
Classifying seismic events and estimating their magnitude are crucial topics in the study of waves. Due to disparities between global local geologic features, models exclusively trained on data may exhibit suboptimal performance contexts. To solve this problem, paper proposes a method evaluate effectiveness Low-Rank Adaptation (LoRA) technique wave research using convolution-augmented transformer (Conformer). We simplified modified Conformer model, reducing number parameters by more than...
Keyword Spotting (KWS) is an essential component in contemporary audio-based deep learning systems and should be of minimal design when the system working streaming on-device environments. We presented a robust feature extraction with single-layer dynamic convolution model our previous work. In this letter, we expand earlier study into multi-layers operation propose Knowledge Distillation (KD) method. Based on distribution between class-centroids embedding vectors, compute three distinct...
This paper reviews the NTIRE 2020 challenge on real world super-resolution. It focuses participating methods and final results. The addresses setting, where paired true high low-resolution images are unavailable. For training, only one set of source input is therefore provided along with a unpaired high-quality target images. In Track 1: Image Processing artifacts, aim to super-resolve synthetically generated image processing artifacts. allows for quantitative benchmarking approaches \wrt...
A mixed sample data augmentation strategy is proposed to enhance the performance of models on audio scene classification, sound event and speech enhancement tasks. While there have been several methods shown be effective in improving image classification performance, their efficacy toward time-frequency domain features not assured. We propose a novel approach named "Specmix" specifically designed for dealing with features. The method consists mixing two different samples by applying masks...
The evolution of Network Architecture (NA) allowed Key-Word Spotting (KWS) to exhibit high performance. Generally, NA for KWS is required have low parameter and computation complexity maintaining classification Most the attempts so far been based on manual approaches, often architectures developed from such efforts dwell in balance performance network complexity. Then, several models Neural Search (NAS) technique proposed. However, these methods do not consider number parameters FLOPs search...
This paper reviews the NTIRE 2020 challenge on perceptual extreme super-resolution with focus proposed solutions and results. The task was to super-resolve an input image a magnification factor 16 based set of prior examples low corresponding high resolution images. goal is obtain network design capable produce results best quality similar ground truth. track had 280 registered participants, 19 teams submitted final They gauge state-of-the-art in single super-resolution.
Recently, deep learning-based facial landmark detection for in-the-wild faces has achieved significant improvement. However, there are still challenges in face other domains (e.g. cartoon, caricature, etc). This is due to the scarcity of extensively annotated training data. To tackle this concern, we design a two-stage approach that effectively leverages limited datasets and pre-trained diffusion model obtain aligned pairs landmarks multiple domains. In first stage, train...
This paper presents a complex-valued deep neural network for sound source localization. Most network-based localization approaches use time-frequency domain features. Even though both magnitude and phase play pivotal role in solving the problem, real-valued features are only used because structures generally accept inputs only. In contrast, directly receive extract hidden Therefore, network, which is proposed this paper, has potential to rich With series of experiments, direction arrival...
Recently, deep learning-based facial landmark detection for in-the-wild faces has achieved significant improvement. However, there are still challenges in face other domains (e.g. cartoon, caricature, etc). This is due to the scarcity of extensively annotated training data. To tackle this concern, we design a two-stage approach that effectively leverages limited datasets and pre-trained diffusion model obtain aligned pairs landmarks multiple domains. In first stage, train...
In recent times, there has been a growing interest in utilizing personalized large models on low-spec devices, such as mobile and CPU-only devices. However, model the on-device is inefficient, sometimes limited due to computational cost. To tackle problem, this paper presents weights separation method minimize using parameter-efficient fine-tuning methods. Moreover, some people speak multiple languages an utterance, known code-switching, ASR necessary address cases. current multilingual...
Detecting earthquake events from seismic time series signal is a challenging task. Recently, detection methods based on machine learning have been developed to improve the accuracy and efficiency. However, of those rely sufficient amount high-quality training data. In many situations, data difficulty obtain. We address resolve this issue by using Generative Adversarial Network (GAN) model for synthesis. GAN already shows its powerful capability in generating high quality synthetic samples...
This paper describes a diffusion model for co-speech gesture generation presented by KU-ISPL entry of the GENEA Challenge 2023. We formulate problem as and semantic problem, we focus on solving denoising probabilistic with text, audio, pre-pose conditions. use U-Net cross-attention architecture model, propose autoencoder mapping function from domain to latent domain. The collective evaluation released 2023 shows that our successfully generates gestures. Our system receives mean...
Sound event classification is starting to receive a lot of attention over the recent years in field audio processing because open datasets, which are recorded various conditions, and introduction challenges. To use sound model wild, it needed be independent recording conditions. Therefore, more generalized model, that can trained tested must researched. This paper presents deep neural network with dual-path frequency residual feedback modules for classification. Most based approaches...