- Topic Modeling
- Natural Language Processing Techniques
- Domain Adaptation and Few-Shot Learning
- Speech Recognition and Synthesis
- Advanced Neural Network Applications
- Multimodal Machine Learning Applications
- Speech and Audio Processing
- Music and Audio Processing
- Advanced Image and Video Retrieval Techniques
- Orthopedic Infections and Treatments
- Machine Learning and ELM
- Image Processing Techniques and Applications
- Neural Networks and Applications
- Millimeter-Wave Propagation and Modeling
- Advanced MIMO Systems Optimization
- Industrial Vision Systems and Defect Detection
- Advanced Vision and Imaging
- Speech and dialogue systems
- Digital Rights Management and Security
- Semantic Web and Ontologies
- Advanced Data Storage Technologies
- Image Retrieval and Classification Techniques
- Infectious Diseases and Tuberculosis
- Microwave Engineering and Waveguides
- Gait Recognition and Analysis
Seoul National University
2017-2025
Seoul Media Institute of Technology
2025
Qualcomm (United Kingdom)
2022
Market Matters
2022
Generalized zero-shot learning (GZSL) is a technique to train deep model identify unseen classes using the attribute. In this paper, we put forth new GZSL that improves classification performance greatly. Key idea of proposed approach, henceforth referred as semantic feature extraction-based (SE-GZSL), use containing only attribute-related information in relationship between image and doing so, can remove interference, if any, caused by attribute-irrelevant contained feature. To network...
Beamforming technique realized by the multipleinput-multiple-output (MIMO) antenna arrays has been widely used to compensate for severe path loss in millimeter wave (mmWave) bands. In 5G NR system, beam sweeping and refinement are employed find out best codeword aligned mobile. Due complicated handshaking finite resolution of codebook, today's 5G-based management strategy is ineffective various scenarios terms data rate, energy consumption, also processing latency. An aim this article...
Compositional Zero-Shot Learning (CZSL) aims to identify unseen state-object compositions by leveraging knowledge learned from seen compositions. Existing approaches often independently predict states and objects, overlooking their relationships. In this paper, we propose a novel framework, learning primitive relations (LPR), designed probabilistically capture the relationships between objects. By employing cross-attention mechanism, LPR considers dependencies enabling model infer likelihood...
Generalized zero-shot learning (GZSL) is a technique to train deep model identify unseen classes using the image attribute. In this paper, we put forth new GZSL exploiting Vision Transformer (ViT) maximize attribute-related information contained in feature. ViT, entire region processed without degradation of resolution and local preserved patch features. To fully enjoy benefits exploit features as well CLS feature extraction particular, propose novel attention-based module, called attribute...
Monocular depth estimation is very challenging because clues to the exact are incomplete in a single RGB image. To overcome limitation, deep neural networks rely on various visual hints such as size, shade, and texture extracted from information. However, we observe that if overly exploited, network can be biased information without considering comprehensive view. We propose novel model named RElative Depth Transformer (RED-T) uses relative guidance self-attention. Specifically, assigns high...
Weakly-supervised semantic segmentation (WSSS) aims to train a network using weak labels. Recent approaches generate the pseudo-label from image-level label and then exploit it as pixel-level supervision in training. A potential drawback of conventional WSSS is that cannot accurately express object regions their classes, causing degradation performance. In this paper, we propose new technique trains without relying on pseudo-label. Key idea proposed approach such erased by map not detected...
Unsupervised semantic segmentation (USS) aims to discover and recognize meaningful categories without any labels. For a successful USS, two key abilities are required: 1) information compression 2) clustering capability. Previous methods have relied on feature dimension reduction for compression, however, this approach may hinder the process of clustering. In paper, we propose novel USS framework called Expand-and-Quantize Semantic Segmentation (EQUSS), which combines benefits...
In few-shot open-set recognition (FSOSR), a network learns to recognize closed-set samples with few support while rejecting no class cue. Unlike conventional OSR, the FSOSR considers more practical open worlds where can be selected as an in another testing (task) and vice versa. Existing methods have commonly represented set task-dependent extra modules. These modules decently handle varied closed classes but accompany inevitable complexity increase. This paper shows that single prototype...
Image-text retrieval is a task to search for the proper textual descriptions of visual world and vice versa. One challenge this vulnerability input image/text corruptions. Such corruptions are often unobserved during training, degrade model's decision quality substantially. In paper, we propose novel image-text technique, referred as robust semantic embedding (RVSE), which consists image-based text-based augmentation techniques called semantic-preserving image (SPAug-I) text (SPAug-T). Since...
Recently, the necessity of multiple attention heads in transformer architecture has been questioned [1]. Removing less important from a large network is promising strategy to reduce computation cost and parameters. However, pruning out multihead does not evenly overall load, because feedforward modules are affected. In this study, we apply head on All-attention [2] transformer, where savings proportional number pruned heads. This improved computing efficiency comes at sensitivity, which...
The customization of large language models (LLMs) for user-specified tasks gets important. However, maintaining all the customized LLMs on cloud servers incurs substantial memory and computational overheads, uploading user data can also lead to privacy concerns. On-device offer a promising solution by mitigating these issues. Yet, performance on-device is inherently constrained limitations small-scaled models. To overcome restrictions, we first propose Crayon, novel approach LLM...
The expansion of speech models emphasizes the importance parameter efficiency in practical automatic recognition (ASR) systems. Parameter sharing, which reuses same multiple times, has emerged as a promising solution to reduce storage requirements. However, previous studies have often faced challenges balancing number parameters with performance. In this paper, we propose novel architecture that effectively reduces while minimizing performance degradation. key idea is insert lightweight...
A text encoder within Vision-Language Models (VLMs) like CLIP plays a crucial role in translating textual input into an embedding space shared with images, thereby facilitating the interpretative analysis of vision tasks through natural language. Despite varying significance different elements sentence depending on context, efforts to account for variation importance constructing embeddings have been lacking. We propose framework Semantic Token Reweighting build Interpretable (SToRI), which...
Handling long input contexts remains a significant challenge for Large Language Models (LLMs), particularly in resource-constrained environments such as mobile devices. Our work aims to address this limitation by introducing InfiniPot, novel KV cache control framework designed enable pre-trained LLMs manage extensive sequences within fixed memory constraints efficiently, without requiring additional training. InfiniPot leverages Continual Context Distillation (CCD), an iterative process that...
Recently, we have observed that Large Multi-modal Models (LMMs) are revolutionizing the way machines interact with world, unlocking new possibilities across various multi-modal applications. To adapt LMMs for downstream tasks, parameter-efficient fine-tuning (PEFT) which only trains additional prefix tokens or modules, has gained popularity. Nevertheless, there been little analysis of how PEFT works in LMMs. In this paper, delve into strengths and weaknesses each tuning strategy, shifting...
Few-Shot Open-Set Recognition (FSOSR) targets a critical real-world challenge, aiming to categorize inputs into known categories, termed closed-set classes, while identifying open-set that fall outside these classes. Although transfer learning where model is tuned given few-shot task has become prominent paradigm in closed-world, we observe it fails expand open-world. To unlock this propose two-stage method which consists of aware meta-learning with free learning. In the stage, trained...
Transformer-based deep neural networks have achieved great success in various sequence applications due to their powerful ability model long-range dependency. The key module of Transformer is self-attention (SA) which extracts features from the entire regardless distance between positions. Although SA helps performs particularly well on tasks, requires quadratic computation and memory complexity with input length. Recently, attention map reuse, groups multiple layers share one map, has been...
The performance of automatic speech recognition (ASR) models can be greatly improved by proper beam-search decoding with an external language model (LM). There has been increasing interest in Korean recognition, but not many studies have focused on the procedure. In this paper, we propose a tokenization method for neural network-based LM used ASR. Although common approach is to use same as ASR model, show that it may best choice Korean. We new inserts special token, SkipTC, when there no...