Yunsu Kim

ORCID: 0000-0002-1375-005X
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Topic Modeling
  • Natural Language Processing Techniques
  • Speech Recognition and Synthesis
  • Multimodal Machine Learning Applications
  • Speech and dialogue systems
  • Text Readability and Simplification
  • Semantic Web and Ontologies
  • Advanced Text Analysis Techniques
  • Asian Culture and Media Studies
  • Advanced Clustering Algorithms Research
  • Data Stream Mining Techniques
  • Online Learning and Analytics
  • Text and Document Classification Technologies
  • Handwritten Text Recognition Techniques
  • Machine Learning and Algorithms
  • EEG and Brain-Computer Interfaces
  • Augmented Reality Applications
  • Multi-Agent Systems and Negotiation
  • Fiber-reinforced polymer composites
  • Technology-Enhanced Education Studies
  • Recycling and Waste Management Techniques
  • Interactive and Immersive Displays
  • Nanomaterials for catalytic reactions
  • Machine Learning and Data Classification
  • Aging and Gerontology Research

Inje University Ilsan Paik Hospital
2024-2025

Sungkyunkwan University
2024

Inje University
2024

Korea Post
2023

Pohang University of Science and Technology
2023

Korea Advanced Institute of Science and Technology
2023

RWTH Aachen University
2013-2019

FIR e. V. an der RWTH Aachen
2018-2019

Korea Railroad Research Institute
2005

Transfer learning or multilingual model is essential for low-resource neural machine translation (NMT), but the applicability limited to cognate languages by sharing their vocabularies. This paper shows effective techniques transfer a pretrained NMT new, unrelated language without shared We relieve vocabulary mismatch using cross-lingual word embedding, train more language-agnostic encoder injecting artificial noises, and generate synthetic data easily from pretraining back-translation. Our...

10.18653/v1/p19-1120 preprint EN cc-by 2019-01-01

Document-level context has received lots of attention for compensating neural machine translation (NMT) isolated sentences. However, recent advances in document-level NMT focus on sophisticated integration the context, explaining its improvement with only a few selected examples or targeted test sets. We extensively quantify causes improvements by model general sets, clarifying limit usefulness NMT. show that most are not interpretable as utilizing context. also minimal encoding is...

10.18653/v1/d19-6503 article EN cc-by 2019-01-01

Yunsu Kim, Petre Petrov, Pavel Petrushkov, Shahram Khadivi, Hermann Ney. Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and 9th International Joint (EMNLP-IJCNLP). 2019.

10.18653/v1/d19-1080 article EN cc-by 2019-01-01

This paper studies the practicality of current state-of-the-art unsupervised methods in neural machine translation (NMT). In ten tasks with various data settings, we analyze conditions under which fail to produce reasonable translations. We show that their performance is severely affected by linguistic dissimilarity and domain mismatch between source target monolingual data. Such are common for low-resource language pairs, where learning works poorly. all our experiments, supervised...

10.48550/arxiv.2004.10581 preprint EN cc-by arXiv (Cornell University) 2020-01-01

10.1016/j.paid.2025.113094 article EN Personality and Individual Differences 2025-02-10

Abstract Background The development of individual subtypes based on biomarkers offers an intriguing and timely avenue for capturing factors pertaining to mental health independent from individuals’ insights. Aims & Objectives Incorporating 2-channel electroencephalography (EEG) photoplethysmogram (PPG), we sought develop subtype classification system with clinical relevance. Method One hundred healthy participants 99 patients psychiatric disorders were recruited. Classification...

10.1093/ijnp/pyae059.325 article EN cc-by-nc The International Journal of Neuropsychopharmacology 2025-02-01

Back-translation — data augmentation by translating target monolingual is a crucial component in modern neural machine translation (NMT). In this work, we reformulate back-translation the scope of cross-entropy optimization an NMT model, clarifying its underlying mathematical assumptions and approximations beyond heuristic usage. Our formulation covers broader synthetic generation schemes, including sampling from target-to-source model. With formulation, point out fundamental problems...

10.18653/v1/w19-5205 article EN cc-by 2019-01-01

Unsupervised learning of cross-lingual word embedding offers elegant matching words across languages, but has fundamental limitations in translating sentences. In this paper, we propose simple yet effective methods to improve word-by-word translation embeddings, using only monolingual corpora without any back-translation. We integrate a language model for context-aware search, and use novel denoising autoencoder handle reordering. Our system surpasses state-of-the-art unsupervised systems...

10.18653/v1/d18-1101 article EN cc-by Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing 2018-01-01

This paper describes the submission of RWTH Aachen University for De→En parallel corpus filtering task EMNLP 2018 Third Conference on Machine Translation (WMT 2018). We use several rule-based, heuristic methods to preselect sentence pairs. These pairs are scored with count-based and neural systems as language translation models. In addition single sentence-pair scoring, we further implement a simple redundancy removing heuristic. Our best performing system relies recurrent models based...

10.18653/v1/w18-6487 article EN cc-by 2018-01-01

Bioconjugation of proteins can substantially expand the opportunities in biopharmaceutical development, however, applications are limited for gene editing machinery despite its tremendous therapeutic potential. Here, a self-delivered nanomedicine platform based on bioorthogonal CRISPR/Cas9 conjugates, which be armed with chemotherapeutic drug combinatorial therapy is introduced. It demonstrated that multi-functionalized Cas9 and polymer form self-condensed nanocomplexes, induce significant...

10.1002/advs.202302253 article EN cc-by Advanced Science 2023-07-23

Automated essay scoring (AES) aims to score essays written for a given prompt, which defines the writing topic. Most existing AES systems assume grade of same prompt as used in training and assign only holistic score. However, such settings conflict with real-education situations; pre-graded particular are lacking, detailed trait scores sub-rubrics required. Thus, predicting various unseen-prompt (called cross-prompt scoring) is remaining challenge AES. In this paper, we propose robust...

10.18653/v1/2023.findings-acl.98 article EN cc-by Findings of the Association for Computational Linguistics: ACL 2022 2023-01-01

In the realm of automatic speech recognition (ASR), quest for models that not only perform with high accuracy but also offer transparency in their decision-making processes is crucial. The potential quality estimation (QE) metrics introduced and evaluated as a novel tool to enhance explainable artificial intelligence (XAI) ASR systems. Through experiments analyses, capabilities NoRefER (No Reference Error Rate) metric are explored identifying word-level errors aid post-editors refining...

10.48550/arxiv.2401.11268 preprint EN cc-by arXiv (Cornell University) 2024-01-01

Remarkable advances in large language models (LLMs) have enabled high-quality text summarization. However, this capability is currently accessible only through LLMs of substantial size or proprietary with usage fees. In response, smaller-scale (sLLMs) easy accessibility and low costs been extensively studied, yet they often suffer from missing key information entities, i.e., relevance, particular, when input documents are long. We hence propose a key-element-informed instruction tuning for...

10.21437/interspeech.2024-2389 article EN Interspeech 2022 2024-09-01

Transcranial photobiomodulation (tPBM) has been widely studied for its potential to enhance cognitive functions of the elderly. However, efficacy varies, with some individuals exhibiting no significant response treatment. Considering these inconsistencies, we introduce a machine learning approach aimed at distinguishing between that respond and do not tPBM treatment based on functional near-infrared spectroscopy (fNIRS) acquired before We measured nine scores recorded fNIRS data from 62...

10.1109/tnsre.2024.3469284 article EN cc-by-nc-nd IEEE Transactions on Neural Systems and Rehabilitation Engineering 2024-01-01

This paper describes the statistical machine translation systems developed at RWTH Aachen University for German→English, English→Turkish and Chinese→English tasks of EMNLP 2018 Third Conference on Machine Translation (WMT 2018). We use ensembles neural based Transformer architecture. Our main focus is German→English task where we to all automatic scored first with respect metrics provided by organizers. identify data selection, fine-tuning, batch size model dimension as important...

10.18653/v1/w18-6426 article EN cc-by 2018-01-01

We introduce SOLAR 10.7B, a large language model (LLM) with 10.7 billion parameters, demonstrating superior performance in various natural processing (NLP) tasks. Inspired by recent efforts to efficiently up-scale LLMs, we present method for scaling LLMs called depth up-scaling (DUS), which encompasses depthwise and continued pretraining. In contrast other LLM methods that use mixture-of-experts, DUS does not require complex changes train inference efficiently. show experimentally is simple...

10.48550/arxiv.2312.15166 preprint EN cc-by arXiv (Cornell University) 2023-01-01

Transfer learning or multilingual model is essential for low-resource neural machine translation (NMT), but the applicability limited to cognate languages by sharing their vocabularies. This paper shows effective techniques transfer a pre-trained NMT new, unrelated language without shared We relieve vocabulary mismatch using cross-lingual word embedding, train more language-agnostic encoder injecting artificial noises, and generate synthetic data easily from pre-training back-translation....

10.48550/arxiv.1905.05475 preprint EN other-oa arXiv (Cornell University) 2019-01-01

This paper describes the unsupervised neural machine translation (NMT) systems of RWTH Aachen University developed for English ↔ German news task EMNLP 2018 Third Conference on Machine Translation (WMT 2018). Our work is based iterative back-translation using a shared encoder-decoder NMT model. We extensively compare different vocabulary types, word embedding initialization schemes and optimization methods our also investigate gating weight normalization layer.

10.18653/v1/w18-6409 article EN cc-by 2018-01-01

We propose a novel extended translation model (ETM) to counteract some problems in phrase-based translation: The lack of context when using singleword phrases and uncaptured dependencies beyond phrase boundaries.The ETM operates on word-level augments the IBM models by an additional bilingual word pair reordering operation.Its implementation decoder introduces for single-word across boundaries.More, incorporates explicit treatment multiple empty alignments.Its integration outperforms...

10.18653/v1/w15-3033 article EN cc-by 2015-01-01

Recently, encoder-only pre-trained models such as BERT have been successfully applied in automated essay scoring (AES) to predict a single overall score. However, studies yet explore these multi-trait AES, possibly due the inefficiency of replicating BERT-based for each trait. Breaking away from existing sole use encoder, we propose an autoregressive prediction scores (ArTS), incorporating decoding process by leveraging T5. Unlike prior regression or classification methods, redefine AES...

10.48550/arxiv.2403.08332 preprint EN arXiv (Cornell University) 2024-03-13

In table-text open-domain question answering, a retriever system retrieves relevant evidence from tables and text to answer questions. Previous studies in answering have two common challenges: firstly, their retrievers can be affected by false-positive labels training datasets; secondly, they may struggle provide appropriate for questions that require reasoning across the table. To address these issues, we propose Denoised Table-Text Retriever (DoTTeR). Our approach involves utilizing...

10.48550/arxiv.2403.17611 preprint EN arXiv (Cornell University) 2024-03-26
Coming Soon ...