NFDI4DS | UHH-SEMS - Publication Details

Yunsu Kim

ORCID: 0000-0002-1375-005X

Publications

Citations

Views

---

Saved

---

About

Contact & Profiles

A5005673540

Research Areas

Topic Modeling
Natural Language Processing Techniques
Speech Recognition and Synthesis
Multimodal Machine Learning Applications
Speech and dialogue systems
Text Readability and Simplification
Semantic Web and Ontologies
Advanced Text Analysis Techniques
Asian Culture and Media Studies
Advanced Clustering Algorithms Research
Data Stream Mining Techniques
Online Learning and Analytics
Text and Document Classification Technologies
Handwritten Text Recognition Techniques
Machine Learning and Algorithms
EEG and Brain-Computer Interfaces
Augmented Reality Applications
Multi-Agent Systems and Negotiation
Fiber-reinforced polymer composites
Technology-Enhanced Education Studies
Recycling and Waste Management Techniques
Interactive and Immersive Displays
Nanomaterials for catalytic reactions
Machine Learning and Data Classification
Aging and Gerontology Research

Inje University Ilsan Paik Hospital
2024-2025

Sungkyunkwan University
2024

Inje University
2024

Korea Post
2023

Pohang University of Science and Technology
2023

Korea Advanced Institute of Science and Technology
2023

RWTH Aachen University
2013-2019

FIR e. V. an der RWTH Aachen
2018-2019

Korea Railroad Research Institute
2005

Effective Cross-lingual Transfer of Neural Machine Translation Models without Shared Vocabularies

OPENALEX - Publications

Yunsu Kim Yingbo Gao Hermann Ney

Transfer learning or multilingual model is essential for low-resource neural machine translation (NMT), but the applicability limited to cognate languages by sharing their vocabularies. This paper shows effective techniques transfer a pretrained NMT new, unrelated language without shared We relieve vocabulary mismatch using cross-lingual word embedding, train more language-agnostic encoder injecting artificial noises, and generate synthetic data easily from pretraining back-translation. Our...

10.18653/v1/p19-1120 preprint EN cc-by 2019-01-01

SOLAR 10.7B: Scaling Large Language Models with Simple yet Effective Depth Up-Scaling

OPENALEX - Publications

Sanghoon Kim Dahyun Kim Chanjun Park Wonsung Lee Wonho Song and 13 more

10.18653/v1/2024.naacl-industry.3 article EN 2024-01-01

When and Why is Document-level Context Useful in Neural Machine Translation?

OPENALEX - Publications

Yunsu Kim Duc Thanh Tran Hermann Ney

Document-level context has received lots of attention for compensating neural machine translation (NMT) isolated sentences. However, recent advances in document-level NMT focus on sophisticated integration the context, explaining its improvement with only a few selected examples or targeted test sets. We extensively quantify causes improvements by model general sets, clarifying limit usefulness NMT. show that most are not interpretable as utilizing context. also minimal encoding is...

10.18653/v1/d19-6503 article EN cc-by 2019-01-01

Pivot-based Transfer Learning for Neural Machine Translation between Non-English Languages

OPENALEX - Publications

Yunsu Kim Petre Petrov Pavel Petrushkov Shahram Khadivi Hermann Ney

Yunsu Kim, Petre Petrov, Pavel Petrushkov, Shahram Khadivi, Hermann Ney. Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and 9th International Joint (EMNLP-IJCNLP). 2019.

10.18653/v1/d19-1080 article EN cc-by 2019-01-01

When and Why is Unsupervised Neural Machine Translation Useless?

OPENALEX - Publications

Yunsu Kim Miguel Graça Hermann Ney

This paper studies the practicality of current state-of-the-art unsupervised methods in neural machine translation (NMT). In ten tasks with various data settings, we analyze conditions under which fail to produce reasonable translations. We show that their performance is severely affected by linguistic dissimilarity and domain mismatch between source target monolingual data. Such are common for low-resource language pairs, where learning works poorly. all our experiments, supervised...

10.48550/arxiv.2004.10581 preprint EN cc-by arXiv (Cornell University) 2020-01-01

Beliefs about the longevity of emotions and depression

OPENALEX - Publications

Sunkyung Yoon Yunsu Kim Hee Joo Kim Nagyeom Yang

10.1016/j.paid.2025.113094 article EN Personality and Individual Differences 2025-02-10

CLINICAL IMPLICATION OF MAUMGYEOL BASIC BIOTYPES – EEG- AND PPG-BASED BWAVE STATE INVENTORY (EBSI)

OPENALEX - Publications

Seung-Kwon Ha Yunsu Kim Jeongin Kim Seung‐Hwan Lee

Abstract Background The development of individual subtypes based on biomarkers offers an intriguing and timely avenue for capturing factors pertaining to mental health independent from individuals’ insights. Aims & Objectives Incorporating 2-channel electroencephalography (EEG) photoplethysmogram (PPG), we sought develop subtype classification system with clinical relevance. Method One hundred healthy participants 99 patients psychiatric disorders were recruited. Classification...

10.1093/ijnp/pyae059.325 article EN cc-by-nc The International Journal of Neuropsychopharmacology 2025-02-01

Generalizing Back-Translation in Neural Machine Translation

OPENALEX - Publications

Miguel Graça Yunsu Kim Julian Schamper Shahram Khadivi Hermann Ney

Back-translation — data augmentation by translating target monolingual is a crucial component in modern neural machine translation (NMT). In this work, we reformulate back-translation the scope of cross-entropy optimization an NMT model, clarifying its underlying mathematical assumptions and approximations beyond heuristic usage. Our formulation covers broader synthetic generation schemes, including sampling from target-to-source model. With formulation, point out fundamental problems...

10.18653/v1/w19-5205 article EN cc-by 2019-01-01

Improving Unsupervised Word-by-Word Translation with Language Model and Denoising Autoencoder

OPENALEX - Publications

Yunsu Kim Jiahui Geng Hermann Ney

Unsupervised learning of cross-lingual word embedding offers elegant matching words across languages, but has fundamental limitations in translating sentences. In this paper, we propose simple yet effective methods to improve word-by-word translation embeddings, using only monolingual corpora without any back-translation. We integrate a language model for context-aware search, and use novel denoising autoencoder handle reordering. Our system surpasses state-of-the-art unsupervised systems...

10.18653/v1/d18-1101 article EN cc-by Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing 2018-01-01

The RWTH Aachen University Filtering System for the WMT 2018 Parallel Corpus Filtering Task

OPENALEX - Publications

Nick Rossenbach Jan Rosendahl Yunsu Kim Miguel Graça Aman Gokrani and 1 more

This paper describes the submission of RWTH Aachen University for De→En parallel corpus filtering task EMNLP 2018 Third Conference on Machine Translation (WMT 2018). We use several rule-based, heuristic methods to preselect sentence pairs. These pairs are scored with count-based and neural systems as language translation models. In addition single sentence-pair scoring, we further implement a simple redundancy removing heuristic. Our best performing system relies recurrent models based...

10.18653/v1/w18-6487 article EN cc-by 2018-01-01

Bioorthogonal CRISPR/Cas9‐Drug Conjugate: A Combinatorial Nanomedicine Platform

OPENALEX - Publications

Marcel Janis Beha Joo‐Chan Kim San Hae Im Yunsu Kim Seungju Yang and 5 more

Bioconjugation of proteins can substantially expand the opportunities in biopharmaceutical development, however, applications are limited for gene editing machinery despite its tremendous therapeutic potential. Here, a self-delivered nanomedicine platform based on bioorthogonal CRISPR/Cas9 conjugates, which be armed with chemotherapeutic drug combinatorial therapy is introduced. It demonstrated that multi-functionalized Cas9 and polymer form self-condensed nanocomplexes, induce significant...

10.1002/advs.202302253 article EN cc-by Advanced Science 2023-07-23

Prompt- and Trait Relation-aware Cross-prompt Essay Trait Scoring

OPENALEX - Publications

Heejin Do Yunsu Kim Gary Geunbae Lee

Automated essay scoring (AES) aims to score essays written for a given prompt, which defines the writing topic. Most existing AES systems assume grade of same prompt as used in training and assign only holistic score. However, such settings conflict with real-education situations; pre-graded particular are lacking, detailed trait scores sub-rubrics required. Thus, predicting various unseen-prompt (called cross-prompt scoring) is remaining challenge AES. In this paper, we propose robust...

10.18653/v1/2023.findings-acl.98 article EN cc-by Findings of the Association for Computational Linguistics: ACL 2022 2023-01-01

Word-Level ASR Quality Estimation for Efficient Corpus Sampling and Post-Editing through Analyzing Attentions of a Reference-Free Metric

OPENALEX - Publications

Golara Javadi Kamer Ali Yüksel Yunsu Kim Thiago Castro Ferreira Mohamed Al-Badrashiny

In the realm of automatic speech recognition (ASR), quest for models that not only perform with high accuracy but also offer transparency in their decision-making processes is crucial. The potential quality estimation (QE) metrics introduced and evaluated as a novel tool to enhance explainable artificial intelligence (XAI) ASR systems. Through experiments analyses, capabilities NoRefER (No Reference Error Rate) metric are explored identifying word-level errors aid post-editors refining...

10.48550/arxiv.2401.11268 preprint EN cc-by arXiv (Cornell University) 2024-01-01

Key-Element-Informed sLLM Tuning for Document Summarization

OPENALEX - Publications

Sangwon Ryu Heejin Do Yunsu Kim Gary Geunbae Lee Jungseul Ok

Remarkable advances in large language models (LLMs) have enabled high-quality text summarization. However, this capability is currently accessible only through LLMs of substantial size or proprietary with usage fees. In response, smaller-scale (sLLMs) easy accessibility and low costs been extensively studied, yet they often suffer from missing key information entities, i.e., relevance, particular, when input documents are long. We hence propose a key-element-informed instruction tuning for...

10.21437/interspeech.2024-2389 article EN Interspeech 2022 2024-09-01

Machine-Learning-Based Prediction of Photobiomodulation Effects on Older Adults with Cognitive Decline Using Functional Near-Infrared Spectroscopy

OPENALEX - Publications

Kyeonggu Lee Min-Young Chun Bori Jung Yunsu Kim Chaeyoun Yang and 4 more

Transcranial photobiomodulation (tPBM) has been widely studied for its potential to enhance cognitive functions of the elderly. However, efficacy varies, with some individuals exhibiting no significant response treatment. Considering these inconsistencies, we introduce a machine learning approach aimed at distinguishing between that respond and do not tPBM treatment based on functional near-infrared spectroscopy (fNIRS) acquired before We measured nine scores recorded fNIRS data from 62...

10.1109/tnsre.2024.3469284 article EN cc-by-nc-nd IEEE Transactions on Neural Systems and Rehabilitation Engineering 2024-01-01

The RWTH Aachen University Supervised Machine Translation Systems for WMT 2018

OPENALEX - Publications

Julian Schamper Jan Rosendahl Parnia Bahar Yunsu Kim Arne Nix and 1 more

This paper describes the statistical machine translation systems developed at RWTH Aachen University for German→English, English→Turkish and Chinese→English tasks of EMNLP 2018 Third Conference on Machine Translation (WMT 2018). We use ensembles neural based Transformer architecture. Our main focus is German→English task where we to all automatic scored first with respect metrics provided by organizers. identify data selection, fine-tuning, batch size model dimension as important...

10.18653/v1/w18-6426 article EN cc-by 2018-01-01

Subspace clustering of data streams: new algorithms and effective evaluation measures

OPENALEX - Publications

Marwan Hassani Yunsu Kim Seungjin Choi Thomas Seidl

10.1007/s10844-014-0319-2 article EN Journal of Intelligent Information Systems 2014-06-08

SOLAR 10.7B: Scaling Large Language Models with Simple yet Effective Depth Up-Scaling

OPENALEX - Publications

Dahyun Kim Chanjun Park Sang-Hoon Kim Wonsung Lee Wonho Song and 13 more

We introduce SOLAR 10.7B, a large language model (LLM) with 10.7 billion parameters, demonstrating superior performance in various natural processing (NLP) tasks. Inspired by recent efforts to efficiently up-scale LLMs, we present method for scaling LLMs called depth up-scaling (DUS), which encompasses depthwise and continued pretraining. In contrast other LLM methods that use mixture-of-experts, DUS does not require complex changes train inference efficiently. show experimentally is simple...

10.48550/arxiv.2312.15166 preprint EN cc-by arXiv (Cornell University) 2023-01-01

Effective Cross-lingual Transfer of Neural Machine Translation Models without Shared Vocabularies

OPENALEX - Publications

Yunsu Kim Yingbo Gao Hermann Ney

Transfer learning or multilingual model is essential for low-resource neural machine translation (NMT), but the applicability limited to cognate languages by sharing their vocabularies. This paper shows effective techniques transfer a pre-trained NMT new, unrelated language without shared We relieve vocabulary mismatch using cross-lingual word embedding, train more language-agnostic encoder injecting artificial noises, and generate synthetic data easily from pre-training back-translation....

10.48550/arxiv.1905.05475 preprint EN other-oa arXiv (Cornell University) 2019-01-01

The RWTH Aachen University English-German and German-English Unsupervised Neural Machine Translation Systems for WMT 2018

OPENALEX - Publications

Miguel Graça Yunsu Kim Julian Schamper Jiahui Geng Hermann Ney

This paper describes the unsupervised neural machine translation (NMT) systems of RWTH Aachen University developed for English ↔ German news task EMNLP 2018 Third Conference on Machine Translation (WMT 2018). Our work is based iterative back-translation using a shared encoder-decoder NMT model. We extensively compare different vocabulary types, word embedding initialization schemes and optimization methods our also investigate gating weight normalization layer.

10.18653/v1/w18-6409 article EN cc-by 2018-01-01

Extended Translation Models in Phrase-based Decoding

OPENALEX - Publications

Andreas Guta Joern Wuebker Miguel Graça Yunsu Kim Hermann Ney

We propose a novel extended translation model (ETM) to counteract some problems in phrase-based translation: The lack of context when using singleword phrases and uncaptured dependencies beyond phrase boundaries.The ETM operates on word-level augments the IBM models by an additional bilingual word pair reordering operation.Its implementation decoder introduces for single-word across boundaries.More, incorporates explicit treatment multiple empty alignments.Its integration outperforms...

10.18653/v1/w15-3033 article EN cc-by 2015-01-01

Autoregressive Score Generation for Multi-trait Essay Scoring

OPENALEX - Publications

Heejin Do Yunsu Kim Gary Geunbae Lee

Recently, encoder-only pre-trained models such as BERT have been successfully applied in automated essay scoring (AES) to predict a single overall score. However, studies yet explore these multi-trait AES, possibly due the inefficiency of replicating BERT-based for each trait. Breaking away from existing sole use encoder, we propose an autoregressive prediction scores (ArTS), incorporating decoding process by leveraging T5. Unlike prior regression or classification methods, redefine AES...

10.48550/arxiv.2403.08332 preprint EN arXiv (Cornell University) 2024-03-13

Denoising Table-Text Retrieval for Open-Domain Question Answering

OPENALEX - Publications

Deokhyung Kang Baikjin Jung Yunsu Kim Gary Geunbae Lee

In table-text open-domain question answering, a retriever system retrieves relevant evidence from tables and text to answer questions. Previous studies in answering have two common challenges: firstly, their retrievers can be affected by false-positive labels training datasets; secondly, they may struggle provide appropriate for questions that require reasoning across the table. To address these issues, we propose Denoised Table-Text Retriever (DoTTeR). Our approach involves utilizing...

10.48550/arxiv.2403.17611 preprint EN arXiv (Cornell University) 2024-03-26

Coming Soon ...