- Topic Modeling
- Natural Language Processing Techniques
- Speech Recognition and Synthesis
- Multimodal Machine Learning Applications
- Speech and dialogue systems
- Text Readability and Simplification
- Semantic Web and Ontologies
- Advanced Text Analysis Techniques
- Asian Culture and Media Studies
- Advanced Clustering Algorithms Research
- Data Stream Mining Techniques
- Online Learning and Analytics
- Text and Document Classification Technologies
- Handwritten Text Recognition Techniques
- Machine Learning and Algorithms
- EEG and Brain-Computer Interfaces
- Augmented Reality Applications
- Multi-Agent Systems and Negotiation
- Fiber-reinforced polymer composites
- Technology-Enhanced Education Studies
- Recycling and Waste Management Techniques
- Interactive and Immersive Displays
- Nanomaterials for catalytic reactions
- Machine Learning and Data Classification
- Aging and Gerontology Research
Inje University Ilsan Paik Hospital
2024-2025
Sungkyunkwan University
2024
Inje University
2024
Korea Post
2023
Pohang University of Science and Technology
2023
Korea Advanced Institute of Science and Technology
2023
RWTH Aachen University
2013-2019
FIR e. V. an der RWTH Aachen
2018-2019
Korea Railroad Research Institute
2005
Transfer learning or multilingual model is essential for low-resource neural machine translation (NMT), but the applicability limited to cognate languages by sharing their vocabularies. This paper shows effective techniques transfer a pretrained NMT new, unrelated language without shared We relieve vocabulary mismatch using cross-lingual word embedding, train more language-agnostic encoder injecting artificial noises, and generate synthetic data easily from pretraining back-translation. Our...
Document-level context has received lots of attention for compensating neural machine translation (NMT) isolated sentences. However, recent advances in document-level NMT focus on sophisticated integration the context, explaining its improvement with only a few selected examples or targeted test sets. We extensively quantify causes improvements by model general sets, clarifying limit usefulness NMT. show that most are not interpretable as utilizing context. also minimal encoding is...
Yunsu Kim, Petre Petrov, Pavel Petrushkov, Shahram Khadivi, Hermann Ney. Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and 9th International Joint (EMNLP-IJCNLP). 2019.
This paper studies the practicality of current state-of-the-art unsupervised methods in neural machine translation (NMT). In ten tasks with various data settings, we analyze conditions under which fail to produce reasonable translations. We show that their performance is severely affected by linguistic dissimilarity and domain mismatch between source target monolingual data. Such are common for low-resource language pairs, where learning works poorly. all our experiments, supervised...
Abstract Background The development of individual subtypes based on biomarkers offers an intriguing and timely avenue for capturing factors pertaining to mental health independent from individuals’ insights. Aims & Objectives Incorporating 2-channel electroencephalography (EEG) photoplethysmogram (PPG), we sought develop subtype classification system with clinical relevance. Method One hundred healthy participants 99 patients psychiatric disorders were recruited. Classification...
Back-translation — data augmentation by translating target monolingual is a crucial component in modern neural machine translation (NMT). In this work, we reformulate back-translation the scope of cross-entropy optimization an NMT model, clarifying its underlying mathematical assumptions and approximations beyond heuristic usage. Our formulation covers broader synthetic generation schemes, including sampling from target-to-source model. With formulation, point out fundamental problems...
Unsupervised learning of cross-lingual word embedding offers elegant matching words across languages, but has fundamental limitations in translating sentences. In this paper, we propose simple yet effective methods to improve word-by-word translation embeddings, using only monolingual corpora without any back-translation. We integrate a language model for context-aware search, and use novel denoising autoencoder handle reordering. Our system surpasses state-of-the-art unsupervised systems...
This paper describes the submission of RWTH Aachen University for De→En parallel corpus filtering task EMNLP 2018 Third Conference on Machine Translation (WMT 2018). We use several rule-based, heuristic methods to preselect sentence pairs. These pairs are scored with count-based and neural systems as language translation models. In addition single sentence-pair scoring, we further implement a simple redundancy removing heuristic. Our best performing system relies recurrent models based...
Bioconjugation of proteins can substantially expand the opportunities in biopharmaceutical development, however, applications are limited for gene editing machinery despite its tremendous therapeutic potential. Here, a self-delivered nanomedicine platform based on bioorthogonal CRISPR/Cas9 conjugates, which be armed with chemotherapeutic drug combinatorial therapy is introduced. It demonstrated that multi-functionalized Cas9 and polymer form self-condensed nanocomplexes, induce significant...
Automated essay scoring (AES) aims to score essays written for a given prompt, which defines the writing topic. Most existing AES systems assume grade of same prompt as used in training and assign only holistic score. However, such settings conflict with real-education situations; pre-graded particular are lacking, detailed trait scores sub-rubrics required. Thus, predicting various unseen-prompt (called cross-prompt scoring) is remaining challenge AES. In this paper, we propose robust...
In the realm of automatic speech recognition (ASR), quest for models that not only perform with high accuracy but also offer transparency in their decision-making processes is crucial. The potential quality estimation (QE) metrics introduced and evaluated as a novel tool to enhance explainable artificial intelligence (XAI) ASR systems. Through experiments analyses, capabilities NoRefER (No Reference Error Rate) metric are explored identifying word-level errors aid post-editors refining...
Remarkable advances in large language models (LLMs) have enabled high-quality text summarization. However, this capability is currently accessible only through LLMs of substantial size or proprietary with usage fees. In response, smaller-scale (sLLMs) easy accessibility and low costs been extensively studied, yet they often suffer from missing key information entities, i.e., relevance, particular, when input documents are long. We hence propose a key-element-informed instruction tuning for...
Transcranial photobiomodulation (tPBM) has been widely studied for its potential to enhance cognitive functions of the elderly. However, efficacy varies, with some individuals exhibiting no significant response treatment. Considering these inconsistencies, we introduce a machine learning approach aimed at distinguishing between that respond and do not tPBM treatment based on functional near-infrared spectroscopy (fNIRS) acquired before We measured nine scores recorded fNIRS data from 62...
This paper describes the statistical machine translation systems developed at RWTH Aachen University for German→English, English→Turkish and Chinese→English tasks of EMNLP 2018 Third Conference on Machine Translation (WMT 2018). We use ensembles neural based Transformer architecture. Our main focus is German→English task where we to all automatic scored first with respect metrics provided by organizers. identify data selection, fine-tuning, batch size model dimension as important...
We introduce SOLAR 10.7B, a large language model (LLM) with 10.7 billion parameters, demonstrating superior performance in various natural processing (NLP) tasks. Inspired by recent efforts to efficiently up-scale LLMs, we present method for scaling LLMs called depth up-scaling (DUS), which encompasses depthwise and continued pretraining. In contrast other LLM methods that use mixture-of-experts, DUS does not require complex changes train inference efficiently. show experimentally is simple...
Transfer learning or multilingual model is essential for low-resource neural machine translation (NMT), but the applicability limited to cognate languages by sharing their vocabularies. This paper shows effective techniques transfer a pre-trained NMT new, unrelated language without shared We relieve vocabulary mismatch using cross-lingual word embedding, train more language-agnostic encoder injecting artificial noises, and generate synthetic data easily from pre-training back-translation....
This paper describes the unsupervised neural machine translation (NMT) systems of RWTH Aachen University developed for English ↔ German news task EMNLP 2018 Third Conference on Machine Translation (WMT 2018). Our work is based iterative back-translation using a shared encoder-decoder NMT model. We extensively compare different vocabulary types, word embedding initialization schemes and optimization methods our also investigate gating weight normalization layer.
We propose a novel extended translation model (ETM) to counteract some problems in phrase-based translation: The lack of context when using singleword phrases and uncaptured dependencies beyond phrase boundaries.The ETM operates on word-level augments the IBM models by an additional bilingual word pair reordering operation.Its implementation decoder introduces for single-word across boundaries.More, incorporates explicit treatment multiple empty alignments.Its integration outperforms...
Recently, encoder-only pre-trained models such as BERT have been successfully applied in automated essay scoring (AES) to predict a single overall score. However, studies yet explore these multi-trait AES, possibly due the inefficiency of replicating BERT-based for each trait. Breaking away from existing sole use encoder, we propose an autoregressive prediction scores (ArTS), incorporating decoding process by leveraging T5. Unlike prior regression or classification methods, redefine AES...
In table-text open-domain question answering, a retriever system retrieves relevant evidence from tables and text to answer questions. Previous studies in answering have two common challenges: firstly, their retrievers can be affected by false-positive labels training datasets; secondly, they may struggle provide appropriate for questions that require reasoning across the table. To address these issues, we propose Denoised Table-Text Retriever (DoTTeR). Our approach involves utilizing...