Yungi Kim

ORCID: 0000-0002-7557-6755
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Topic Modeling
  • Natural Language Processing Techniques
  • Advanced Graph Neural Networks
  • Recommender Systems and Techniques
  • Speech Recognition and Synthesis
  • Mental Health via Writing
  • Web Data Mining and Analysis
  • Text and Document Classification Technologies
  • Multimodal Machine Learning Applications
  • Intelligent Tutoring Systems and Adaptive Learning

Hanyang University
2022-2024

In this paper, we focus on multimedia recommender systems using graph convolutional networks (GCNs) where the multimodal features as well user-item interactions are employed together. Our study aims to exploit more effectively in order accurately capture users' preferences for items. To end, point out following two limitations of existing GCN-based systems: (L1) although interacted items by a user can reveal her items, methods utilize GCN designed only capturing collaborative signals,...

10.1145/3616855.3635817 article EN 2024-03-04

Online news providers such as Google News, Bing and NAVER News collect a large number of articles from variety presses distribute these to users via their portals. Dynamic nature domain causes the problem information overload that makes it difficult for user find her preferable articles. Motivated by this situation, Corp., largest portal company in South Korea, identified four design considerations (DCs) recommendation reflect unique characteristics domain. In paper, we introduce large-scale...

10.1109/icde53745.2022.00319 article EN 2022 IEEE 38th International Conference on Data Engineering (ICDE) 2022-05-01

We introduce SOLAR 10.7B, a large language model (LLM) with 10.7 billion parameters, demonstrating superior performance in various natural processing (NLP) tasks. Inspired by recent efforts to efficiently up-scale LLMs, we present method for scaling LLMs called depth up-scaling (DUS), which encompasses depthwise and continued pretraining. In contrast other LLM methods that use mixture-of-experts, DUS does not require complex changes train inference efficiently. show experimentally is simple...

10.48550/arxiv.2312.15166 preprint EN cc-by arXiv (Cornell University) 2023-01-01

In a news recommender system, user tends to click on article if she is interested in its topic understood by looking at title. Such behavior possible since, when viewing the title, humans naturally think of contextual meaning each title word leveraging their own background knowledge. Motivated this, we propose novel personalized recommendation framework CAST (Context-aware Attention network with Selection module for Title representation), which capable enriching words body text that fully...

10.1145/3511808.3557619 article EN Proceedings of the 31st ACM International Conference on Information & Knowledge Management 2022-10-16

To address the challenges associated with data processing at scale, we propose Dataverse, a unified open-source Extract-Transform-Load (ETL) pipeline for large language models (LLMs) user-friendly design its core. Easy addition of custom processors block-based interface in Dataverse allows users to readily and efficiently use build their own ETL pipeline. We hope that will serve as vital tool LLM development open source entire library welcome community contribution. Additionally, provide...

10.48550/arxiv.2403.19340 preprint EN arXiv (Cornell University) 2024-03-28

This paper introduces Evalverse, a novel library that streamlines the evaluation of Large Language Models (LLMs) by unifying disparate tools into single, user-friendly framework. Evalverse enables individuals with limited knowledge artificial intelligence to easily request LLM evaluations and receive detailed reports, facilitated an integration communication platforms like Slack. Thus, serves as powerful tool for comprehensive assessment LLMs, offering both researchers practitioners...

10.48550/arxiv.2404.00943 preprint EN arXiv (Cornell University) 2024-04-01

With the increasing demand for substantial amounts of high-quality data to train large language models (LLMs), efficiently filtering web corpora has become a critical challenge. For this purpose, KenLM, lightweight n-gram-based model that operates on CPUs, is widely used. However, traditional method training KenLM utilizes only and, consequently, does not explicitly learn linguistic patterns low-quality data. To address issue, we propose an ensemble approach leverages two contrasting KenLMs:...

10.48550/arxiv.2409.09613 preprint EN arXiv (Cornell University) 2024-09-15

Creating high-quality, large-scale datasets for large language models (LLMs) often relies on resource-intensive, GPU-accelerated quality filtering, making the process time-consuming and costly. This dependence GPUs limits accessibility organizations lacking significant computational infrastructure. To address this issue, we introduce Lightweight, Purpose-driven (LP) Data Pipeline, a framework that operates entirely CPUs to streamline processes of dataset extraction, curation. Based our four...

10.48550/arxiv.2411.11289 preprint EN arXiv (Cornell University) 2024-11-18

In this paper, we focus on multimedia recommender systems using graph convolutional networks (GCNs) where the multimodal features as well user-item interactions are employed together. Our study aims to exploit more effectively in order accurately capture users' preferences for items. To end, point out following two limitations of existing GCN-based systems: (L1) although interacted items by a user can reveal her items, methods utilize GCN designed only capturing collaborative signals,...

10.48550/arxiv.2312.09511 preprint EN cc-by-nc-nd arXiv (Cornell University) 2023-01-01
Coming Soon ...