NFDI4DS | UHH-SEMS - Publication Details

Namyong Park

ORCID: 0000-0002-3344-2361

Publications

Citations

Views

---

Saved

---

About

Contact & Profiles

A5072397278

Research Areas

Advanced Graph Neural Networks
Topic Modeling
Neural Networks and Applications
Evolutionary Algorithms and Applications
Metaheuristic Optimization Algorithms Research
Tensor decomposition and applications
Graph Theory and Algorithms
Anomaly Detection Techniques and Applications
Recommender Systems and Techniques
Natural Language Processing Techniques
Complex Network Analysis Techniques
Parallel Computing and Optimization Techniques
Online Learning and Analytics
Human Mobility and Location-Based Analysis
Data Quality and Management
Asian Culture and Media Studies
Hate Speech and Cyberbullying Detection
Data Stream Mining Techniques
Ethics and Social Impacts of AI
Algorithms and Data Compression
Web Data Mining and Analysis
Authorship Attribution and Profiling
Domain Adaptation and Few-Shot Learning
Acute Kidney Injury Research
Advanced Memory and Neural Computing

Carnegie Mellon University
2019-2024

Water Research Institute
2021

Australian National University
2021

Amazon (United States)
2019

Seoul National University
2012-2018

Estimating Node Importance in Knowledge Graphs Using Graph Neural Networks

OPENALEX - Publications

Namyong Park Andrey Kan Xin Luna Dong Tong Zhao Christos Faloutsos

How can we estimate the importance of nodes in a knowledge graph (KG)? A KG is multi-relational that has proven valuable for many tasks including question answering and semantic search. In this paper, present GENI, method tackling problem estimating node KGs, which enables several downstream applications such as item recommendation resource allocation. While number approaches have been developed to address general graphs, they do not fully utilize information available or lack flexibility...

10.1145/3292500.3330855 preprint EN 2019-07-25

EvoKG

OPENALEX - Publications

Namyong Park Fu‐Chen Liu Purvanshi Mehta Dana Cristofor Christos Faloutsos and 1 more

How can we perform knowledge reasoning over temporal graphs (TKGs)? TKGs represent facts about entities and their relations, where each fact is associated with a timestamp. Reasoning TKGs, i.e., inferring new from time-evolving KGs, crucial for many applications to provide intelligent services. However, despite the prevalence of real-world data that be represented as most methods focus on static graphs, or cannot predict future events. In this paper, present problem formulation unifies two...

10.1145/3488560.3498451 article EN Proceedings of the Fifteenth ACM International Conference on Web Search and Data Mining 2022-02-11

Scalable Tucker Factorization for Sparse Tensors - Algorithms and Discoveries

OPENALEX - Publications

Sejoon Oh Namyong Park Lee Sael U Kang

Given sparse multi-dimensional data (e.g., (user, movie, time; rating) for movie recommendations), how can we discover latent concepts/relations and predict missing values? Tucker factorization has been widely used to solve such problems with data, which are modeled as tensors. However, most algorithms regard estimate entries zeros, triggers a highly inaccurate decomposition. Moreover, few methods focusing on an accuracy exhibit limited scalability since they require huge memory heavy...

10.1109/icde.2018.00104 article EN 2022 IEEE 38th International Conference on Data Engineering (ICDE) 2018-04-01

BePI

OPENALEX - Publications

Jinhong Jung Namyong Park Lee Sael U Kang

How can we measure similarity between nodes quickly and accurately on large graphs? Random walk with restart (RWR) provides a good measure, has been used in various data mining applications including ranking, recommendation, link prediction community detection. However, existing methods for computing RWR do not scale to graphs containing billions of edges; iterative are slow query time, preprocessing require too much memory.

10.1145/3035918.3035950 article EN 2017-05-09

CGC: Contrastive Graph Clustering forCommunity Detection and Tracking

OPENALEX - Publications

Namyong Park Ryan A. Rossi Eunyee Koh Iftikhar Ahamath Burhanuddin Sungchul Kim and 3 more

Given entities and their interactions in the web data, which may have occurred at different time, how can we find communities of track evolution? In this paper, approach important task from graph clustering perspective. Recently, state-of-the-art performance various domains has been achieved by deep methods. Especially, (DGC) methods successfully extended to graph-structured data learning node representations cluster assignments a joint optimization framework. Despite some differences...

10.1145/3485447.3512160 article EN Proceedings of the ACM Web Conference 2022 2022-04-25

Personalized Graph-Based Retrieval for Large Language Models

OPENALEX - Publications

Steven Au Cameron J. Dimacali Ojasmitha Pedirappagari Namyong Park Franck Dernoncourt and 5 more

As large language models (LLMs) evolve, their ability to deliver personalized and context-aware responses offers transformative potential for improving user experiences. Existing personalization approaches, however, often rely solely on history augment the prompt, limiting effectiveness in generating tailored outputs, especially cold-start scenarios with sparse data. To address these limitations, we propose Personalized Graph-based Retrieval-Augmented Generation (PGraphRAG), a framework that...

10.48550/arxiv.2501.02157 preprint EN arXiv (Cornell University) 2025-01-03

Memory-Efficient Fine-Tuning of Transformers via Token Selection

OPENALEX - Publications

Antoine Simoulin Namyong Park Xiaoyi Liu Grey Yang

Fine-tuning provides an effective means to specialize pre-trained models for various downstream tasks. However, fine-tuning often incurs high memory overhead, especially large transformer-based models, such as LLMs. While existing methods may reduce certain parts of the required fine-tuning, they still require caching all intermediate activations computed in forward pass update weights during backward pass. In this work, we develop TokenTune, a method usage, specifically store activations,...

10.48550/arxiv.2501.18824 preprint EN arXiv (Cornell University) 2025-01-30

From Selection to Generation: A Survey of LLM-based Active Learning

OPENALEX - Publications

Yu Xia Subhojyoti Mukherjee Zhouhang Xie Junda Wu Xintong Li and 29 more

Active Learning (AL) has been a powerful paradigm for improving model efficiency and performance by selecting the most informative data points labeling training. In recent active learning frameworks, Large Language Models (LLMs) have employed not only selection but also generating entirely new instances providing more cost-effective annotations. Motivated increasing importance of high-quality efficient training in era LLMs, we present comprehensive survey on LLM-based Learning. We introduce...

10.48550/arxiv.2502.11767 preprint EN arXiv (Cornell University) 2025-02-17

Predicting acute kidney injury in cancer patients using heterogeneous and irregular data

OPENALEX - Publications

Namyong Park Eunjeong Kang Minsu Park Hajeong Lee Hee-Gyung Kang and 2 more

How can we predict the occurrence of acute kidney injury (AKI) in cancer patients based on machine learning with serum creatinine data? Given irregular and heterogeneous clinical data, how make most it for accurate AKI prediction? is a common significant complication patients, correlates substantial morbidity mortality. Since no effective treatment still exists, important to take timely preventive measures. While several approaches have been proposed predicting AKI, their scope applicability...

10.1371/journal.pone.0199839 article EN cc-by PLoS ONE 2018-07-19

BIGtensor

OPENALEX - Publications

Namyong Park Byungsoo Jeon Jungwoo Lee U Kang

Many real-world data are naturally represented as tensors, or multi-dimensional arrays. Tensor decomposition is an important tool to analyze tensors for various applications such latent concept discovery, trend analysis, clustering, and anomaly detection. However, existing tools tensor analysis do not scale well billion-scale offer limited functionalities. In this paper, we propose BIGtensor, a large-scale mining library that tackles both of the above problems. Carefully designed...

10.1145/2983323.2983332 article EN 2016-10-24

High-Performance Tucker Factorization on Heterogeneous Platforms

OPENALEX - Publications

Sejoon Oh Namyong Park Jun-Gi Jang Lee Sael U Kang

Given large-scale multi-dimensional data (e.g., (user, movie, time; rating) for movie recommendations), how can we extract latent concepts/relations of such data? Tensor factorization has been widely used to solve problems with data, which are modeled as tensors. However, most tensor algorithms exhibit limited scalability and speed since they require huge memory heavy computational costs while updating factor matrices. In this paper, propose GTA, a general framework Tucker on heterogeneous...

10.1109/tpds.2019.2908639 article EN IEEE Transactions on Parallel and Distributed Systems 2019-04-01

Acute kidney injury predicts all‐cause mortality in patients with cancer

OPENALEX - Publications

Eunjeong Kang Minsu Park Peong Gang Park Namyong Park Younglee Jung and 8 more

Abstract Background Acute kidney injury (AKI) is a critical issue in cancer patients because it not only morbid complication but also able to interrupt timely diagnostic evaluation or planned optimal treatment. However, the impact of AKI on overall mortality remains unclear. Methods We conducted retrospective cohort study 67 986 patients, from 2004 2013 evaluate relationship between and all‐cause mortality. used KDIGO definition grading system. Results During 3.9 ± 3.1 years follow‐up, 33.8%...

10.1002/cam4.2140 article EN cc-by Cancer Medicine 2019-04-09

MultiImport: Inferring Node Importance in a Knowledge Graph from Multiple Input Signals

OPENALEX - Publications

Namyong Park Andrey Kan Xin Luna Dong Tong Zhao Christos Faloutsos

Given multiple input signals, how can we infer node importance in a knowledge graph (KG)? Node estimation is crucial and challenging task that benefit lot of applications including recommendation, search, query disambiguation. A key challenge towards this goal to effectively use from different sources. On the one hand, KG rich source information, with types nodes edges. other there are external such as number votes or pageviews, which directly tell us about entities KG. While several methods...

10.1145/3394486.3403093 preprint EN 2020-08-20

Fast and Scalable Distributed Boolean Tensor Factorization

OPENALEX - Publications

Namyong Park Sejoon Oh U Kang

How can we analyze tensors that are composed of 0's and 1's? efficiently such Boolean with millions or even billions entries? often represent relationship, membership, occurrences events as subject-relation-object tuples in knowledge base data (e.g., 'Seoul'-'is the capital of'-'South Korea'). tensor factorization (BTF) is a useful tool for analyzing binary to discover latent factors from them. Furthermore, BTF known produce more interpretable sparser results than normal methods. Although...

10.1109/icde.2017.152 article EN 2017-04-01

INFOSHIELD: Generalizable Information-Theoretic Human-Trafficking Detection

OPENALEX - Publications

Meng-Chieh Lee Catalina Vajiac Aayushi Kulshrestha Sacha Lévy Namyong Park and 3 more

Given a million escort advertisements, how can we spot near-duplicates? Such micro-clusters of ads are usually signals human trafficking. How summarize them, visually, to convince law enforcement act? Can build general tool that works for different languages? Spotting near-duplicate documents is useful in multiple, additional settings, including spam-bot detection Twitter ads, plagiarism, and more.We present INFOSHIELD, which makes the following contributions: (a) Practical, being scalable...

10.1109/icde51399.2021.00101 article EN 2022 IEEE 38th International Conference on Data Engineering (ICDE) 2021-04-01

Fast and scalable method for distributed Boolean tensor factorization

OPENALEX - Publications

Namyong Park Sejoon Oh U Kang

10.1007/s00778-019-00538-z article EN The VLDB Journal 2019-03-18

Fairness-Aware Graph Neural Networks: A Survey

OPENALEX - Publications

April Chen Ryan A. Rossi Namyong Park Puja Trivedi Yu Wang and 4 more

Graph Neural Networks (GNNs) have become increasingly important due to their representational power and state-of-the-art predictive performance on many fundamental learning tasks. Despite this success, GNNs suffer from fairness issues that arise as a result of the underlying graph data aggregation mechanism lies at heart large class GNN models. In article, we examine categorize techniques for improving GNNs. We these by whether they focus in pre-processing, in-processing (during training),...

10.1145/3649142 article EN ACM Transactions on Knowledge Discovery from Data 2024-02-24

GLEMOS: Benchmark for Instantaneous Graph Learning Model Selection

OPENALEX - Publications

Namyong Park Ryan A. Rossi Xing Wang Antoine Simoulin Nesreen K. Ahmed and 1 more

The choice of a graph learning (GL) model (i.e., GL algorithm and its hyperparameter settings) has significant impact on the performance downstream tasks. However, selecting right becomes increasingly difficult time consuming as more models are developed. Accordingly, it is great significance practical value to equip users with ability perform near-instantaneous selection an effective without manual intervention. Despite recent attempts tackle this important problem, there been no...

10.48550/arxiv.2404.01578 preprint EN arXiv (Cornell University) 2024-04-01

Evolving the best known approximation to the Q function

OPENALEX - Publications

Đào Ngọc Phong Nguyễn Xuân Hoài Bob McKay Constantin Siriteanu Nguyen Quang Uy and 1 more

The Gaussian Q-function is the integral of tail distribution; as such, it important across a vast range fields requiring stochastic analysis. No elementary closed form possible, so number approximations have been proposed. We use Genetic Programming (GP) system, Tree Adjoining Grammar Guided GP (TAG3P) with local search operators to evolve in given by Benitez [1]. found more accurate than any previously published. This confirms practical importance TAG3P.

10.1145/2330163.2330275 article EN 2012-07-07

Partition Aware Connected Component Computation in Distributed Systems

OPENALEX - Publications

Ha-Myung Park Namyong Park Sung-Hyon Myaeng U Kang

How can we find all connected components in an enormous graph with billions of nodes and edges?Finding is a fundamental operation for various computation tasks such as pattern recognition, reachability, compression, etc. Many algorithms have been proposed decades, but most them are not scalable enough to process recent web scale graphs. Recently, MapReduce algorithm was handle large However, the repeatedly reads writes numerous intermediate data that cause network overload prolong running...

10.1109/icdm.2016.0053 article EN 2016-12-01

PACC: Large scale connected component computation on Hadoop and Spark

OPENALEX - Publications

Ha-Myung Park Namyong Park Sung-Hyon Myaeng U Kang

A connected component in a graph is set of nodes linked to each other by paths. The problem finding components has been applied diverse analysis tasks such as partitioning, compression, and pattern recognition. Several distributed algorithms have proposed find enormous graphs. Ironically, the do not scale enough due unnecessary data IO & processing, massive intermediate data, numerous rounds computations, load balancing issues. In this paper, we propose fast scalable algorithm PACC...

10.1371/journal.pone.0229936 article EN cc-by PLoS ONE 2020-03-18

Improvement of complex and refractory ecological models: Riverine water quality modelling using evolutionary computation

OPENALEX - Publications

Minhyeok Kim Namyong Park Bob McKay Haisoo Shin Yun-Geun Lee and 2 more

10.1016/j.ecolmodel.2014.07.021 article EN Ecological Modelling 2014-08-28

Dropout Prediction over Weeks in MOOCs via Interpretable Multi-Layer Representation Learning

OPENALEX - Publications

Byungsoo Jeon Namyong Park Seojin Bang

Massive Open Online Courses (MOOCs) have become popular platforms for online learning. While MOOCs enable students to study at their own pace, this flexibility makes it easy drop out of class. In paper, our goal is predict if a learner going within the next week, given clickstream data current week. To end, we present multi-layer representation learning solution based on branch and bound (BB) algorithm, which learns from low-level clickstreams in an unsupervised manner, produces...

10.48550/arxiv.2002.01598 preprint EN other-oa arXiv (Cornell University) 2020-01-01

Dropout Prediction over Weeks in MOOCs by Learning Representations of Clicks and Videos

OPENALEX - Publications

Byungsoo Jeon Namyong Park

This paper addresses a key challenge in MOOC dropout prediction, namely to build meaningful representations from clickstream data. While variety of feature extraction techniques have been explored extensively for such purposes, our knowledge, no prior works modeling educational content (e.g. video) and their correlation with the learner's behavior clickstream) this context. We bridge gap by devising method learn representation videos between clicks. The results indicate that clicks bring...

10.48550/arxiv.2002.01955 preprint EN other-oa arXiv (Cornell University) 2020-01-01

J-Recs: Principled and Scalable Recommendation Justification

OPENALEX - Publications

Namyong Park Andrey Kan Christos Faloutsos Dong Xin

Online recommendation is an essential functionality across a variety of services, including e-commerce and video streaming, where items to buy, watch, or read are suggested users. Justifying recommendations, i.e., explaining why user might like the recommended item, has been shown improve satisfaction persuasiveness recommendation. In this paper, we develop method for generating post-hoc justifications that can be applied output any algorithm. Existing methods often limited in providing...

10.1109/icdm50108.2020.00151 article EN 2021 IEEE International Conference on Data Mining (ICDM) 2020-11-01

Coming Soon ...