Namyong Park

ORCID: 0000-0002-3344-2361
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Advanced Graph Neural Networks
  • Topic Modeling
  • Neural Networks and Applications
  • Evolutionary Algorithms and Applications
  • Metaheuristic Optimization Algorithms Research
  • Tensor decomposition and applications
  • Graph Theory and Algorithms
  • Anomaly Detection Techniques and Applications
  • Recommender Systems and Techniques
  • Natural Language Processing Techniques
  • Complex Network Analysis Techniques
  • Parallel Computing and Optimization Techniques
  • Online Learning and Analytics
  • Human Mobility and Location-Based Analysis
  • Data Quality and Management
  • Asian Culture and Media Studies
  • Hate Speech and Cyberbullying Detection
  • Data Stream Mining Techniques
  • Ethics and Social Impacts of AI
  • Algorithms and Data Compression
  • Web Data Mining and Analysis
  • Authorship Attribution and Profiling
  • Domain Adaptation and Few-Shot Learning
  • Acute Kidney Injury Research
  • Advanced Memory and Neural Computing

Carnegie Mellon University
2019-2024

Water Research Institute
2021

Australian National University
2021

Amazon (United States)
2019

Seoul National University
2012-2018

How can we estimate the importance of nodes in a knowledge graph (KG)? A KG is multi-relational that has proven valuable for many tasks including question answering and semantic search. In this paper, present GENI, method tackling problem estimating node KGs, which enables several downstream applications such as item recommendation resource allocation. While number approaches have been developed to address general graphs, they do not fully utilize information available or lack flexibility...

10.1145/3292500.3330855 preprint EN 2019-07-25

How can we perform knowledge reasoning over temporal graphs (TKGs)? TKGs represent facts about entities and their relations, where each fact is associated with a timestamp. Reasoning TKGs, i.e., inferring new from time-evolving KGs, crucial for many applications to provide intelligent services. However, despite the prevalence of real-world data that be represented as most methods focus on static graphs, or cannot predict future events. In this paper, present problem formulation unifies two...

10.1145/3488560.3498451 article EN Proceedings of the Fifteenth ACM International Conference on Web Search and Data Mining 2022-02-11

Given sparse multi-dimensional data (e.g., (user, movie, time; rating) for movie recommendations), how can we discover latent concepts/relations and predict missing values? Tucker factorization has been widely used to solve such problems with data, which are modeled as tensors. However, most algorithms regard estimate entries zeros, triggers a highly inaccurate decomposition. Moreover, few methods focusing on an accuracy exhibit limited scalability since they require huge memory heavy...

10.1109/icde.2018.00104 article EN 2022 IEEE 38th International Conference on Data Engineering (ICDE) 2018-04-01

How can we measure similarity between nodes quickly and accurately on large graphs? Random walk with restart (RWR) provides a good measure, has been used in various data mining applications including ranking, recommendation, link prediction community detection. However, existing methods for computing RWR do not scale to graphs containing billions of edges; iterative are slow query time, preprocessing require too much memory.

10.1145/3035918.3035950 article EN 2017-05-09

Given entities and their interactions in the web data, which may have occurred at different time, how can we find communities of track evolution? In this paper, approach important task from graph clustering perspective. Recently, state-of-the-art performance various domains has been achieved by deep methods. Especially, (DGC) methods successfully extended to graph-structured data learning node representations cluster assignments a joint optimization framework. Despite some differences...

10.1145/3485447.3512160 article EN Proceedings of the ACM Web Conference 2022 2022-04-25

As large language models (LLMs) evolve, their ability to deliver personalized and context-aware responses offers transformative potential for improving user experiences. Existing personalization approaches, however, often rely solely on history augment the prompt, limiting effectiveness in generating tailored outputs, especially cold-start scenarios with sparse data. To address these limitations, we propose Personalized Graph-based Retrieval-Augmented Generation (PGraphRAG), a framework that...

10.48550/arxiv.2501.02157 preprint EN arXiv (Cornell University) 2025-01-03

Fine-tuning provides an effective means to specialize pre-trained models for various downstream tasks. However, fine-tuning often incurs high memory overhead, especially large transformer-based models, such as LLMs. While existing methods may reduce certain parts of the required fine-tuning, they still require caching all intermediate activations computed in forward pass update weights during backward pass. In this work, we develop TokenTune, a method usage, specifically store activations,...

10.48550/arxiv.2501.18824 preprint EN arXiv (Cornell University) 2025-01-30

Active Learning (AL) has been a powerful paradigm for improving model efficiency and performance by selecting the most informative data points labeling training. In recent active learning frameworks, Large Language Models (LLMs) have employed not only selection but also generating entirely new instances providing more cost-effective annotations. Motivated increasing importance of high-quality efficient training in era LLMs, we present comprehensive survey on LLM-based Learning. We introduce...

10.48550/arxiv.2502.11767 preprint EN arXiv (Cornell University) 2025-02-17

How can we predict the occurrence of acute kidney injury (AKI) in cancer patients based on machine learning with serum creatinine data? Given irregular and heterogeneous clinical data, how make most it for accurate AKI prediction? is a common significant complication patients, correlates substantial morbidity mortality. Since no effective treatment still exists, important to take timely preventive measures. While several approaches have been proposed predicting AKI, their scope applicability...

10.1371/journal.pone.0199839 article EN cc-by PLoS ONE 2018-07-19

Many real-world data are naturally represented as tensors, or multi-dimensional arrays. Tensor decomposition is an important tool to analyze tensors for various applications such latent concept discovery, trend analysis, clustering, and anomaly detection. However, existing tools tensor analysis do not scale well billion-scale offer limited functionalities. In this paper, we propose BIGtensor, a large-scale mining library that tackles both of the above problems. Carefully designed...

10.1145/2983323.2983332 article EN 2016-10-24

Given large-scale multi-dimensional data (e.g., (user, movie, time; rating) for movie recommendations), how can we extract latent concepts/relations of such data? Tensor factorization has been widely used to solve problems with data, which are modeled as tensors. However, most tensor algorithms exhibit limited scalability and speed since they require huge memory heavy computational costs while updating factor matrices. In this paper, propose GTA, a general framework Tucker on heterogeneous...

10.1109/tpds.2019.2908639 article EN IEEE Transactions on Parallel and Distributed Systems 2019-04-01

Abstract Background Acute kidney injury (AKI) is a critical issue in cancer patients because it not only morbid complication but also able to interrupt timely diagnostic evaluation or planned optimal treatment. However, the impact of AKI on overall mortality remains unclear. Methods We conducted retrospective cohort study 67 986 patients, from 2004 2013 evaluate relationship between and all‐cause mortality. used KDIGO definition grading system. Results During 3.9 ± 3.1 years follow‐up, 33.8%...

10.1002/cam4.2140 article EN cc-by Cancer Medicine 2019-04-09

Given multiple input signals, how can we infer node importance in a knowledge graph (KG)? Node estimation is crucial and challenging task that benefit lot of applications including recommendation, search, query disambiguation. A key challenge towards this goal to effectively use from different sources. On the one hand, KG rich source information, with types nodes edges. other there are external such as number votes or pageviews, which directly tell us about entities KG. While several methods...

10.1145/3394486.3403093 preprint EN 2020-08-20

How can we analyze tensors that are composed of 0's and 1's? efficiently such Boolean with millions or even billions entries? often represent relationship, membership, occurrences events as subject-relation-object tuples in knowledge base data (e.g., 'Seoul'-'is the capital of'-'South Korea'). tensor factorization (BTF) is a useful tool for analyzing binary to discover latent factors from them. Furthermore, BTF known produce more interpretable sparser results than normal methods. Although...

10.1109/icde.2017.152 article EN 2017-04-01

Given a million escort advertisements, how can we spot near-duplicates? Such micro-clusters of ads are usually signals human trafficking. How summarize them, visually, to convince law enforcement act? Can build general tool that works for different languages? Spotting near-duplicate documents is useful in multiple, additional settings, including spam-bot detection Twitter ads, plagiarism, and more.We present INFOSHIELD, which makes the following contributions: (a) Practical, being scalable...

10.1109/icde51399.2021.00101 article EN 2022 IEEE 38th International Conference on Data Engineering (ICDE) 2021-04-01

Graph Neural Networks (GNNs) have become increasingly important due to their representational power and state-of-the-art predictive performance on many fundamental learning tasks. Despite this success, GNNs suffer from fairness issues that arise as a result of the underlying graph data aggregation mechanism lies at heart large class GNN models. In article, we examine categorize techniques for improving GNNs. We these by whether they focus in pre-processing, in-processing (during training),...

10.1145/3649142 article EN ACM Transactions on Knowledge Discovery from Data 2024-02-24

The choice of a graph learning (GL) model (i.e., GL algorithm and its hyperparameter settings) has significant impact on the performance downstream tasks. However, selecting right becomes increasingly difficult time consuming as more models are developed. Accordingly, it is great significance practical value to equip users with ability perform near-instantaneous selection an effective without manual intervention. Despite recent attempts tackle this important problem, there been no...

10.48550/arxiv.2404.01578 preprint EN arXiv (Cornell University) 2024-04-01

The Gaussian Q-function is the integral of tail distribution; as such, it important across a vast range fields requiring stochastic analysis. No elementary closed form possible, so number approximations have been proposed. We use Genetic Programming (GP) system, Tree Adjoining Grammar Guided GP (TAG3P) with local search operators to evolve in given by Benitez [1]. found more accurate than any previously published. This confirms practical importance TAG3P.

10.1145/2330163.2330275 article EN 2012-07-07

How can we find all connected components in an enormous graph with billions of nodes and edges?Finding is a fundamental operation for various computation tasks such as pattern recognition, reachability, compression, etc. Many algorithms have been proposed decades, but most them are not scalable enough to process recent web scale graphs. Recently, MapReduce algorithm was handle large However, the repeatedly reads writes numerous intermediate data that cause network overload prolong running...

10.1109/icdm.2016.0053 article EN 2016-12-01

A connected component in a graph is set of nodes linked to each other by paths. The problem finding components has been applied diverse analysis tasks such as partitioning, compression, and pattern recognition. Several distributed algorithms have proposed find enormous graphs. Ironically, the do not scale enough due unnecessary data IO & processing, massive intermediate data, numerous rounds computations, load balancing issues. In this paper, we propose fast scalable algorithm PACC...

10.1371/journal.pone.0229936 article EN cc-by PLoS ONE 2020-03-18

Massive Open Online Courses (MOOCs) have become popular platforms for online learning. While MOOCs enable students to study at their own pace, this flexibility makes it easy drop out of class. In paper, our goal is predict if a learner going within the next week, given clickstream data current week. To end, we present multi-layer representation learning solution based on branch and bound (BB) algorithm, which learns from low-level clickstreams in an unsupervised manner, produces...

10.48550/arxiv.2002.01598 preprint EN other-oa arXiv (Cornell University) 2020-01-01

This paper addresses a key challenge in MOOC dropout prediction, namely to build meaningful representations from clickstream data. While variety of feature extraction techniques have been explored extensively for such purposes, our knowledge, no prior works modeling educational content (e.g. video) and their correlation with the learner's behavior clickstream) this context. We bridge gap by devising method learn representation videos between clicks. The results indicate that clicks bring...

10.48550/arxiv.2002.01955 preprint EN other-oa arXiv (Cornell University) 2020-01-01

Online recommendation is an essential functionality across a variety of services, including e-commerce and video streaming, where items to buy, watch, or read are suggested users. Justifying recommendations, i.e., explaining why user might like the recommended item, has been shown improve satisfaction persuasiveness recommendation. In this paper, we develop method for generating post-hoc justifications that can be applied output any algorithm. Existing methods often limited in providing...

10.1109/icdm50108.2020.00151 article EN 2021 IEEE International Conference on Data Mining (ICDM) 2020-11-01
Coming Soon ...