- Advanced Graph Neural Networks
- Topic Modeling
- Neural Networks and Applications
- Evolutionary Algorithms and Applications
- Metaheuristic Optimization Algorithms Research
- Tensor decomposition and applications
- Graph Theory and Algorithms
- Anomaly Detection Techniques and Applications
- Recommender Systems and Techniques
- Natural Language Processing Techniques
- Complex Network Analysis Techniques
- Parallel Computing and Optimization Techniques
- Online Learning and Analytics
- Human Mobility and Location-Based Analysis
- Data Quality and Management
- Asian Culture and Media Studies
- Hate Speech and Cyberbullying Detection
- Data Stream Mining Techniques
- Ethics and Social Impacts of AI
- Algorithms and Data Compression
- Web Data Mining and Analysis
- Authorship Attribution and Profiling
- Domain Adaptation and Few-Shot Learning
- Acute Kidney Injury Research
- Advanced Memory and Neural Computing
Carnegie Mellon University
2019-2024
Water Research Institute
2021
Australian National University
2021
Amazon (United States)
2019
Seoul National University
2012-2018
How can we estimate the importance of nodes in a knowledge graph (KG)? A KG is multi-relational that has proven valuable for many tasks including question answering and semantic search. In this paper, present GENI, method tackling problem estimating node KGs, which enables several downstream applications such as item recommendation resource allocation. While number approaches have been developed to address general graphs, they do not fully utilize information available or lack flexibility...
How can we perform knowledge reasoning over temporal graphs (TKGs)? TKGs represent facts about entities and their relations, where each fact is associated with a timestamp. Reasoning TKGs, i.e., inferring new from time-evolving KGs, crucial for many applications to provide intelligent services. However, despite the prevalence of real-world data that be represented as most methods focus on static graphs, or cannot predict future events. In this paper, present problem formulation unifies two...
Given sparse multi-dimensional data (e.g., (user, movie, time; rating) for movie recommendations), how can we discover latent concepts/relations and predict missing values? Tucker factorization has been widely used to solve such problems with data, which are modeled as tensors. However, most algorithms regard estimate entries zeros, triggers a highly inaccurate decomposition. Moreover, few methods focusing on an accuracy exhibit limited scalability since they require huge memory heavy...
How can we measure similarity between nodes quickly and accurately on large graphs? Random walk with restart (RWR) provides a good measure, has been used in various data mining applications including ranking, recommendation, link prediction community detection. However, existing methods for computing RWR do not scale to graphs containing billions of edges; iterative are slow query time, preprocessing require too much memory.
Given entities and their interactions in the web data, which may have occurred at different time, how can we find communities of track evolution? In this paper, approach important task from graph clustering perspective. Recently, state-of-the-art performance various domains has been achieved by deep methods. Especially, (DGC) methods successfully extended to graph-structured data learning node representations cluster assignments a joint optimization framework. Despite some differences...
As large language models (LLMs) evolve, their ability to deliver personalized and context-aware responses offers transformative potential for improving user experiences. Existing personalization approaches, however, often rely solely on history augment the prompt, limiting effectiveness in generating tailored outputs, especially cold-start scenarios with sparse data. To address these limitations, we propose Personalized Graph-based Retrieval-Augmented Generation (PGraphRAG), a framework that...
Fine-tuning provides an effective means to specialize pre-trained models for various downstream tasks. However, fine-tuning often incurs high memory overhead, especially large transformer-based models, such as LLMs. While existing methods may reduce certain parts of the required fine-tuning, they still require caching all intermediate activations computed in forward pass update weights during backward pass. In this work, we develop TokenTune, a method usage, specifically store activations,...
Active Learning (AL) has been a powerful paradigm for improving model efficiency and performance by selecting the most informative data points labeling training. In recent active learning frameworks, Large Language Models (LLMs) have employed not only selection but also generating entirely new instances providing more cost-effective annotations. Motivated increasing importance of high-quality efficient training in era LLMs, we present comprehensive survey on LLM-based Learning. We introduce...
How can we predict the occurrence of acute kidney injury (AKI) in cancer patients based on machine learning with serum creatinine data? Given irregular and heterogeneous clinical data, how make most it for accurate AKI prediction? is a common significant complication patients, correlates substantial morbidity mortality. Since no effective treatment still exists, important to take timely preventive measures. While several approaches have been proposed predicting AKI, their scope applicability...
Many real-world data are naturally represented as tensors, or multi-dimensional arrays. Tensor decomposition is an important tool to analyze tensors for various applications such latent concept discovery, trend analysis, clustering, and anomaly detection. However, existing tools tensor analysis do not scale well billion-scale offer limited functionalities. In this paper, we propose BIGtensor, a large-scale mining library that tackles both of the above problems. Carefully designed...
Given large-scale multi-dimensional data (e.g., (user, movie, time; rating) for movie recommendations), how can we extract latent concepts/relations of such data? Tensor factorization has been widely used to solve problems with data, which are modeled as tensors. However, most tensor algorithms exhibit limited scalability and speed since they require huge memory heavy computational costs while updating factor matrices. In this paper, propose GTA, a general framework Tucker on heterogeneous...
Abstract Background Acute kidney injury (AKI) is a critical issue in cancer patients because it not only morbid complication but also able to interrupt timely diagnostic evaluation or planned optimal treatment. However, the impact of AKI on overall mortality remains unclear. Methods We conducted retrospective cohort study 67 986 patients, from 2004 2013 evaluate relationship between and all‐cause mortality. used KDIGO definition grading system. Results During 3.9 ± 3.1 years follow‐up, 33.8%...
Given multiple input signals, how can we infer node importance in a knowledge graph (KG)? Node estimation is crucial and challenging task that benefit lot of applications including recommendation, search, query disambiguation. A key challenge towards this goal to effectively use from different sources. On the one hand, KG rich source information, with types nodes edges. other there are external such as number votes or pageviews, which directly tell us about entities KG. While several methods...
How can we analyze tensors that are composed of 0's and 1's? efficiently such Boolean with millions or even billions entries? often represent relationship, membership, occurrences events as subject-relation-object tuples in knowledge base data (e.g., 'Seoul'-'is the capital of'-'South Korea'). tensor factorization (BTF) is a useful tool for analyzing binary to discover latent factors from them. Furthermore, BTF known produce more interpretable sparser results than normal methods. Although...
Given a million escort advertisements, how can we spot near-duplicates? Such micro-clusters of ads are usually signals human trafficking. How summarize them, visually, to convince law enforcement act? Can build general tool that works for different languages? Spotting near-duplicate documents is useful in multiple, additional settings, including spam-bot detection Twitter ads, plagiarism, and more.We present INFOSHIELD, which makes the following contributions: (a) Practical, being scalable...
Graph Neural Networks (GNNs) have become increasingly important due to their representational power and state-of-the-art predictive performance on many fundamental learning tasks. Despite this success, GNNs suffer from fairness issues that arise as a result of the underlying graph data aggregation mechanism lies at heart large class GNN models. In article, we examine categorize techniques for improving GNNs. We these by whether they focus in pre-processing, in-processing (during training),...
The choice of a graph learning (GL) model (i.e., GL algorithm and its hyperparameter settings) has significant impact on the performance downstream tasks. However, selecting right becomes increasingly difficult time consuming as more models are developed. Accordingly, it is great significance practical value to equip users with ability perform near-instantaneous selection an effective without manual intervention. Despite recent attempts tackle this important problem, there been no...
The Gaussian Q-function is the integral of tail distribution; as such, it important across a vast range fields requiring stochastic analysis. No elementary closed form possible, so number approximations have been proposed. We use Genetic Programming (GP) system, Tree Adjoining Grammar Guided GP (TAG3P) with local search operators to evolve in given by Benitez [1]. found more accurate than any previously published. This confirms practical importance TAG3P.
How can we find all connected components in an enormous graph with billions of nodes and edges?Finding is a fundamental operation for various computation tasks such as pattern recognition, reachability, compression, etc. Many algorithms have been proposed decades, but most them are not scalable enough to process recent web scale graphs. Recently, MapReduce algorithm was handle large However, the repeatedly reads writes numerous intermediate data that cause network overload prolong running...
A connected component in a graph is set of nodes linked to each other by paths. The problem finding components has been applied diverse analysis tasks such as partitioning, compression, and pattern recognition. Several distributed algorithms have proposed find enormous graphs. Ironically, the do not scale enough due unnecessary data IO & processing, massive intermediate data, numerous rounds computations, load balancing issues. In this paper, we propose fast scalable algorithm PACC...
Massive Open Online Courses (MOOCs) have become popular platforms for online learning. While MOOCs enable students to study at their own pace, this flexibility makes it easy drop out of class. In paper, our goal is predict if a learner going within the next week, given clickstream data current week. To end, we present multi-layer representation learning solution based on branch and bound (BB) algorithm, which learns from low-level clickstreams in an unsupervised manner, produces...
This paper addresses a key challenge in MOOC dropout prediction, namely to build meaningful representations from clickstream data. While variety of feature extraction techniques have been explored extensively for such purposes, our knowledge, no prior works modeling educational content (e.g. video) and their correlation with the learner's behavior clickstream) this context. We bridge gap by devising method learn representation videos between clicks. The results indicate that clicks bring...
Online recommendation is an essential functionality across a variety of services, including e-commerce and video streaming, where items to buy, watch, or read are suggested users. Justifying recommendations, i.e., explaining why user might like the recommended item, has been shown improve satisfaction persuasiveness recommendation. In this paper, we develop method for generating post-hoc justifications that can be applied output any algorithm. Existing methods often limited in providing...