Ruofan Wu

ORCID: 0000-0002-2005-6058
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Advanced Graph Neural Networks
  • Complex Network Analysis Techniques
  • Privacy, Security, and Data Protection
  • Privacy-Preserving Technologies in Data
  • Imbalanced Data Classification Techniques
  • Domain Adaptation and Few-Shot Learning
  • Advanced Clustering Algorithms Research
  • Semantic Web and Ontologies
  • Epigenetics and DNA Methylation
  • Advanced Database Systems and Queries
  • Human Mobility and Location-Based Analysis
  • Ethics and Social Impacts of AI
  • Adversarial Robustness in Machine Learning
  • Machine Learning and Data Classification
  • Anomaly Detection Techniques and Applications
  • Recommender Systems and Techniques
  • Service-Oriented Architecture and Web Services

The last years have witnessed the emergence of a promising self-supervised learning strategy, referred to as masked autoencoding. However, there is lack theoretical understanding how masking matters on graph autoencoders (GAEs). In this work, we present autoencoder (MaskGAE), framework for graph-structured data. Different from standard GAEs, MaskGAE adopts modeling (MGM) principled pretext task - portion edges and attempting reconstruct missing part with partially visible, unmasked...

10.1145/3580305.3599546 article EN Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining 2023-08-04

Graph clustering, a fundamental and challenging task in graph mining, aims to classify nodes into several disjoint clusters. In recent years, contrastive learning (GCL) has emerged as dominant line of research clustering advances the new state-of-the-art. However, GCL-based methods heavily rely on augmentations schemes, which may potentially introduce challenges such semantic drift scalability issues. Another promising involves adoption modularity maximization, popular effective measure for...

10.1145/3637528.3671967 article EN cc-by Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining 2024-08-24

In this work, we tackle the challenge of disambiguating queries in retrieval-augmented generation (RAG) to diverse yet answerable interpretations. State-of-the-arts follow a Diversify-then-Verify (DtV) pipeline, where interpretations are generated by an LLM, later used as search retrieve supporting passages. Such process may introduce noise either or retrieval, particularly enterprise settings, LLMs -- trained on static data struggle with domain-specific disambiguations. Thus, post-hoc...

10.48550/arxiv.2502.10352 preprint EN arXiv (Cornell University) 2025-02-14

Knowledge Graph Embedding (KGE) is a fundamental technique that extracts expressive representation from knowledge graph (KG) to facilitate diverse downstream tasks. The emerging federated KGE (FKGE) collaboratively trains distributed KGs held among clients while avoiding exchanging clients' sensitive raw KGs, which can still suffer privacy threats as evidenced in other model trainings (e.g., neural networks). However, quantifying and defending against such remain unexplored for FKGE...

10.1145/3543507.3583450 article EN Proceedings of the ACM Web Conference 2022 2023-04-26

Graph convolutional networks (GCNs) have been shown to be vulnerable small adversarial perturbations, which becomes a severe threat and largely limits their applications in security-critical scenarios. To mitigate such threat, considerable research efforts devoted increasing the robustness of GCNs against attacks. However, current defense approaches are typically designed prevent from untargeted attacks focus on overall performance, making it challenging protect important local nodes more...

10.1145/3583780.3614903 article EN 2023-10-21

Graph autoencoders (GAEs) are self-supervised learning models that can learn meaningful representations of graph-structured data by reconstructing the input graph from a low-dimensional latent space. Over past few years, GAEs have gained significant attention in academia and industry. In particular, recent advent with masked autoencoding schemes marks advancement research. While numerous been proposed, underlying mechanisms not well understood, comprehensive benchmark for is still lacking....

10.48550/arxiv.2410.10241 preprint EN arXiv (Cornell University) 2024-10-14

Imbalanced data are frequently encountered in real-world classification tasks. Previous works on imbalanced learning mostly focused with a minority class of few samples. However, the notion imbalance also applies to cases where contains abundant samples, which is usually case for industrial applications like fraud detection area financial risk management. In this paper, we take population-level approach by proposing new formulation called \emph{ultra-imbalanced classification} (UIC). Under...

10.48550/arxiv.2409.04101 preprint EN arXiv (Cornell University) 2024-09-06

In many practical binary classification applications, such as financial fraud detection or medical diagnosis, it is crucial to optimize a model's performance on high-confidence samples whose scores are higher than specific threshold, which calculated by given false positive rate according requirements. However, the proportion of typically extremely small, especially in long-tailed datasets, can lead poor recall results and an alignment bias between realistic goals loss. To address this...

10.1145/3583780.3614764 article EN 2023-10-21

Over the past few years, graph neural networks (GNNs) have become powerful and practical tools for learning on (static) graph-structure data. However, many real-world applications, such as social e-commerce, involve temporal graphs where nodes edges are dynamically evolving. Temporal (TGNNs) progressively emerged an extension of GNNs to address time-evolving gradually a trending research topic in both academics industry. Advancing application emerging field necessitates development new...

10.48550/arxiv.2311.16605 preprint EN other-oa arXiv (Cornell University) 2023-01-01
Coming Soon ...