Jeffrey Xu Yu

ORCID: 0000-0002-9738-827X
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Data Management and Algorithms
  • Advanced Database Systems and Queries
  • Graph Theory and Algorithms
  • Complex Network Analysis Techniques
  • Advanced Graph Neural Networks
  • Data Mining Algorithms and Applications
  • Semantic Web and Ontologies
  • Caching and Content Delivery
  • Algorithms and Data Compression
  • Web Data Mining and Analysis
  • Peer-to-Peer Network Technologies
  • Complexity and Algorithms in Graphs
  • Cloud Computing and Resource Management
  • Advanced Data Storage Technologies
  • Advanced Graph Theory Research
  • Rough Sets and Fuzzy Logic
  • Data Stream Mining Techniques
  • Time Series Analysis and Forecasting
  • Advanced Image and Video Retrieval Techniques
  • Opinion Dynamics and Social Influence
  • Distributed systems and fault tolerance
  • Data Quality and Management
  • Optimization and Search Problems
  • Geographic Information Systems Studies
  • Human Mobility and Location-Based Analysis

Chinese University of Hong Kong
2016-2025

University of Hong Kong
2003-2024

Nanjing Normal University
2024

Kaiser Permanente
2023

Guangzhou University
2023

University of Technology Sydney
2023

University of California, San Diego
2020-2022

Baycrest Hospital
2022

Health Sciences Centre
2022

The University of Texas at Arlington
2021

The goal of graph clustering is to partition vertices in a large into different clusters based on various criteria such as vertex connectivity or neighborhood similarity. Graph techniques are very useful for detecting densely connected groups graph. Many existing methods mainly focus the topological structure clustering, but largely ignore properties which often heterogenous. In this paper, we propose novel algorithm, SA-Cluster , both structural and attribute similarities through unified...

10.14778/1687627.1687709 article EN Proceedings of the VLDB Endowment 2009-08-01

With the increasing amount of data and need to integrate from multiple sources, a challenging issue is find near duplicate records efficiently. In this paper, we focus on efficient algorithms pairs such that their similarities are above given threshold. Several existing rely prefix filtering principle avoid computing similarity values for all possible records. We propose new techniques by exploiting ordering information; they integrated into methods drastically reduce candidate sizes hence...

10.1145/1367497.1367516 article EN 2008-04-21

Community detection which discovers densely connected structures in a network has been studied lot. In this paper, we study online community search is practically useful but less the literature. Given query vertex graph, problem to find meaningful communities that belongs an manner. We propose novel model based on k-truss concept, brings nice structural and computational properties. design compact elegant index structure supports efficient of with linear cost respect size. addition,...

10.1145/2588555.2610495 article EN 2014-06-18

Graphs are widely used to model complicated data semantics in many applications. In this paper, we aim develop efficient techniques retrieve graphs, containing a given query graph, from large set of graphs. Considering the problem testing subgraph isomorphism is generally NP-hard, most existing based on framework filtering -and- verification reduce precise computation costs; consequently various novel feature-based indexes have been developed. While work well for small phase becomes...

10.14778/1453856.1453899 article EN Proceedings of the VLDB Endowment 2008-08-01

It is widely realized that the integration of database and information retrieval techniques will provide users with a wide range high quality services. In this paper, we study processing an l-keyword query, p1, p2, ···, pl, against relational which can be modeled as weighted graph, G(V, E). Here V set nodes (tuples) E edges representing foreign key references between tuples. Let Vi contain keyword pi. We finding top-k minimum cost connected trees at least one node in every subset Vi, denote...

10.1109/icde.2007.367929 article EN 2007-04-01

With the increasing amount of data and need to integrate from multiple sources, one challenging issues is identify near-duplicate records efficiently. In this article, we focus on efficient algorithms find a pair such that their similarities are no less than given threshold. Several existing rely prefix filtering principle avoid computing similarity values for all possible pairs records. We propose new techniques by exploiting token ordering information; they integrated into methods...

10.1145/2000824.2000825 article EN ACM Transactions on Database Systems 2011-08-01

RDF question/answering (Q/A) allows users to ask questions in natural languages over a knowledge base represented by RDF. To answer language question, the existing work takes two-stage approach: question understanding and query evaluation. Their focus is on deal with disambiguation of phrases. The most common technique joint disambiguation, which has exponential search space. In this paper, we propose systematic framework repository (RDF Q/A) from graph data-driven perspective. We semantic...

10.1109/tkde.2017.2766634 article EN IEEE Transactions on Knowledge and Data Engineering 2017-10-26

RDF question/answering (Q/A) allows users to ask questions in natural languages over a knowledge base represented by RDF. To answer national language question, the existing work takes two-stage approach: question understanding and query evaluation. Their focus is on deal with disambiguation of phrases. The most common technique joint disambiguation, which has exponential search space. In this paper, we propose systematic framework repository (RDF Q/A) from graph data-driven perspective. We...

10.1145/2588555.2610525 article EN 2014-06-18

Community search is a problem of finding densely connected subgraphs that satisfy the query conditions in network, which has attracted much attention recent years. However, all previous studies on community do not consider influence community. In this paper, we introduce novel model called k -influential based concept -core, can capture Based new model, propose linear-time online algorithm to find top- r communities network. To further speed up influential algorithm, devise linear-space...

10.14778/2735479.2735484 article EN Proceedings of the VLDB Endowment 2015-01-01

Recently, there has been significant interest in the study of community search problem social and information networks: given one or more query nodes, find densely connected communities containing nodes. However, most existing studies do not address "free rider" issue, that is, nodes far away from irrelevant to them are included detected community. Some state-of-the-art models have attempted this but only their formulated problems NP-hard, they admit any approximations without restrictive...

10.14778/2856318.2856323 article EN Proceedings of the VLDB Endowment 2015-12-01

Influence maximization has recently received significant attention for scheduling online campaigns or advertisements on social network platforms. However, most studies only focus user influence via cyber interactions while ignoring their physical which are also essential to gauge propagation. Additionally, targeted have not sufficient attention. To address these issues, we first devise a novel holistic diffusion model that takes into account both and in an effective practical way. Based the...

10.1109/tkde.2020.3003047 article EN IEEE Transactions on Knowledge and Data Engineering 2020-01-01

Graph reachability is fundamental to a wide range of applications, including XML indexing, geographic navigation, Internet routing, ontology queries based on RDF/OWL, etc. Many applications involve huge graphs and require fast answering queries. Several labeling methods have been proposed for this purpose. They assign labels the vertices, such that between any two vertices may be decided using their only. For sparse graphs, 2-hop schemes answer efficiently relatively small label space....

10.1109/icde.2006.53 article EN 2006-01-01

The spatial and temporal databases have been studied widely intensively over years. In this paper, we study how to answer queries of finding the best departure time that minimizes total travel from a place another, road network, where traffic conditions dynamically change time. We generalized form problem, called time-dependent shortest-path problem. A graph GT is has an edge-delay function, wi, j(t), associated with each edge (vi, vj), be stored in database. function j(t) specifies much it...

10.1145/1353343.1353371 article EN 2008-03-25

Traditionally, building a classifier requires two sets of examples: positive examples and negative examples. This paper studies the problem text using (P) unlabeled (U). The are mixed with both Since no example is given explicitly, task reliable becomes far more challenging. Simply treating all as thereafter undoubtedly poor approach to tackling this problem. Generally speaking, most solved by two-step heuristic: first, extract (N) from U. Second, build based on P N. Surprisingly, did not...

10.1109/tkde.2006.16 article EN IEEE Transactions on Knowledge and Data Engineering 2006-01-01

In recent years, many networks have become available for analysis, including social networks, sensor biological etc. Graph clustering has shown its effectiveness in analyzing and visualizing large networks. The goal of graph is to partition vertices a into clusters based on various criteria such as vertex connectivity or neighborhood similarity. Many existing methods mainly focus the topological structures, but largely ignore properties which are often heterogeneous. Recently, new algorithm,...

10.1109/icdm.2010.41 article EN 2010-12-01

Due to rapid growth of the Internet technology and new scientific/technological advances, number applications that model data as graphs increases, because have high expressive power complicated structures. The dominance in real-world asks for graph management so users can access effectively efficiently. In this paper, we study a pattern matching problem over large graph. is find all patterns match user-given pattern. We propose two-step R-join (reachability join) algorithm with filter step...

10.1109/icde.2008.4497500 article EN 2008-04-01

The k-core decomposition in a graph is fundamental problem for social network analysis. of to calculate the core number every node graph. Previous studies mainly focus on static There exists linear time algorithm However, many real-world applications such as online networks and Internet, typically evolves overtime. In applications, key issue maintain numbers nodes when changes A simple implementation perform recompute after updated. Such expensive very large. this paper, we propose new...

10.1109/tkde.2013.158 article EN IEEE Transactions on Knowledge and Data Engineering 2013-09-27

Revealing the latent community structure, which is crucial to understanding features of networks, an important problem in network and graph analysis. During last decade, many approaches have been proposed solve this challenging diverse ways, i.e. different measures or data structures. Unfortunately, experimental reports on existing techniques fell short validity integrity since comparisons were not based a unified code base merely discussed theory. We engage in-depth benchmarking study...

10.14778/2794367.2794370 article EN Proceedings of the VLDB Endowment 2015-06-01

Social networks, sensor biological and many other information networks can be modeled as a large graph. Graph vertices represent entities, graph edges their relationships or interactions. In graphs, there is usually one more attributes associated with every vertex to describe its properties. application domains, clustering techniques are very useful for detecting densely connected groups in well understanding visualizing The goal of partition into different clusters based on various criteria...

10.1145/1921632.1921638 article EN ACM Transactions on Knowledge Discovery from Data 2011-02-01

Query processing on uncertain data streams has attracted a lot of attentions lately, due to the imprecise nature in generated from variety streaming applications, such as readings sensor network. However, all existing works study unbounded streams. This paper takes first step towards important and challenging problem answering sliding-window queries streams, with focus arguably one most types queries---top- k queries. The challenge top- stems strict space time requirements both arriving...

10.14778/1453856.1453892 article EN Proceedings of the VLDB Endowment 2008-08-01

Maximal clique enumeration is a fundamental problem in graph theory and has important applications many areas such as social network analysis bioinformatics. The extensively studied; however, the best existing algorithms require memory space linear size of input graph. This become serious concern view massive volume today's fast-growing networks. We propose general framework for designing external-memory maximal large graphs. enables to be processed recursively small subgraphs graph, thus...

10.1145/2043652.2043654 article EN ACM Transactions on Database Systems 2011-12-01
Coming Soon ...