- Data Management and Algorithms
- Graph Theory and Algorithms
- Advanced Graph Neural Networks
- Complex Network Analysis Techniques
- Advanced Database Systems and Queries
- Caching and Content Delivery
- Data Mining Algorithms and Applications
- Advanced Graph Theory Research
- Geographic Information Systems Studies
- Advanced Image and Video Retrieval Techniques
- Algorithms and Data Compression
- Data Quality and Management
- Web Data Mining and Analysis
- Automated Road and Building Extraction
- Complexity and Algorithms in Graphs
- Constraint Satisfaction and Optimization
- Peer-to-Peer Network Technologies
- Computational Geometry and Mesh Generation
- Optimization and Search Problems
- Topic Modeling
- Data Visualization and Analytics
- Privacy-Preserving Technologies in Data
- Distributed systems and fault tolerance
- Human Mobility and Location-Based Analysis
- Semantic Web and Ontologies
Shanghai Jiao Tong University
2022-2025
Foshan University
2024-2025
Sun Yat-sen Memorial Hospital
2024
Sun Yat-sen University
2024
UNSW Sydney
2014-2023
East China Normal University
2013-2022
Shanghai Key Laboratory of Trustworthy Computing
2021
Beijing Urban Construction Design & Development Group (China)
2021
University of Technology Sydney
2013-2020
Zhejiang Lab
2018-2019
With the increasing amount of data and need to integrate from multiple sources, a challenging issue is find near duplicate records efficiently. In this paper, we focus on efficient algorithms pairs such that their similarities are above given threshold. Several existing rely prefix filtering principle avoid computing similarity values for all possible records. We propose new techniques by exploiting ordering information; they integrated into methods drastically reduce candidate sizes hence...
Skyline computation has many applications including multi-criteria decision making. In this paper, we study the problem of selecting k skyline points so that number points, which are dominated by at least one these is maximized. We first present an efficient dynamic programming based exact algorithm in a 2d-space. Then, show NP-hard when dimensionality 3 or more and it can be approximately solved polynomial time with guaranteed approximation ratio 1-1/e. To speed-up computation, efficient,...
Multiview data clustering attracts more attention than their single-view counterparts due to the fact that leveraging multiple independent and complementary information from multiview feature spaces outperforms single one. spectral aims at yielding partition agreement over local manifold structures by seeking eigenvalue-eigenvector decompositions. Among all methods, low-rank representation (LRR) is effective, exploring consensus beyond low rankness boost performance. However, as we observed,...
Graphs are widely used to model complicated data semantics in many applications. In this paper, we aim develop efficient techniques retrieve graphs, containing a given query graph, from large set of graphs. Considering the problem testing subgraph isomorphism is generally NP-hard, most existing based on framework filtering -and- verification reduce precise computation costs; consequently various novel feature-based indexes have been developed. While work well for small phase becomes...
It is widely realized that the integration of database and information retrieval techniques will provide users with a wide range high quality services. In this paper, we study processing an l-keyword query, p1, p2, ···, pl, against relational which can be modeled as weighted graph, G(V, E). Here V set nodes (tuples) E edges representing foreign key references between tuples. Let Vi contain keyword pi. We finding top-k minimum cost connected trees at least one node in every subset Vi, denote...
With the increasing amount of text data stored in relational databases, there is a demand for RDBMS to support keyword queries over data. As search result often assembled from multiple tables, traditional IR-style ranking and query evaluation methods cannot be applied directly.
More often than not, a multimedia data described by multiple features, such as color and shape can be naturally decomposed of multi-views. Since multi-views provide complementary information to each other, great endeavors have been dedicated leveraging views instead single view achieve the better clustering performance. To effectively exploit correlation consensus among multi-views, in this paper, we study subspace for multi-view while keeping individual well encapsulated. For characterizing...
With the increasing amount of data and need to integrate from multiple sources, one challenging issues is identify near-duplicate records efficiently. In this article, we focus on efficient algorithms find a pair such that their similarities are no less than given threshold. Several existing rely prefix filtering principle avoid computing similarity values for all possible pairs records. We propose new techniques by exploiting token ordering information; they integrated into methods...
Nearest neighbor search is a fundamental and essential operation in applications from many domains, such as databases, machine learning, multimedia, computer vision. Because exact searching results are not efficient for high-dimensional space, lot of efforts have turned to approximate nearest search. Although algorithms been continuously proposed the literature each year, there no comprehensive evaluation analysis their performance. In this paper, we conduct experimental state-of-the-art...
In this paper, we study the problem of subgraph matching that extracts all isomorphic embeddings a query graph q in large data G. The existing algorithms for follow Ullmann's backtracking approach; is, iteratively map vertices to by following order vertices. It has been shown is very important aspect efficiency algorithm. Recently, many advanced techniques, such as enforcing connectivity and merging similar or graphs, have proposed provide an effective with aim reduce unpromising...
Given a query photo issued by user (q-user), the landmark retrieval is to return set of photos with their landmarks similar those query, while existing studies on focus exploiting geometries for similarity matches between candidate and photo. We observe that same provided different users over social media community may convey geometry information depending viewpoints and/or angles, may, subsequently, yield very results. In fact, dealing low quality shapes caused photography q-users often...
Uncertain data is inherent in a few important applications such as environmental surveillance and mobile object tracking. Top-k queries (also known ranking queries) are often natural useful analyzing uncertain those applications. In this paper, we study the problem of answering probabilistic threshold top-k on data, which computes records taking probability at least p to be list where user specified threshold. We present an efficient exact algorithm, fast sampling Poisson approximation based...
We consider the problem of efficiently computing skyline against most recent N elements in a data stream seen so far. Specifically, we study n-of-N queries; that is, for n (/spl forall/n/spl les/N) elements. Firstly, developed an effective pruning technique to minimize number be kept. It can shown on average storing only O(log/sup d/ N) from is sufficient support precise computation all queries d-dimension space if distribution each dimension independent. Then, novel encoding scheme...
There has been considerable interest in similarity join the research community recently. Similarity is a fundamental operation many application areas, such as data integration and cleaning, bioinformatics, pattern recognition. We focus on efficient algorithms for with edit distance constraints. Existing approaches are mainly based converting constraint to weaker number of matching q -grams between pair strings. In this paper, we propose novel perspective investigating mismatching -grams....
Given an integer k, a representative skyline contains the k points that best describe tradeoffs among different dimensions offered by full skyline. Although this topic has been previously studied, existing solution may sometimes produce appear in arbitrarily tiny cluster, and therefore, fail to be representative. Motivated this, we propose new definition of minimizes distance between non-representative point its nearest We also study algorithms for computing distance-based skylines. In 2D...
Similarity join is a useful primitive operation underlying many applications, such as near duplicate Web page detection, data integration, and pattern recognition. Traditional similarity joins require user to specify threshold. In this paper, we study variant of the join, termed top-k set join. It returns pairs records ranked by their similarities, thus eliminating guess work users have perform when threshold unknown before hand. An algorithm, topk-join, proposed answer efficiently. based on...
Empowering users to access databases using simple keywords can relieve the from steep learning curve of mastering a structured query language and understanding complex possibly fast evolving data schemas. In this tutorial, we give an overview state-of-the-art techniques for supporting keyword search on semi-structured data, including result definition, ranking functions, generation top-k processing, snippet generation, clustering, cleaning, performance optimization, quality evaluation....
Clustering on uncertain data, one of the essential tasks in mining posts significant challenges both modeling similarity between objects and developing efficient computational methods. The previous methods extend traditional partitioning clustering like $(k)$-means density-based DBSCAN to thus rely geometric distances objects. Such cannot handle that are geometrically indistinguishable, such as products with same mean but very different variances customer ratings. Surprisingly, probability...
Query processing on uncertain data streams has attracted a lot of attentions lately, due to the imprecise nature in generated from variety streaming applications, such as readings sensor network. However, all existing works study unbounded streams. This paper takes first step towards important and challenging problem answering sliding-window queries streams, with focus arguably one most types queries---top- k queries. The challenge top- stems strict space time requirements both arriving...
As graph data is prevalent for an increasing number of Internet applications, continuously monitoring structural patterns in dynamic graphs order to generate real-time alerts and trigger prompt actions becomes critical many applications. In this paper, we present a new system GraphS efficiently detect constrained cycles graph, which changing constantly, return the satisfying real-time. A hot point based index built maintained each query so as greatly speed-up time achieve high throughput....
The CPU cache performance is one of the key issues to efficiency in database systems. It reported that miss latency takes a half execution time To improve performance, there are studies support searching including cache-oblivious, and cache-conscious trees. In this paper, we focus on speedup for graph computing general by reducing ratio different algorithms. approaches dealing with trees not applicable graphs which complex nature.