Xuemin Lin

ORCID: 0000-0003-2396-7225
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Data Management and Algorithms
  • Graph Theory and Algorithms
  • Advanced Graph Neural Networks
  • Complex Network Analysis Techniques
  • Advanced Database Systems and Queries
  • Caching and Content Delivery
  • Data Mining Algorithms and Applications
  • Advanced Graph Theory Research
  • Geographic Information Systems Studies
  • Advanced Image and Video Retrieval Techniques
  • Algorithms and Data Compression
  • Data Quality and Management
  • Web Data Mining and Analysis
  • Automated Road and Building Extraction
  • Complexity and Algorithms in Graphs
  • Constraint Satisfaction and Optimization
  • Peer-to-Peer Network Technologies
  • Computational Geometry and Mesh Generation
  • Optimization and Search Problems
  • Topic Modeling
  • Data Visualization and Analytics
  • Privacy-Preserving Technologies in Data
  • Distributed systems and fault tolerance
  • Human Mobility and Location-Based Analysis
  • Semantic Web and Ontologies

Shanghai Jiao Tong University
2022-2025

Foshan University
2024-2025

Sun Yat-sen Memorial Hospital
2024

Sun Yat-sen University
2024

UNSW Sydney
2014-2023

East China Normal University
2013-2022

Shanghai Key Laboratory of Trustworthy Computing
2021

Beijing Urban Construction Design & Development Group (China)
2021

University of Technology Sydney
2013-2020

Zhejiang Lab
2018-2019

With the increasing amount of data and need to integrate from multiple sources, a challenging issue is find near duplicate records efficiently. In this paper, we focus on efficient algorithms pairs such that their similarities are above given threshold. Several existing rely prefix filtering principle avoid computing similarity values for all possible records. We propose new techniques by exploiting ordering information; they integrated into methods drastically reduce candidate sizes hence...

10.1145/1367497.1367516 article EN 2008-04-21

Skyline computation has many applications including multi-criteria decision making. In this paper, we study the problem of selecting k skyline points so that number points, which are dominated by at least one these is maximized. We first present an efficient dynamic programming based exact algorithm in a 2d-space. Then, show NP-hard when dimensionality 3 or more and it can be approximately solved polynomial time with guaranteed approximation ratio 1-1/e. To speed-up computation, efficient,...

10.1109/icde.2007.367854 article EN 2007-04-01

Multiview data clustering attracts more attention than their single-view counterparts due to the fact that leveraging multiple independent and complementary information from multiview feature spaces outperforms single one. spectral aims at yielding partition agreement over local manifold structures by seeking eigenvalue-eigenvector decompositions. Among all methods, low-rank representation (LRR) is effective, exploring consensus beyond low rankness boost performance. However, as we observed,...

10.1109/tnnls.2017.2777489 article EN IEEE Transactions on Neural Networks and Learning Systems 2018-01-04

Graphs are widely used to model complicated data semantics in many applications. In this paper, we aim develop efficient techniques retrieve graphs, containing a given query graph, from large set of graphs. Considering the problem testing subgraph isomorphism is generally NP-hard, most existing based on framework filtering -and- verification reduce precise computation costs; consequently various novel feature-based indexes have been developed. While work well for small phase becomes...

10.14778/1453856.1453899 article EN Proceedings of the VLDB Endowment 2008-08-01

It is widely realized that the integration of database and information retrieval techniques will provide users with a wide range high quality services. In this paper, we study processing an l-keyword query, p1, p2, ···, pl, against relational which can be modeled as weighted graph, G(V, E). Here V set nodes (tuples) E edges representing foreign key references between tuples. Let Vi contain keyword pi. We finding top-k minimum cost connected trees at least one node in every subset Vi, denote...

10.1109/icde.2007.367929 article EN 2007-04-01

With the increasing amount of text data stored in relational databases, there is a demand for RDBMS to support keyword queries over data. As search result often assembled from multiple tables, traditional IR-style ranking and query evaluation methods cannot be applied directly.

10.1145/1247480.1247495 article EN 2007-06-11

More often than not, a multimedia data described by multiple features, such as color and shape can be naturally decomposed of multi-views. Since multi-views provide complementary information to each other, great endeavors have been dedicated leveraging views instead single view achieve the better clustering performance. To effectively exploit correlation consensus among multi-views, in this paper, we study subspace for multi-view while keeping individual well encapsulated. For characterizing...

10.1109/tip.2015.2457339 article EN IEEE Transactions on Image Processing 2015-07-16

With the increasing amount of data and need to integrate from multiple sources, one challenging issues is identify near-duplicate records efficiently. In this article, we focus on efficient algorithms find a pair such that their similarities are no less than given threshold. Several existing rely prefix filtering principle avoid computing similarity values for all possible pairs records. We propose new techniques by exploiting token ordering information; they integrated into methods...

10.1145/2000824.2000825 article EN ACM Transactions on Database Systems 2011-08-01

Nearest neighbor search is a fundamental and essential operation in applications from many domains, such as databases, machine learning, multimedia, computer vision. Because exact searching results are not efficient for high-dimensional space, lot of efforts have turned to approximate nearest search. Although algorithms been continuously proposed the literature each year, there no comprehensive evaluation analysis their performance. In this paper, we conduct experimental state-of-the-art...

10.1109/tkde.2019.2909204 article EN IEEE Transactions on Knowledge and Data Engineering 2019-04-03

In this paper, we study the problem of subgraph matching that extracts all isomorphic embeddings a query graph q in large data G. The existing algorithms for follow Ullmann's backtracking approach; is, iteratively map vertices to by following order vertices. It has been shown is very important aspect efficiency algorithm. Recently, many advanced techniques, such as enforcing connectivity and merging similar or graphs, have proposed provide an effective with aim reduce unpromising...

10.1145/2882903.2915236 article EN Proceedings of the 2022 International Conference on Management of Data 2016-06-14

Given a query photo issued by user (q-user), the landmark retrieval is to return set of photos with their landmarks similar those query, while existing studies on focus exploiting geometries for similarity matches between candidate and photo. We observe that same provided different users over social media community may convey geometry information depending viewpoints and/or angles, may, subsequently, yield very results. In fact, dealing low quality shapes caused photography q-users often...

10.1109/tip.2017.2655449 article EN IEEE Transactions on Image Processing 2017-01-18

Uncertain data is inherent in a few important applications such as environmental surveillance and mobile object tracking. Top-k queries (also known ranking queries) are often natural useful analyzing uncertain those applications. In this paper, we study the problem of answering probabilistic threshold top-k on data, which computes records taking probability at least p to be list where user specified threshold. We present an efficient exact algorithm, fast sampling Poisson approximation based...

10.1145/1376616.1376685 article EN 2008-06-09

We consider the problem of efficiently computing skyline against most recent N elements in a data stream seen so far. Specifically, we study n-of-N queries; that is, for n (/spl forall/n/spl les/N) elements. Firstly, developed an effective pruning technique to minimize number be kept. It can shown on average storing only O(log/sup d/ N) from is sufficient support precise computation all queries d-dimension space if distribution each dimension independent. Then, novel encoding scheme...

10.1109/icde.2005.137 article EN 2005-04-19

There has been considerable interest in similarity join the research community recently. Similarity is a fundamental operation many application areas, such as data integration and cleaning, bioinformatics, pattern recognition. We focus on efficient algorithms for with edit distance constraints. Existing approaches are mainly based converting constraint to weaker number of matching q -grams between pair strings. In this paper, we propose novel perspective investigating mismatching -grams....

10.14778/1453856.1453957 article EN Proceedings of the VLDB Endowment 2008-08-01

Given an integer k, a representative skyline contains the k points that best describe tradeoffs among different dimensions offered by full skyline. Although this topic has been previously studied, existing solution may sometimes produce appear in arbitrarily tiny cluster, and therefore, fail to be representative. Motivated this, we propose new definition of minimizes distance between non-representative point its nearest We also study algorithms for computing distance-based skylines. In 2D...

10.1109/icde.2009.84 article EN Proceedings - International Conference on Data Engineering 2009-03-01

10.1016/0020-0190(93)90079-o article EN Information Processing Letters 1993-10-01

Similarity join is a useful primitive operation underlying many applications, such as near duplicate Web page detection, data integration, and pattern recognition. Traditional similarity joins require user to specify threshold. In this paper, we study variant of the join, termed top-k set join. It returns pairs records ranked by their similarities, thus eliminating guess work users have perform when threshold unknown before hand. An algorithm, topk-join, proposed answer efficiently. based on...

10.1109/icde.2009.111 article EN Proceedings - International Conference on Data Engineering 2009-03-01

Empowering users to access databases using simple keywords can relieve the from steep learning curve of mastering a structured query language and understanding complex possibly fast evolving data schemas. In this tutorial, we give an overview state-of-the-art techniques for supporting keyword search on semi-structured data, including result definition, ranking functions, generation top-k processing, snippet generation, clustering, cleaning, performance optimization, quality evaluation....

10.1145/1559845.1559966 article EN 2009-06-29

Clustering on uncertain data, one of the essential tasks in mining posts significant challenges both modeling similarity between objects and developing efficient computational methods. The previous methods extend traditional partitioning clustering like $(k)$-means density-based DBSCAN to thus rely geometric distances objects. Such cannot handle that are geometrically indistinguishable, such as products with same mean but very different variances customer ratings. Surprisingly, probability...

10.1109/tkde.2011.221 article EN IEEE Transactions on Knowledge and Data Engineering 2011-10-18

Query processing on uncertain data streams has attracted a lot of attentions lately, due to the imprecise nature in generated from variety streaming applications, such as readings sensor network. However, all existing works study unbounded streams. This paper takes first step towards important and challenging problem answering sliding-window queries streams, with focus arguably one most types queries---top- k queries. The challenge top- stems strict space time requirements both arriving...

10.14778/1453856.1453892 article EN Proceedings of the VLDB Endowment 2008-08-01

As graph data is prevalent for an increasing number of Internet applications, continuously monitoring structural patterns in dynamic graphs order to generate real-time alerts and trigger prompt actions becomes critical many applications. In this paper, we present a new system GraphS efficiently detect constrained cycles graph, which changing constantly, return the satisfying real-time. A hot point based index built maintained each query so as greatly speed-up time achieve high throughput....

10.14778/3229863.3229874 article EN Proceedings of the VLDB Endowment 2018-08-01

The CPU cache performance is one of the key issues to efficiency in database systems. It reported that miss latency takes a half execution time To improve performance, there are studies support searching including cache-oblivious, and cache-conscious trees. In this paper, we focus on speedup for graph computing general by reducing ratio different algorithms. approaches dealing with trees not applicable graphs which complex nature.

10.1145/2882903.2915220 article EN Proceedings of the 2022 International Conference on Management of Data 2016-06-16
Coming Soon ...