NFDI4DS | UHH-SEMS - Publication Details

Xuemin Lin

ORCID: 0000-0003-2396-7225

Publications

Citations

Views

---

Saved

---

About

Contact & Profiles

A5079659938

Research Areas

Data Management and Algorithms
Graph Theory and Algorithms
Advanced Graph Neural Networks
Complex Network Analysis Techniques
Advanced Database Systems and Queries
Caching and Content Delivery
Data Mining Algorithms and Applications
Advanced Graph Theory Research
Geographic Information Systems Studies
Advanced Image and Video Retrieval Techniques
Algorithms and Data Compression
Data Quality and Management
Web Data Mining and Analysis
Automated Road and Building Extraction
Complexity and Algorithms in Graphs
Constraint Satisfaction and Optimization
Peer-to-Peer Network Technologies
Computational Geometry and Mesh Generation
Optimization and Search Problems
Topic Modeling
Data Visualization and Analytics
Privacy-Preserving Technologies in Data
Distributed systems and fault tolerance
Human Mobility and Location-Based Analysis
Semantic Web and Ontologies

Shanghai Jiao Tong University
2022-2025

Foshan University
2024-2025

Sun Yat-sen Memorial Hospital
2024

Sun Yat-sen University
2024

UNSW Sydney
2014-2023

East China Normal University
2013-2022

Shanghai Key Laboratory of Trustworthy Computing
2021

Beijing Urban Construction Design & Development Group (China)
2021

University of Technology Sydney
2013-2020

Zhejiang Lab
2018-2019

Efficient similarity joins for near duplicate detection

OPENALEX - Publications

Chuan Xiao Wei Wang Xuemin Lin Jeffrey Xu Yu

With the increasing amount of data and need to integrate from multiple sources, a challenging issue is find near duplicate records efficiently. In this paper, we focus on efficient algorithms pairs such that their similarities are above given threshold. Several existing rely prefix filtering principle avoid computing similarity values for all possible records. We propose new techniques by exploiting ordering information; they integrated into methods drastically reduce candidate sizes hence...

10.1145/1367497.1367516 article EN 2008-04-21

Selecting Stars: The k Most Representative Skyline Operator

OPENALEX - Publications

Xuemin Lin Yidong Yuan Qing Zhang Ying Zhang

Skyline computation has many applications including multi-criteria decision making. In this paper, we study the problem of selecting k skyline points so that number points, which are dominated by at least one these is maximized. We first present an efficient dynamic programming based exact algorithm in a 2d-space. Then, show NP-hard when dimensionality 3 or more and it can be approximately solved polynomial time with guaranteed approximation ratio 1-1/e. To speed-up computation, efficient,...

10.1109/icde.2007.367854 article EN 2007-04-01

Multiview Spectral Clustering via Structured Low-Rank Matrix Factorization

OPENALEX - Publications

Yang Wang Lin Wu Xuemin Lin Junbin Gao

Multiview data clustering attracts more attention than their single-view counterparts due to the fact that leveraging multiple independent and complementary information from multiview feature spaces outperforms single one. spectral aims at yielding partition agreement over local manifold structures by seeking eigenvalue-eigenvector decompositions. Among all methods, low-rank representation (LRR) is effective, exploring consensus beyond low rankness boost performance. However, as we observed,...

10.1109/tnnls.2017.2777489 article EN IEEE Transactions on Neural Networks and Learning Systems 2018-01-04

Taming verification hardness

OPENALEX - Publications

Haichuan Shang Ying Zhang Xuemin Lin Jeffrey Xu Yu

Graphs are widely used to model complicated data semantics in many applications. In this paper, we aim develop efficient techniques retrieve graphs, containing a given query graph, from large set of graphs. Considering the problem testing subgraph isomorphism is generally NP-hard, most existing based on framework filtering -and- verification reduce precise computation costs; consequently various novel feature-based indexes have been developed. While work well for small phase becomes...

10.14778/1453856.1453899 article EN Proceedings of the VLDB Endowment 2008-08-01

Finding Top-k Min-Cost Connected Trees in Databases

OPENALEX - Publications

Bolin Ding Jeffrey Xu Yu Shan Wang Lu Qin Xiao Zhang and 1 more

It is widely realized that the integration of database and information retrieval techniques will provide users with a wide range high quality services. In this paper, we study processing an l-keyword query, p1, p2, ···, pl, against relational which can be modeled as weighted graph, G(V, E). Here V set nodes (tuples) E edges representing foreign key references between tuples. Let Vi contain keyword pi. We finding top-k minimum cost connected trees at least one node in every subset Vi, denote...

10.1109/icde.2007.367929 article EN 2007-04-01

Spark

OPENALEX - Publications

Yi Luo Xuemin Lin Wei Wang Xiaofang Zhou

With the increasing amount of text data stored in relational databases, there is a demand for RDBMS to support keyword queries over data. As search result often assembled from multiple tables, traditional IR-style ranking and query evaluation methods cannot be applied directly.

10.1145/1247480.1247495 article EN 2007-06-11

Robust Subspace Clustering for Multi-View Data by Exploiting Correlation Consensus

OPENALEX - Publications

Yang Wang Xuemin Lin Lin Wu Wenjie Zhang Qing Zhang and 1 more

More often than not, a multimedia data described by multiple features, such as color and shape can be naturally decomposed of multi-views. Since multi-views provide complementary information to each other, great endeavors have been dedicated leveraging views instead single view achieve the better clustering performance. To effectively exploit correlation consensus among multi-views, in this paper, we study subspace for multi-view while keeping individual well encapsulated. For characterizing...

10.1109/tip.2015.2457339 article EN IEEE Transactions on Image Processing 2015-07-16

Efficient similarity joins for near-duplicate detection

OPENALEX - Publications

Chuan Xiao Wei Wang Xuemin Lin Jeffrey Xu Yu Guoren Wang

With the increasing amount of data and need to integrate from multiple sources, one challenging issues is identify near-duplicate records efficiently. In this article, we focus on efficient algorithms find a pair such that their similarities are no less than given threshold. Several existing rely prefix filtering principle avoid computing similarity values for all possible pairs records. We propose new techniques by exploiting token ordering information; they integrated into methods...

10.1145/2000824.2000825 article EN ACM Transactions on Database Systems 2011-08-01

Approximate Nearest Neighbor Search on High Dimensional Data — Experiments, Analyses, and Improvement

OPENALEX - Publications

Wen Li Ying Zhang Yifang Sun Wei Wang Mingjie Li and 2 more

Nearest neighbor search is a fundamental and essential operation in applications from many domains, such as databases, machine learning, multimedia, computer vision. Because exact searching results are not efficient for high-dimensional space, lot of efforts have turned to approximate nearest search. Although algorithms been continuously proposed the literature each year, there no comprehensive evaluation analysis their performance. In this paper, we conduct experimental state-of-the-art...

10.1109/tkde.2019.2909204 article EN IEEE Transactions on Knowledge and Data Engineering 2019-04-03

A survey of community search over big graphs

OPENALEX - Publications

Yixiang Fang Xin Huang Lu Qin Ying Zhang Wenjie Zhang and 2 more

10.1007/s00778-019-00556-x article EN The VLDB Journal 2019-07-20

Efficient Subgraph Matching by Postponing Cartesian Products

OPENALEX - Publications

Fei Bi Lijun Chang Xuemin Lin Lu Qin Wenjie Zhang

In this paper, we study the problem of subgraph matching that extracts all isomorphic embeddings a query graph q in large data G. The existing algorithms for follow Ullmann's backtracking approach; is, iteratively map vertices to by following order vertices. It has been shown is very important aspect efficiency algorithm. Recently, many advanced techniques, such as enforcing connectivity and merging similar or graphs, have proposed provide an effective with aim reduce unpromising...

10.1145/2882903.2915236 article EN Proceedings of the 2022 International Conference on Management of Data 2016-06-14

Effective Multi-Query Expansions: Collaborative Deep Networks for Robust Landmark Retrieval

OPENALEX - Publications

Yang Wang Xuemin Lin Lin Wu Wenjie Zhang

Given a query photo issued by user (q-user), the landmark retrieval is to return set of photos with their landmarks similar those query, while existing studies on focus exploiting geometries for similarity matches between candidate and photo. We observe that same provided different users over social media community may convey geometry information depending viewpoints and/or angles, may, subsequently, yield very results. In fact, dealing low quality shapes caused photography q-users often...

10.1109/tip.2017.2655449 article EN IEEE Transactions on Image Processing 2017-01-18

Ranking queries on uncertain data

OPENALEX - Publications

Hua Ming Jian Pei Wenjie Zhang Xuemin Lin

Uncertain data is inherent in a few important applications such as environmental surveillance and mobile object tracking. Top-k queries (also known ranking queries) are often natural useful analyzing uncertain those applications. In this paper, we study the problem of answering probabilistic threshold top-k on data, which computes records taking probability at least p to be list where user specified threshold. We present an efficient exact algorithm, fast sampling Poisson approximation based...

10.1145/1376616.1376685 article EN 2008-06-09

Stabbing the Sky: Efficient Skyline Computation over Sliding Windows

OPENALEX - Publications

Xuemin Lin Yidong Yuan Wei Wang Hongjun Lü

We consider the problem of efficiently computing skyline against most recent N elements in a data stream seen so far. Specifically, we study n-of-N queries; that is, for n (/spl forall/n/spl les/N) elements. Firstly, developed an effective pruning technique to minimize number be kept. It can shown on average storing only O(log/sup d/ N) from is sufficient support precise computation all queries d-dimension space if distribution each dimension independent. Then, novel encoding scheme...

10.1109/icde.2005.137 article EN 2005-04-19

Ed-Join

OPENALEX - Publications

Chuan Xiao Wei Wang Xuemin Lin

There has been considerable interest in similarity join the research community recently. Similarity is a fundamental operation many application areas, such as data integration and cleaning, bioinformatics, pattern recognition. We focus on efficient algorithms for with edit distance constraints. Existing approaches are mainly based converting constraint to weaker number of matching q -grams between pair strings. In this paper, we propose novel perspective investigating mismatching -grams....

10.14778/1453856.1453957 article EN Proceedings of the VLDB Endowment 2008-08-01

Distance-Based Representative Skyline

OPENALEX - Publications

Yufei Tao Ling Ding Xuemin Lin Jian Pei

Given an integer k, a representative skyline contains the k points that best describe tradeoffs among different dimensions offered by full skyline. Although this topic has been previously studied, existing solution may sometimes produce appear in arbitrarily tiny cluster, and therefore, fail to be representative. Motivated this, we propose new definition of minimizes distance between non-representative point its nearest We also study algorithms for computing distance-based skylines. In 2D...

10.1109/icde.2009.84 article EN Proceedings - International Conference on Data Engineering 2009-03-01

A fast and effective heuristic for the feedback arc set problem

OPENALEX - Publications

Peter Eades Xuemin Lin W.F. Smyth

10.1016/0020-0190(93)90079-o article EN Information Processing Letters 1993-10-01

Top-k Set Similarity Joins

OPENALEX - Publications

Chuan Xiao Wei Wang Xuemin Lin Haichuan Shang

Similarity join is a useful primitive operation underlying many applications, such as near duplicate Web page detection, data integration, and pattern recognition. Traditional similarity joins require user to specify threshold. In this paper, we study variant of the join, termed top-k set join. It returns pairs records ranked by their similarities, thus eliminating guess work users have perform when threshold unknown before hand. An algorithm, topk-join, proposed answer efficiently. based on...

10.1109/icde.2009.111 article EN Proceedings - International Conference on Data Engineering 2009-03-01

Keyword search on structured and semi-structured data

OPENALEX - Publications

Yi Chen Wei Wang Ziyang Liu Xuemin Lin

Empowering users to access databases using simple keywords can relieve the from steep learning curve of mastering a structured query language and understanding complex possibly fast evolving data schemas. In this tutorial, we give an overview state-of-the-art techniques for supporting keyword search on semi-structured data, including result definition, ranking functions, generation top-k processing, snippet generation, clustering, cleaning, performance optimization, quality evaluation....

10.1145/1559845.1559966 article EN 2009-06-29

Clustering Uncertain Data Based on Probability Distribution Similarity

OPENALEX - Publications

Bin Jiang Jian Pei Yufei Tao Xuemin Lin

Clustering on uncertain data, one of the essential tasks in mining posts significant challenges both modeling similarity between objects and developing efficient computational methods. The previous methods extend traditional partitioning clustering like $(k)$-means density-based DBSCAN to thus rely geometric distances objects. Such cannot handle that are geometrically indistinguishable, such as products with same mean but very different variances customer ratings. Surprisingly, probability...

10.1109/tkde.2011.221 article EN IEEE Transactions on Knowledge and Data Engineering 2011-10-18

Sliding-window top-k queries on uncertain streams

OPENALEX - Publications

Cheqing Jin Ke Yi Lei Chen Jeffrey Xu Yu Xuemin Lin

Query processing on uncertain data streams has attracted a lot of attentions lately, due to the imprecise nature in generated from variety streaming applications, such as readings sensor network. However, all existing works study unbounded streams. This paper takes first step towards important and challenging problem answering sliding-window queries streams, with focus arguably one most types queries---top- k queries. The challenge top- stems strict space time requirements both arriving...

10.14778/1453856.1453892 article EN Proceedings of the VLDB Endowment 2008-08-01

Real-time constrained cycle detection in large dynamic graphs

OPENALEX - Publications

Xiafei Qiu Wubin Cen Zhengping Qian You Peng Ying Zhang and 2 more

As graph data is prevalent for an increasing number of Internet applications, continuously monitoring structural patterns in dynamic graphs order to generate real-time alerts and trigger prompt actions becomes critical many applications. In this paper, we present a new system GraphS efficiently detect constrained cycles graph, which changing constantly, return the satisfying real-time. A hot point based index built maintained each query so as greatly speed-up time achieve high throughput....

10.14778/3229863.3229874 article EN Proceedings of the VLDB Endowment 2018-08-01

Speedup Graph Processing by Graph Ordering

OPENALEX - Publications

Hao Wei Jeffrey Xu Yu Can Lu Xuemin Lin

The CPU cache performance is one of the key issues to efficiency in database systems. It reported that miss latency takes a half execution time To improve performance, there are studies support searching including cache-oblivious, and cache-conscious trees. In this paper, we focus on speedup for graph computing general by reducing ratio different algorithms. approaches dealing with trees not applicable graphs which complex nature.

10.1145/2882903.2915220 article EN Proceedings of the 2022 International Conference on Management of Data 2016-06-16

Coming Soon ...