NFDI4DS | UHH-SEMS - Publication Details

LOF

OPENALEX - Publications

Markus Breunig Hans‐Peter Kriegel Raymond T. Ng Jörg Sander

For many KDD applications, such as detecting criminal activities in E-commerce, finding the rare instances or outliers, can be more interesting than common patterns. Existing work outlier detection regards being an a binary property. In this paper, we contend that for scenarios, it is meaningful to assign each object degree of outlier. This called local factor (LOF) object. It depends on how isolated with respect surrounding neighborhood. We give detailed formal analysis showing LOF enjoys...

10.1145/335191.335388 article EN ACM SIGMOD Record 2000-05-16

OPTICS

OPENALEX - Publications

Mihael Ankerst Markus Breunig Hans‐Peter Kriegel Jörg Sander

Cluster analysis is a primary method for database mining. It either used as stand-alone tool to get insight into the distribution of data set, e.g. focus further and processing, or preprocessing step other algorithms operating on detected clusters. Almost all well-known clustering require input parameters which are hard determine but have significant influence result. Furthermore, many real-data sets there does not even exist global parameter setting result algorithm describes intrinsic...

10.1145/304181.304187 article EN ACM SIGMOD Record 1999-06-01

LOF

OPENALEX - Publications

Markus Breunig Hans‐Peter Kriegel Raymond T. Ng Jörg Sander

For many KDD applications, such as detecting criminal activities in E-commerce, finding the rare instances or outliers, can be more interesting than common patterns. Existing work outlier detection regards being an a binary property. In this paper, we contend that for scenarios, it is meaningful to assign each object degree of outlier. This called local factor (LOF) object. It depends on how isolated with respect surrounding neighborhood. We give detailed formal analysis showing LOF enjoys...

10.1145/342009.335388 article EN 2000-05-16

DBSCAN Revisited, Revisited

OPENALEX - Publications

Erich Schubert Jörg Sander Martin Ester Hans Peter Kriegel Xiaowei Xu

At SIGMOD 2015, an article was presented with the title “DBSCAN Revisited: Mis-Claim, Un-Fixability, and Approximation” that won conference’s best paper award. In this technical correspondence, we want to point out some inaccuracies in way DBSCAN represented, why criticism should have been directed at assumption about performance of spatial index structures such as R-trees not algorithm can use indexes. We will also discuss relationship indexability dataset, heuristics for choosing...

10.1145/3068335 article EN ACM Transactions on Database Systems 2017-07-31

OPENALEX - Publications

Jörg Sander Martin Ester Hans‐Peter Kriegel Xiaowei Xu

10.1023/a:1009745219419 article EN Data Mining and Knowledge Discovery 1998-01-01

OPTICS

OPENALEX - Publications

Mihael Ankerst Markus Breunig Hans‐Peter Kriegel Jörg Sander

Cluster analysis is a primary method for database mining. It either used as stand-alone tool to get insight into the distribution of data set, e.g. focus further and processing, or preprocessing step other algorithms operating on detected clusters. Almost all well-known clustering require input parameters which are hard determine but have significant influence result. Furthermore, many real-data sets there does not even exist global parameter setting result algorithm describes intrinsic...

10.1145/304182.304187 article EN 1999-06-01

Density‐based clustering

OPENALEX - Publications

Hans‐Peter Kriegel Peer Kröger Jörg Sander Arthur Zimek

Abstract Clustering refers to the task of identifying groups or clusters in a data set. In density‐based clustering , cluster is set objects spread space over contiguous region high density objects. Density‐based are separated from each other by regions low Data located low‐density typically considered noise outliers. © 2011 John Wiley & Sons, Inc. WIREs Mining Knowl Discov 1 231–240 DOI: 10.1002/widm.30 This article categorized under: Technologies > Structure Discovery and

10.1002/widm.30 article EN Wiley Interdisciplinary Reviews Data Mining and Knowledge Discovery 2011-04-05

On the evaluation of unsupervised outlier detection: measures, datasets, and an empirical study

OPENALEX - Publications

Guilherme O. Campos Arthur Zimek Jörg Sander Ricardo J. G. B. Campello Barbora Micenková and 3 more

10.1007/s10618-015-0444-8 article EN Data Mining and Knowledge Discovery 2016-01-16

Hierarchical Density Estimates for Data Clustering, Visualization, and Outlier Detection

OPENALEX - Publications

Ricardo J. G. B. Campello Davoud Moulavi Arthur Zimek Jörg Sander

An integrated framework for density-based cluster analysis, outlier detection, and data visualization is introduced in this article. The main module consists of an algorithm to compute hierarchical estimates the level sets a density, following Hartigan’s classic model density-contour clusters trees. Such generalizes improves existing clustering techniques with respect different aspects. It provides as result complete hierarchy composed all possible nonparametric adopted, infinite range...

10.1145/2733381 article EN ACM Transactions on Knowledge Discovery from Data 2015-07-22

Ensembles for unsupervised outlier detection

OPENALEX - Publications

Arthur Zimek Ricardo J. G. B. Campello Jörg Sander

Ensembles for unsupervised outlier detection is an emerging topic that has been neglected a surprisingly long time (although there are reasons why this more difficult than supervised ensembles or even clustering ensembles). Aggarwal recently discussed algorithmic patterns of ensembles, identified traces the idea in literature, and remarked on potential as well unlikely avenues future transfer concepts from ensembles. Complementary to his points, here we focus core ingredients building...

10.1145/2594473.2594476 article EN ACM SIGKDD Explorations Newsletter 2014-03-17

Density-Based Clustering Validation

OPENALEX - Publications

Davoud Moulavi Pablo Andretta Jaskowiak Ricardo J. G. B. Campello Arthur Zimek Jörg Sander

Previous chapter Next Full AccessProceedings Proceedings of the 2014 SIAM International Conference on Data Mining (SDM)Density-Based Clustering ValidationDavoud Moulavi, Pablo A. Jaskowiak, Ricardo J. G. B. Campello, Arthur Zimek, and Jörg SanderDavoud Sanderpp.839 - 847Chapter DOI:https://doi.org/10.1137/1.9781611973440.96PDFBibTexSections ToolsAdd to favoritesExport CitationTrack CitationsEmail SectionsAboutAbstract One most challenging aspects clustering is validation, which objective...

10.1137/1.9781611973440.96 article EN 2014-04-28

A distribution-based clustering algorithm for mining in large spatial databases

OPENALEX - Publications

Xiaowei Xu Martin Ester H.-P. Kriegel Jörg Sander

The problem of detecting clusters points belonging to a spatial point process arises in many applications. In this paper, we introduce the new clustering algorithm DBCLASD (Distribution-Based Clustering LArge Spatial Databases) discover type. results experiments demonstrate that DBCLASD, contrary partitioning algorithms such as CLARANS (Clustering Large Applications based on RANdomized Search), discovers arbitrary shape. Furthermore, does not require any input parameters, contrast DBSCAN...

10.1109/icde.1998.655795 article EN 2002-11-27

Independent quantization: an index compression technique for high-dimensional data spaces

OPENALEX - Publications

Stefan Berchtold Christian Böhm H. V. Jagadish H.-P. Kriegel Jörg Sander

Two major approaches have been proposed to efficiently process queries in databases: speeding up the search by using index structures, and operating on a compressed database, such as signature file. Both their limitations: indexing techniques are inefficient extreme configurations, high-dimensional spaces, where even simple scan may be cheaper than an index-based search. Compression not very efficient all other situations. We propose combine both for nearest neighbors space. For this...

10.1109/icde.2000.839456 article EN 2002-11-07

OPENALEX - Publications

Martin Ester Alexander Frommelt Hans‐Peter Kriegel Jörg Sander

10.1023/a:1009843930701 article EN Data Mining and Knowledge Discovery 2000-01-01

Subsampling for efficient and effective unsupervised outlier detection ensembles

OPENALEX - Publications

Arthur Zimek Matthew Gaudet Ricardo J. G. B. Campello Jörg Sander

Outlier detection and ensemble learning are well established research directions in data mining yet the application of techniques to outlier has been rarely studied. Here, we propose study subsampling as a technique induce diversity among individual detectors. We show analytically experimentally that an detector based on subsample per se, besides inducing diversity, can, under certain conditions, already improve upon results same complete dataset. Building top several subsamples is further...

10.1145/2487575.2487676 article EN 2013-08-11

Density‐based clustering

OPENALEX - Publications

Ricardo J. G. B. Campello Peer Kröger Jörg Sander Arthur Zimek

Abstract Clustering refers to the task of identifying groups or clusters in a data set. In density‐based clustering , cluster is set objects spread space over contiguous region high density objects. Density‐based are separated from each other by regions low Data located low‐density typically considered noise outliers. this review article we discuss statistical notion clusters, classic algorithms for deriving flat partitioning methods hierarchical clustering, and semi‐supervised clustering....

10.1002/widm.1343 article EN Wiley Interdisciplinary Reviews Data Mining and Knowledge Discovery 2019-10-29

Analysis of SIGMOD's co-authorship graph

OPENALEX - Publications

Mário A. Nascimento Jörg Sander Jeffrey Pound

In this paper we investigate the co-authorship graph obtained from all papers published at SIGMOD between 1975 and 2002. We find some interesting facts, for instance, identity of authors who, on average, are "closest" to other a given time. also show that SIGMOD's is yet another example small world---a topology which has received lot attention recently. A companion web site can be found http://db.cs.ualberta.ca/coauthorship.

10.1145/945721.945722 article EN ACM SIGMOD Record 2003-09-01

Finding non-redundant, statistically significant regions in high dimensional data

OPENALEX - Publications

Gabriela Moise Jörg Sander

Projected and subspace clustering algorithms search for clusters of points in subsets attributes. computes several disjoint clusters, plus outliers, so that each cluster exists its own subset Subspace enumerates all attributes, typically producing many overlapping clusters. One problem existing approaches is their objectives are stated a way not independent the particular algorithm proposed to detect such A second definition density based on user-defined parameters, which makes it hard...

10.1145/1401890.1401956 article EN 2008-08-24