- Data Management and Algorithms
- Advanced Clustering Algorithms Research
- Data Mining Algorithms and Applications
- Anomaly Detection Techniques and Applications
- Advanced Database Systems and Queries
- Algorithms and Data Compression
- Time Series Analysis and Forecasting
- Complex Network Analysis Techniques
- Bayesian Methods and Mixture Models
- Gene expression and cancer classification
- Energy Efficient Wireless Sensor Networks
- Advanced Statistical Methods and Models
- Data Visualization and Analytics
- Face and Expression Recognition
- Water Systems and Optimization
- Fault Detection and Control Systems
- Imbalanced Data Classification Techniques
- Bioinformatics and Genomic Networks
- Engineering and Materials Science Studies
- Data Stream Mining Techniques
- Opportunistic and Delay-Tolerant Networks
- Geographic Information Systems Studies
- Machine Learning and Data Classification
- Advanced Image and Video Retrieval Techniques
- Artificial Immune Systems Applications
University of Alberta
2015-2024
Data61
2017
University of Amsterdam
2017
Alberta Innovates
2015
IBM (Canada)
2014
Ludwig-Maximilians-Universität München
1997-2002
University of Michigan
2002
University of British Columbia
2001
Institut für Urheber- und Medienrecht
1997
University of Tübingen
1971
For many KDD applications, such as detecting criminal activities in E-commerce, finding the rare instances or outliers, can be more interesting than common patterns. Existing work outlier detection regards being an a binary property. In this paper, we contend that for scenarios, it is meaningful to assign each object degree of outlier. This called local factor (LOF) object. It depends on how isolated with respect surrounding neighborhood. We give detailed formal analysis showing LOF enjoys...
Cluster analysis is a primary method for database mining. It either used as stand-alone tool to get insight into the distribution of data set, e.g. focus further and processing, or preprocessing step other algorithms operating on detected clusters. Almost all well-known clustering require input parameters which are hard determine but have significant influence result. Furthermore, many real-data sets there does not even exist global parameter setting result algorithm describes intrinsic...
For many KDD applications, such as detecting criminal activities in E-commerce, finding the rare instances or outliers, can be more interesting than common patterns. Existing work outlier detection regards being an a binary property. In this paper, we contend that for scenarios, it is meaningful to assign each object degree of outlier. This called local factor (LOF) object. It depends on how isolated with respect surrounding neighborhood. We give detailed formal analysis showing LOF enjoys...
At SIGMOD 2015, an article was presented with the title “DBSCAN Revisited: Mis-Claim, Un-Fixability, and Approximation” that won conference’s best paper award. In this technical correspondence, we want to point out some inaccuracies in way DBSCAN represented, why criticism should have been directed at assumption about performance of spatial index structures such as R-trees not algorithm can use indexes. We will also discuss relationship indexability dataset, heuristics for choosing...
Cluster analysis is a primary method for database mining. It either used as stand-alone tool to get insight into the distribution of data set, e.g. focus further and processing, or preprocessing step other algorithms operating on detected clusters. Almost all well-known clustering require input parameters which are hard determine but have significant influence result. Furthermore, many real-data sets there does not even exist global parameter setting result algorithm describes intrinsic...
Abstract Clustering refers to the task of identifying groups or clusters in a data set. In density‐based clustering , cluster is set objects spread space over contiguous region high density objects. Density‐based are separated from each other by regions low Data located low‐density typically considered noise outliers. © 2011 John Wiley & Sons, Inc. WIREs Mining Knowl Discov 1 231–240 DOI: 10.1002/widm.30 This article categorized under: Technologies > Structure Discovery and
An integrated framework for density-based cluster analysis, outlier detection, and data visualization is introduced in this article. The main module consists of an algorithm to compute hierarchical estimates the level sets a density, following Hartigan’s classic model density-contour clusters trees. Such generalizes improves existing clustering techniques with respect different aspects. It provides as result complete hierarchy composed all possible nonparametric adopted, infinite range...
Ensembles for unsupervised outlier detection is an emerging topic that has been neglected a surprisingly long time (although there are reasons why this more difficult than supervised ensembles or even clustering ensembles). Aggarwal recently discussed algorithmic patterns of ensembles, identified traces the idea in literature, and remarked on potential as well unlikely avenues future transfer concepts from ensembles. Complementary to his points, here we focus core ingredients building...
Previous chapter Next Full AccessProceedings Proceedings of the 2014 SIAM International Conference on Data Mining (SDM)Density-Based Clustering ValidationDavoud Moulavi, Pablo A. Jaskowiak, Ricardo J. G. B. Campello, Arthur Zimek, and Jörg SanderDavoud Sanderpp.839 - 847Chapter DOI:https://doi.org/10.1137/1.9781611973440.96PDFBibTexSections ToolsAdd to favoritesExport CitationTrack CitationsEmail SectionsAboutAbstract One most challenging aspects clustering is validation, which objective...
The problem of detecting clusters points belonging to a spatial point process arises in many applications. In this paper, we introduce the new clustering algorithm DBCLASD (Distribution-Based Clustering LArge Spatial Databases) discover type. results experiments demonstrate that DBCLASD, contrary partitioning algorithms such as CLARANS (Clustering Large Applications based on RANdomized Search), discovers arbitrary shape. Furthermore, does not require any input parameters, contrast DBSCAN...
Two major approaches have been proposed to efficiently process queries in databases: speeding up the search by using index structures, and operating on a compressed database, such as signature file. Both their limitations: indexing techniques are inefficient extreme configurations, high-dimensional spaces, where even simple scan may be cheaper than an index-based search. Compression not very efficient all other situations. We propose combine both for nearest neighbors space. For this...
Outlier detection and ensemble learning are well established research directions in data mining yet the application of techniques to outlier has been rarely studied. Here, we propose study subsampling as a technique induce diversity among individual detectors. We show analytically experimentally that an detector based on subsample per se, besides inducing diversity, can, under certain conditions, already improve upon results same complete dataset. Building top several subsamples is further...
Abstract Clustering refers to the task of identifying groups or clusters in a data set. In density‐based clustering , cluster is set objects spread space over contiguous region high density objects. Density‐based are separated from each other by regions low Data located low‐density typically considered noise outliers. this review article we discuss statistical notion clusters, classic algorithms for deriving flat partitioning methods hierarchical clustering, and semi‐supervised clustering....
In this paper we investigate the co-authorship graph obtained from all papers published at SIGMOD between 1975 and 2002. We find some interesting facts, for instance, identity of authors who, on average, are "closest" to other a given time. also show that SIGMOD's is yet another example small world---a topology which has received lot attention recently. A companion web site can be found http://db.cs.ualberta.ca/coauthorship.
Projected and subspace clustering algorithms search for clusters of points in subsets attributes. computes several disjoint clusters, plus outliers, so that each cluster exists its own subset Subspace enumerates all attributes, typically producing many overlapping clusters. One problem existing approaches is their objectives are stated a way not independent the particular algorithm proposed to detect such A second definition density based on user-defined parameters, which makes it hard...