Toon Calders

ORCID: 0000-0002-4943-6978
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Data Mining Algorithms and Applications
  • Rough Sets and Fuzzy Logic
  • Data Management and Algorithms
  • Advanced Database Systems and Queries
  • Imbalanced Data Classification Techniques
  • Semantic Web and Ontologies
  • Data Quality and Management
  • Ethics and Social Impacts of AI
  • Machine Learning and Data Classification
  • Natural Language Processing Techniques
  • Explainable Artificial Intelligence (XAI)
  • Topic Modeling
  • Complex Network Analysis Techniques
  • Human Mobility and Location-Based Analysis
  • Data Stream Mining Techniques
  • Business Process Modeling and Analysis
  • Time Series Analysis and Forecasting
  • Online Learning and Analytics
  • Hate Speech and Cyberbullying Detection
  • Data Visualization and Analytics
  • Adversarial Robustness in Machine Learning
  • Graph Theory and Algorithms
  • Service-Oriented Architecture and Web Services
  • Anomaly Detection Techniques and Applications
  • Software System Performance and Reliability

University of Antwerp
2008-2024

Université Libre de Bruxelles
2013-2020

ZNA Middelheim Hospital
2016-2019

University of Bremen
2019

Département d'Informatique
2013-2015

Eindhoven University of Technology
2006-2013

Siemens (United States)
2012-2013

Tamedia (Switzerland)
2005-2011

Fund for Scientific Research
2002

Recently, the following Discrimination-Aware Classification Problem was introduced: Suppose we are given training data that exhibit unlawful discrimination; e.g., toward sensitive attributes such as gender or ethnicity. The task is to learn a classifier optimizes accuracy, but does not have this discrimination in its predictions on test data. This problem relevant many settings, when generated by biased decision process attribute serves proxy for unobserved features. In paper, concentrate...

10.1007/s10115-011-0463-8 article EN cc-by-nc Knowledge and Information Systems 2011-12-03

In this paper, we investigate how to modify the naive Bayes classifier in order perform classification that is restricted be independent with respect a given sensitive attribute. Such independency restrictions occur naturally when decision process leading labels data-set was biased; e.g., due gender or racial discrimination. This setting motivated by many cases which there exist laws disallow partly based on Naive application of machine learning techniques would result huge fines for...

10.1007/s10618-010-0190-x article EN cc-by-nc Data Mining and Knowledge Discovery 2010-07-26

In this paper we study the problem of classifier learning where input data contains unjustified dependencies between some attributes and class label. Such cases arise for example when training is collected from different sources with labeling criteria or generated by a biased decision process. When trained directly on such data, these undesirable will carry over to classifier's predictions. order tackle problem, classification independency constraints problem: find an accurate model which...

10.1109/icdmw.2009.83 article EN IEEE ... International Conference on Data Mining workshops 2009-12-01

Classification models usually make predictions on the basis of training data. If data is biased towards certain groups or classes objects, e.g., there racial discrimination black people, learned model will also show discriminatory behavior that particular community. This partial attitude may lead to outcomes when labeling future unlabeled objects. Often, however, impartial classification results are desired even required by law for objects in spite having In this paper, we tackle problem...

10.1109/ic4.2009.4909197 article EN 2009-02-01

Recently, the following discrimination aware classification problem was introduced: given a labeled dataset and an attribute B, find classifier with high predictive accuracy that at same time does not discriminate on basis of B. This is motivated by fact often available historic data biased due to discrimination, e.g., when B denotes ethnicity. Using standard learners this may lead wrongfully classifiers, even if removed from training data. Existing solutions for consist in "cleaning away"...

10.1109/icdm.2010.50 article EN 2010-12-01

Historical data used for supervised learning may contain discrimination. We study how to train classifiers on such data, so that they are discrimination free with respect a given sensitive attribute, e.g., gender. Existing techniques deal this problem aim at removing all and do not take into account part of the be explainable by other attributes, as, education level. In context, we introduce analyze issue conditional non-discrimination in classifier design. show some differences decisions...

10.1109/icdm.2011.72 article EN 2011-12-01

In data mining we often have to learn from biased data, because, for instance, comes different batches or there was a gender racial bias in the collection of social data. some applications it may be necessary explicitly control this models This paper is first study learning linear regression under constraints that biasing effect given attribute such as batch number. We show how propensity modeling can used factoring out part justified by externally provided explanatory attributes. Then...

10.1109/icdm.2013.114 article EN 2013-12-01

Abstract Pattern mining based on data compression has been successfully applied in many tasks. For itemset data, the Krimp algorithm minimum description length (MDL) principle was shown to be very effective solving redundancy issue descriptive pattern mining. However, for sequence of set frequent sequential patterns is not fully addressed literature. In this article, we study MDL‐based algorithms non‐redundant sets from a database. First, propose an encoding scheme compressing with patterns....

10.1002/sam.11192 article EN Statistical Analysis and Data Mining The ASA Data Science Journal 2013-05-23

Pieter Delobelle, Ewoenam Tokpo, Toon Calders, Bettina Berendt. Proceedings of the 2022 Conference North American Chapter Association for Computational Linguistics: Human Language Technologies. 2022.

10.18653/v1/2022.naacl-main.122 article EN cc-by Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies 2022-01-01

All frequent itemset mining algorithms rely heavily on the monotonicity principle for pruning. This allows excluding candidate itemsets from expensive counting phase. In this paper, we present sound and complete deduction rules to derive bounds support of an itemset. Based these rules, construct a condensed representation all itemsets, by removing those which can be derived, resulting in so called Non-Derivable Itemsets (NDI) representation. We also connections between our proposal recent...

10.1007/s10618-006-0054-6 article EN cc-by-nc Data Mining and Knowledge Discovery 2007-01-25

Educational Data Mining (EDM) is an emerging multidisciplinary research area, in which methods and techniques for exploring data originating from various educational information systems have been developed. EDM both a learning science, as well rich application area mining, due to the growing availability of data. contributes study how students learn, settings they learn. It enables data-driven decision making improving current practice material. We present brief overview introduce four...

10.1145/2207243.2207245 article EN ACM SIGKDD Explorations Newsletter 2012-05-01

Mining frequent item sets from transactional datasets is a well known problem with good algorithmic solutions. Most of these algorithms assume that the input data free errors. Real data, however, often affected by noise. Such noise can be represented uncertain in which each has an existence probability. Recently, Bernecker et al. (2009) proposed frequentness probability, i.e., probability given set frequent, to select database. A dynamic programming approach evaluate this measure was as...

10.1109/icdm.2010.42 article EN 2010-12-01

Well-designed object-oriented programs typically consist of a few key classes that work tightly together to provide the bulk functionality. As such, these are excellent starting points for program comprehension process. We propose technique uses Webmining principles on execution traces discover important and interacting classes. Based two medium-scale case studies - Apache Ant Jakarta JMeter detailed architectural information from its developers, we show our heuristic does in fact find...

10.1109/csmr.2005.12 article EN 2005-03-31

Mining frequent itemsets in a datastream proves to be difficult problem, as arrive rapid succession and storing parts of the stream is typically impossible.Nonetheless, it has many useful applications; e.g., opinion sentiment analysis from social networks.Current mining algorithms are based on approximations.In earlier work, items under max-frequency measure proved effective for items.In this paper, we extended our work itemsets.Firstly, an optimized incremental algorithm presented.The...

10.1109/icdm.2007.66 article EN 2007-10-01
Coming Soon ...