- Data Mining Algorithms and Applications
- Rough Sets and Fuzzy Logic
- Data Management and Algorithms
- Advanced Database Systems and Queries
- Imbalanced Data Classification Techniques
- Semantic Web and Ontologies
- Data Quality and Management
- Ethics and Social Impacts of AI
- Machine Learning and Data Classification
- Natural Language Processing Techniques
- Explainable Artificial Intelligence (XAI)
- Topic Modeling
- Complex Network Analysis Techniques
- Human Mobility and Location-Based Analysis
- Data Stream Mining Techniques
- Business Process Modeling and Analysis
- Time Series Analysis and Forecasting
- Online Learning and Analytics
- Hate Speech and Cyberbullying Detection
- Data Visualization and Analytics
- Adversarial Robustness in Machine Learning
- Graph Theory and Algorithms
- Service-Oriented Architecture and Web Services
- Anomaly Detection Techniques and Applications
- Software System Performance and Reliability
University of Antwerp
2008-2024
Université Libre de Bruxelles
2013-2020
ZNA Middelheim Hospital
2016-2019
University of Bremen
2019
Département d'Informatique
2013-2015
Eindhoven University of Technology
2006-2013
Siemens (United States)
2012-2013
Tamedia (Switzerland)
2005-2011
Fund for Scientific Research
2002
Recently, the following Discrimination-Aware Classification Problem was introduced: Suppose we are given training data that exhibit unlawful discrimination; e.g., toward sensitive attributes such as gender or ethnicity. The task is to learn a classifier optimizes accuracy, but does not have this discrimination in its predictions on test data. This problem relevant many settings, when generated by biased decision process attribute serves proxy for unobserved features. In paper, concentrate...
In this paper, we investigate how to modify the naive Bayes classifier in order perform classification that is restricted be independent with respect a given sensitive attribute. Such independency restrictions occur naturally when decision process leading labels data-set was biased; e.g., due gender or racial discrimination. This setting motivated by many cases which there exist laws disallow partly based on Naive application of machine learning techniques would result huge fines for...
In this paper we study the problem of classifier learning where input data contains unjustified dependencies between some attributes and class label. Such cases arise for example when training is collected from different sources with labeling criteria or generated by a biased decision process. When trained directly on such data, these undesirable will carry over to classifier's predictions. order tackle problem, classification independency constraints problem: find an accurate model which...
Classification models usually make predictions on the basis of training data. If data is biased towards certain groups or classes objects, e.g., there racial discrimination black people, learned model will also show discriminatory behavior that particular community. This partial attitude may lead to outcomes when labeling future unlabeled objects. Often, however, impartial classification results are desired even required by law for objects in spite having In this paper, we tackle problem...
Recently, the following discrimination aware classification problem was introduced: given a labeled dataset and an attribute B, find classifier with high predictive accuracy that at same time does not discriminate on basis of B. This is motivated by fact often available historic data biased due to discrimination, e.g., when B denotes ethnicity. Using standard learners this may lead wrongfully classifiers, even if removed from training data. Existing solutions for consist in "cleaning away"...
Historical data used for supervised learning may contain discrimination. We study how to train classifiers on such data, so that they are discrimination free with respect a given sensitive attribute, e.g., gender. Existing techniques deal this problem aim at removing all and do not take into account part of the be explainable by other attributes, as, education level. In context, we introduce analyze issue conditional non-discrimination in classifier design. show some differences decisions...
In data mining we often have to learn from biased data, because, for instance, comes different batches or there was a gender racial bias in the collection of social data. some applications it may be necessary explicitly control this models This paper is first study learning linear regression under constraints that biasing effect given attribute such as batch number. We show how propensity modeling can used factoring out part justified by externally provided explanatory attributes. Then...
Abstract Pattern mining based on data compression has been successfully applied in many tasks. For itemset data, the Krimp algorithm minimum description length (MDL) principle was shown to be very effective solving redundancy issue descriptive pattern mining. However, for sequence of set frequent sequential patterns is not fully addressed literature. In this article, we study MDL‐based algorithms non‐redundant sets from a database. First, propose an encoding scheme compressing with patterns....
Pieter Delobelle, Ewoenam Tokpo, Toon Calders, Bettina Berendt. Proceedings of the 2022 Conference North American Chapter Association for Computational Linguistics: Human Language Technologies. 2022.
All frequent itemset mining algorithms rely heavily on the monotonicity principle for pruning. This allows excluding candidate itemsets from expensive counting phase. In this paper, we present sound and complete deduction rules to derive bounds support of an itemset. Based these rules, construct a condensed representation all itemsets, by removing those which can be derived, resulting in so called Non-Derivable Itemsets (NDI) representation. We also connections between our proposal recent...
Educational Data Mining (EDM) is an emerging multidisciplinary research area, in which methods and techniques for exploring data originating from various educational information systems have been developed. EDM both a learning science, as well rich application area mining, due to the growing availability of data. contributes study how students learn, settings they learn. It enables data-driven decision making improving current practice material. We present brief overview introduce four...
Mining frequent item sets from transactional datasets is a well known problem with good algorithmic solutions. Most of these algorithms assume that the input data free errors. Real data, however, often affected by noise. Such noise can be represented uncertain in which each has an existence probability. Recently, Bernecker et al. (2009) proposed frequentness probability, i.e., probability given set frequent, to select database. A dynamic programming approach evaluate this measure was as...
Well-designed object-oriented programs typically consist of a few key classes that work tightly together to provide the bulk functionality. As such, these are excellent starting points for program comprehension process. We propose technique uses Webmining principles on execution traces discover important and interacting classes. Based two medium-scale case studies - Apache Ant Jakarta JMeter detailed architectural information from its developers, we show our heuristic does in fact find...
Mining frequent itemsets in a datastream proves to be difficult problem, as arrive rapid succession and storing parts of the stream is typically impossible.Nonetheless, it has many useful applications; e.g., opinion sentiment analysis from social networks.Current mining algorithms are based on approximations.In earlier work, items under max-frequency measure proved effective for items.In this paper, we extended our work itemsets.Firstly, an optimized incremental algorithm presented.The...