NFDI4DS | UHH-SEMS - Publication Details

Deep Neural Networks and Tabular Data: A Survey

OPENALEX - Publications

Vadim Borisov Tobias Leemann Kathrin Seßler Johannes Haug Martin Pawelczyk and 1 more

Heterogeneous tabular data are the most commonly used form of and essential for numerous critical computationally demanding applications. On homogeneous sets, deep neural networks have repeatedly shown excellent performance therefore been widely adopted. However, their adaptation to inference or generation tasks remains challenging. To facilitate further progress in field, this work provides an overview state-of-the-art learning methods data. We categorize these into three groups:...

10.1109/tnnls.2022.3229161 article EN cc-by IEEE Transactions on Neural Networks and Learning Systems 2022-12-23

Language Models are Realistic Tabular Data Generators

OPENALEX - Publications

Vadim Borisov Kathrin Seßler Tobias Leemann Martin Pawelczyk Gjergji Kasneci

Tabular data is among the oldest and most ubiquitous forms of data. However, generation synthetic samples with original data's characteristics remains a significant challenge for tabular While many generative models from computer vision domain, such as variational autoencoders or adversarial networks, have been adapted generation, less research has directed towards recent transformer-based large language (LLMs), which are also in nature. To this end, we propose GReaT (Generation Realistic...

10.48550/arxiv.2210.06280 preprint EN cc-by-nc-nd arXiv (Cornell University) 2022-01-01

A Consistent and Efficient Evaluation Strategy for Attribution Methods

OPENALEX - Publications

Yao Rong Tobias Leemann Vadim Borisov Gjergji Kasneci Enkelejda Kasneci

With a variety of local feature attribution methods being proposed in recent years, follow-up work suggested several evaluation strategies. To assess the quality across different techniques, most popular among these strategies image domain use pixel perturbations. However, advances discovered that produce conflicting rankings and can be prohibitively expensive to compute. In this work, we present an information-theoretic analysis based on Our findings reveal results are strongly affected by...

10.48550/arxiv.2202.00449 preprint EN other-oa arXiv (Cornell University) 2022-01-01

Deep Neural Networks and Tabular Data: A Survey

OPENALEX - Publications

Vadim Borisov Tobias Leemann Kathrin Seßler Johannes Haug Martin Pawelczyk and 1 more

Heterogeneous tabular data are the most commonly used form of and essential for numerous critical computationally demanding applications. On homogeneous sets, deep neural networks have repeatedly shown excellent performance therefore been widely adopted. However, their adaptation to inference or generation tasks remains challenging. To facilitate further progress in field, this work provides an overview state-of-the-art learning methods data. We categorize these into three groups:...

10.48550/arxiv.2110.01889 preprint EN cc-by-nc-nd arXiv (Cornell University) 2021-01-01

Multi-Step Training for Predicting Roundabout Traffic Situations

OPENALEX - Publications

Moritz Sackmann Tobias Leemann Henrik Bey Ulrich Hofmann Jörn Thielecke

Predicting the future trajectories of surrounding vehicles is an important challenge in automated driving, especially highly interactive environments such as roundabouts. Many works approach task with behavioral cloning: A single-step prediction model established by learning mapping states to corresponding actions from a fixed dataset. To achieve long term trajectory prediction, repeatedly executed. However, models learned cloning are unable compensate for accumulating errors that inevitably...

10.1109/itsc48978.2021.9564547 article EN 2021-09-19

Towards Human-centered Explainable AI: A Survey of User Studies for Model Explanations

OPENALEX - Publications

Yao Rong Tobias Leemann Thai-trang Nguyen Lisa Fiedler Peizhu Qian and 4 more

Explainable AI (XAI) is widely viewed as a sine qua non for ever-expanding research. A better understanding of the needs XAI users, well human-centered evaluations explainable models are both necessity and challenge. In this paper, we explore how HCI researchers conduct user studies in applications based on systematic literature review. After identifying thoroughly analyzing 97core papers with human-based over past five years, categorize them along measured characteristics explanatory...

10.48550/arxiv.2210.11584 preprint EN other-oa arXiv (Cornell University) 2022-01-01

Towards Non-Adversarial Algorithmic Recourse

OPENALEX - Publications

Tobias Leemann Martin Pawelczyk Bardh Prenkaj Gjergji Kasneci

The streams of research on adversarial examples and counterfactual explanations have largely been growing independently. This has led to several recent works trying elucidate their similarities differences. Most prominently, it argued that examples, as opposed explanations, a unique characteristic in they lead misclassification compared the ground truth. However, computational goals methodologies employed existing explanation example generation methods often lack alignment with this...

10.48550/arxiv.2403.10330 preprint EN arXiv (Cornell University) 2024-03-15

I Prefer Not to Say: Protecting User Consent in Models with Optional Personal Data

OPENALEX - Publications

Tobias Leemann Martin Pawelczyk Christian Thomas Eberle Gjergji Kasneci

We examine machine learning models in a setup where individuals have the choice to share optional personal information with decision-making system, as seen modern insurance pricing models. Some users consent their data being used whereas others object and keep undisclosed. In this work, we show that decision not can be considered itself should protected respect users' privacy. This observation raises overlooked problem of how ensure who protect do suffer any disadvantages result. To address...

10.1609/aaai.v38i19.30126 article EN Proceedings of the AAAI Conference on Artificial Intelligence 2024-03-24

Attention Mechanisms Don't Learn Additive Models: Rethinking Feature Importance for Transformers

OPENALEX - Publications

Tobias Leemann Alina Fastowski Felix Pfeiffer Gjergji Kasneci

We address the critical challenge of applying feature attribution methods to transformer architecture, which dominates current applications in natural language processing and beyond. Traditional explainable AI (XAI) explicitly or implicitly rely on linear additive surrogate models quantify impact input features a model's output. In this work, we formally prove an alarming incompatibility: transformers are structurally incapable align with popular for attribution, undermining grounding these...

10.48550/arxiv.2405.13536 preprint EN arXiv (Cornell University) 2024-05-22

Unifying Evolution, Explanation, and Discernment: A Generative Approach for Dynamic Graph Counterfactuals

OPENALEX - Publications

Bardh Prenkaj Mario Villaizán-Vallelado Tobias Leemann Gjergji Kasneci

We present GRACIE (Graph Recalibration and Adaptive Counterfactual Inspection Explanation), a novel approach for generative classification counterfactual explanations of dynamically changing graph data. study problems through the lens classifiers. propose dynamic, self-supervised latent variable model that updates by identifying plausible counterfactuals input graphs recalibrating decision boundaries contrastive optimization. Unlike prior work, we do not rely on linear separability between...

10.1145/3637528.3671831 article EN other-oa Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining 2024-08-24

The Language of Trauma: Modeling Traumatic Event Descriptions Across Domains with Explainable AI

OPENALEX - Publications

Miriam Schirmer Tobias Leemann Gjergji Kasneci Jürgen Pfeffer David Jurgens

Psychological trauma can manifest following various distressing events and is captured in diverse online contexts. However, studies traditionally focus on a single aspect of trauma, often neglecting the transferability findings across different scenarios. We address this gap by training language models with progressing complexity trauma-related datasets, including genocide-related court data, Reddit dataset post-traumatic stress disorder (PTSD), counseling conversations, Incel forum posts....

10.48550/arxiv.2408.05977 preprint EN arXiv (Cornell University) 2024-08-12

Auto-GDA: Automatic Domain Adaptation for Efficient Grounding Verification in Retrieval Augmented Generation

OPENALEX - Publications

Tobias Leemann Panagiotis Petridis Giuseppe Vietri Dionysis Manousakas Aaron Roth and 1 more

While retrieval augmented generation (RAG) has been shown to enhance factuality of large language model (LLM) outputs, LLMs still suffer from hallucination, generating incorrect or irrelevant information. One common detection strategy involves prompting the LLM again assess whether its response is grounded in retrieved evidence, but this approach costly. Alternatively, lightweight natural inference (NLI) models for efficient grounding verification can be used at time. existing pre-trained...

10.48550/arxiv.2410.03461 preprint EN arXiv (Cornell University) 2024-10-04

The Language of Trauma: Modeling Traumatic Event Descriptions Across Domains with Explainable AI

OPENALEX - Publications

Miriam Schirmer Tobias Leemann Gjergji Kasneci Jürgen Pfeffer David Jurgens

10.18653/v1/2024.findings-emnlp.773 article EN 2024-01-01

Adapting to Change: Robust Counterfactual Explanations in Dynamic Data Landscapes

OPENALEX - Publications

Bardh Prenkaj Mario Villaizán-Vallelado Tobias Leemann Gjergji Kasneci

We introduce a novel semi-supervised Graph Counterfactual Explainer (GCE) methodology, Dynamic GRAph (DyGRACE). It leverages initial knowledge about the data distribution to search for valid counterfactuals while avoiding using information from potentially outdated decision functions in subsequent time steps. Employing two graph autoencoders (GAEs), DyGRACE learns representation of each class binary classification scenario. The GAEs minimise reconstruction error between original and its...

10.48550/arxiv.2308.02353 preprint EN cc-by-nc-nd arXiv (Cornell University) 2023-01-01

On the Trade-Off between Actionable Explanations and the Right to be Forgotten

OPENALEX - Publications

Martin Pawelczyk Tobias Leemann Asia J. Biega Gjergji Kasneci

As machine learning (ML) models are increasingly being deployed in high-stakes applications, policymakers have suggested tighter data protection regulations (e.g., GDPR, CCPA). One key principle is the "right to be forgotten" which gives users right their deleted. Another an actionable explanation, also known as algorithmic recourse, allowing reverse unfavorable decisions. To date, it unknown whether these two principles can operationalized simultaneously. Therefore, we introduce and study...

10.48550/arxiv.2208.14137 preprint EN cc-by arXiv (Cornell University) 2022-01-01

Distribution Preserving Multiple Hypotheses Prediction for Uncertainty Modeling

OPENALEX - Publications

Tobias Leemann Moritz Sackmann Jörn Thielecke Ulrich Hofmann

Many supervised machine learning tasks, such as future state prediction in dynamical systems, require precise modeling of a forecast's uncertainty.The Multiple Hypotheses Prediction (MHP) approach addresses this problem by providing several hypotheses that represent possible outcomes.Unfortunately, with the common l2 loss function, these do not preserve data distribution's characteristics.We propose an alternative for distribution preserving MHP and review relevant theorems supporting our...

10.14428/esann/2021.es2021-16 article EN ESANN 2021 proceedings 2021-01-01

Gaussian Membership Inference Privacy

OPENALEX - Publications

Tobias Leemann Martin Pawelczyk Gjergji Kasneci

We propose a novel and practical privacy notion called $f$-Membership Inference Privacy ($f$-MIP), which explicitly considers the capabilities of realistic adversaries under membership inference attack threat model. Consequently, $f$-MIP offers interpretable guarantees improved utility (e.g., better classification accuracy). In particular, we derive parametric family that refer to as $\mu$-Gaussian Membership ($\mu$-GMIP) by theoretically analyzing likelihood ratio-based attacks on...

10.48550/arxiv.2306.07273 preprint EN other-oa arXiv (Cornell University) 2023-01-01

I Prefer not to Say: Protecting User Consent in Models with Optional Personal Data

OPENALEX - Publications

Tobias Leemann Martin Pawelczyk Christian Thomas Eberle Gjergji Kasneci

We examine machine learning models in a setup where individuals have the choice to share optional personal information with decision-making system, as seen modern insurance pricing models. Some users consent their data being used whereas others object and keep undisclosed. In this work, we show that decision not can be considered itself should protected respect users' privacy. This observation raises overlooked problem of how ensure who protect do suffer any disadvantages result. To address...

10.48550/arxiv.2210.13954 preprint EN other-oa arXiv (Cornell University) 2022-01-01

When are Post-hoc Conceptual Explanations Identifiable?

OPENALEX - Publications

Tobias Leemann Michael Kirchhof Yao Rong Enkelejda Kasneci Gjergji Kasneci

Interest in understanding and factorizing learned embedding spaces through conceptual explanations is steadily growing. When no human concept labels are available, discovery methods search trained for interpretable concepts like object shape or color that can provide post-hoc decisions. Unlike previous work, we argue should be identifiable, meaning a number of known provably recovered to guarantee reliability the explanations. As starting point, explicitly make connection between classical...

10.48550/arxiv.2206.13872 preprint EN other-oa arXiv (Cornell University) 2022-01-01