Tobias Leemann

ORCID: 0000-0001-9333-228X
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Explainable Artificial Intelligence (XAI)
  • Anomaly Detection Techniques and Applications
  • Topic Modeling
  • Domain Adaptation and Few-Shot Learning
  • Advanced Neural Network Applications
  • Privacy-Preserving Technologies in Data
  • Gaussian Processes and Bayesian Inference
  • Ethics and Social Impacts of AI
  • Time Series Analysis and Forecasting
  • Adversarial Robustness in Machine Learning
  • Natural Language Processing Techniques
  • Machine Learning in Materials Science
  • Privacy, Security, and Data Protection
  • Artificial Intelligence in Healthcare and Education
  • Machine Learning in Healthcare
  • Machine Learning and Data Classification
  • Criminal Justice and Corrections Analysis
  • Traffic and Road Safety
  • Computability, Logic, AI Algorithms
  • Data Stream Mining Techniques
  • Speech and dialogue systems
  • Machine Learning and Algorithms
  • Advanced Graph Neural Networks
  • Scientific Computing and Data Management
  • Autonomous Vehicle Technology and Safety

University of Tübingen
2024

TH Bingen University of Applied Sciences
2022-2023

Stanford University
1990

Heterogeneous tabular data are the most commonly used form of and essential for numerous critical computationally demanding applications. On homogeneous sets, deep neural networks have repeatedly shown excellent performance therefore been widely adopted. However, their adaptation to inference or generation tasks remains challenging. To facilitate further progress in field, this work provides an overview state-of-the-art learning methods data. We categorize these into three groups:...

10.1109/tnnls.2022.3229161 article EN cc-by IEEE Transactions on Neural Networks and Learning Systems 2022-12-23

Tabular data is among the oldest and most ubiquitous forms of data. However, generation synthetic samples with original data's characteristics remains a significant challenge for tabular While many generative models from computer vision domain, such as variational autoencoders or adversarial networks, have been adapted generation, less research has directed towards recent transformer-based large language (LLMs), which are also in nature. To this end, we propose GReaT (Generation Realistic...

10.48550/arxiv.2210.06280 preprint EN cc-by-nc-nd arXiv (Cornell University) 2022-01-01

With a variety of local feature attribution methods being proposed in recent years, follow-up work suggested several evaluation strategies. To assess the quality across different techniques, most popular among these strategies image domain use pixel perturbations. However, advances discovered that produce conflicting rankings and can be prohibitively expensive to compute. In this work, we present an information-theoretic analysis based on Our findings reveal results are strongly affected by...

10.48550/arxiv.2202.00449 preprint EN other-oa arXiv (Cornell University) 2022-01-01

Heterogeneous tabular data are the most commonly used form of and essential for numerous critical computationally demanding applications. On homogeneous sets, deep neural networks have repeatedly shown excellent performance therefore been widely adopted. However, their adaptation to inference or generation tasks remains challenging. To facilitate further progress in field, this work provides an overview state-of-the-art learning methods data. We categorize these into three groups:...

10.48550/arxiv.2110.01889 preprint EN cc-by-nc-nd arXiv (Cornell University) 2021-01-01

Predicting the future trajectories of surrounding vehicles is an important challenge in automated driving, especially highly interactive environments such as roundabouts. Many works approach task with behavioral cloning: A single-step prediction model established by learning mapping states to corresponding actions from a fixed dataset. To achieve long term trajectory prediction, repeatedly executed. However, models learned cloning are unable compensate for accumulating errors that inevitably...

10.1109/itsc48978.2021.9564547 article EN 2021-09-19

Explainable AI (XAI) is widely viewed as a sine qua non for ever-expanding research. A better understanding of the needs XAI users, well human-centered evaluations explainable models are both necessity and challenge. In this paper, we explore how HCI researchers conduct user studies in applications based on systematic literature review. After identifying thoroughly analyzing 97core papers with human-based over past five years, categorize them along measured characteristics explanatory...

10.48550/arxiv.2210.11584 preprint EN other-oa arXiv (Cornell University) 2022-01-01

The streams of research on adversarial examples and counterfactual explanations have largely been growing independently. This has led to several recent works trying elucidate their similarities differences. Most prominently, it argued that examples, as opposed explanations, a unique characteristic in they lead misclassification compared the ground truth. However, computational goals methodologies employed existing explanation example generation methods often lack alignment with this...

10.48550/arxiv.2403.10330 preprint EN arXiv (Cornell University) 2024-03-15

We examine machine learning models in a setup where individuals have the choice to share optional personal information with decision-making system, as seen modern insurance pricing models. Some users consent their data being used whereas others object and keep undisclosed. In this work, we show that decision not can be considered itself should protected respect users' privacy. This observation raises overlooked problem of how ensure who protect do suffer any disadvantages result. To address...

10.1609/aaai.v38i19.30126 article EN Proceedings of the AAAI Conference on Artificial Intelligence 2024-03-24

We address the critical challenge of applying feature attribution methods to transformer architecture, which dominates current applications in natural language processing and beyond. Traditional explainable AI (XAI) explicitly or implicitly rely on linear additive surrogate models quantify impact input features a model's output. In this work, we formally prove an alarming incompatibility: transformers are structurally incapable align with popular for attribution, undermining grounding these...

10.48550/arxiv.2405.13536 preprint EN arXiv (Cornell University) 2024-05-22

We present GRACIE (Graph Recalibration and Adaptive Counterfactual Inspection Explanation), a novel approach for generative classification counterfactual explanations of dynamically changing graph data. study problems through the lens classifiers. propose dynamic, self-supervised latent variable model that updates by identifying plausible counterfactuals input graphs recalibrating decision boundaries contrastive optimization. Unlike prior work, we do not rely on linear separability between...

10.1145/3637528.3671831 article EN other-oa Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining 2024-08-24

Psychological trauma can manifest following various distressing events and is captured in diverse online contexts. However, studies traditionally focus on a single aspect of trauma, often neglecting the transferability findings across different scenarios. We address this gap by training language models with progressing complexity trauma-related datasets, including genocide-related court data, Reddit dataset post-traumatic stress disorder (PTSD), counseling conversations, Incel forum posts....

10.48550/arxiv.2408.05977 preprint EN arXiv (Cornell University) 2024-08-12

While retrieval augmented generation (RAG) has been shown to enhance factuality of large language model (LLM) outputs, LLMs still suffer from hallucination, generating incorrect or irrelevant information. One common detection strategy involves prompting the LLM again assess whether its response is grounded in retrieved evidence, but this approach costly. Alternatively, lightweight natural inference (NLI) models for efficient grounding verification can be used at time. existing pre-trained...

10.48550/arxiv.2410.03461 preprint EN arXiv (Cornell University) 2024-10-04

We introduce a novel semi-supervised Graph Counterfactual Explainer (GCE) methodology, Dynamic GRAph (DyGRACE). It leverages initial knowledge about the data distribution to search for valid counterfactuals while avoiding using information from potentially outdated decision functions in subsequent time steps. Employing two graph autoencoders (GAEs), DyGRACE learns representation of each class binary classification scenario. The GAEs minimise reconstruction error between original and its...

10.48550/arxiv.2308.02353 preprint EN cc-by-nc-nd arXiv (Cornell University) 2023-01-01

As machine learning (ML) models are increasingly being deployed in high-stakes applications, policymakers have suggested tighter data protection regulations (e.g., GDPR, CCPA). One key principle is the "right to be forgotten" which gives users right their deleted. Another an actionable explanation, also known as algorithmic recourse, allowing reverse unfavorable decisions. To date, it unknown whether these two principles can operationalized simultaneously. Therefore, we introduce and study...

10.48550/arxiv.2208.14137 preprint EN cc-by arXiv (Cornell University) 2022-01-01

Many supervised machine learning tasks, such as future state prediction in dynamical systems, require precise modeling of a forecast's uncertainty.The Multiple Hypotheses Prediction (MHP) approach addresses this problem by providing several hypotheses that represent possible outcomes.Unfortunately, with the common l2 loss function, these do not preserve data distribution's characteristics.We propose an alternative for distribution preserving MHP and review relevant theorems supporting our...

10.14428/esann/2021.es2021-16 article EN ESANN 2021 proceedings 2021-01-01

We propose a novel and practical privacy notion called $f$-Membership Inference Privacy ($f$-MIP), which explicitly considers the capabilities of realistic adversaries under membership inference attack threat model. Consequently, $f$-MIP offers interpretable guarantees improved utility (e.g., better classification accuracy). In particular, we derive parametric family that refer to as $\mu$-Gaussian Membership ($\mu$-GMIP) by theoretically analyzing likelihood ratio-based attacks on...

10.48550/arxiv.2306.07273 preprint EN other-oa arXiv (Cornell University) 2023-01-01

We examine machine learning models in a setup where individuals have the choice to share optional personal information with decision-making system, as seen modern insurance pricing models. Some users consent their data being used whereas others object and keep undisclosed. In this work, we show that decision not can be considered itself should protected respect users' privacy. This observation raises overlooked problem of how ensure who protect do suffer any disadvantages result. To address...

10.48550/arxiv.2210.13954 preprint EN other-oa arXiv (Cornell University) 2022-01-01

Interest in understanding and factorizing learned embedding spaces through conceptual explanations is steadily growing. When no human concept labels are available, discovery methods search trained for interpretable concepts like object shape or color that can provide post-hoc decisions. Unlike previous work, we argue should be identifiable, meaning a number of known provably recovered to guarantee reliability the explanations. As starting point, explicitly make connection between classical...

10.48550/arxiv.2206.13872 preprint EN other-oa arXiv (Cornell University) 2022-01-01
Coming Soon ...