NFDI4DS | UHH-SEMS - Publication Details

Attention Mechanisms Don't Learn Additive Models: Rethinking Feature Importance for Transformers

FOS: Computer and information sciences Computer Science - Machine Learning Computer Science - Computation and Language Artificial Intelligence (cs.AI) Computer Science - Artificial Intelligence Computation and Language (cs.CL) Machine Learning (cs.LG)

DOI: 10.48550/arxiv.2405.13536 Publication Date: 2024-01-01

Abstract Supplemental Material References Cited by

AUTHORS (4)

Leemann, Tobias

Fastowski, Alina

Pfeiffer, Felix

Kasneci, Gjergji

ABSTRACT

Comment: TMLR Camera-Ready version<br/>We address the critical challenge of applying feature attribution methods to the transformer architecture, which dominates current applications in natural language processing and beyond. Traditional attribution methods to explainable AI (XAI) explicitly or implicitly rely on linear or additive surrogate models to quantify the impact of input features on a model's output. In this work, we formally prove an alarming incompatibility: transformers are structurally incapable of representing linear or additive surrogate models used for feature attribution, undermining the grounding of these conventional explanation methodologies. To address this discrepancy, we introduce the Softmax-Linked Additive Log Odds Model (SLALOM), a novel surrogate model specifically designed to align with the transformer framework. SLALOM demonstrates the capacity to deliver a range of insightful explanations with both synthetic and real-world datasets. We highlight SLALOM's unique efficiency-quality curve by showing that SLALOM can produce explanations with substantially higher fidelity than competing surrogate models or provide explanations of comparable quality at a fraction of their computational costs. We release code for SLALOM as an open-source project online at https://github.com/tleemann/slalom_explanations.<br/>

SUPPLEMENTAL MATERIAL

Coming soon ....

REFERENCES ()

CITATIONS ()

EXTERNAL LINKS

OPENAIRE - Products

PlumX Metrics

Attention Mechanisms Don't Learn Additive Models: Rethinking Feature Importance for Transformers

RECOMMENDATIONS

FAIR ASSESSMENT

Coming soon ....

JUPYTER LAB

Coming soon ....