NFDI4DS | UHH-SEMS - Publication Details

Architectural Neural Backdoors from First Principles

DOI: 10.48550/arxiv.2402.06957 Publication Date: 2024-02-10

Abstract Supplemental Material References Cited by

AUTHORS (5)

Harry Langford

Ilia Shumailov

Yiren Zhao

Robert Mullins

Nicolas Papernot

ABSTRACT

While previous research backdoored neural networks by changing their parameters, recent work uncovered a more insidious threat: backdoors embedded within the definition of network's architecture. This involves injecting common architectural components, such as activation functions and pooling layers, to subtly introduce backdoor behavior that persists even after (full re-)training. However, full scope implications have remained largely unexplored. Bober-Irizar et al. [2023] introduced first backdoor; they showed how create for checkerboard pattern, but never explained target an arbitrary trigger pattern choice. In this we construct detector which can be used architecture with no human supervision. leads us revisit concept taxonomise them, describing 12 distinct types. To gauge difficulty detecting backdoors, conducted user study, revealing ML developers only identify suspicious components in model definitions 37% cases, while surprisingly preferred models 33% cases. contextualize these results, find language outperform humans at detection backdoors. Finally, discuss defenses against emphasizing need robust comprehensive strategies safeguard integrity systems.

SUPPLEMENTAL MATERIAL

Coming soon ....

REFERENCES ()

CITATIONS ()

EXTERNAL LINKS

OPENALEX - Publications

PlumX Metrics

Architectural Neural Backdoors from First Principles

RECOMMENDATIONS

FAIR ASSESSMENT

Coming soon ....

JUPYTER LAB

Coming soon ....