Architectural Neural Backdoors from First Principles
DOI:
10.48550/arxiv.2402.06957
Publication Date:
2024-02-10
AUTHORS (5)
ABSTRACT
While previous research backdoored neural networks by changing their parameters, recent work uncovered a more insidious threat: backdoors embedded within the definition of network's architecture. This involves injecting common architectural components, such as activation functions and pooling layers, to subtly introduce backdoor behavior that persists even after (full re-)training. However, full scope implications have remained largely unexplored. Bober-Irizar et al. [2023] introduced first backdoor; they showed how create for checkerboard pattern, but never explained target an arbitrary trigger pattern choice. In this we construct detector which can be used architecture with no human supervision. leads us revisit concept taxonomise them, describing 12 distinct types. To gauge difficulty detecting backdoors, conducted user study, revealing ML developers only identify suspicious components in model definitions 37% cases, while surprisingly preferred models 33% cases. contextualize these results, find language outperform humans at detection backdoors. Finally, discuss defenses against emphasizing need robust comprehensive strategies safeguard integrity systems.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES ()
CITATIONS ()
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....