Data sanitization against adversarial label contamination based on data complexity

0202 electrical engineering, electronic engineering, information engineering 02 engineering and technology
DOI: 10.1007/s13042-016-0629-5 Publication Date: 2017-01-24T19:05:00Z
ABSTRACT
Machine learning techniques may suffer from adversarial attack in which an attacker misleads a learning process by manipulating training samples. Data sanitization is one of countermeasures against poisoning attack. It is a data pre-processing method which filters suspect samples before learning. Recently, a number of data sanitization methods are devised for label flip attack, but their flexibility is limited due to specific assumptions. It is observed that abrupt label flip caused by attack changes complexity of classification. A data sanitization method based on data complexity, which is a measure of the difficulty of classification on a dataset, is proposed in this paper. Our method measures the data complexity of a training set after removing a sample and its nearest samples. Contaminated samples are then distinguished from untainted samples according to their data complexity values. Experimental results support the idea that data complexity can be used to identify attack samples. The proposed method achieves a better result than the current sanitization method in terms of detection accuracy for well known security application problems.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES (47)
CITATIONS (31)
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....