On the correlation between human fixations, handcrafted and CNN features

Artificial intelligence Feature (linguistics) CNN features; Human fixations; Local image descriptors Convolutional neural network Image Segmentation Pattern recognition (psychology) 03 medical and health sciences 0302 clinical medicine Image Feature Retrieval and Recognition Techniques Artificial Intelligence Image (mathematics) Similarity (geometry) Local Descriptors Knowmad Institut Scale-invariant feature transform Deep learning Linguistics Visual Attention Computer science FOS: Philosophy, ethics and religion Philosophy Advances in Transfer Learning and Domain Adaptation Computer Science Physical Sciences Computational Modeling of Visual Saliency Detection FOS: Languages and literature Computer vision Computer Vision and Pattern Recognition Top-Down Attention Software Feature Matching
DOI: 10.1007/s00521-021-05863-5 Publication Date: 2021-03-19T13:04:49Z
ABSTRACT
AbstractTraditional local image descriptors such as SIFT and SURF are based on processings similar to those that take place in the early visual cortex. Nowadays, convolutional neural networks still draw inspiration from the human vision system, integrating computational elements typical of higher visual cortical areas. Deep CNN’s architectures are intrinsically hard to interpret, so much effort has been made to dissect them in order to understand which type of features they learn. However, considering the resemblance to the human vision system, no enough attention has been devoted to understand if the image features learned by deep CNNs and used for classification correlate with features that humans select when viewing images, the so-called human fixations, nor if they correlate with earlier developed handcrafted features such as SIFT and SURF. Exploring these correlations is highly meaningful since what we require from CNNs, and features in general, is to recognize and correctly classify objects or subjects relevant to humans. In this paper, we establish the correlation between three families of image interest points: human fixations, handcrafted and CNN features. We extract features from the feature maps of selected layers of several deep CNN’s architectures, from the shallowest to the deepest. All features and fixations are then compared with two types of measures, global and local, which unveil the degree of similarity of the areas of interest of the three families. From the experiments carried out on ETD human fixations database, it turns out that human fixations are positively correlated with handcrafted features and even more with deep layers of CNNs and that handcrafted features highly correlate between themselves as some CNNs do.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES (45)
CITATIONS (11)
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....