- Visual Attention and Saliency Detection
- Visual perception and processing mechanisms
- Face Recognition and Perception
- Aesthetic Perception and Analysis
- Advanced Vision and Imaging
- Adversarial Robustness in Machine Learning
- Neural Networks and Applications
- Neural dynamics and brain function
- Bacillus and Francisella bacterial research
- Cell Image Analysis Techniques
- Gaze Tracking and Assistive Technology
- Advanced Optical Sensing Technologies
- Computer Graphics and Visualization Techniques
- Generative Adversarial Networks and Image Synthesis
- Image Enhancement Techniques
- Tactile and Sensory Interactions
- Image Retrieval and Classification Techniques
- Integrated Circuits and Semiconductor Failure Analysis
- Anomaly Detection Techniques and Applications
- Domain Adaptation and Few-Shot Learning
- Medical Image Segmentation Techniques
- Advanced Memory and Neural Computing
- Visual and Cognitive Learning Processes
- Multisensory perception and integration
- Glaucoma and retinal disorders
Harvard University
2021-2022
Massachusetts Institute of Technology
2020-2022
Harvard University Press
2019-2022
William James College
2021-2022
University of California, Santa Barbara
2015-2019
Virality of online content on social networking websites is an important but esoteric phenomenon often studied in fields like marketing, psychology and data mining. In this paper we study viral images from a computer vision perspective. We introduce three new image datasets Reddit <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">1</sup> define virality score using metadata. train classifiers with state-of-the-art features to predict individual...
When viewing objects depicted in a frame, observers prefer to view large like cars larger sizes and smaller cups sizes. That is, the visual size of an object that "looks best" is linked its typical physical world. Why this case? One intuitive possibility these preferences are driven by semantic knowledge: For example, when we recognize sofa, access our knowledge about real-world size, influences what sofa within frame. However, might processing play role phenomenon-that do features related...
The goal of this work is to characterize the representational impact that foveation operations have for machine vision systems, inspired by foveated human visual system, which has higher acuity at center gaze and texture-like encoding in periphery. To do so, we introduce models consisting a first-stage \textit{fixed} image transform followed second-stage \textit{learnable} convolutional neural network, varied first stage component. primary model foveated-textural input stage, compare with...
After years of experience, humans become experts at perceiving letters. Is this visual capacity attained by learning specialized letter features, or reusing general features previously learned in service object categorization? To explore question, we first measured the perceptual similarity letters two behavioral tasks, search and categorization. Then, trained deep convolutional neural networks on either 26-way categorization 1000-way categorization, as a way to operationalize possible...
The problem of $\textit{visual metamerism}$ is defined as finding a family perceptually indistinguishable, yet physically different images. In this paper, we propose our NeuroFovea metamer model, foveated generative model that based on mixture peripheral representations and style transfer forward-pass algorithms. Our gradient-descent free parametrized by VGG19 encoder-decoder which allows us to encode images in high dimensional space interpolate between the content texture information with...
This paper outlines the development and testing of a novel, feedback-enabled attention allocation aid (AAAD), which uses real-time physiological data to improve human performance in realistic sequential visual search task. Indeed, by optimizing over duration, improves efficiency, while preserving decision accuracy, as operator identifies classifies targets within simulated aerial imagery. Specifically, using experimental eye-tracking measurements about target detectability across field, we...
Previous studies have proposed image-based clutter measures that correlate with human search times and/or eye movements. However, most models do not take into account the fact effects of interact foveated nature visual system: further from fovea has an increasing detrimental influence on perception. Here, we introduce a new model to predict in target utilizing forced fixation task. We use Feature Congestion (Rosenholtz et al.) as our non model, and stack peripheral architecture top for...
Recent work suggests that representations learned by adversarially robust networks are more human perceptually-aligned than non-robust via image manipulations. Despite appearing closer to visual perception, it is unclear if the constraints in DNN match biological found vision. Human vision seems rely on texture-based/summary statistic periphery, which have been shown explain phenomena such as crowding and performance search tasks. To understand how optimizations/representations compare...
With the advent of modern expert systems driven by deep learning that supplement human experts (e.g. radiologists, dermatologists, surveillance scanners), we analyze how and when do such enhance performance in a fine-grained small target visual search task. We set up 2 session factorial experimental design which humans visually for with without Deep Learning (DL) system. evaluate changes detection eye-movements presence DL find improvements system (computed via Faster R-CNN VGG16) interacts...
The main success stories of deep learning, starting with ImageNet, depend on convolutional networks, which certain tasks perform significantly better than traditional shallow classifiers, such as support vector machines, and also fully connected networks; but what is so special about networks? Recent results in approximation theory proved an exponential advantage networks or without shared weights approximating functions hierarchical locality their compositional structure. More recently, the...
Modern high-scoring models of vision in the brain score competition do not stem from Vision Transformers. However, this paper, we provide evidence against unexpected trend Transformers (ViT) being perceptually aligned with human visual representations by showing how a dual-stream Transformer, CrossViT$~\textit{a la}$ Chen et al. (2021), under joint rotationally-invariant and adversarial optimization procedure yields 2nd place aggregate Brain-Score 2022 competition(Schrimpf al., 2020b)...
The spatially-varying field of the human visual system has recently received a resurgence interest with development virtual reality (VR) and neural networks. computational demands high resolution rendering desired for VR can be offset by savings in periphery, while networks trained foveated input have shown perceptual gains i.i.d o.o.d generalization. In this paper, we present technique that exploits CUDA GPU architecture to efficiently generate Gaussian-based images at definition (1920x1080...
ABSTRACT After years of experience, humans become experts at perceiving letters. Is this visual capacity attained by learning specialized letter features, or reusing general features previously learned in service object categorization? To explore question, we first measured the perceptual similarity letters two behavioral tasks, search and categorization. Then, trained deep convolutional neural networks on either 26-way categorization 1000-way categorization, as a way to operationalize...
The main success stories of deep learning in visual perception tasks starting with ImageNet, has relied on convolutional neural networks, which certain perform significantly better than traditional shallow classifiers, such as support vector machines. Is there something special about networks that other machines do not possess? Recent results approximation theory have shown is an exponential advantage (DCN) over fully connected (FCN) approximating functions hierarchical locality their...
When viewing objects depicted in a frame, most of us prefer to view large like sofas larger sizes and smaller paperclips sizes. In general, the visual size an object that "looks best" is linked its typical physical world (Konkle & Oliva, 2011). Why this case? One intuitive possibility these preferences are driven by semantic knowledge: For example, we recognize sofa, access our knowledge about real-world size, influences what sofa frame. However, might processing play role phenomenon—that...
Abstract When viewing objects depicted in a frame, observers prefer to view large like cars larger sizes and smaller cups sizes. That is, the visual size of an object that “looks best” is linked its typical physical world. Why this case? One intuitive possibility these preferences are driven by semantic knowledge: For example, when we recognize sofa, access our knowledge about real-world size, influences what sofa within frame. However, might processing play role phenomenon—that do features...
Self-supervised learning is a powerful way to learn useful representations from natural data. It has also been suggested as one possible means of building visual representation in humans, but the specific objective and algorithm are unknown. Currently, most self-supervised methods encourage system an invariant different transformations same image contrast those other images. However, such generally non-biologically plausible, often consist contrived perceptual schemes random cropping color...
Scene context guides eye movements and facilitates search performance (Torralba et al., 2006; Chen & Zelinsky, Eckstein 2006). Here, we assess how scene modulates the effect of number distractors on with real scenes. Methods: Observers (64) were presented 24 grayscale images plus 96 fillers (22.53 deg. x 15.03 deg.) sampled from a dataset 1224 desk multiple viewpoints varying distractor objects. Half contained target (computer mouse). When present, appeared in 60 % (next to monitor/keyboard)...
Previous studies have proposed image based measures of clutter and correlated them to subjective judgments perceptual (Yu et al., 2014) or threshold contrasts during a search task (Rosenholtz 2007). Here we evaluate multiple metrics (Feature Congestion, FC; Subband Entropy, SE, Rosenholtz 2005; Freeman & Simoncelli, 1995; ProtoObject Segmentation, PS, Yu correlate with the time required for observers fixate searched target. In addition, influence on detectability as function retinal...