Abhinav Shrivastava

ORCID: 0000-0001-8928-8554
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Domain Adaptation and Few-Shot Learning
  • Advanced Neural Network Applications
  • Advanced Image and Video Retrieval Techniques
  • Multimodal Machine Learning Applications
  • Generative Adversarial Networks and Image Synthesis
  • Human Pose and Action Recognition
  • Advanced Vision and Imaging
  • Advanced Image Processing Techniques
  • Anomaly Detection Techniques and Applications
  • Video Analysis and Summarization
  • Image Retrieval and Classification Techniques
  • 3D Shape Modeling and Analysis
  • Visual Attention and Saliency Detection
  • Video Surveillance and Tracking Methods
  • Digital Media Forensic Detection
  • Video Coding and Compression Technologies
  • COVID-19 diagnosis using AI
  • Computer Graphics and Visualization Techniques
  • Image and Signal Denoising Methods
  • Adversarial Robustness in Machine Learning
  • Handwritten Text Recognition Techniques
  • Machine Learning and Data Classification
  • Medical Image Segmentation Techniques
  • Advanced Data Compression Techniques
  • Robotic Path Planning Algorithms

University of Maryland, College Park
2018-2025

Embedded Systems (United States)
2025

Academia Sinica
2020-2021

Google (United States)
2017-2020

Meta (Israel)
2020

Carnegie Mellon University
2011-2017

Pandit Sundarlal Sharma Open University
2016

The field of object detection has made significant advances riding on the wave region-based ConvNets, but their training procedure still includes many heuristics and hyperparameters that are costly to tune. We present a simple yet surprisingly effective online hard example mining (OHEM) algorithm for ConvNet detectors. Our motivation is same as it always been - datasets contain an overwhelming number easy examples small examples. Automatic selection these can make more efficient. OHEM...

10.1109/cvpr.2016.89 article EN 2016-06-01

The success of deep learning in vision can be attributed to: (a) models with high capacity; (b) increased computational power; and (c) availability large-scale labeled data. Since 2012, there have been significant advances representation capabilities the GPUs. But size biggest dataset has surprisingly remained constant. What will happen if we increase by 10 × or 100 ? This paper takes a step towards clearing clouds mystery surrounding relationship between 'enormous data' visual learning. By...

10.1109/iccv.2017.97 article EN 2017-10-01

Multi-task learning in Convolutional Networks has displayed remarkable success the field of recognition. This can be largely attributed to shared representations from multiple supervisory tasks. However, existing multi-task approaches rely on enumerating network architectures specific tasks at hand, that do not generalize. In this paper, we propose a principled approach learn ConvNets using multitask learning. Specifically, new sharing unit: "cross-stitch" unit. These units combine...

10.1109/cvpr.2016.433 article EN 2016-06-01

How do we learn an object detector that is invariant to occlusions and deformations? Our current solution use a data-driven strategy - collect large-scale datasets which have instances under different conditions. The hope the final classifier can these examples invariances. But it really possible see all in dataset? We argue like categories, deformations also follow long-tail. Some are so rare they hardly happen, yet want model such occurrences. In this paper, propose alternative solution....

10.1109/cvpr.2017.324 article EN 2017-07-01

We propose NEIL (Never Ending Image Learner), a computer program that runs 24 hours per day and 7 days week to automatically extract visual knowledge from Internet data. uses semi-supervised learning algorithm jointly discovers common sense relationships (e.g., "Corolla is kind of/looks similar Car", "Wheel part of Car") labels instances the given categories. It an attempt develop world's largest structured base with minimum human labeling effort. As 10th October 2013, has been continuously...

10.1109/iccv.2013.178 article EN 2013-12-01

In recent years, we have seen tremendous progress in the field of object detection. Most improvements been achieved by targeting deeper feedforward networks. However, many hard categories such as bottle, remote, etc. require representation fine details and not just coarse, semantic representations. But most these are lost early convolutional layers. What need is a way to incorporate finer from lower layers into detection architecture. Skip connections proposed combine high-level low-level...

10.48550/arxiv.1612.06851 preprint EN other-oa arXiv (Cornell University) 2016-01-01

The success of deep learning in vision can be attributed to: (a) models with high capacity; (b) increased computational power; and (c) availability large-scale labeled data. Since 2012, there have been significant advances representation capabilities the GPUs. But size biggest dataset has surprisingly remained constant. What will happen if we increase by 10x or 100x? This paper takes a step towards clearing clouds mystery surrounding relationship between `enormous data' visual learning. By...

10.48550/arxiv.1707.02968 preprint EN other-oa arXiv (Cornell University) 2017-01-01

Recent advances in image editing techniques have posed serious challenges to the trustworthiness of multimedia data, which drives research tampering detection. In this paper, we propose ObjectFormer detect and localize manipulations. To capture subtle manipulation traces that are no longer visible RGB domain, extract high-frequency features images combine them with as multimodal patch embeddings. Additionally, use a set learnable object prototypes mid-level representations model object-level...

10.1109/cvpr52688.2022.00240 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2022-06-01

Weakly-supervised temporal action localization aims to recognize and localize segments in untrimmed videos given only video-level labels for training. Without the boundary information of segments, existing methods mostly rely on multiple instance learning (MIL), where predictions unlabeled instances (i.e., video snippets) are supervised by classifying labeled bags videos). However, this formulation typically treats snippets a as independent instances, ignoring underlying structures within...

10.1109/cvpr52688.2022.01355 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2022-06-01

The goal of this work is to find visually similar images even if they appear quite different at the raw pixel level. This task particularly important for matching across visual domains, such as photos taken over seasons or lighting conditions, paintings, hand-drawn sketches, etc. We propose a surprisingly simple method that estimates relative importance features in query image based on notion "data-driven uniqueness". employ standard tools from discriminative object detection novel way,...

10.1145/2024156.2024188 article EN 2011-12-12

The goal of this work is to find visually similar images even if they appear quite different at the raw pixel level. This task particularly important for matching across visual domains, such as photos taken over seasons or lighting conditions, paintings, hand-drawn sketches, etc. We propose a surprisingly simple method that estimates relative importance features in query image based on notion "data-driven uniqueness". employ standard tools from discriminative object detection novel way,...

10.1145/2070781.2024188 article EN ACM Transactions on Graphics 2011-11-30

There have been some recent efforts to build visual knowledge bases from Internet images. But most of these approaches focused on bounding box representation objects. In this paper, we propose enrich by automatically discovering objects and their segmentations noisy Specifically, our approach combines the power generative modeling for segmentation with effectiveness discriminative models detection. The key idea behind is learn exploit top-down priors based subcategories. strong learned...

10.1109/cvpr.2014.261 article EN 2009 IEEE Conference on Computer Vision and Pattern Recognition 2014-06-01

We present an approach for detecting human-object interactions (HOIs) in images, based on the idea that humans interact with functionally similar objects a manner. The proposed model is simple and efficiently uses data, visual features of human, relative spatial orientation human object, knowledge take part humans. provide extensive experimental validation our demonstrate state-of-the-art results HOI detection. On HICO-Det dataset method achieves gain over 2.5% absolute points mean average...

10.1609/aaai.v34i07.6616 article EN Proceedings of the AAAI Conference on Artificial Intelligence 2020-04-03

We present a semi-supervised approach that localizes multiple unknown object instances in long videos. start with handful of labeled boxes and iteratively learn label hundreds thousands instances. propose criteria for reliable detection tracking constraining the learning process minimizing semantic drift. Our does not assume exhaustive labeling each instance any single frame, or explicit annotation negative data. Working such generic setting allow us to tackle video, many which are static....

10.1109/cvpr.2015.7298982 article EN 2015-06-01

Detecting manipulated images has become a significant emerging challenge. The advent of image sharing platforms and the easy availability advanced photo editing software have resulted in large quantities being shared on internet. While intent behind such manipulations varies widely, concerns spread false news misinformation is growing. Current state art methods for detecting these suffers from lack training data due to laborious labeling process. We address this problem paper, which we...

10.1609/aaai.v34i07.7007 article EN Proceedings of the AAAI Conference on Artificial Intelligence 2020-04-03

This paper focuses on multi-person action forecasting in videos. More precisely, given a history of H previous frames, the goal is to detect actors and predict their future actions for next T frames. Our approach jointly models temporal spatial interactions among different by constructing recurrent graph, using actor proposals obtained with Faster R-CNN as nodes. method learns select subset discriminative relations without requiring explicit supervision, thus enabling us tackle challenging...

10.1109/cvpr.2019.00036 preprint EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2019-06-01

We address the problem of scene layout generation for diverse domains such as images, mobile applications, documents, and 3D objects. Most complex scenes, natural or human-designed, can be expressed a meaningful arrangement simpler compositional graphical primitives. Generating new extending an existing re- quires understanding relationships between these To do this, we propose LayoutTransformer, novel framework that leverages self-attention to learn contextual elements generate layouts in...

10.1109/iccv48922.2021.00104 article EN 2021 IEEE/CVF International Conference on Computer Vision (ICCV) 2021-10-01

We propose a novel neural representation for videos (NeRV) which encodes in networks. Unlike conventional representations that treat as frame sequences, we represent networks taking index input. Given index, NeRV outputs the corresponding RGB image. Video encoding is simply fitting network to video frames and decoding process simple feedforward operation. As an image-wise implicit representation, output whole image shows great efficiency compared pixel-wise improving speed by 25x 70x, 38x...

10.48550/arxiv.2110.13903 preprint EN cc-by-nc-nd arXiv (Cornell University) 2021-01-01
Coming Soon ...