Achal Dave

ORCID: 0000-0003-1948-5629
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Video Surveillance and Tracking Methods
  • Advanced Neural Network Applications
  • Anomaly Detection Techniques and Applications
  • Human Pose and Action Recognition
  • Domain Adaptation and Few-Shot Learning
  • Autonomous Vehicle Technology and Safety
  • Advanced Image and Video Retrieval Techniques
  • Generative Adversarial Networks and Image Synthesis
  • Advanced Vision and Imaging
  • Multimodal Machine Learning Applications
  • Topic Modeling
  • Visual Attention and Saliency Detection
  • Computer Graphics and Visualization Techniques
  • Explainable Artificial Intelligence (XAI)
  • Face recognition and analysis
  • COVID-19 diagnosis using AI
  • Human-Animal Interaction Studies
  • Natural Language Processing Techniques
  • Video Analysis and Summarization
  • Machine Learning and Data Classification
  • Remote Sensing and LiDAR Applications
  • Air Quality Monitoring and Forecasting
  • 3D Shape Modeling and Analysis
  • Healthcare Technology and Patient Monitoring
  • Neural Networks and Applications

Toyota Research Institute
2023-2024

Toyota Industries (United States)
2024

Amazon (United States)
2022-2023

Amazon (Germany)
2023

Carnegie Mellon University
2017-2022

Seattle University
2022

Indian Institute of Technology Guwahati
2021

University of California, Berkeley
2014

We study how robust current ImageNet models are to distribution shifts arising from natural variations in datasets. Most research on robustness focuses synthetic image perturbations (noise, simulated weather artifacts, adversarial examples, etc.), which leaves open shift relates real data. Informed by an evaluation of 204 213 different test conditions, we find that there is often little no transfer shift. Moreover, most techniques provide the our testbed. The main exception training larger...

10.48550/arxiv.2007.00644 preprint EN other-oa arXiv (Cornell University) 2020-01-01

10.1109/cvpr52733.2024.00377 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2024-06-16

Detecting and segmenting individual objects, regardless of their category, is crucial for many applications such as action detection or robotic interaction. While this problem has been well-studied under the classic formulation spatio-temporal grouping, state-of-the-art approaches do not make use learning-based methods. To bridge gap, we propose a simple approach grouping. Our leverages motion cues from optical flow bottom-up signal separating objects each other. Motion are then combined...

10.1109/iccvw.2019.00187 article EN 2019-10-01

Multiple existing benchmarks involve tracking and segmenting objects in video e.g., Video Object Segmentation (VOS) Multi-Object Tracking (MOTS), but there is little interaction between them due to the use of disparate benchmark datasets metrics (e.g. $\mathcal{J}\& {\mathcal{F}}$, mAP, sMOTSA). As a result, published works usually target particular benchmark, are not easily comparable each another. We believe that development generalized methods can tackle multiple tasks requires greater...

10.1109/wacv56688.2023.00172 article EN 2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) 2023-01-01

Tracking and detecting any object, including ones never-seen-before during model training, is a crucial but elusive capability of autonomous systems. An agent that blind to objects poses safety hazard when operating in the real world - yet this how almost all current systems work. One main obstacles towards advancing tracking object task notoriously difficult evaluate. A benchmark would allow us perform an apples-to-apples comparison existing efforts first step important research field. This...

10.1109/cvpr52688.2022.01846 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2022-06-01

While deep feature learning has revolutionized techniques for static-image understanding, the same does not quite hold video processing. Architectures and optimization used are largely based off those static images, potentially underutilizing rich information. In this work, we rethink both underlying network architecture stochastic paradigm temporal data. To do so, draw inspiration from classic theory on linear dynamic systems modeling time series. By extending such models to include...

10.1109/cvpr.2017.223 preprint EN 2017-07-01

Monocular object detection and tracking have improved drastically in recent years, but rely on a key assumption: that objects are visible to the camera. Many offline approaches reason about occluded post-hoc, by linking together tracklets after re-appears, making use of reidentification (ReID). However, online embodied robotic agents (such as self-driving vehicle) fundamentally requires permanence, which is ability before they re-appear. In this work, we re-purpose benchmarks propose new...

10.1109/iccv48922.2021.00316 article EN 2021 IEEE/CVF International Conference on Computer Vision (ICCV) 2021-10-01

Emerging head-worn computing devices can enable interactions with smart objects in physical spaces. We present the iterative design and evaluation of HOBS -- a Head-Orientation Based Selection technique for interacting these at distance. augment commercial wearable device, Google Glass, an infrared (IR) emitter to select targets equipped IR receivers. Our first shows that naive implementation outperform list selection, but has poor performance when refinement between multiple is needed. A...

10.1145/2659766.2659773 article EN 2014-10-04

By design, average precision (AP) for object detection aims to treat all classes independently: AP is computed independently per category and averaged. On one hand, this desirable as it treats equally. the other ignores cross-category confidence calibration, a key property in real-world use cases. Unfortunately, under important conditions (i.e., large vocabulary, high instance counts) default implementation of neither independent, nor does directly reward properly calibrated detectors. In...

10.48550/arxiv.2102.01066 preprint EN other-oa arXiv (Cornell University) 2021-01-01

We introduce DataComp for Language Models (DCLM), a testbed controlled dataset experiments with the goal of improving language models. As part DCLM, we provide standardized corpus 240T tokens extracted from Common Crawl, effective pretraining recipes based on OpenLM framework, and broad suite 53 downstream evaluations. Participants in DCLM benchmark can experiment data curation strategies such as deduplication, filtering, mixing at model scales ranging 412M to 7B parameters. baseline conduct...

10.48550/arxiv.2406.11794 preprint EN arXiv (Cornell University) 2024-06-17

Drug-related errors are a leading cause of preventable patient harm in the clinical setting. We present first wearable camera system to automatically detect potential errors, prior medication delivery. demonstrate that using deep learning algorithms, our can and classify drug labels on syringes vials preparation events recorded real-world operating rooms. created first-of-its-kind large-scale video dataset from head-mounted cameras comprising 4K footage across 13 anesthesiology providers, 2...

10.1038/s41746-024-01295-2 article EN cc-by-nc-nd npj Digital Medicine 2024-10-22

Vision models notoriously flicker when applied to videos: they correctly recognize objects in some frames, but fail on perceptually similar, nearby frames. In this work, we systematically analyze the robustness of image classifiers such temporal perturbations videos. To do so, construct two new datasets, ImageNet-Vid-Robust and YTBB-Robust, containing a total 57,897 images grouped into 3,139 sets similar images. Our datasets were derived from ImageNet-Vid Youtube-BB, respectively, thoroughly...

10.1109/iccv48922.2021.00952 article EN 2021 IEEE/CVF International Conference on Computer Vision (ICCV) 2021-10-01

Contrastively trained language-image models such as CLIP, ALIGN, and BASIC have demonstrated unprecedented robustness to multiple challenging natural distribution shifts. Since these differ from previous training approaches in several ways, an important question is what causes the large gains. We answer this via a systematic experimental investigation. Concretely, we study five different possible for gains: (i) set size, (ii) distribution, (iii) language supervision at time, (iv) test (v)...

10.48550/arxiv.2205.01397 preprint EN other-oa arXiv (Cornell University) 2022-01-01

This paper studies the problem of concept-based interpretability transformer representations for videos. Concretely, we seek to explain decision-making process video transformers based on high-level, spatiotemporal concepts that are automatically discovered. Prior research has concentrated solely image-level tasks. Comparatively, models deal with added temporal dimension, increasing complexity and posing challenges in identifying dynamic over time. In this work, systematically address these...

10.48550/arxiv.2401.10831 preprint EN other-oa arXiv (Cornell University) 2024-01-01

Scaling laws are useful guides for developing language models, but there still gaps between current scaling studies and how models ultimately trained evaluated. For instance, is usually studied in the compute-optimal training regime (i.e., "Chinchilla optimal" regime); however, practice, often over-trained to reduce inference costs. Moreover, mostly predict loss on next-token prediction, compared based downstream task performance. In this paper, we address both shortcomings. To do so, create...

10.48550/arxiv.2403.08540 preprint EN arXiv (Cornell University) 2024-03-13

Recent work leverages the expressive power of generative adversarial networks (GANs) to generate labeled synthetic datasets. These dataset generation methods often require new annotations images, which forces practitioners seek out annotators, curate a set and ensure quality generated labels. We introduce HandsOff framework, technique capable producing an unlimited number images corresponding labels after being trained on less than 50 preexisting images. Our framework avoids practical...

10.1109/cvpr52729.2023.00772 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2023-06-01

Amodal perception, the ability to comprehend complete object structures from partial visibility, is a fundamental skill, even for infants. Its significance extends applications like autonomous driving, where clear understanding of heavily occluded objects essential. However, modern detection and tracking algorithms often overlook this critical capability, perhaps due prevalence \textit{modal} annotations in most benchmarks. To address scarcity amodal benchmarks, we introduce TAO-Amodal,...

10.48550/arxiv.2312.12433 preprint EN other-oa arXiv (Cornell University) 2023-01-01
Coming Soon ...