NFDI4DS | UHH-SEMS - Publication Details

Achal Dave

ORCID: 0000-0003-1948-5629

Publications

Citations

Views

---

Saved

---

About

Contact & Profiles

A5045047925

Research Areas

Video Surveillance and Tracking Methods
Advanced Neural Network Applications
Anomaly Detection Techniques and Applications
Human Pose and Action Recognition
Domain Adaptation and Few-Shot Learning
Autonomous Vehicle Technology and Safety
Advanced Image and Video Retrieval Techniques
Generative Adversarial Networks and Image Synthesis
Advanced Vision and Imaging
Multimodal Machine Learning Applications
Topic Modeling
Visual Attention and Saliency Detection
Computer Graphics and Visualization Techniques
Explainable Artificial Intelligence (XAI)
Face recognition and analysis
COVID-19 diagnosis using AI
Human-Animal Interaction Studies
Natural Language Processing Techniques
Video Analysis and Summarization
Machine Learning and Data Classification
Remote Sensing and LiDAR Applications
Air Quality Monitoring and Forecasting
3D Shape Modeling and Analysis
Healthcare Technology and Patient Monitoring
Neural Networks and Applications

Toyota Research Institute
2023-2024

Toyota Industries (United States)
2024

Amazon (United States)
2022-2023

Amazon (Germany)
2023

Carnegie Mellon University
2017-2022

Seattle University
2022

Indian Institute of Technology Guwahati
2021

University of California, Berkeley
2014

Measuring Robustness to Natural Distribution Shifts in Image Classification

OPENALEX - Publications

Rohan Taori Achal Dave Vaishaal Shankar Nicholas Carlini Benjamin Recht and 1 more

We study how robust current ImageNet models are to distribution shifts arising from natural variations in datasets. Most research on robustness focuses synthetic image perturbations (noise, simulated weather artifacts, adversarial examples, etc.), which leaves open shift relates real data. Informed by an evaluation of 204 213 different test conditions, we find that there is often little no transfer shift. Moreover, most techniques provide the our testbed. The main exception training larger...

10.48550/arxiv.2007.00644 preprint EN other-oa arXiv (Cornell University) 2020-01-01

pix2gestalt: Amodal Segmentation by Synthesizing Wholes

OPENALEX - Publications

Ege Ozguroglu Ruoshi Liu Dídac Surís Dian Chen Achal Dave and 2 more

10.1109/cvpr52733.2024.00377 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2024-06-16

Towards Segmenting Anything That Moves

OPENALEX - Publications

Achal Dave Pavel Tokmakov Deva Ramanan

Detecting and segmenting individual objects, regardless of their category, is crucial for many applications such as action detection or robotic interaction. While this problem has been well-studied under the classic formulation spatio-temporal grouping, state-of-the-art approaches do not make use learning-based methods. To bridge gap, we propose a simple approach grouping. Our leverages motion cues from optical flow bottom-up signal separating objects each other. Motion are then combined...

10.1109/iccvw.2019.00187 article EN 2019-10-01

BURST: A Benchmark for Unifying Object Recognition, Segmentation and Tracking in Video

OPENALEX - Publications

Ali Athar Jonathon Luiten Paul Voigtlaender Tarasha Khurana Achal Dave and 2 more

Multiple existing benchmarks involve tracking and segmenting objects in video e.g., Video Object Segmentation (VOS) Multi-Object Tracking (MOTS), but there is little interaction between them due to the use of disparate benchmark datasets metrics (e.g. $\mathcal{J}\& {\mathcal{F}}$, mAP, sMOTSA). As a result, published works usually target particular benchmark, are not easily comparable each another. We believe that development generalized methods can tackle multiple tasks requires greater...

10.1109/wacv56688.2023.00172 article EN 2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) 2023-01-01

Opening up Open World Tracking

OPENALEX - Publications

Yang Liu Idil Esen Zulfikar Jonathon Luiten Achal Dave Deva Ramanan and 3 more

Tracking and detecting any object, including ones never-seen-before during model training, is a crucial but elusive capability of autonomous systems. An agent that blind to objects poses safety hazard when operating in the real world - yet this how almost all current systems work. One main obstacles towards advancing tracking object task notoriously difficult evaluate. A benchmark would allow us perform an apples-to-apples comparison existing efforts first step important research field. This...

10.1109/cvpr52688.2022.01846 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2022-06-01

Predictive-Corrective Networks for Action Detection

OPENALEX - Publications

Achal Dave Olga Russakovsky Deva Ramanan

While deep feature learning has revolutionized techniques for static-image understanding, the same does not quite hold video processing. Architectures and optimization used are largely based off those static images, potentially underutilizing rich information. In this work, we rethink both underlying network architecture stochastic paradigm temporal data. To do so, draw inspiration from classic theory on linear dynamic systems modeling time series. By extending such models to include...

10.1109/cvpr.2017.223 preprint EN 2017-07-01

Detecting Invisible People

OPENALEX - Publications

Tarasha Khurana Achal Dave Deva Ramanan

Monocular object detection and tracking have improved drastically in recent years, but rely on a key assumption: that objects are visible to the camera. Many offline approaches reason about occluded post-hoc, by linking together tracklets after re-appears, making use of reidentification (ReID). However, online embodied robotic agents (such as self-driving vehicle) fundamentally requires permanence, which is ability before they re-appear. In this work, we re-purpose benchmarks propose new...

10.1109/iccv48922.2021.00316 article EN 2021 IEEE/CVF International Conference on Computer Vision (ICCV) 2021-10-01

HOBS

OPENALEX - Publications

Ben Zhang Yu-Hsiang Chen Claire Tuna Achal Dave Yang Li and 2 more

Emerging head-worn computing devices can enable interactions with smart objects in physical spaces. We present the iterative design and evaluation of HOBS -- a Head-Orientation Based Selection technique for interacting these at distance. augment commercial wearable device, Google Glass, an infrared (IR) emitter to select targets equipped IR receivers. Our first shows that naive implementation outperform list selection, but has poor performance when refinement between multiple is needed. A...

10.1145/2659766.2659773 article EN 2014-10-04

Evaluating Large-Vocabulary Object Detectors: The Devil is in the Details

OPENALEX - Publications

Achal Dave Piotr Dollár Deva Ramanan Alexander M. Kirillov Ross Girshick

By design, average precision (AP) for object detection aims to treat all classes independently: AP is computed independently per category and averaged. On one hand, this desirable as it treats equally. the other ignores cross-category confidence calibration, a key property in real-world use cases. Unfortunately, under important conditions (i.e., large vocabulary, high instance counts) default implementation of neither independent, nor does directly reward properly calibrated detectors. In...

10.48550/arxiv.2102.01066 preprint EN other-oa arXiv (Cornell University) 2021-01-01

DataComp-LM: In search of the next generation of training sets for language models

OPENALEX - Publications

Jeffrey Li Alex Chengyu Fang Georgios Smyrnis Maor Ivgi Matt Jordan and 54 more

We introduce DataComp for Language Models (DCLM), a testbed controlled dataset experiments with the goal of improving language models. As part DCLM, we provide standardized corpus 240T tokens extracted from Common Crawl, effective pretraining recipes based on OpenLM framework, and broad suite 53 downstream evaluations. Participants in DCLM benchmark can experiment data curation strategies such as deduplication, filtering, mixing at model scales ranging 412M to 7B parameters. baseline conduct...

10.48550/arxiv.2406.11794 preprint EN arXiv (Cornell University) 2024-06-17

Detecting clinical medication errors with AI enabled wearable cameras

OPENALEX - Publications

Justin Chan Solomon Nsumba Mitchell Wortsman Achal Dave Ludwig Schmidt and 2 more

Drug-related errors are a leading cause of preventable patient harm in the clinical setting. We present first wearable camera system to automatically detect potential errors, prior medication delivery. demonstrate that using deep learning algorithms, our can and classify drug labels on syringes vials preparation events recorded real-world operating rooms. created first-of-its-kind large-scale video dataset from head-mounted cameras comprising 4K footage across 13 anesthesiology providers, 2...

10.1038/s41746-024-01295-2 article EN cc-by-nc-nd npj Digital Medicine 2024-10-22

Do Image Classifiers Generalize Across Time?

OPENALEX - Publications

Vaishaal Shankar Achal Dave Rebecca Roelofs Deva Ramanan Benjamin Recht and 1 more

Vision models notoriously flicker when applied to videos: they correctly recognize objects in some frames, but fail on perceptually similar, nearby frames. In this work, we systematically analyze the robustness of image classifiers such temporal perturbations videos. To do so, construct two new datasets, ImageNet-Vid-Robust and YTBB-Robust, containing a total 57,897 images grouped into 3,139 sets similar images. Our datasets were derived from ImageNet-Vid Youtube-BB, respectively, thoroughly...

10.1109/iccv48922.2021.00952 article EN 2021 IEEE/CVF International Conference on Computer Vision (ICCV) 2021-10-01

Data Determines Distributional Robustness in Contrastive Language Image Pre-training (CLIP)

OPENALEX - Publications

Alex Chengyu Fang Gabriel Ilharco Mitchell Wortsman Yuhao Wan Vaishaal Shankar and 2 more

Contrastively trained language-image models such as CLIP, ALIGN, and BASIC have demonstrated unprecedented robustness to multiple challenging natural distribution shifts. Since these differ from previous training approaches in several ways, an important question is what causes the large gains. We answer this via a systematic experimental investigation. Concretely, we study five different possible for gains: (i) set size, (ii) distribution, (iii) language supervision at time, (iv) test (v)...

10.48550/arxiv.2205.01397 preprint EN other-oa arXiv (Cornell University) 2022-01-01

Understanding Video Transformers via Universal Concept Discovery

OPENALEX - Publications

Matthew Kowal Achal Dave Rareş Ambruş Adrien Gaidon Konstantinos G. Derpanis and 1 more

This paper studies the problem of concept-based interpretability transformer representations for videos. Concretely, we seek to explain decision-making process video transformers based on high-level, spatiotemporal concepts that are automatically discovered. Prior research has concentrated solely image-level tasks. Comparatively, models deal with added temporal dimension, increasing complexity and posing challenges in identifying dynamic over time. In this work, systematically address these...

10.48550/arxiv.2401.10831 preprint EN other-oa arXiv (Cornell University) 2024-01-01

Language models scale reliably with over-training and on downstream tasks

OPENALEX - Publications

Samir Yitzhak Gadre Georgios Smyrnis Vaishaal Shankar Suchin Gururangan Mitchell Wortsman and 18 more

Scaling laws are useful guides for developing language models, but there still gaps between current scaling studies and how models ultimately trained evaluated. For instance, is usually studied in the compute-optimal training regime (i.e., "Chinchilla optimal" regime); however, practice, often over-trained to reduce inference costs. Moreover, mostly predict loss on next-token prediction, compared based downstream task performance. In this paper, we address both shortcomings. To do so, create...

10.48550/arxiv.2403.08540 preprint EN arXiv (Cornell University) 2024-03-13

Zero-Shot Open-Vocabulary Tracking with Large Pre-Trained Models

OPENALEX - Publications

Wen-Hsuan Chu Adam W. Harley Pavel Tokmakov Achal Dave Leonidas Guibas and 1 more

10.1109/icra57147.2024.10611726 article EN 2024-05-13

HandsOff: Labeled Dataset Generation With No Additional Human Annotations

OPENALEX - Publications

Austin Xu Mariya I. Vasileva Achal Dave Arjun Seshadri

Recent work leverages the expressive power of generative adversarial networks (GANs) to generate labeled synthetic datasets. These dataset generation methods often require new annotations images, which forces practitioners seek out annotators, curate a set and ensure quality generated labels. We introduce HandsOff framework, technique capable producing an unlimited number images corresponding labels after being trained on less than 50 preexisting images. Our framework avoids practical...

10.1109/cvpr52729.2023.00772 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2023-06-01

Tracking Any Object Amodally

OPENALEX - Publications

Cheng-Yen Hsieh Tarasha Khurana Achal Dave Deva Ramanan

Amodal perception, the ability to comprehend complete object structures from partial visibility, is a fundamental skill, even for infants. Its significance extends applications like autonomous driving, where clear understanding of heavily occluded objects essential. However, modern detection and tracking algorithms often overlook this critical capability, perhaps due prevalence \textit{modal} annotations in most benchmarks. To address scarcity amodal benchmarks, we introduce TAO-Amodal,...

10.48550/arxiv.2312.12433 preprint EN other-oa arXiv (Cornell University) 2023-01-01

Coming Soon ...