Kashyap Chitta

ORCID: 0000-0002-3891-3230
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Advanced Neural Network Applications
  • Autonomous Vehicle Technology and Safety
  • Domain Adaptation and Few-Shot Learning
  • Machine Learning and Algorithms
  • Machine Learning and Data Classification
  • Video Surveillance and Tracking Methods
  • Human Pose and Action Recognition
  • Multimodal Machine Learning Applications
  • Adversarial Robustness in Machine Learning
  • Generative Adversarial Networks and Image Synthesis
  • Advanced Image and Video Retrieval Techniques
  • Simulation Techniques and Applications
  • Advanced Image Processing Techniques
  • AI in cancer detection
  • Traffic Prediction and Management Techniques
  • Transportation and Mobility Innovations
  • Data Stream Mining Techniques
  • Image Retrieval and Classification Techniques
  • Robotics and Sensor-Based Localization
  • Emotion and Mood Recognition
  • Advanced Graph Neural Networks
  • Real-time simulation and control systems
  • Visual Attention and Saliency Detection
  • Robotic Path Planning Algorithms
  • Gaussian Processes and Bayesian Inference

TH Bingen University of Applied Sciences
2022-2024

University of Tübingen
2020-2022

Max Planck Institute for Intelligent Systems
2020-2022

Max Planck Society
2019-2021

Weatherford College
2021

Istituto Tecnico Industriale Alessandro Volta
2021

Nvidia (United States)
2021

Carnegie Mellon University
2018-2019

R.V. College of Engineering
2016

How should representations from complementary sensors be integrated for autonomous driving? Geometry-based sensor fusion has shown great promise perception tasks such as object detection and motion forecasting. However, the actual driving task, global context of 3D scene is key, e.g. a change in traffic light state can affect behavior vehicle geometrically distant that light. Geometry alone may therefore insufficient effectively fusing end-to-end models. In this work, we demonstrate...

10.1109/cvpr46437.2021.00700 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2021-06-01

How should we integrate representations from complementary sensors for autonomous driving? Geometry-based fusion has shown promise perception (e.g., object detection, motion forecasting). However, in the context of end-to-end driving, find that imitation learning based on existing sensor methods underperforms complex driving scenarios with a high density dynamic agents. Therefore, propose TransFuser, mechanism to image and LiDAR using self-attention. Our approach uses transformer modules at...

10.1109/tpami.2022.3200245 article EN IEEE Transactions on Pattern Analysis and Machine Intelligence 2022-08-19

Efficient reasoning about the semantic, spatial, and temporal structure of a scene is crucial prerequisite for autonomous driving. We present NEural ATtention fields (NEAT), novel representation that enables such end-to-end imitation learning models. NEAT continuous function which maps locations in Bird's Eye View (BEV) coordinates to waypoints semantics, using intermediate attention iteratively compress high-dimensional 2D image features into compact representation. This allows our model...

10.1109/iccv48922.2021.01550 article EN 2021 IEEE/CVF International Conference on Computer Vision (ICCV) 2021-10-01

The autonomous driving community has witnessed a rapid growth in approaches that embrace an end-to-end algorithm framework, utilizing raw sensor input to generate vehicle motion plans, instead of concentrating on individual tasks such as detection and prediction. End-to-end systems, comparison modular pipelines, benefit from joint feature optimization for perception planning. This field flourished due the availability large-scale datasets, closed-loop evaluation, increasing need algorithms...

10.1109/tpami.2024.3435937 article EN cc-by-nc-nd IEEE Transactions on Pattern Analysis and Machine Intelligence 2024-07-30

Deep Neural Networks trained in a fully supervised fashion are the dominant technology perception-based autonomous driving systems. While collecting large amounts of unlabeled data is already major undertaking, only subset it can be labeled by humans due to effort needed for high-quality annotation. Therefore, finding right label has become key challenge. Active learning powerful technique improve efficiency methods, as aims at selecting smallest possible training set reach required...

10.1109/iv47402.2020.9304793 article EN 2022 IEEE Intelligent Vehicles Symposium (IV) 2020-10-19

Generative Adversarial Networks (GANs) produce high-quality images but are challenging to train. They need careful regularization, vast amounts of compute, and expensive hyper-parameter sweeps. We make significant headway on these issues by projecting generated real samples into a fixed, pretrained feature space. Motivated the finding that discriminator cannot fully exploit features from deeper layers model, we propose more effective strategy mixes across channels resolutions. Our Projected...

10.48550/arxiv.2111.01007 preprint EN other-oa arXiv (Cornell University) 2021-01-01

Data aggregation techniques can significantly improve vision-based policy learning within a training environment, e.g., to drive in specific simulation condition. However, as on-policy data is sequentially sampled and added an iterative manner, the specialize overfit conditions. For real-world applications, it useful for learned generalize novel scenarios that differ from To while maintaining robustness when end-to-end driving policies, we perform extensive analysis of CARLA environment. We...

10.1109/cvpr42600.2020.01178 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2020-06-01

Human drivers have a remarkable ability to drive in diverse visual conditions and situations, e.g., from maneuvering rainy, limited visibility with no lane markings turning busy intersection while yielding pedestrians. In contrast, we find that state-of-the-art sensorimotor driving models struggle when encountering settings varying relationships between observation action. To generalize making decisions across conditions, humans leverage multiple types of situation-specific reasoning...

10.1109/cvpr42600.2020.01131 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2020-06-01

End-to-end driving systems have recently made rapid progress, in particular on CARLA. Independent of their major contribution, they introduce changes to minor system components. Consequently, the source improvements is unclear. We identify two biases that recur nearly all state-of-the-art methods and are critical for observed progress CARLA: (1) lateral recovery via a strong inductive bias towards target point following, (2) longitudinal averaging multimodal waypoint predictions slowing...

10.1109/iccv51070.2023.00757 article EN 2021 IEEE/CVF International Conference on Computer Vision (ICCV) 2023-10-01

The release of nuPlan marks a new era in vehicle motion planning research, offering the first large-scale real-world dataset and evaluation schemes requiring both precise short-term long-horizon ego-forecasting. Existing systems struggle to simultaneously meet requirements. Indeed, we find that these tasks are fundamentally misaligned should be addressed independently. We further assess current state closed-loop field, revealing limitations learning-based methods complex scenarios value...

10.48550/arxiv.2306.07962 preprint EN other-oa arXiv (Cornell University) 2023-01-01

Planning an optimal route in a complex environment requires efficient reasoning about the surrounding scene. While human drivers prioritize important objects and ignore details not relevant to decision, learning-based planners typically extract features from dense, high-dimensional grid representations containing all vehicle road context information. In this paper, we propose PlanT, novel approach for planning of self-driving that uses standard transformer architecture. PlanT is based on...

10.48550/arxiv.2210.14222 preprint EN other-oa arXiv (Cornell University) 2022-01-01

Annotating the right data for training deep neural networks is an important challenge. Active learning using uncertainty estimates from Bayesian Neural Networks (BNNs) could provide effective solution to this. Despite being theoretically principled, BNNs require approximations be applied large-scale problems, where both performance and estimation are crucial. In this paper, we introduce Deep Probabilistic Ensembles (DPEs), a scalable technique that uses regularized ensemble approximate BNN....

10.48550/arxiv.1811.03575 preprint EN other-oa arXiv (Cornell University) 2018-01-01

It is well known that semantic segmentation can be used as an effective intermediate representation for learning driving policies. However, the task of street scene requires expensive annotations. Furthermore, algorithms are often trained irrespective actual task, using auxiliary image-space loss functions which not guaranteed to maximize metrics such safety or distance traveled per intervention. In this work, we seek quantify impact reducing annotation costs on learned behavior cloning...

10.1109/iros45743.2020.9340641 article EN 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) 2020-10-24

Deep Neural Networks (DNNs) often rely on vast datasets for training. Given the large size of such datasets, it is conceivable that they contain specific samples either do not contribute or negatively impact DNN's optimization. Modifying training distribution to exclude could provide an effective solution improve performance and reduce time. This paper proposes scale up ensemble Active Learning (AL) methods perform acquisition at a (10k 500k time). We this with ensembles hundreds models,...

10.1109/tits.2021.3133268 article EN IEEE Transactions on Intelligent Transportation Systems 2021-12-31

The autonomous driving community has witnessed a rapid growth in approaches that embrace an end-to-end algorithm framework, utilizing raw sensor input to generate vehicle motion plans, instead of concentrating on individual tasks such as detection and prediction. End-to-end systems, comparison modular pipelines, benefit from joint feature optimization for perception planning. This field flourished due the availability large-scale datasets, closed-loop evaluation, increasing need algorithms...

10.48550/arxiv.2306.16927 preprint EN cc-by-nc-sa arXiv (Cornell University) 2023-01-01

Training deep networks for semantic segmentation requires annotation of large amounts data, which can be time-consuming and expensive. Unfortunately, these trained still generalize poorly when tested in domains not consistent with the training data. In this paper, we show that by carefully presenting a mixture labeled source domain proxy-labeled target data to network, achieve state-of-the-art unsupervised adaptation results. With our design, network progressively learns features specific...

10.48550/arxiv.1811.03542 preprint EN other-oa arXiv (Cornell University) 2018-01-01

Semantic segmentation with Convolutional Neural Networks is a memory-intensive task due to the high spatial resolution of feature maps and output predictions. In this paper, we present Quadtree Generating (QGNs), novel approach able drastically reduce memory footprint modern semantic networks. The key idea use quadtrees represent predictions target masks instead dense pixel grids. Our quadtree representation enables hierarchical processing an input image, most computationally demanding...

10.1109/wacv45572.2020.9093449 article EN 2020-03-01

SLEDGE is the first generative simulator for vehicle motion planning trained on real-world driving logs. Its core component a learned model that able to generate agent bounding boxes and lane graphs. The model's outputs serve as an initial state traffic simulation. unique properties of entities be generated SLEDGE, such their connectivity variable count per scene, render naive application most modern models this task non-trivial. Therefore, together with systematic study existing graph...

10.48550/arxiv.2403.17933 preprint EN arXiv (Cornell University) 2024-03-26

We address the problem of semi-supervised domain adaptation classification algorithms through deep Q-learning. The core idea is to consider predictions a source network on target data as noisy labels, and learn policy sample from this so maximize accuracy small annotated reward partition domain. Our experiments show that learned sampling policies construct labeled sets improve accuracies visual classifiers over baselines.

10.48550/arxiv.1805.07641 preprint EN other-oa arXiv (Cornell University) 2018-01-01

The general approach to facial expression recognition involves three stages: face acquisition, feature extraction and recognition. A series of steps are used during extraction, the robustness a model depends on ability handle exceptions over all these steps. This paper details experiments conducted classify images by using reduced regions interest discriminative salient patches face, while minimizing number required for their localization. performance various descriptors is analyzed which...

10.1109/tencon.2016.7848553 article EN 2016-11-01
Coming Soon ...