- Advanced Neural Network Applications
- Autonomous Vehicle Technology and Safety
- Domain Adaptation and Few-Shot Learning
- Machine Learning and Algorithms
- Machine Learning and Data Classification
- Video Surveillance and Tracking Methods
- Human Pose and Action Recognition
- Multimodal Machine Learning Applications
- Adversarial Robustness in Machine Learning
- Generative Adversarial Networks and Image Synthesis
- Advanced Image and Video Retrieval Techniques
- Simulation Techniques and Applications
- Advanced Image Processing Techniques
- AI in cancer detection
- Traffic Prediction and Management Techniques
- Transportation and Mobility Innovations
- Data Stream Mining Techniques
- Image Retrieval and Classification Techniques
- Robotics and Sensor-Based Localization
- Emotion and Mood Recognition
- Advanced Graph Neural Networks
- Real-time simulation and control systems
- Visual Attention and Saliency Detection
- Robotic Path Planning Algorithms
- Gaussian Processes and Bayesian Inference
TH Bingen University of Applied Sciences
2022-2024
University of Tübingen
2020-2022
Max Planck Institute for Intelligent Systems
2020-2022
Max Planck Society
2019-2021
Weatherford College
2021
Istituto Tecnico Industriale Alessandro Volta
2021
Nvidia (United States)
2021
Carnegie Mellon University
2018-2019
R.V. College of Engineering
2016
How should representations from complementary sensors be integrated for autonomous driving? Geometry-based sensor fusion has shown great promise perception tasks such as object detection and motion forecasting. However, the actual driving task, global context of 3D scene is key, e.g. a change in traffic light state can affect behavior vehicle geometrically distant that light. Geometry alone may therefore insufficient effectively fusing end-to-end models. In this work, we demonstrate...
How should we integrate representations from complementary sensors for autonomous driving? Geometry-based fusion has shown promise perception (e.g., object detection, motion forecasting). However, in the context of end-to-end driving, find that imitation learning based on existing sensor methods underperforms complex driving scenarios with a high density dynamic agents. Therefore, propose TransFuser, mechanism to image and LiDAR using self-attention. Our approach uses transformer modules at...
Efficient reasoning about the semantic, spatial, and temporal structure of a scene is crucial prerequisite for autonomous driving. We present NEural ATtention fields (NEAT), novel representation that enables such end-to-end imitation learning models. NEAT continuous function which maps locations in Bird's Eye View (BEV) coordinates to waypoints semantics, using intermediate attention iteratively compress high-dimensional 2D image features into compact representation. This allows our model...
The autonomous driving community has witnessed a rapid growth in approaches that embrace an end-to-end algorithm framework, utilizing raw sensor input to generate vehicle motion plans, instead of concentrating on individual tasks such as detection and prediction. End-to-end systems, comparison modular pipelines, benefit from joint feature optimization for perception planning. This field flourished due the availability large-scale datasets, closed-loop evaluation, increasing need algorithms...
Deep Neural Networks trained in a fully supervised fashion are the dominant technology perception-based autonomous driving systems. While collecting large amounts of unlabeled data is already major undertaking, only subset it can be labeled by humans due to effort needed for high-quality annotation. Therefore, finding right label has become key challenge. Active learning powerful technique improve efficiency methods, as aims at selecting smallest possible training set reach required...
Generative Adversarial Networks (GANs) produce high-quality images but are challenging to train. They need careful regularization, vast amounts of compute, and expensive hyper-parameter sweeps. We make significant headway on these issues by projecting generated real samples into a fixed, pretrained feature space. Motivated the finding that discriminator cannot fully exploit features from deeper layers model, we propose more effective strategy mixes across channels resolutions. Our Projected...
Data aggregation techniques can significantly improve vision-based policy learning within a training environment, e.g., to drive in specific simulation condition. However, as on-policy data is sequentially sampled and added an iterative manner, the specialize overfit conditions. For real-world applications, it useful for learned generalize novel scenarios that differ from To while maintaining robustness when end-to-end driving policies, we perform extensive analysis of CARLA environment. We...
Human drivers have a remarkable ability to drive in diverse visual conditions and situations, e.g., from maneuvering rainy, limited visibility with no lane markings turning busy intersection while yielding pedestrians. In contrast, we find that state-of-the-art sensorimotor driving models struggle when encountering settings varying relationships between observation action. To generalize making decisions across conditions, humans leverage multiple types of situation-specific reasoning...
End-to-end driving systems have recently made rapid progress, in particular on CARLA. Independent of their major contribution, they introduce changes to minor system components. Consequently, the source improvements is unclear. We identify two biases that recur nearly all state-of-the-art methods and are critical for observed progress CARLA: (1) lateral recovery via a strong inductive bias towards target point following, (2) longitudinal averaging multimodal waypoint predictions slowing...
The release of nuPlan marks a new era in vehicle motion planning research, offering the first large-scale real-world dataset and evaluation schemes requiring both precise short-term long-horizon ego-forecasting. Existing systems struggle to simultaneously meet requirements. Indeed, we find that these tasks are fundamentally misaligned should be addressed independently. We further assess current state closed-loop field, revealing limitations learning-based methods complex scenarios value...
Planning an optimal route in a complex environment requires efficient reasoning about the surrounding scene. While human drivers prioritize important objects and ignore details not relevant to decision, learning-based planners typically extract features from dense, high-dimensional grid representations containing all vehicle road context information. In this paper, we propose PlanT, novel approach for planning of self-driving that uses standard transformer architecture. PlanT is based on...
Annotating the right data for training deep neural networks is an important challenge. Active learning using uncertainty estimates from Bayesian Neural Networks (BNNs) could provide effective solution to this. Despite being theoretically principled, BNNs require approximations be applied large-scale problems, where both performance and estimation are crucial. In this paper, we introduce Deep Probabilistic Ensembles (DPEs), a scalable technique that uses regularized ensemble approximate BNN....
It is well known that semantic segmentation can be used as an effective intermediate representation for learning driving policies. However, the task of street scene requires expensive annotations. Furthermore, algorithms are often trained irrespective actual task, using auxiliary image-space loss functions which not guaranteed to maximize metrics such safety or distance traveled per intervention. In this work, we seek quantify impact reducing annotation costs on learned behavior cloning...
Deep Neural Networks (DNNs) often rely on vast datasets for training. Given the large size of such datasets, it is conceivable that they contain specific samples either do not contribute or negatively impact DNN's optimization. Modifying training distribution to exclude could provide an effective solution improve performance and reduce time. This paper proposes scale up ensemble Active Learning (AL) methods perform acquisition at a (10k 500k time). We this with ensembles hundreds models,...
The autonomous driving community has witnessed a rapid growth in approaches that embrace an end-to-end algorithm framework, utilizing raw sensor input to generate vehicle motion plans, instead of concentrating on individual tasks such as detection and prediction. End-to-end systems, comparison modular pipelines, benefit from joint feature optimization for perception planning. This field flourished due the availability large-scale datasets, closed-loop evaluation, increasing need algorithms...
Training deep networks for semantic segmentation requires annotation of large amounts data, which can be time-consuming and expensive. Unfortunately, these trained still generalize poorly when tested in domains not consistent with the training data. In this paper, we show that by carefully presenting a mixture labeled source domain proxy-labeled target data to network, achieve state-of-the-art unsupervised adaptation results. With our design, network progressively learns features specific...
Semantic segmentation with Convolutional Neural Networks is a memory-intensive task due to the high spatial resolution of feature maps and output predictions. In this paper, we present Quadtree Generating (QGNs), novel approach able drastically reduce memory footprint modern semantic networks. The key idea use quadtrees represent predictions target masks instead dense pixel grids. Our quadtree representation enables hierarchical processing an input image, most computationally demanding...
SLEDGE is the first generative simulator for vehicle motion planning trained on real-world driving logs. Its core component a learned model that able to generate agent bounding boxes and lane graphs. The model's outputs serve as an initial state traffic simulation. unique properties of entities be generated SLEDGE, such their connectivity variable count per scene, render naive application most modern models this task non-trivial. Therefore, together with systematic study existing graph...
We address the problem of semi-supervised domain adaptation classification algorithms through deep Q-learning. The core idea is to consider predictions a source network on target data as noisy labels, and learn policy sample from this so maximize accuracy small annotated reward partition domain. Our experiments show that learned sampling policies construct labeled sets improve accuracies visual classifiers over baselines.
The general approach to facial expression recognition involves three stages: face acquisition, feature extraction and recognition. A series of steps are used during extraction, the robustness a model depends on ability handle exceptions over all these steps. This paper details experiments conducted classify images by using reduced regions interest discriminative salient patches face, while minimizing number required for their localization. performance various descriptors is analyzed which...