Konrad Habel
- Advanced Image and Video Retrieval Techniques
- Robotics and Sensor-Based Localization
- Multimodal Machine Learning Applications
- Domain Adaptation and Few-Shot Learning
- Video Analysis and Summarization
- Anomaly Detection Techniques and Applications
- Sports Analytics and Performance
- Advanced Neural Network Applications
- Remote-Sensing Image Classification
- Regional resilience and development
- Human Pose and Action Recognition
- Indoor and Outdoor Localization Technologies
- Software System Performance and Reliability
Technical University of Munich
2025
Universität der Bundeswehr München
2022-2024
Cross-View Geo-Localisation is still a challenging task where additional modules, specific pre-processing or zooming strategies are necessary to determine accurate positions of images. Since different views have geometries, like polar transformation helps merge them. However, this results in distorted images which then be rectified. Adding hard negatives the training batch could improve overall performance but with default loss functions geo-localisation it difficult include In work, we...
Sports analytics benefits from recent advances in machine learning providing a competitive advantage for teams or individuals. One important task this context is the performance measurement of individual players to provide reports and log files subsequent analysis. During sport events like basketball, involves re-identification during match either multiple camera viewpoints single viewpoint at different times. In work, we investigate whether it possible transfer out-standing zero-shot...
Retrieving relevant multimedia content is one of the main problems in a world that increasingly data-driven. With proliferation drones, high quality aerial footage now available to wide audience for first time. Integrating this into applications can enable GPS-less geo-localisation or location correction. In paper, we present an orientation-guided training framework UAV-view geo-localisation. Through hierarchical localisation orientations UAV images are estimated relation satellite imagery....
Cross-View Geo-Localisation is still a challenging task where additional modules, specific pre-processing or zooming strategies are necessary to determine accurate positions of images. Since different views have geometries, like polar transformation helps merge them. However, this results in distorted images which then be rectified. Adding hard negatives the training batch could improve overall performance but with default loss functions geo-localisation it difficult include In article, we...
The SoccerNet 2024 challenges represent the fourth annual video understanding organized by team. These aim to advance research across multiple themes in football, including broadcast understanding, field and player understanding. This year, encompass four vision-based tasks. (1) Ball Action Spotting, focusing on precisely localizing when which soccer actions related ball occur, (2) Dense Video Captioning, describing with natural language anchored timestamps, (3) Multi-View Foul Recognition,...
Retrieving relevant multimedia content is one of the main problems in a world that increasingly data-driven. With proliferation drones, high quality aerial footage now available to wide audience for first time. Integrating this into applications can enable GPS-less geo-localisation or location correction. In paper, we present an orientation-guided training framework UAV-view geo-localisation. Through hierarchical localisation orientations UAV images are estimated relation satellite imagery....
The SoccerNet 2023 challenges were the third annual video understanding organized by team. For this edition, composed of seven vision-based tasks split into three main themes. first theme, broadcast understanding, is high-level related to describing events occurring in broadcasts: (1) action spotting, focusing on retrieving all timestamps global actions soccer, (2) ball soccer change state, and (3) dense captioning, with natural language anchored timestamps. second field relates single task...
Current architectures for multi-modality tasks such as visual question answering suffer from their high complexity. As a result, these are difficult to train and require computational resources. To address problems we present CLIP-based architecture that does not any fine-tuning of the feature extractors. A simple linear classifier is used on concatenated features image text encoder. During training an auxiliary loss added which operates answer types. The resulting classification then...