- Advanced Neural Network Applications
- Medical Image Segmentation Techniques
- Domain Adaptation and Few-Shot Learning
- COVID-19 diagnosis using AI
- Advanced Image and Video Retrieval Techniques
- Brain Tumor Detection and Classification
- Digital Imaging for Blood Diseases
- Image and Object Detection Techniques
- Image and Signal Denoising Methods
- Generative Adversarial Networks and Image Synthesis
- Advanced Vision and Imaging
- Advanced Text Analysis Techniques
- Digital Marketing and Social Media
- Face and Expression Recognition
- AI in cancer detection
- Cell Image Analysis Techniques
- Multimodal Machine Learning Applications
- Neural Networks and Applications
- Video Analysis and Summarization
- Industrial Vision Systems and Defect Detection
- Image Retrieval and Classification Techniques
- Semantic Web and Ontologies
- Reinforcement Learning in Robotics
- Sentiment Analysis and Opinion Mining
RWTH Aachen University
2019-2024
University College London
2023
Wellcome / EPSRC Centre for Interventional and Surgical Sciences
2023
Affymax (United States)
2023
Fine-tuning pre-trained vision models for specific tasks is a common practice in computer vision. However, this process becomes more expensive as grow larger. Recently, parameter-efficient fine-tuning (PEFT) methods have emerged popular solution to improve training efficiency and reduce storage needs by tuning additional low-rank modules within backbones. Despite their advantages, they struggle with limited representation capabilities misalignment intermediate features. To address these...
Diffusion models have achieved remarkable success in text-to-image generation. However, their practical applications are hindered by the misalignment between generated images and corresponding text prompts. To tackle this issue, reinforcement learning (RL) has been considered for diffusion model fine-tuning. Yet, RL's effectiveness is limited challenge of sparse reward, where feedback only available at end generation process. This makes it difficult to identify which actions during denoising...
Designing metrics for evaluating instance segmentation revolves around comprehensively considering object detection and accuracy. However, other important properties, such as sensitivity, continuity, equality, are overlooked in the current study. In this paper, we reveal that most existing have a limited resolution of quality. They only conditionally sensitive to change masks or false predictions. For certain metrics, score can drastically narrow range which could provide misleading...
Recently, it has been revealed that small semantic segmentation (SS) models exhibit a tendency to make errors in maintaining boundary region completeness and preserving target connectivity, despite their effective of the main object regions. To address these errors, we propose targeted relation distillation (BRD) strategy using knowledge from large teacher student models. Specifically, extracts explicit boundaries hierarchical feature maps backbone network, subsequently enhancing model's...
The advances in deep generative models have greatly accelerate the process of video procession such as enhancement and synthesis. Learning spatio-temporal requires to capture temporal dynamics a scene, addition visual appearance individual frames. Illumination consistency, which reflects variations illumination dynamic sequences, play vital role processing. Unfortunately, date, no well-accepted quantitative metric has been proposed for consistency evaluation. In this paper, we propose...
Improving user experience and providing personalized search results in E-commerce platforms heavily rely on understanding purchase intention. However, existing methods for acquiring large-scale intentions bank distilling large language models with human annotation verification. Such an approach tends to generate product-centric intentions, overlook valuable visual information from product images, incurs high costs scalability. To address these issues, we introduce MIND, a multimodal...
We introduce LLaVA-MoD, a novel framework designed to enable the efficient training of small-scale Multimodal Language Models (s-MLLM) by distilling knowledge from large-scale MLLM (l-MLLM). Our approach tackles two fundamental challenges in distillation. First, we optimize network structure s-MLLM integrating sparse Mixture Experts (MoE) architecture into language model, striking balance between computational efficiency and model expressiveness. Second, propose progressive transfer strategy...
Long-tailed distribution of semantic categories, which has been often ignored in conventional methods, causes unsatisfactory performance segmentation on tail categories. In this paper, we focus the problem long-tailed segmentation. Although some recognition methods (e.g., re-sampling/re-weighting) have proposed other problems, they can probably compromise crucial contextual information and are thus hardly adaptable to To address issue, propose MEDOE, a novel framework for via...
Designing metrics for evaluating instance segmentation revolves around comprehensively considering object detection and accuracy. However, other important properties, such as sensitivity, continuity, equality, are overlooked in the current study. In this paper, we reveal that most existing have a limited resolution of quality. They only conditionally sensitive to change masks or false predictions. For certain metrics, score can drastically narrow range which could provide misleading...
To date, most instance segmentation approaches are based on supervised learning that requires a considerable amount of annotated object contours as training ground truth. Here, we propose framework searches for the target shape prior. The prior model is learned with variational autoencoder only very limited data: In our experiments, few dozens patches from dataset, well purely synthetic shapes, were sufficient to achieve results en par methods full access data two out three cell datasets....
Product detection in large retail stores requires extensive annotated real data, which is expensive and lacks adaptability when new products are introduced. This paper presents an end-to-end product approach using domain randomization to generate synthetic datasets for training. We propose a set of randomizations at the scene level method generating amounts domain-randomized data. To evaluate performance on this dataset, we pipeline where model pre-trained simulation data fine-tuned small...