NFDI4DS | UHH-SEMS - Publication Details

Armin Mustafa

ORCID: 0000-0002-1779-2775

Publications

Citations

Views

---

Saved

---

About

Contact & Profiles

A5075260187

Research Areas

Advanced Vision and Imaging
Human Pose and Action Recognition
Robotics and Sensor-Based Localization
Video Surveillance and Tracking Methods
Music and Audio Processing
Computer Graphics and Visualization Techniques
Image Enhancement Techniques
Multimodal Machine Learning Applications
3D Shape Modeling and Analysis
Video Analysis and Summarization
Optical measurement and interference techniques
Advanced Image Processing Techniques
Speech and Audio Processing
Anomaly Detection Techniques and Applications
Advanced Image and Video Retrieval Techniques
Digital Media Forensic Detection
Advanced Neural Network Applications
Hand Gesture Recognition Systems
Generative Adversarial Networks and Image Synthesis
Image and Signal Denoising Methods
Infrared Thermography in Medicine
3D Surveying and Cultural Heritage
Autonomous Vehicle Technology and Safety
Music Technology and Sound Studies
Domain Adaptation and Few-Shot Learning

University of Surrey
2015-2024

Signal Processing (United States)
2024

Samsung (India)
2012

Indian Institute of Technology Kanpur
2011

A*3D Dataset: Towards Autonomous Driving in Challenging Environments

OPENALEX - Publications

Quang-Hieu Pham Pierre Sevestre Ramanpreet Singh Pahwa Huijing Zhan Chun Ho Pang and 4 more

With the increasing global popularity of self-driving cars, there is an immediate need for challenging real-world datasets benchmarking and training various computer vision tasks such as 3D object detection. Existing either represent simple scenarios or provide only day-time data. In this paper, we introduce a new A*3D dataset which consists RGB images LiDAR data with significant diversity scene, time, weather. The high-density (≈ 10 times more than pioneering KITTI dataset), heavy...

10.1109/icra40945.2020.9197385 article EN 2020-05-01

General Dynamic Scene Reconstruction from Multiple View Video

OPENALEX - Publications

Armin Mustafa Hansung Kim Jean‐Yves Guillemaut Adrian Hilton

This paper introduces a general approach to dynamic scene reconstruction from multiple moving cameras without prior knowledge or limiting constraints on the structure, appearance, illumination. Existing techniques wide-baseline camera views primarily focus accurate in controlled environments, where are fixed and calibrated background is known. These approaches not robust for scenes captured with sparse cameras. Previous outdoor assume of static appearance structure. The primary contributions...

10.1109/iccv.2015.109 article EN 2015-12-01

Temporally Coherent 4D Reconstruction of Complex Dynamic Scenes

OPENALEX - Publications

Armin Mustafa Hansung Kim Jean‐Yves Guillemaut Adrian Hilton

This paper presents an approach for reconstruction of 4D temporally coherent models complex dynamic scenes. No prior knowledge is required scene structure or camera calibration allowing from multiple moving cameras. Sparse-to-dense temporal correspondence integrated with joint multi-view segmentation and to obtain a complete representation static objects. Temporal coherence exploited overcome visual ambiguities resulting in improved Robust objects achieved by introducing geodesic star...

10.1109/cvpr.2016.504 article EN 2016-06-01

UPGPT: Universal Diffusion Model for Person Image Generation, Editing and Pose Transfer

OPENALEX - Publications

Soon Yau Cheong Armin Mustafa Andrew Gilbert

Text-to-image models (T2I) such as StableDiffusion have been used to generate high quality images of people. However, due the random nature generation process, person has a different appearance e.g. pose, face, and clothing, despite using same text prompt. The inconsistency makes T2I unsuitable for pose transfer. We address this by proposing multimodal diffusion model that accepts text, visual prompting. Our is first unified method perform all image tasks-generation, transfer, mask-less...

10.1109/iccvw60793.2023.00451 article EN 2023-10-02

Deconstruct Complexity (DeComplex): A Novel Perspective on Tackling Dense Action Detection

OPENALEX - Publications

Faegheh Sardari Armin Mustafa Philip J. B. Jackson Adrian Hilton

Dense action detection involves detecting multiple co-occurring actions in an untrimmed video while classes are often ambiguous and represent overlapping concepts. To address this challenge task, we introduce a novel perspective inspired by how humans tackle complex tasks breaking them into manageable sub-tasks. Instead of relying on single network to the entire problem, as current approaches, propose decomposing problem key concepts present classes, specifically, dense static dynamic...

10.48550/arxiv.2501.18509 preprint EN arXiv (Cornell University) 2025-01-30

Semantically Coherent Co-Segmentation and Reconstruction of Dynamic Scenes

OPENALEX - Publications

Armin Mustafa Adrian Hilton

In this paper we propose a framework for spatially and temporally coherent semantic co-segmentation reconstruction of complex dynamic scenes from multiple static or moving cameras. Semantic exploits the coherence in class labels both spatially, between views at single time instant, temporally, widely spaced instants objects with similar shape appearance. We demonstrate that results improved segmentation scenes. A joint formulation is proposed semantically object-based by enforcing consistent...

10.1109/cvpr.2017.592 article EN 2017-07-01

CAD - Contextual Multi-modal Alignment for Dynamic AVQA

OPENALEX - Publications

Asmar Nadeem Adrian Hilton Robert Dawes G. Neil Thomas Armin Mustafa

In the context of Audio Visual Question Answering (AVQA) tasks, audio and visual modalities could be learnt on three levels: 1) Spatial, 2) Temporal, 3) Semantic. Existing AVQA methods suffer from two major shortcomings; audio-visual (AV) information passing through network isn't aligned Spatial Temporal levels; and, inter-modal (audio visual) Semantic is often not balanced within a context; this results in poor performance. paper, we propose novel end-to-end Contextual Multi-modal Alignment...

10.1109/wacv57701.2024.00709 article EN 2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) 2024-01-03

MSFD: Multi-Scale Segmentation-Based Feature Detection for Wide-Baseline Scene Reconstruction

OPENALEX - Publications

Armin Mustafa Hansung Kim Adrian Hilton

A common problem in wide-baseline matching is the sparse and non-uniform distribution of correspondences when using conventional detectors, such as SIFT, SURF, FAST, A-KAZE, MSER. In this paper, we introduce a novel segmentation-based feature detector (SFD) that produces an increased number accurate features for matching. multi-scale SFD proposed bilateral image decomposition to produce large scale-invariant reconstruction. All input images are over-segmented into regions any existing...

10.1109/tip.2018.2872906 article EN IEEE Transactions on Image Processing 2018-09-28

Max-AST: Combining Convolution, Local and Global Self-Attentions for Audio Event Classification

OPENALEX - Publications

Tony Alex Sara Atito Ali Ahmed Armin Mustafa Muhammad Awais Philip J. B. Jackson

In the domain of audio transformer architectures, prior research has extensively investigated isotropic architectures that capture global context through full self-attention and hierarchical progressively transition from local to utilising structures with convolutions or window-based attention. However, idea imbuing each individual block both contexts, thereby creating a hybrid block, remains relatively under-explored in field.To facilitate this exploration, we introduce Multi Axis Audio...

10.1109/icassp48485.2024.10447697 article EN ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2024-03-18

Multi-person Implicit Reconstruction from a Single Image

OPENALEX - Publications

Armin Mustafa Akın Çalışkan Lourdes Agapito Adrian Hilton

We present a new end-to-end learning framework to obtain detailed and spatially coherent reconstructions of multiple people from single image. Existing multi-person methods suffer two main drawbacks: they are often model-based therefore cannot capture accurate 3D models with loose clothing hair; or require manual intervention resolve occlusions interactions. Our method addresses both limitations by introducing the first approach perform model-free implicit reconstruction for realistic...

10.1109/cvpr46437.2021.01424 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2021-06-01

PAT: Position-Aware Transformer for Dense Multi-Label Action Detection

OPENALEX - Publications

Faegheh Sardari Armin Mustafa Philip J. B. Jackson Adrian Hilton

We present PAT, a transformer-based network that learns complex temporal co-occurrence action dependencies in video by exploiting multi-scale features. In existing methods, the self-attention mechanism transformers loses positional information, which is essential for robust detection. To address this issue, we (i) embed relative encoding and (ii) exploit relationships designing novel non-hierarchical network, contrast to recent approaches use hierarchical structure. argue joining with...

10.1109/iccvw60793.2023.00321 article EN 2023-10-02

4D Temporally Coherent Light-Field Video

OPENALEX - Publications

Armin Mustafa Marco Volino Jean‐Yves Guillemaut Adrian Hilton

Light-field video has recently been used in virtual and augmented reality applications to increase realism immersion. However, existing light-field methods are generally limited static scenes due the requirement acquire a dense scene representation. The large amount of data absence infer temporal coherence pose major challenges storage, compression editing compared conventional video. In this paper, we propose first method extract spatio-temporally coherent A novel obtain Epipolar Plane...

10.1109/3dv.2017.00014 article EN 2021 International Conference on 3D Vision (3DV) 2017-10-01

DTF-AT: Decoupled Time-Frequency Audio Transformer for Event Classification

OPENALEX - Publications

Tony Alex Sara Atito Ali Ahmed Armin Mustafa Muhammad Awais Philip J. B. Jackson

Convolutional neural networks (CNNs) and Transformer-based have recently enjoyed significant attention for various audio classification tagging tasks following their wide adoption in the computer vision domain. Despite difference information distribution between spectrograms natural images, there has been limited exploration of effective retrieval from using domain-specific layers tailored In this paper, we leverage power Multi-Axis Vision Transformer (MaxViT) to create DTF-AT (Decoupled...

10.1609/aaai.v38i16.29716 article EN Proceedings of the AAAI Conference on Artificial Intelligence 2024-03-24

S3R-Net: A Single-Stage Approach to Self-Supervised Shadow Removal

OPENALEX - Publications

Nikolina Kubiak Armin Mustafa Graeme Phillipson S. Jolly Simon Hadfield

10.1109/cvprw63382.2024.00597 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) 2024-06-17

U4D: Unsupervised 4D Dynamic Scene Understanding

OPENALEX - Publications

Armin Mustafa Chris Russell Adrian Hilton

We introduce the first approach to solve challenging problem of unsupervised 4D visual scene understanding for complex dynamic scenes with multiple interacting people from multi-view video. Our simultaneously estimates a detailed model that includes per-pixel semantically and temporally coherent reconstruction, together instance-level segmentation exploiting photo-consistency, semantic motion information. further leverage recent advances in 3D pose estimation constrain joint instance...

10.1109/iccv.2019.01052 article EN 2021 IEEE/CVF International Conference on Computer Vision (ICCV) 2019-10-01

RealMonoDepth: Self-Supervised Monocular Depth Estimation for General Scenes

OPENALEX - Publications

Mertalp Ocal Armin Mustafa

We present a generalised self-supervised learning approach for monocular estimation of the real depth across scenes with diverse ranges from 1--100s meters. Existing supervised methods require accurate measurements training. This limitation has led to introduction that are trained on stereo image pairs fixed camera baseline estimate disparity which is transformed given known calibration. Self-supervised approaches have demonstrated impressive results but do not generalise different or...

10.48550/arxiv.2004.06267 preprint EN other-oa arXiv (Cornell University) 2020-01-01

Segmentation Based Features for Wide-Baseline Multi-view Reconstruction

OPENALEX - Publications

Armin Mustafa Hansung Kim Evren İmre Adrian Hilton

A common problem in wide-baseline stereo is the sparse and non-uniform distribution of correspondences when using conventional detectors such as SIFT, SURF, FAST MSER. In this paper we introduce a novel segmentation based feature detector SFD that produces an increased number 'good' features for accurate reconstruction. Each image segmented into regions by over-segmentation points are detected at intersection boundaries three or more regions. Segmentation-based detection locates local maxima...

10.1109/3dv.2015.39 article EN International Conference on 3D Vision 2015-10-01

Temporally Coherent General Dynamic Scene Reconstruction

OPENALEX - Publications

Armin Mustafa Marco Volino Hansung Kim Jean‐Yves Guillemaut Adrian Hilton

Abstract Existing techniques for dynamic scene reconstruction from multiple wide-baseline cameras primarily focus on in controlled environments, with fixed calibrated and strong prior constraints. This paper introduces a general approach to obtain 4D representation of complex scenes multi-view static or moving without knowledge the structure, appearance, illumination. Contributions work are: an automatic method initial coarse initialize joint estimation; sparse-to-dense temporal...

10.1007/s11263-020-01367-2 article EN cc-by International Journal of Computer Vision 2020-08-18

Semantically Coherent 4D Scene Flow of Dynamic Scenes

OPENALEX - Publications

Armin Mustafa Adrian Hilton

Abstract Simultaneous semantically coherent object-based long-term 4D scene flow estimation, co-segmentation and reconstruction is proposed exploiting the coherence in semantic class labels both spatially, between views at a single time instant, temporally, widely spaced instants of dynamic objects with similar shape appearance. In this paper we propose framework for spatially temporally general scenes from multiple view videos captured network static or moving cameras. Semantic results...

10.1007/s11263-019-01241-w article EN cc-by International Journal of Computer Vision 2019-10-03

Learning Dense Wide Baseline Stereo Matching for People

OPENALEX - Publications

Akın Çalışkan Armin Mustafa Evren İmre Adrian Hilton

Existing methods for stereo work on narrow baseline image pairs giving limited performance between wide views. This paper proposes a framework to learn and estimate dense people from pairs. A synthetic patch dataset (S2P2) is introduced matching people. The proposed not only learns human specific features data but also exploits pooling layer augmentation adapt real data. network the patches wide-baseline estimation. In addition match learning, constraint in solve reconstruction of humans....

10.1109/iccvw.2019.00271 article EN 2019-10-01

A*3D Dataset: Towards Autonomous Driving in Challenging Environments

OPENALEX - Publications

Quang-Hieu Pham Pierre Sevestre Ramanpreet Singh Pahwa Huijing Zhan Chun Ho Pang and 4 more

With the increasing global popularity of self-driving cars, there is an immediate need for challenging real-world datasets benchmarking and training various computer vision tasks such as 3D object detection. Existing either represent simple scenarios or provide only day-time data. In this paper, we introduce a new A*3D dataset which consists RGB images LiDAR data with significant diversity scene, time, weather. The high-density ($\approx~10$ times more than pioneering KITTI dataset), heavy...

10.48550/arxiv.1909.07541 preprint EN other-oa arXiv (Cornell University) 2019-01-01

Coming Soon ...