- Multimodal Machine Learning Applications
- Speech and Audio Processing
- Advanced Image and Video Retrieval Techniques
- Face recognition and analysis
- Fuzzy and Soft Set Theory
- Domain Adaptation and Few-Shot Learning
- Human Pose and Action Recognition
- Handwritten Text Recognition Techniques
- Music and Audio Processing
- Speech Recognition and Synthesis
- Advanced Algebra and Logic
- Face and Expression Recognition
- Topic Modeling
- Natural Language Processing Techniques
- Digital Media Forensic Detection
- Biometric Identification and Security
- Multi-Criteria Decision Making
- Text and Document Classification Technologies
- X-ray Diffraction in Crystallography
- Economic Growth and Development
- Image Retrieval and Classification Techniques
- Voice and Speech Disorders
- Enzyme Structure and Function
- Speech and dialogue systems
- Machine Learning in Materials Science
Deutsches Elektronen-Synchrotron DESY
2022-2024
Johannes Kepler University of Linz
2024
Shaheed Benazir Bhutto University
2024
Italian Institute of Technology
2020-2022
Hazara University
2016-2022
Institute of Informatics and Telematics
2021-2022
University of Insubria
2017-2021
University of Agriculture Faisalabad
2021
Jazan University
2021
Florida International University
2020
Recently attention-based networks have been successful for image restoration tasks. However, existing methods are either computationally expensive or limited receptive fields, adding constraints to the model. They also less resilient in spatial and contextual aspects lack pixel-to-pixel correspondence, which may degrade feature representations. In this paper, we propose a novel efficient architecture Single Stage Adaptive Multi-Attention Network (SSAMAN) tasks, particularly denoising...
Human action recognition has emerged as a challenging research domain for video understanding and analysis. Subsequently, extensive been conducted to achieve the improved performance of human actions. activity various real time applications, such patient monitoring in which patients are being monitored among group normal people then identified based on their abnormal activities. Our goal is render multi class detection individuals well groups from sequences differentiate multiple In this...
With the massive explosion of social media platforms such as Twitter and Instagram, people everyday share billions multimedia posts, containing images text. Typically, text in these posts is short, informal noisy, leading to ambiguities which can be resolved using images. In this paper we will explore text-centric Named Entity Recognition task on posts. We propose an end model learns a joint representation image. Our extends multi-dimensional self-attention technique, where now image helps...
Multi-modal approaches employ data from multiple input streams such as textual and visual domains. Deep neural networks have been successfully employed for these approaches. In this paper, we present a novel multi-modal approach that fuses images text descriptions to improve classification performance in real-world scenarios. The proposed embeds an encoded onto image obtain information enriched image. To learn feature representations of resulting images, standard Convolutional Neural...
In this paper, we propose a multimodal setting in real-world scenarios based on weighting and meta-learning combination methods that integrate the output probabilities obtained from text visual classifiers. While classifier built concatenation of features may worsen results, model described paper can increase classification accuracy to over 6%. Typically, or images are used classification; however, ambiguity either image reduce performance. This leads combine an object concept approach...
We study the problem of learning association between face and voice. Prior works adopt pairwise or triplet loss formulations to learn an embedding space amenable for associated matching verification tasks. Albeit showing some progress, such are restrictive due dependency on distance-dependent margin parameter, poor run-time training complexity, reliance carefully crafted negative mining procedures. In this work, we hypothesize that enriched feature representation coupled with effective yet...
Serial crystallography experiments produce massive amounts of experimental data. Yet in spite these large-scale data sets, only a small percentage the are useful for downstream analysis. Thus, it is essential to differentiate reliably between acceptable (hits) and unacceptable (misses). To this end, novel pipeline proposed categorize data, which extracts features from images, summarizes with `bag visual words' method then classifies images using machine learning. In addition, study various...
In recent years, an association is established between faces and voices of celebrities leveraging large scale audio-visual information from YouTube. The availability datasets instrumental in developing speaker recognition methods based on standard Convolutional Neural Networks. Thus, the aim this paper to leverage improve task. To achieve task, we proposed a two-branch network learn joint representations multimodal system. Afterwards, features are extracted train classifier for recognition....
With the rapid growth of social media platforms, users are sharing billions multimedia posts containing audio, images, and text. Researchers have focused on building autonomous systems capable processing such data to solve challenging multimodal tasks including cross-modal retrieval, matching, verification. Existing works use separate networks extract embeddings each modality bridge gap between them. The modular structure their branched is fundamental in creating numerous applications has...
We propose a novel deep training algorithm for joint representation of audio and visual information which consists single stream network (SSNet) coupled with loss function to learn shared latent space multimodal information. The proposed framework characterizes the by leveraging class centers helps eliminate need pairwise or triplet supervision. quantitatively qualitatively evaluate approach on VoxCeleb, benchmarks audio-visual dataset multitude tasks including cross-modal verification,...
Recent years have seen a surge in finding association between faces and voices within cross-modal biometric application along with speaker recognition. Inspired from this, we introduce challenging task establishing across multiple languages spoken by the same set of persons. The aim this paper is to answer two closely related questions: "Is face-voice language independent?" "Can be recognized irrespective language?". These questions are important understand effectiveness boost development...
Fine-grained image classification is a challenging task due to the presence of hierarchical coarse-to-fine-grained distribution in dataset. Generally, parts are used discriminate various objects fine-grained datasets, however, not all beneficial and indispensable. In recent years, natural language descriptions obtain information on discriminative object. This paper leverages description proposes strategy for learning joint representation images using two-branch network with multiple layers...
Multimodal strategies combine different input sources into a joint representation that provides enhanced information from the unimodal strategy. In this article, we present novel multimodal approach fuses image and encoded text description to obtain an information-enriched image. This casts obtained Word2Vec word embedding visual be concatenated with We employ standard convolutional neural networks learn representations of images. Finally, compare our their combination on three large-scale...
In this paper, we extended the idea of a neutrosophic triplet set to non-associative semihypergroups and define LA-semihypergroup. We discuss some basic results properties. At end, provide an application proposed structure in Football.
The current method of meter reading is manual and error-prone in developing countries. A reader logs the to calculate cost electricity. In recent years, there have been multiple efforts provide automated solutions read digits. However, existing systems extract based on a specific topology. this paper, we propose an approach Faster R-CNN recognize digits electric meter. We compared our against several state-of-the-art object detection methods. proposed robust different lightening conditions,...
Serial crystallography experiments at X-ray free-electron laser facilities produce massive amounts of data but only a fraction these are useful for downstream analysis. Thus, it is essential to differentiate between acceptable and unacceptable data, generally known as 'hit' 'miss', respectively. Image classification methods from artificial intelligence, or more specifically convolutional neural networks (CNNs), classify the into hit miss categories in order achieve reduction. The...
Voice spoofing attacks pose a significant threat to automated speaker verification systems. Existing anti-spoofing methods often simulate specific attack types, such as synthetic or replay attacks. However, in real-world scenarios, the countermeasures are unaware of generation schema attack, necessitating unified solution. Current solutions struggle detect artefacts, especially with recent mechanisms. For instance, algorithms inject spectral temporal anomalies, which challenging identify. To...
In this paper, we encode semantics of a text document in an image to take advantage the same Convolutional Neural Networks (CNNs) that have been successfully employed classification. We use Word2Vec, which is estimation word representation vector space can maintain semantic and syntactic relationships among words. Word2Vec vectors are transformed into graphical words representing sequence document. The encoded images classified by using AlexNet architecture. introduced new dataset named...
Deep metric learning plays an important role in measuring similarity through distance metrics among arbitrary group of data. MNIST dataset is typically used to measure however this has few seemingly similar classes, making it less effective for deep methods. In paper, we created a new handwritten named Urdu-Characters with set classes suitable learning. With work, compare the performance two state-of-the-art methods i.e. Siamese and Triplet network. We show that network more powerful than...
Convolutional Neural Networks (CNNs) have been widely used in computer vision tasks, such as face recognition and verification, achieved state-of-the-art results due to their ability capture discriminative deep features. Conventionally, CNNs trained with softmax supervision signal penalize the classification loss. In order further enhance capability of features, we introduce a joint signal, Git loss, which leverages on center loss functions. The aim our function is minimize intra-class...
Conversational engagement estimation is posed as a regression problem, entailing the identification of favorable attention and involvement participants in conversation. This task arises crucial pursuit to gain insights into human's interaction dynamics behavior patterns within In this research, we introduce dilated convolutional Transformer for modeling estimating human MULTIMEDIATE 2023 competition. Our proposed system surpasses baseline models, exhibiting noteworthy 7% improvement on test...