- Multimodal Machine Learning Applications
- Human Pose and Action Recognition
- Generative Adversarial Networks and Image Synthesis
- Visual Attention and Saliency Detection
- Text Readability and Simplification
- Domain Adaptation and Few-Shot Learning
- Video Analysis and Summarization
- Natural Language Processing Techniques
- Human Mobility and Location-Based Analysis
- Aesthetic Perception and Analysis
- Smart Grid Energy Management
- Topic Modeling
- Sentiment Analysis and Opinion Mining
- Scientific Research and Philosophical Inquiry
- Engineering Education and Technology
- Advanced Data Processing Techniques
- Advanced Image Processing Techniques
- Energy Load and Power Forecasting
- Digital Media Forensic Detection
- Categorization, perception, and language
- Language, Metaphor, and Cognition
King Abdullah University of Science and Technology
2021-2024
Sejong University
2020
Tashkent University of Information Technology
2006
We present a novel large-scale dataset and accompanying machine learning models aimed at providing detailed understanding of the interplay between visual content, its emotional effect, explanations for latter in language. In contrast to most existing annotation datasets computer vision, we focus on affective experience triggered by artworks ask annotators indicate dominant emotion they feel given image and, crucially, also provide grounded verbal explanation their choice. As demonstrate...
Asking insightful questions is crucial for acquiring knowledge and expanding our understanding of the world. However, importance questioning has been largely overlooked in AI research, where models have primarily developed to answer questions. With recent advancements large language (LLMs) like ChatGPT, we discover their capability ask high-quality when provided with a suitable prompt. This discovery presents new opportunity develop an automatic system. In this paper, introduce...
The exponential growth in population and their overall reliance on the usage of electrical electronic devices have increased demand for energy production. It needs precise management systems that can forecast consumers future policymaking. Embedded smart sensors attached to electricity meters home appliances enable power suppliers effectively analyze generate distribute into residential areas based level consumption. Therefore, this paper proposes a clustering-based analysis consumption...
Datasets that capture the connection between vision, language, and affection are limited, causing a lack of understanding emotional aspect human intelligence. As step in this direction, ArtEmis dataset was recently introduced as large-scale reactions to images along with language explanations these chosen emotions. We observed significant bias towards instance-rich emotions, making trained neural speakers less accurate describing under-represented show collecting new data, same way, is not...
Video captioning aims to convey dynamic scenes from videos using natural language, facilitating the understanding of spatiotemporal information within our environment. Although there have been recent advances, generating detailed and enriched video descriptions continues be a substantial challenge. In this work, we introduce ChatCaptioner, an innovative approach for creating more comprehensive descriptions. Our method employs ChatGPT model as controller, specifically designed select frames...
We present a novel large-scale dataset and accompanying machine learning models aimed at providing detailed understanding of the interplay between visual content, its emotional effect, explanations for latter in language. In contrast to most existing annotation datasets computer vision, we focus on affective experience triggered by artworks ask annotators indicate dominant emotion they feel given image and, crucially, also provide grounded verbal explanation their choice. As demonstrate...
Research in vision and language has made considerable progress thanks to benchmarks such as COCO. COCO captions focused on unambiguous facts English; ArtEmis introduced subjective emotions ArtELingo some multilinguality (Chinese Arabic). However we believe there should be more multilinguality. Hence, present ArtELingo-28, a vision-language benchmark that spans $\textbf{28}$ languages encompasses approximately $\textbf{200,000}$ annotations ($\textbf{140}$ per image). Traditionally, research...
Datasets that capture the connection between vision, language, and affection are limited, causing a lack of understanding emotional aspect human intelligence. As step in this direction, ArtEmis dataset was recently introduced as large-scale reactions to images along with language explanations these chosen emotions. We observed significant bias towards instance-rich emotions, making trained neural speakers less accurate describing under-represented show collecting new data, same way, is not...
We introduce Affective Visual Dialog, an emotion explanation and reasoning task as a testbed for research on understanding the formation of emotions in visually grounded conversations. The involves three skills: (1) Dialog-based Question Answering (2) Emotion Prediction (3) generation based dialog. Our key contribution is collection large-scale dataset, dubbed AffectVisDial, consisting 50K 10-turn dialogs well concluding attributions dialog-informed textual explanations, resulting total...
Opportunities of creation are investigated effective energy system for mobile objects communications.