- Advanced Neural Network Applications
- Domain Adaptation and Few-Shot Learning
- Anomaly Detection Techniques and Applications
- Multimodal Machine Learning Applications
- COVID-19 diagnosis using AI
- Advanced Image and Video Retrieval Techniques
- Data-Driven Disease Surveillance
- Generative Adversarial Networks and Image Synthesis
- Influenza Virus Research Studies
- Visual Attention and Saliency Detection
- Human Pose and Action Recognition
- Network Security and Intrusion Detection
- Robotics and Sensor-Based Localization
- Natural Language Processing Techniques
- Advanced Vision and Imaging
- Face recognition and analysis
- Adversarial Robustness in Machine Learning
- Mental Health Research Topics
- Advanced Image Processing Techniques
- Non-Destructive Testing Techniques
- Medical Imaging Techniques and Applications
- Stroke Rehabilitation and Recovery
- Image Enhancement Techniques
- Infrastructure Maintenance and Monitoring
- Machine Learning and ELM
Mitsubishi Electric (United States)
2020-2025
Mitsubishi Electric (Japan)
2019-2024
Siemens (United States)
2017-2019
Siemens (Germany)
2018
Cornell University
2013-2016
Weakly supervised learning with only coarse labels can obtain visual explanations of deep neural network such as attention maps by back-propagating gradients. These are then available priors for tasks object localization and semantic segmentation. In one common framework we address three shortcomings previous approaches in modeling maps: We (1) make an explicit natural component the end-to-end training first time, (2) provide self-guidance directly on these exploring supervision from itself...
Incremental learning (IL) is an important task aimed at increasing the capability of a trained model, in terms number classes recognizable by model. The key problem this requirement storing data (e.g. images) associated with existing classes, while teaching classifier to learn new classes. However, impractical as it increases memory every incremental step, which makes impossible implement IL algorithms on edge devices limited memory. Hence, we propose novel approach, called `Learning without...
This paper explores two new aspects of photos and human emotions. First, we show through psychovisual studies that different people have emotional reactions to the same image, which is a strong novel departure from previous work only records predicts single dominant emotion for each image. Our also person may multiple one Predicting emotions in "distributions" instead important many applications. Second, not can often change evoked an image by adjusting color tone texture related features...
Which parts of an image evoke emotions in observer? To answer this question, we introduce a novel problem computer vision - predicting Emotion Stimuli Map (ESM), which describes pixel-wise contribution to evoked emotions. Building new database, EmotionROI, as benchmark for the ESM, find that regions selected by saliency and objectness detection do not correctly predict emotion. Although objects represent important evoking emotion, background are also important. Based on fact, propose using...
With only coarse labels, weakly supervised learning typically uses top-down attention maps generated by back-propagating gradients as priors for tasks such object localization and semantic segmentation. While these are intuitive informative explanations of deep neural network, there is no effective mechanism to manipulate the network during process. In this paper, we address three shortcomings previous approaches in modeling one common framework. First, make a natural explicit component...
Graph Convolutional Networks (GCNs) have been widely used to model the high-order dynamic dependencies for skeleton-based action recognition. Most existing approaches do not explicitly embed spatio-temporal importance joints’ spatial connection topology and intensity, they direct objectives on their attention module jointly learn when where focus in sequence. To address these problems, we propose To-a-T Spatio-Temporal Focus (STF), a recognition framework that utilizes gradient relevant...
Most cross-domain unsupervised Video Anomaly Detection (VAD) works assume that at least few task-relevant target domain training data are available for adaptation from the source to domain. However, this requires laborious model-tuning by end-user who may prefer have a system "out-of-the-box." To address such practical scenarios, we identify novel (inference-time) VAD task where no available. end, propose new 'Zero-shot Cross-domain (zxVAD)' framework includes future-frame prediction...
Weakly supervised learning with only coarse labels can obtain visual explanations of deep neural network such as attention maps by back-propagating gradients. These are then available priors for tasks object localization and semantic segmentation. In one common framework we address three shortcomings previous approaches in modeling maps: We (1) first time make an explicit natural component the end-to-end training, (2) provide self-guidance directly on these exploring supervision form itself...
Recent times have witnessed an increasing number of applications deep neural networks towards solving tasks that require superior cognitive abilities, e.g., playing Go, generating art, question answering (e.g., ChatGPT), etc. Such a dramatic progress raises the question: how generalizable are in problems demand broad skills? To answer this question, we propose SMART: Simple Multimodal Algorithmic Reasoning Task and associated SMART-101 dataset <sup...
Recent works about convolutional neural networks (CNN) show breakthrough performance on various tasks. However, most of them only use the features extracted from topmost layer CNN instead leveraging different layers. As first group which explicitly addresses utilizing layers CNN, we propose cross-layer consist multiple CNN. Our experimental results that our proposed outperform not state-of-the-art but also commonly used in traditional framework three tasks - artistic style, artist, and...
Recent developments in gradient-based attention modeling have seen maps emerge as a powerful tool for interpreting convolutional neural networks. Despite good localization an individual class of interest, these techniques produce with substantially overlapping responses among different classes, leading to the problem visual confusion and need discriminative attention. In this paper, we address by means new framework that makes class-discriminative principled part learning process. Our key...
Most works related to convolutional neural networks (CNN) use the traditional CNN framework which extracts features in only one scale. We propose multi-scale (MSCNN) can not extract but also solve issues of previous methods features. With assumption label-inheritable (LI) property, we a method generate exponentially more training examples for MSCNN from given set. Our experimental results show that outperforms both state-of-the-art and on artist, artistic style, architectural style...
Most works using convolutional neural networks (CNN) show the efficacy of their methods in standard object recognition tasks, but not abstract tasks such as emotion classification and memorability prediction, which are a subject increasing importance (especially machines become more autonomous, there is need for semantic understanding). To verify whether CNN-based effective we select 8 different computer vision, evaluating performance 5 training approaches these tasks. We that outperform...
Domain Adaptation aims at adapting the knowledge learned from a domain (source-domain) to another (target-domain). Existing approaches typically require portion of task-relevant target-domain data priori. We propose an approach, zero-shot deep adaptation (ZDDA), which uses paired dual-domain task-irrelevant eliminate need for training data. ZDDA learns generate common representations source and target domains Then, either representation is used later train system that works on both or having...
Most works about affective image classification in computer vision treat each emotion category independently and predict hard labels, ignoring the correlation between categories. In this work, inspired by psychological theories, we adopt a dimensional model to among certain We also propose framework of changing using our predictor. Easily extendable other feature transformations, changes color histogram specification, relaxing limitation previous method that is associated with monotonic...
Recent times have witnessed an increasing number of applications deep neural networks towards solving tasks that require superior cognitive abilities, e.g., playing Go, generating art, ChatGPT, etc. Such a dramatic progress raises the question: how generalizable are in problems demand broad skills? To answer this question, we propose SMART: Simple Multimodal Algorithmic Reasoning Task and associated SMART-101 dataset, for evaluating abstraction, deduction, generalization abilities...
Compositionality of semantic concepts in image synthesis and analysis is appealing as it can help decomposing known generatively recomposing unknown data. For instance, we may learn changing illumination, geometry or albedo a scene, try to recombine them generate physically meaningful, but unseen data for training testing. In practice however often do not have samples from the joint concept space available: We on illumination change one set geometric another without complete overlap. pose...