- Advanced Vision and Imaging
- Robotics and Sensor-Based Localization
- Advanced Image and Video Retrieval Techniques
- Advanced Image Processing Techniques
- Multimodal Machine Learning Applications
- Domain Adaptation and Few-Shot Learning
- Image Processing Techniques and Applications
- Anomaly Detection Techniques and Applications
- Advanced Neural Network Applications
- Video Surveillance and Tracking Methods
- Generative Adversarial Networks and Image Synthesis
- Human Pose and Action Recognition
- Computer Graphics and Visualization Techniques
- Image Enhancement Techniques
- Optical measurement and interference techniques
- Image and Signal Denoising Methods
- 3D Shape Modeling and Analysis
- Image Retrieval and Classification Techniques
- Image and Object Detection Techniques
- Remote Sensing and LiDAR Applications
- Video Analysis and Summarization
- Neural Networks and Applications
- COVID-19 diagnosis using AI
- Infrastructure Maintenance and Monitoring
- Industrial Vision Systems and Defect Detection
Tohoku University
2016-2025
RIKEN Center for Advanced Intelligence Project
2017-2024
RIKEN
2021-2022
Tohoku University Hospital
2017
Japan Science and Technology Agency
2015
The University of Tokyo
1996-2002
Bunkyo University
2002
This paper considers the problem of single image depth estimation. The employment convolutional neural networks (CNNs) has recently brought about significant advancements in research this problem. However, most existing methods suffer from loss spatial resolution estimated maps; a typical symptom is distorted and blurry reconstruction object boundaries. In paper, toward more accurate estimation with focus on maps higher resolution, we propose two improvements to approaches. One strategy...
A key solution to visual question answering (VQA) exists in how fuse and language features extracted from an input image question. We show that attention mechanism enables dense, bi-directional interactions between the two modalities contributes boost accuracy of prediction answers. Specifically, we present a simple architecture is fully symmetric representations, which each word attends on regions region words. It can be stacked form hierarchy for multi-step image-question pair. through...
In this paper, we study design of deep neural networks for tasks image restoration. We propose a novel style residual connections dubbed "dual connection", which exploits the potential paired operations, e.g., up- and down-sampling or convolution with large- small-size kernels. modular block implementing connection style; it is equipped two containers to arbitrary operations are inserted. Adopting "unraveled" view proposed by Veit et al., point out that stack blocks allows first operation in...
Many studies have been conducted so far on image restoration, the problem of restoring a clean from its distorted version. There are many different types distortion affecting quality. Previous focused single distortion, proposing methods for removing them. However, quality degrades due to multiple factors in real world. Thus, depending applications, e.g., vision autonomous cars or surveillance cameras, we need be able deal with combined distortions unknown mixture ratios. For this purpose,...
Recently, convolutional neural networks (CNNs) have shown great success on the task of monocular depth estimation. A fundamental yet unanswered question is: how CNNs can infer from a single image. Toward answering this question, we consider visualization inference CNN by identifying relevant pixels an input image to We formulate it as optimization problem smallest number which estimate map with minimum difference entire To cope difficulty through deep CNN, propose use another network predict...
This paper presents a framework for estimating the cause of damage to bridge members by combining Structure from Motion (SfM) and Visual Question Answering (VQA) techniques. A VQA model was developed that uses images dataset creation outputs or member name its existence based on questions. In model, correct answer rate questions requiring were 67.4 68.9%, respectively. The yes/no 99.1%. Based estimation method proposed. proposed method, causes are narrowed down inputting new which determined...
This paper examines numerical algorithms for factorization of a low-rank matrix with missing components. We first propose new method that incorporates damping factor into the Wiberg to solve problem. The is characterized by way it constrains ambiguity factorization, which helps improve both global convergence ability and local speed. then present experimental comparisons latest methods used No comprehensive comparison have been proposed recently has yet reported in literature. In our...
This paper studies clothing and attribute recognition in the fashion domain. Specifically, this paper, we turn our attention to compatibility of items attributes (Fig 1). For example, people do not wear a skirt dress at same time, yet jacket shirt are preferred combination. We consider such inter-object or inter-attribute formulate Conditional Random Field (CRF) that seeks most probable combination given picture. The model takes into account location-specific appearance with respect human...
It is still challenging to build an AI system that can perform tasks involve vision and language at human level. So far, researchers have singled out individual separately, for each of which they designed networks trained them on its dedicated datasets. Although this approach has seen a certain degree success, it comes with difficulties understanding relations among different transferring the knowledge learned task others. We propose multi-task learning enables learn vision-language...
Abstract Continuum robots can enter narrow spaces and are useful for search rescue missions in disaster sites. The exploration efficiency at sites improves if the simultaneously acquire several pieces of information. However, a continuum robot that information to such an extent has not yet been designed. This is because attaching multiple sensors without compromising its body flexibility challenging. In this study, we installed small distributed manner develop continuum-robot system with...
This paper proposes a method for detecting temporal changes of the three-dimensional structure an outdoor scene from its multi-view images captured at two separate times. For images, we consider those by camera mounted on vehicle running in city street. The estimates structures probabilistically, not deterministically, and based their estimates, it evaluates probability structural scene, where inputs are similarity local image patches among images. aim probabilistic treatment is to maximize...
Research on unsupervised anomaly detection (AD) has recently progressed, significantly increasing accuracy. This paper focuses texture images and considers how few normal samples are needed for accurate AD. We first highlight the critical nature of problem that previous studies have overlooked: gets harder anisotropic textures when image orientations not aligned between inputs samples. then propose a zero-shot method, which detects anomalies without using sample. The method is free from...
We propose action-agnostic point-level (AAPL) supervision for temporal action detection to achieve accurate instance with a lightly annotated dataset. In the proposed scheme, small portion of video frames is sampled in an unsupervised manner and presented human annotators, who then label categories. Unlike supervision, which requires annotators search every untrimmed video, annotate are selected without intervention AAPL supervision. also model learning method effectively utilize labels....