Takayuki Okatani

ORCID: 0000-0001-9222-763X
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Advanced Vision and Imaging
  • Robotics and Sensor-Based Localization
  • Advanced Image and Video Retrieval Techniques
  • Advanced Image Processing Techniques
  • Multimodal Machine Learning Applications
  • Domain Adaptation and Few-Shot Learning
  • Image Processing Techniques and Applications
  • Anomaly Detection Techniques and Applications
  • Advanced Neural Network Applications
  • Video Surveillance and Tracking Methods
  • Generative Adversarial Networks and Image Synthesis
  • Human Pose and Action Recognition
  • Computer Graphics and Visualization Techniques
  • Image Enhancement Techniques
  • Optical measurement and interference techniques
  • Image and Signal Denoising Methods
  • 3D Shape Modeling and Analysis
  • Image Retrieval and Classification Techniques
  • Image and Object Detection Techniques
  • Remote Sensing and LiDAR Applications
  • Video Analysis and Summarization
  • Neural Networks and Applications
  • COVID-19 diagnosis using AI
  • Infrastructure Maintenance and Monitoring
  • Industrial Vision Systems and Defect Detection

Tohoku University
2016-2025

RIKEN Center for Advanced Intelligence Project
2017-2024

RIKEN
2021-2022

Tohoku University Hospital
2017

Japan Science and Technology Agency
2015

The University of Tokyo
1996-2002

Bunkyo University
2002

This paper considers the problem of single image depth estimation. The employment convolutional neural networks (CNNs) has recently brought about significant advancements in research this problem. However, most existing methods suffer from loss spatial resolution estimated maps; a typical symptom is distorted and blurry reconstruction object boundaries. In paper, toward more accurate estimation with focus on maps higher resolution, we propose two improvements to approaches. One strategy...

10.1109/wacv.2019.00116 preprint EN 2019-01-01

A key solution to visual question answering (VQA) exists in how fuse and language features extracted from an input image question. We show that attention mechanism enables dense, bi-directional interactions between the two modalities contributes boost accuracy of prediction answers. Specifically, we present a simple architecture is fully symmetric representations, which each word attends on regions region words. It can be stacked form hierarchy for multi-step image-question pair. through...

10.1109/cvpr.2018.00637 article EN 2018-06-01

In this paper, we study design of deep neural networks for tasks image restoration. We propose a novel style residual connections dubbed "dual connection", which exploits the potential paired operations, e.g., up- and down-sampling or convolution with large- small-size kernels. modular block implementing connection style; it is equipped two containers to arbitrary operations are inserted. Adopting "unraveled" view proposed by Veit et al., point out that stack blocks allows first operation in...

10.1109/cvpr.2019.00717 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2019-06-01

Many studies have been conducted so far on image restoration, the problem of restoring a clean from its distorted version. There are many different types distortion affecting quality. Previous focused single distortion, proposing methods for removing them. However, quality degrades due to multiple factors in real world. Thus, depending applications, e.g., vision autonomous cars or surveillance cameras, we need be able deal with combined distortions unknown mixture ratios. For this purpose,...

10.1109/cvpr.2019.00925 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2019-06-01

Recently, convolutional neural networks (CNNs) have shown great success on the task of monocular depth estimation. A fundamental yet unanswered question is: how CNNs can infer from a single image. Toward answering this question, we consider visualization inference CNN by identifying relevant pixels an input image to We formulate it as optimization problem smallest number which estimate map with minimum difference entire To cope difficulty through deep CNN, propose use another network predict...

10.1109/iccv.2019.00397 article EN 2021 IEEE/CVF International Conference on Computer Vision (ICCV) 2019-10-01

This paper presents a framework for estimating the cause of damage to bridge members by combining Structure from Motion (SfM) and Visual Question Answering (VQA) techniques. A VQA model was developed that uses images dataset creation outputs or member name its existence based on questions. In model, correct answer rate questions requiring were 67.4 68.9%, respectively. The yes/no 99.1%. Based estimation method proposed. proposed method, causes are narrowed down inputting new which determined...

10.1080/15732479.2024.2355929 article EN cc-by Structure and Infrastructure Engineering 2024-05-20

This paper examines numerical algorithms for factorization of a low-rank matrix with missing components. We first propose new method that incorporates damping factor into the Wiberg to solve problem. The is characterized by way it constrains ambiguity factorization, which helps improve both global convergence ability and local speed. then present experimental comparisons latest methods used No comprehensive comparison have been proposed recently has yet reported in literature. In our...

10.1109/iccv.2011.6126324 article EN International Conference on Computer Vision 2011-11-01

This paper studies clothing and attribute recognition in the fashion domain. Specifically, this paper, we turn our attention to compatibility of items attributes (Fig 1). For example, people do not wear a skirt dress at same time, yet jacket shirt are preferred combination. We consider such inter-object or inter-attribute formulate Conditional Random Field (CRF) that seeks most probable combination given picture. The model takes into account location-specific appearance with respect human...

10.5244/c.29.51 article EN 2015-01-01

It is still challenging to build an AI system that can perform tasks involve vision and language at human level. So far, researchers have singled out individual separately, for each of which they designed networks trained them on its dedicated datasets. Although this approach has seen a certain degree success, it comes with difficulties understanding relations among different transferring the knowledge learned task others. We propose multi-task learning enables learn vision-language...

10.1109/cvpr.2019.01074 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2019-06-01

Abstract Continuum robots can enter narrow spaces and are useful for search rescue missions in disaster sites. The exploration efficiency at sites improves if the simultaneously acquire several pieces of information. However, a continuum robot that information to such an extent has not yet been designed. This is because attaching multiple sensors without compromising its body flexibility challenging. In this study, we installed small distributed manner develop continuum-robot system with...

10.1186/s40648-022-00223-x article EN cc-by ROBOMECH Journal 2022-03-21

This paper proposes a method for detecting temporal changes of the three-dimensional structure an outdoor scene from its multi-view images captured at two separate times. For images, we consider those by camera mounted on vehicle running in city street. The estimates structures probabilistically, not deterministically, and based their estimates, it evaluates probability structural scene, where inputs are similarity local image patches among images. aim probabilistic treatment is to maximize...

10.1109/cvpr.2013.25 article EN 2009 IEEE Conference on Computer Vision and Pattern Recognition 2013-06-01

10.1016/j.cviu.2015.09.001 article EN Computer Vision and Image Understanding 2015-09-14

Research on unsupervised anomaly detection (AD) has recently progressed, significantly increasing accuracy. This paper focuses texture images and considers how few normal samples are needed for accurate AD. We first highlight the critical nature of problem that previous studies have overlooked: gets harder anisotropic textures when image orientations not aligned between inputs samples. then propose a zero-shot method, which detects anomalies without using sample. The method is free from...

10.1109/wacv56688.2023.00552 article EN 2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) 2023-01-01

10.1109/wacv61041.2025.00273 article EN 2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) 2025-02-26

10.1109/wacv61041.2025.00453 article EN 2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) 2025-02-26

We propose action-agnostic point-level (AAPL) supervision for temporal action detection to achieve accurate instance with a lightly annotated dataset. In the proposed scheme, small portion of video frames is sampled in an unsupervised manner and presented human annotators, who then label categories. Unlike supervision, which requires annotators search every untrimmed video, annotate are selected without intervention AAPL supervision. also model learning method effectively utilize labels....

10.1609/aaai.v39i9.33037 article EN Proceedings of the AAAI Conference on Artificial Intelligence 2025-04-11
Coming Soon ...