Kiyoharu Aizawa

ORCID: 0000-0003-2146-6275
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Advanced Vision and Imaging
  • Advanced Image and Video Retrieval Techniques
  • Video Analysis and Summarization
  • Image Retrieval and Classification Techniques
  • Image Processing Techniques and Applications
  • CCD and CMOS Imaging Sensors
  • Advanced Image Processing Techniques
  • Computer Graphics and Visualization Techniques
  • Video Surveillance and Tracking Methods
  • Advanced Data Compression Techniques
  • Infrared Target Detection Methodologies
  • Nutritional Studies and Diet
  • Image Enhancement Techniques
  • 3D Shape Modeling and Analysis
  • Human Pose and Action Recognition
  • Image and Signal Denoising Methods
  • Multimodal Machine Learning Applications
  • Handwritten Text Recognition Techniques
  • Robotics and Sensor-Based Localization
  • Visual Attention and Saliency Detection
  • Human Motion and Animation
  • Music and Audio Processing
  • Video Coding and Compression Technologies
  • Face recognition and analysis
  • Advanced Chemical Sensor Technologies

The University of Tokyo
2016-2025

Bunkyo University
2002-2025

Tokyo University of Information Sciences
2014-2024

Universidad Europea
2024

University of Tokyo Hospital
2023

Hitachi (Japan)
2020

University of Liverpool
2015

National Institute of Informatics
2015

Ube Frontier University
2002-2008

Shinshu University
2005

Deep neural networks (DNNs) trained on large-scale datasets have exhibited significant performance in image classification. Many are collected from websites, however they tend to contain inaccurate labels that termed as noisy labels. Training such labeled causes degradation because DNNs easily overfit To overcome this problem, we propose a joint optimization framework of learning DNN parameters and estimating true Our can correct during training by alternating update network We conduct...

10.1109/cvpr.2018.00582 article EN 2018-06-01

Can we detect common objects in a variety of image domains without instance-level annotations? In this paper, present framework for novel task, cross-domain weakly supervised object detection, which addresses question. For have access to images with annotations source domain (e.g., natural image) and image-level target watercolor). addition, the classes be detected are all or subset those domain. Starting from fully detector, is pre-trained on domain, propose two-step progressive adaptation...

10.1109/cvpr.2018.00525 preprint EN 2018-06-01

In this paper, we apply a convolutional neural network (CNN) to the tasks of detecting and recognizing food images. Because wide diversity types food, image recognition items is generally very difficult. However, deep learning has been shown recently be powerful technique, CNN state-of-the-art approach learning. We applied detection through parameter optimization. constructed dataset most frequent in publicly available food-logging system, used it evaluate performance. showed significantly...

10.1145/2647868.2654970 article EN 2014-11-03

The paper gives an overview of model-based approaches applied to image coding, by looking at source models. In these schemes, which are different from the various conventional waveform coding methods, 3-D properties scenes taken into consideration. They can achieve very low bit rate transmission. 2-D model and based explained. Among them, a method using facial utilizing deformable triangular patches described. Works related images some remaining problems also described.< <ETX...

10.1109/5.364463 article EN Proceedings of the IEEE 1995-01-01

This paper presents a robust photometric stereo method that effectively compensates for various non-Lambertian corruptions such as specularities, shadows, and image noise. We construct constrained sparse regression problem enforces both Lambertian, rank-3 structure sparse, additive corruptions. A solution is derived using hierarchical Bayesian approximation to accurately estimate the surface normals while simultaneously separating Extensive evaluations are performed show state-of-the-art...

10.1109/cvpr.2012.6247691 article EN 2009 IEEE Conference on Computer Vision and Pattern Recognition 2012-06-01

We have created Manga109, a dataset of variety 109 Japanese comic books publicly available for use academic purposes. This provides numerous images but lacks the annotations elements in comics that are necessary by machine learning algorithms or evaluation methods. In this paper, we present our ongoing project to build metadata Manga109. first define terms frames, texts and characters. then web-based software efficiently creating ground truth these images. addition, provide guideline...

10.1145/3011549.3011551 article EN 2016-12-04

We have investigated the "FoodLog" multimedia food-recording tool, whereby users upload photographs of their meals and a food diary is constructed using image-processing functions such as food-image detection food-balance estimation. In this paper, following brief introduction to FoodLog, we propose Bayesian framework that makes use personal dietary tendencies improve both The facilitates incremental learning. It incorporates three influence analysis: likelihood, prior distribution, mealtime...

10.1109/tmm.2013.2271474 article EN IEEE Transactions on Multimedia 2013-06-27

Since deep learning models have been implemented in many commercial applications, it is important to detect out-of-distribution (OOD) inputs correctly maintain the performance of models, ensure quality collected data, and prevent applications from being used for other-than-intended purposes. In this work, we propose a two-head convolutional neural network (CNN) maximize discrepancy between two classifiers OOD inputs. We train CNN consisting one common feature extractor which different...

10.1109/iccv.2019.00961 article EN 2021 IEEE/CVF International Conference on Computer Vision (ICCV) 2019-10-01

In this work, travel destinations and business locations are taken as venues. Discovering a venue by photograph is very important for visual context-aware applications. Unfortunately, few efforts paid attention to complicated real images such photographs generated users. Our goal fine-grained discovery from heterogeneous social multimodal data. To end, we propose novel deep learning model, category-based canonical correlation analysis. Given input, model performs: 1) exact search (find the...

10.1109/tnnls.2018.2856253 article EN IEEE Transactions on Neural Networks and Learning Systems 2018-08-10

This paper presents a photometric stereo method that is purely pixelwise and handles general isotropic surfaces in stable manner. Following the recently proposed sum-of-lobes representation of reflectance function, we constructed constrained bivariate regression problem where function approximated by smooth, Bernstein polynomials. The unknown normal vector was separated from considering inverse image formation process, then could accurately compute surface normals solving simple efficient...

10.1109/cvpr.2014.280 article EN 2009 IEEE Conference on Computer Vision and Pattern Recognition 2014-06-01

Currently, food image recognition tasks are evaluated against fixed datasets. However, in real-world conditions, there cases which the number of samples each class continues to increase and from novel classes appear. In particular, dynamic datasets individual user creates updating process often have content that varies considerably between different users, per person is very limited. A single classifier common all users cannot handle such data. Bridging gap laboratory environment real world...

10.1109/tmm.2018.2814339 article EN IEEE Transactions on Multimedia 2018-03-15

Manga, or comics, which are a type of multimodal artwork, have been left behind in the recent trend deep learning applications because lack proper dataset. Hence, we built Manga109, dataset consisting variety 109 Japanese comic books (94 authors and 21 142 pages) made it publicly available by obtaining author permissions for academic use. We carefully annotated frames, speech texts, character faces, bodies; total number annotations exceeds 500 k. This provides numerous manga images...

10.1109/mmul.2020.2987895 article EN IEEE Multimedia 2020-04-01

Scene text recognition (STR) task has a common practice: All state-of-the-art STR models are trained on large synthetic data. In contrast to this practice, training only fewer real labels (STR with labels) is important when we have train without data: for handwritten or artistic texts that difficult generate synthetically and languages other than English which do not always However, there been implicit knowledge data nearly impossible because insufficient. We consider obstructed the study of...

10.1109/cvpr46437.2021.00313 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2021-06-01

This paper proposes new methods for analyzing image sequences and updating textures of the three-dimensional (3-D) facial model. It also describes a method synthesizing various expressions. These three are key technologies model-based coding system. The input analysis technique directly robustly estimates 3-D head motions expressions without any two-dimensional (2-D) entity correspondences. resolves 2-D correspondence mismatch errors provides quality reproduction original images by fully...

10.1109/76.305871 article EN IEEE Transactions on Circuits and Systems for Video Technology 1994-06-01

In this paper, we present continuous capture of our life log with various sensors plus additional data and propose effective retrieval methods using context content. Our system contains video, audio, acceleration sensor, gyro, GPS, annotations, documents, web pages, emails. previous studies, showed methodology [8], [9], which mainly depends on information from sensor data. extend functions. They are (1) spatio-temporal sampling for extraction key frames summarization; (2) conversation scene...

10.1145/1026653.1026656 article EN 2004-10-15

In this paper, we propose a novel method that combines monocular visual simultaneous localization and mapping (vSLAM) deep-learning-based semantic segmentation. For stable operation, vSLAM requires feature points on static objects. conventional vSLAM, random sample consensus (RANSAC) [5] is used to select those points. However, if major portion of the view occupied by moving objects, many become inappropriate RANSAC does not perform well. Based our empirical studies, in sky cars often cause...

10.1109/cvprw.2018.00063 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) 2018-06-01

FoodLog is a multimedia food-recording tool that offers novel method for recording daily food intake primarily healthcare purposes. Its use of image-processing techniques presents significant potential the development new monitoring apps.

10.1109/mmul.2015.39 article EN IEEE Multimedia 2015-04-01

This work presents methods to automatically find optimal parameter settings for convolutional neural networks (CNNs) by using an evolutionary algorithm called particle swarm optimization (PSO). Even though the space is extremely large (> 10 <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">20</sup> ), we experimentally show that a better setting can be found Alexnet configuration five different image datasets. We have also developed two candidate...

10.1109/bigmm.2017.69 article EN 2017-04-01

We introduce Spatio-Temporal Vector of Locally Max Pooled Features (ST-VLMPF), a super vector-based encoding method specifically designed for local deep features encoding. The proposed addresses an important problem video understanding: how to build representation that incorporates the CNN over entire video. Feature assignment is carried out at two levels, by using similarity and spatio-temporal information. For each we specific encoding, focused on nature features, with goal capture highest...

10.1109/cvpr.2017.341 article EN 2017-07-01

End-to-end distance metric learning (DML) has been applied to obtain features useful in many computer vision tasks. However, these DML studies have not provided equitable comparisons between extracted from DML-based networks and softmax-based networks. In this paper, we present objective two approaches under the same network architecture.

10.1109/tpami.2019.2911075 article EN cc-by IEEE Transactions on Pattern Analysis and Machine Intelligence 2019-01-01

The Japanese comic format known as Manga is popular all over the world. It traditionally produced in black and white, colorization time consuming costly. Automatic methods generally rely on greyscale values, which are not present manga. Furthermore, due to copyright protection, colorized manga available for training scarce. We propose a method based conditional Generative Adversarial Networks (cGAN). Unlike previous cGAN approaches that use many hundreds or thousands of images, our requires...

10.1109/icdar.2017.295 article EN 2017-11-01

Weakly supervised object detection (WSOD), where a detector is trained with only image-level annotations, attracting more and attention. As method to obtain well-performing detector, the instance labels are updated iteratively. In this study, for efficient iterative updating, we focus on labeling problem, problem of which label should be annotated each region based last localization result. Instead simply top-scoring its highly overlapping regions as positive others negative, propose...

10.1109/iccv.2019.00616 article EN 2021 IEEE/CVF International Conference on Computer Vision (ICCV) 2019-10-01
Coming Soon ...