NFDI4DS | UHH-SEMS - Publication Details

Kiyoharu Aizawa

ORCID: 0000-0003-2146-6275

Publications

Citations

Views

---

Saved

---

About

Contact & Profiles

A5069982192

Research Areas

Advanced Vision and Imaging
Advanced Image and Video Retrieval Techniques
Video Analysis and Summarization
Image Retrieval and Classification Techniques
Image Processing Techniques and Applications
CCD and CMOS Imaging Sensors
Advanced Image Processing Techniques
Computer Graphics and Visualization Techniques
Video Surveillance and Tracking Methods
Advanced Data Compression Techniques
Infrared Target Detection Methodologies
Nutritional Studies and Diet
Image Enhancement Techniques
3D Shape Modeling and Analysis
Human Pose and Action Recognition
Image and Signal Denoising Methods
Multimodal Machine Learning Applications
Handwritten Text Recognition Techniques
Robotics and Sensor-Based Localization
Visual Attention and Saliency Detection
Human Motion and Animation
Music and Audio Processing
Video Coding and Compression Technologies
Face recognition and analysis
Advanced Chemical Sensor Technologies

The University of Tokyo
2016-2025

Bunkyo University
2002-2025

Tokyo University of Information Sciences
2014-2024

Universidad Europea
2024

University of Tokyo Hospital
2023

Hitachi (Japan)
2020

University of Liverpool
2015

National Institute of Informatics
2015

Ube Frontier University
2002-2008

Shinshu University
2005

Joint Optimization Framework for Learning with Noisy Labels

OPENALEX - Publications

Daiki Tanaka Daiki Ikami Toshihiko Yamasaki Kiyoharu Aizawa

Deep neural networks (DNNs) trained on large-scale datasets have exhibited significant performance in image classification. Many are collected from websites, however they tend to contain inaccurate labels that termed as noisy labels. Training such labeled causes degradation because DNNs easily overfit To overcome this problem, we propose a joint optimization framework of learning DNN parameters and estimating true Our can correct during training by alternating update network We conduct...

10.1109/cvpr.2018.00582 article EN 2018-06-01

Cross-Domain Weakly-Supervised Object Detection Through Progressive Domain Adaptation

OPENALEX - Publications

Naoto Inoue Ryosuke Furuta Toshihiko Yamasaki Kiyoharu Aizawa

Can we detect common objects in a variety of image domains without instance-level annotations? In this paper, present framework for novel task, cross-domain weakly supervised object detection, which addresses question. For have access to images with annotations source domain (e.g., natural image) and image-level target watercolor). addition, the classes be detected are all or subset those domain. Starting from fully detector, is pre-trained on domain, propose two-step progressive adaptation...

10.1109/cvpr.2018.00525 preprint EN 2018-06-01

Food Detection and Recognition Using Convolutional Neural Network

OPENALEX - Publications

Hokuto Kagaya Kiyoharu Aizawa Makoto Ogawa

In this paper, we apply a convolutional neural network (CNN) to the tasks of detecting and recognizing food images. Because wide diversity types food, image recognition items is generally very difficult. However, deep learning has been shown recently be powerful technique, CNN state-of-the-art approach learning. We applied detection through parameter optimization. constructed dataset most frequent in publicly available food-logging system, used it evaluate performance. showed significantly...

10.1145/2647868.2654970 article EN 2014-11-03

Model-based analysis synthesis image coding (MBASIC) system for a person's face

OPENALEX - Publications

Kiyoharu Aizawa Hiroshi Harashima Toshikuni Saito

10.1016/0923-5965(89)90006-4 article EN Signal Processing Image Communication 1989-10-01

Model-based image coding advanced video coding techniques for very low bit-rate applications

OPENALEX - Publications

Kiyoharu Aizawa Thomas S. Huang

The paper gives an overview of model-based approaches applied to image coding, by looking at source models. In these schemes, which are different from the various conventional waveform coding methods, 3-D properties scenes taken into consideration. They can achieve very low bit rate transmission. 2-D model and based explained. Among them, a method using facial utilizing deformable triangular patches described. Works related images some remaining problems also described.< <ETX...

10.1109/5.364463 article EN Proceedings of the IEEE 1995-01-01

Robust photometric stereo using sparse regression

OPENALEX - Publications

Satoshi Ikehata David Wipf Yasuyuki Matsushita Kiyoharu Aizawa

This paper presents a robust photometric stereo method that effectively compensates for various non-Lambertian corruptions such as specularities, shadows, and image noise. We construct constrained sparse regression problem enforces both Lambertian, rank-3 structure sparse, additive corruptions. A solution is derived using hierarchical Bayesian approximation to accurately estimate the surface normals while simultaneously separating Extensive evaluations are performed show state-of-the-art...

10.1109/cvpr.2012.6247691 article EN 2009 IEEE Conference on Computer Vision and Pattern Recognition 2012-06-01

Manga109 dataset and creation of metadata

OPENALEX - Publications

Azuma Fujimoto Toru Ogawa Yamamoto Kazuyoshi Yusuke Matsui Toshihiko Yamasaki and 1 more

We have created Manga109, a dataset of variety 109 Japanese comic books publicly available for use academic purposes. This provides numerous images but lacks the annotations elements in comics that are necessary by machine learning algorithms or evaluation methods. In this paper, we present our ongoing project to build metadata Manga109. first define terms frames, texts and characters. then web-based software efficiently creating ground truth these images. addition, provide guideline...

10.1145/3011549.3011551 article EN 2016-12-04

Food Balance Estimation by Using Personal Dietary Tendencies in a Multimedia Food Log

OPENALEX - Publications

Kiyoharu Aizawa Yuto Maruyama Li He Chamin Morikawa

We have investigated the "FoodLog" multimedia food-recording tool, whereby users upload photographs of their meals and a food diary is constructed using image-processing functions such as food-image detection food-balance estimation. In this paper, following brief introduction to FoodLog, we propose Bayesian framework that makes use personal dietary tendencies improve both The facilitates incremental learning. It incorporates three influence analysis: likelihood, prior distribution, mealtime...

10.1109/tmm.2013.2271474 article EN IEEE Transactions on Multimedia 2013-06-27

Unsupervised Out-of-Distribution Detection by Maximum Classifier Discrepancy

OPENALEX - Publications

Qing Yu Kiyoharu Aizawa

Since deep learning models have been implemented in many commercial applications, it is important to detect out-of-distribution (OOD) inputs correctly maintain the performance of models, ensure quality collected data, and prevent applications from being used for other-than-intended purposes. In this work, we propose a two-head convolutional neural network (CNN) maximize discrepancy between two classifiers OOD inputs. We train CNN consisting one common feature extractor which different...

10.1109/iccv.2019.00961 article EN 2021 IEEE/CVF International Conference on Computer Vision (ICCV) 2019-10-01

Category-Based Deep CCA for Fine-Grained Venue Discovery From Multimodal Data

OPENALEX - Publications

Yi Yu Suhua Tang Kiyoharu Aizawa Akiko Aizawa

In this work, travel destinations and business locations are taken as venues. Discovering a venue by photograph is very important for visual context-aware applications. Unfortunately, few efforts paid attention to complicated real images such photographs generated users. Our goal fine-grained discovery from heterogeneous social multimodal data. To end, we propose novel deep learning model, category-based canonical correlation analysis. Given input, model performs: 1) exact search (find the...

10.1109/tnnls.2018.2856253 article EN IEEE Transactions on Neural Networks and Learning Systems 2018-08-10

Photometric Stereo Using Constrained Bivariate Regression for General Isotropic Surfaces

OPENALEX - Publications

Satoshi Ikehata Kiyoharu Aizawa

This paper presents a photometric stereo method that is purely pixelwise and handles general isotropic surfaces in stable manner. Following the recently proposed sum-of-lobes representation of reflectance function, we constructed constrained bivariate regression problem where function approximated by smooth, Bernstein polynomials. The unknown normal vector was separated from considering inverse image formation process, then could accurately compute surface normals solving simple efficient...

10.1109/cvpr.2014.280 article EN 2009 IEEE Conference on Computer Vision and Pattern Recognition 2014-06-01

Personalized Classifier for Food Image Recognition

OPENALEX - Publications

Shota Horiguchi Sosuke Amano Makoto Ogawa Kiyoharu Aizawa

Currently, food image recognition tasks are evaluated against fixed datasets. However, in real-world conditions, there cases which the number of samples each class continues to increase and from novel classes appear. In particular, dynamic datasets individual user creates updating process often have content that varies considerably between different users, per person is very limited. A single classifier common all users cannot handle such data. Bridging gap laboratory environment real world...

10.1109/tmm.2018.2814339 article EN IEEE Transactions on Multimedia 2018-03-15

Building a Manga Dataset “Manga109” With Annotations for Multimedia Applications

OPENALEX - Publications

Kiyoharu Aizawa Azuma Fujimoto Atsushi Otsubo Toru Ogawa Yusuke Matsui and 2 more

Manga, or comics, which are a type of multimodal artwork, have been left behind in the recent trend deep learning applications because lack proper dataset. Hence, we built Manga109, dataset consisting variety 109 Japanese comic books (94 authors and 21 142 pages) made it publicly available by obtaining author permissions for academic use. We carefully annotated frames, speech texts, character faces, bodies; total number annotations exceeds 500 k. This provides numerous manga images...

10.1109/mmul.2020.2987895 article EN IEEE Multimedia 2020-04-01

What If We Only Use Real Datasets for Scene Text Recognition? Toward Scene Text Recognition With Fewer Labels

OPENALEX - Publications

Jeonghun Baek Yusuke Matsui Kiyoharu Aizawa

Scene text recognition (STR) task has a common practice: All state-of-the-art STR models are trained on large synthetic data. In contrast to this practice, training only fewer real labels (STR with labels) is important when we have train without data: for handwritten or artistic texts that difficult generate synthetically and languages other than English which do not always However, there been implicit knowledge data nearly impossible because insufficient. We consider obstructed the study of...

10.1109/cvpr46437.2021.00313 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2021-06-01

Analysis and synthesis of facial image sequences in model-based image coding

OPENALEX - Publications

Chang Seek Choi Kiyoharu Aizawa Hiroshi Harashima Tsuyoshi Takebe

This paper proposes new methods for analyzing image sequences and updating textures of the three-dimensional (3-D) facial model. It also describes a method synthesizing various expressions. These three are key technologies model-based coding system. The input analysis technique directly robustly estimates 3-D head motions expressions without any two-dimensional (2-D) entity correspondences. resolves 2-D correspondence mismatch errors provides quality reproduction original images by fully...

10.1109/76.305871 article EN IEEE Transactions on Circuits and Systems for Video Technology 1994-06-01

Efficient retrieval of life log based on context and content

OPENALEX - Publications

Kiyoharu Aizawa Datchakorn Tancharoen S. Kawasaki Toshihiko Yamasaki

In this paper, we present continuous capture of our life log with various sensors plus additional data and propose effective retrieval methods using context content. Our system contains video, audio, acceleration sensor, gyro, GPS, annotations, documents, web pages, emails. previous studies, showed methodology [8], [9], which mainly depends on information from sensor data. extend functions. They are (1) spatio-temporal sampling for extraction key frames summarization; (2) conversation scene...

10.1145/1026653.1026656 article EN 2004-10-15

Mask-SLAM: Robust Feature-Based Monocular SLAM by Masking Using Semantic Segmentation

OPENALEX - Publications

Masaya Kaneko Kazuya Iwami Torn Ogawa Toshihiko Yamasaki Kiyoharu Aizawa

In this paper, we propose a novel method that combines monocular visual simultaneous localization and mapping (vSLAM) deep-learning-based semantic segmentation. For stable operation, vSLAM requires feature points on static objects. conventional vSLAM, random sample consensus (RANSAC) [5] is used to select those points. However, if major portion of the view occupied by moving objects, many become inappropriate RANSAC does not perform well. Based our empirical studies, in sky cars often cause...

10.1109/cvprw.2018.00063 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) 2018-06-01

FoodLog: Multimedia Tool for Healthcare Applications

OPENALEX - Publications

Kiyoharu Aizawa Makoto Ogawa

FoodLog is a multimedia food-recording tool that offers novel method for recording daily food intake primarily healthcare purposes. Its use of image-processing techniques presents significant potential the development new monitoring apps.

10.1109/mmul.2015.39 article EN IEEE Multimedia 2015-04-01

Efficient Optimization of Convolutional Neural Networks Using Particle Swarm Optimization

OPENALEX - Publications

Toshihiko Yamasaki Takuto Honma Kiyoharu Aizawa

This work presents methods to automatically find optimal parameter settings for convolutional neural networks (CNNs) by using an evolutionary algorithm called particle swarm optimization (PSO). Even though the space is extremely large (> 10 <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">20</sup> ), we experimentally show that a better setting can be found Alexnet configuration five different image datasets. We have also developed two candidate...

10.1109/bigmm.2017.69 article EN 2017-04-01

Spatio-Temporal Vector of Locally Max Pooled Features for Action Recognition in Videos

OPENALEX - Publications

Ionuţ Cosmin Duţă Bogdan Ionescu Kiyoharu Aizawa Nicu Sebe

We introduce Spatio-Temporal Vector of Locally Max Pooled Features (ST-VLMPF), a super vector-based encoding method specifically designed for local deep features encoding. The proposed addresses an important problem video understanding: how to build representation that incorporates the CNN over entire video. Feature assignment is carried out at two levels, by using similarity and spatio-temporal information. For each we specific encoding, focused on nature features, with goal capture highest...

10.1109/cvpr.2017.341 article EN 2017-07-01

Significance of Softmax-based Features in Comparison to Distance Metric Learning-based Features

OPENALEX - Publications

Shota Horiguchi Daiki Ikami Kiyoharu Aizawa

End-to-end distance metric learning (DML) has been applied to obtain features useful in many computer vision tasks. However, these DML studies have not provided equitable comparisons between extracted from DML-based networks and softmax-based networks. In this paper, we present objective two approaches under the same network architecture.

10.1109/tpami.2019.2911075 article EN cc-by IEEE Transactions on Pattern Analysis and Machine Intelligence 2019-01-01

cGAN-Based Manga Colorization Using a Single Training Image

OPENALEX - Publications

Paulina Hensman Kiyoharu Aizawa

The Japanese comic format known as Manga is popular all over the world. It traditionally produced in black and white, colorization time consuming costly. Automatic methods generally rely on greyscale values, which are not present manga. Furthermore, due to copyright protection, colorized manga available for training scarce. We propose a method based conditional Generative Adversarial Networks (cGAN). Unlike previous cGAN approaches that use many hundreds or thousands of images, our requires...

10.1109/icdar.2017.295 article EN 2017-11-01

Object-Aware Instance Labeling for Weakly Supervised Object Detection

OPENALEX - Publications

Satoshi Kosugi Toshihiko Yamasaki Kiyoharu Aizawa

Weakly supervised object detection (WSOD), where a detector is trained with only image-level annotations, attracting more and attention. As method to obtain well-performing detector, the instance labels are updated iteratively. In this study, for efficient iterative updating, we focus on labeling problem, problem of which label should be annotated each region based last localization result. Instead simply top-scoring its highly overlapping regions as positive others negative, propose...

10.1109/iccv.2019.00616 article EN 2021 IEEE/CVF International Conference on Computer Vision (ICCV) 2019-10-01

Coming Soon ...