Masanori Suganuma

ORCID: 0000-0002-1469-9663
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Advanced Image Processing Techniques
  • Multimodal Machine Learning Applications
  • Advanced Image and Video Retrieval Techniques
  • Advanced Vision and Imaging
  • Image Processing Techniques and Applications
  • Image Enhancement Techniques
  • Anomaly Detection Techniques and Applications
  • Advanced Neural Network Applications
  • Evolutionary Algorithms and Applications
  • Domain Adaptation and Few-Shot Learning
  • Human Pose and Action Recognition
  • Image and Signal Denoising Methods
  • COVID-19 diagnosis using AI
  • Reinforcement Learning in Robotics
  • Machine Learning and Data Classification
  • Metaheuristic Optimization Algorithms Research
  • Video Analysis and Summarization
  • Robotics and Sensor-Based Localization
  • Cell Image Analysis Techniques
  • Generative Adversarial Networks and Image Synthesis
  • Video Surveillance and Tracking Methods
  • Visual Attention and Saliency Detection
  • Data-Driven Disease Surveillance
  • Infrastructure Maintenance and Monitoring
  • Structural Health Monitoring Techniques

Tohoku University
2018-2025

RIKEN Center for Advanced Intelligence Project
2018-2025

Yokohama National University
2013-2018

The convolutional neural network (CNN), which is one of the deep learning models, has seen much success in a variety computer vision tasks. However, designing CNN architectures still requires expert knowledge and lot trial error. In this paper, we attempt to automatically construct for an image classification task based on Cartesian genetic programming (CGP). our method, adopt highly functional modules, such as blocks tensor concatenation, node functions CGP. structure connectivity...

10.1145/3071178.3071229 article EN Proceedings of the Genetic and Evolutionary Computation Conference 2017-06-30

In this paper, we study design of deep neural networks for tasks image restoration. We propose a novel style residual connections dubbed "dual connection", which exploits the potential paired operations, e.g., up- and down-sampling or convolution with large- small-size kernels. modular block implementing connection style; it is equipped two containers to arbitrary operations are inserted. Adopting "unraveled" view proposed by Veit et al., point out that stack blocks allows first operation in...

10.1109/cvpr.2019.00717 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2019-06-01

We propose a method for designing convolutional neural network (CNN) architectures based on Cartesian genetic programming (CGP). In the proposed method, of CNNs are represented by directed acyclic graphs, in which each node represents highly-functional modules such as blocks and tensor operations, edge connectivity layers. The architecture is optimized to maximize classification accuracy validation dataset an evolutionary algorithm. show that can find competitive CNN compared with...

10.24963/ijcai.2018/755 article EN 2018-07-01

Many studies have been conducted so far on image restoration, the problem of restoring a clean from its distorted version. There are many different types distortion affecting quality. Previous focused single distortion, proposing methods for removing them. However, quality degrades due to multiple factors in real world. Thus, depending applications, e.g., vision autonomous cars or surveillance cameras, we need be able deal with combined distortions unknown mixture ratios. For this purpose,...

10.1109/cvpr.2019.00925 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2019-06-01

Abstract The convolutional neural network (CNN), one of the deep learning models, has demonstrated outstanding performance in a variety computer vision tasks. However, as architectures become deeper and more complex, designing CNN requires expert knowledge trial error. In this article, we attempt to automatically construct high-performing for given task. Our method uses Cartesian genetic programming (CGP) encode architectures, adopting highly functional modules such block tensor...

10.1162/evco_a_00253 article EN cc-by-nc Evolutionary Computation 2019-03-22

The convolutional neural network (CNN), which is one of the deep learning models, has seen much success in a variety computer vision tasks. However, designing CNN architectures still requires expert knowledge and lot trial error. In this paper, we attempt to automatically construct for an image classification task based on Cartesian genetic programming (CGP). our method, adopt highly functional modules, such as blocks tensor concatenation, node functions CGP. structure connectivity...

10.48550/arxiv.1704.00764 preprint EN other-oa arXiv (Cornell University) 2017-01-01

Abstract This paper explores the application of visual question answering (VQA) in bridge inspection using recent advancements multimodal artificial intelligence (AI) systems. VQA involves an AI model providing natural language answers to questions about content input image. However, applying poses challenges due high cost creating training data that requires expert knowledge. To address this, we propose leveraging existing reports, which already include image–text pairs, as external...

10.1111/mice.13086 article EN cc-by-nc-nd Computer-Aided Civil and Infrastructure Engineering 2023-08-18

Computer vision has become increasingly prevalent in solving real-world problems across diverse domains, including smart agriculture, fishery, and livestock management. These applications may not require processing many image frames per second, leading practitioners to use single board computers (SBCs). Although lightweight networks have been developed for "mobile/edge" devices, they primarily target smartphones with more powerful processors SBCs the low-end CPUs. This paper introduces a...

10.1109/wacv57701.2024.00116 article EN 2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) 2024-01-03

The application of Multi-modal Large Language Models (MLLMs) in Autonomous Driving (AD) faces significant challenges due to their limited training on traffic-specific data and the absence dedicated benchmarks for spatiotemporal understanding. This study addresses these issues by proposing TB-Bench, a comprehensive benchmark designed evaluate MLLMs understanding traffic behaviors across eight perception tasks from ego-centric views. We also introduce vision-language instruction tuning...

10.48550/arxiv.2501.05733 preprint EN arXiv (Cornell University) 2025-01-10

10.1109/wacv61041.2025.00273 article EN 2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) 2025-02-26

10.1109/wacv61041.2025.00453 article EN 2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) 2025-02-26

Abstract Climate change exacerbates natural disasters, demanding rapid damage and risk assessment. However, expert‐reliant analyses delay responses despite drone‐aided data collection. This study develops compares multimodal AI approaches using advanced large language models (LLMs) for expert‐level landslide image analysis. We tackle landslide‐specific challenges: capturing nuanced geotechnical reasoning beyond digitization (specific to geological features assessment), developing specialized...

10.1111/mice.13482 article EN cc-by-nc Computer-Aided Civil and Infrastructure Engineering 2025-04-11

Previous studies on unsupervised industrial anomaly detection mainly focus 'structural' types of anomalies such as cracks and color contamination by matching or learning local feature representations. While achieving significantly high performance this kind anomaly, they are faced with 'logical' that violate the long-range dependencies a normal object placed in wrong position. Noting reverse distillation approaches under encoder-decoder paradigm could learn from abstract level knowledge, we...

10.1109/wacv57701.2024.00022 article EN 2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) 2024-01-03

There is a growing interest in the community making an embodied AI agent perform complicated task while interacting with environment following natural language directives. Recent studies have tackled problem using ALFRED, well-designed dataset for task, but achieved only very low accuracy. This paper proposes new method, which outperforms previous methods by large margin. It based on combination of several ideas. One two-stage interpretation provided instructions. The method first selects...

10.24963/ijcai.2021/128 article EN 2021-08-01

The application of computer algorithms to identify patterns in data is referred as machine learning. are used learn complex relationships and build models for various predictions. Herein, the k ‐means method used, one unsupervised learning methods learning, predict Young's modulus ultimate tensile strength (UTS) carbon‐fiber‐reinforced polymers (CFRPs), their experimental UTS values compared. categorizes CFRP into four colors: carbon fiber, epoxy resin matrix, defects, contamination....

10.1002/adem.202101072 article EN Advanced Engineering Materials 2022-02-18

In this paper, we design a hierarchical feature construction method for image classification. Our has two stages: (1) by combination of primitive processing filters, and (2) evolved filters. We verify the classification performance proposed on MIT urban nature scene dataset. The experimental results show that two-stage improves accuracy compared to single stage construction. addition, outperforms several existing methods.

10.1109/smc.2016.7844436 article EN 2022 IEEE International Conference on Systems, Man, and Cybernetics (SMC) 2016-10-01

Convolution is an equivariant operation, and image position does not affect its result. A recent study shows that the zero-padding employed in convolutional layers of CNNs provides information to CNNs. The further claims enables accurate inference for several tasks, such as object recognition, segmentation, etc. However, there a technical issue with design experiments study, thus correctness claim yet be verified. Moreover, absolute may essential segmentation natural images, which target...

10.48550/arxiv.2005.03463 preprint EN other-oa arXiv (Cornell University) 2020-01-01

Semantic segmentation requires a lot of training data, which necessitates costly annotation. There have been many studies on unsupervised domain adaptation (UDA) from one to another, e.g., computer graphics real images. However, there is still gap in accuracy between UDA and supervised native data. It arguably attributable the class-level misalignment source target To cope with this, we propose method that applies adversarial align two feature distributions domain. uses self-training...

10.1016/j.cviu.2023.103743 article EN cc-by-nc-nd Computer Vision and Image Understanding 2023-06-10

Abstract Recent studies on visual anomaly detection (AD) of industrial objects/textures have achieved quite good performance. They consider an unsupervised setting, specifically the one-class in which we assume availability a set normal (i.e., anomaly-free) images for training. In this paper, more challenging scenario AD, detect anomalies given that might contain both and anomalous samples. The setting does not known data thus is completely free from human annotation, differs standard AD...

10.1007/s00138-024-01511-9 article EN cc-by Machine Vision and Applications 2024-03-01
Coming Soon ...