NFDI4DS | UHH-SEMS - Publication Details

Masanori Suganuma

ORCID: 0000-0002-1469-9663

Publications

Citations

Views

---

Saved

---

About

Contact & Profiles

A5042810984

Research Areas

Advanced Image Processing Techniques
Multimodal Machine Learning Applications
Advanced Image and Video Retrieval Techniques
Advanced Vision and Imaging
Image Processing Techniques and Applications
Image Enhancement Techniques
Anomaly Detection Techniques and Applications
Advanced Neural Network Applications
Evolutionary Algorithms and Applications
Domain Adaptation and Few-Shot Learning
Human Pose and Action Recognition
Image and Signal Denoising Methods
COVID-19 diagnosis using AI
Reinforcement Learning in Robotics
Machine Learning and Data Classification
Metaheuristic Optimization Algorithms Research
Video Analysis and Summarization
Robotics and Sensor-Based Localization
Cell Image Analysis Techniques
Generative Adversarial Networks and Image Synthesis
Video Surveillance and Tracking Methods
Visual Attention and Saliency Detection
Data-Driven Disease Surveillance
Infrastructure Maintenance and Monitoring
Structural Health Monitoring Techniques

Tohoku University
2018-2025

RIKEN Center for Advanced Intelligence Project
2018-2025

Yokohama National University
2013-2018

A genetic programming approach to designing convolutional neural network architectures

OPENALEX - Publications

Masanori Suganuma Shinichi Shirakawa Tomoharu Nagao

The convolutional neural network (CNN), which is one of the deep learning models, has seen much success in a variety computer vision tasks. However, designing CNN architectures still requires expert knowledge and lot trial error. In this paper, we attempt to automatically construct for an image classification task based on Cartesian genetic programming (CGP). our method, adopt highly functional modules, such as blocks tensor concatenation, node functions CGP. structure connectivity...

10.1145/3071178.3071229 article EN Proceedings of the Genetic and Evolutionary Computation Conference 2017-06-30

Dual Residual Networks Leveraging the Potential of Paired Operations for Image Restoration

OPENALEX - Publications

Xing Liu Masanori Suganuma Zhun Sun Takayuki Okatani

In this paper, we study design of deep neural networks for tasks image restoration. We propose a novel style residual connections dubbed "dual connection", which exploits the potential paired operations, e.g., up- and down-sampling or convolution with large- small-size kernels. modular block implementing connection style; it is equipped two containers to arbitrary operations are inserted. Adopting "unraveled" view proposed by Veit et al., point out that stack blocks allows first operation in...

10.1109/cvpr.2019.00717 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2019-06-01

A Genetic Programming Approach to Designing Convolutional Neural Network Architectures

OPENALEX - Publications

Masanori Suganuma Shinichi Shirakawa Tomoharu Nagao

We propose a method for designing convolutional neural network (CNN) architectures based on Cartesian genetic programming (CGP). In the proposed method, of CNNs are represented by directed acyclic graphs, in which each node represents highly-functional modules such as blocks and tensor operations, edge connectivity layers. The architecture is optimized to maximize classification accuracy validation dataset an evolutionary algorithm. show that can find competitive CNN compared with...

10.24963/ijcai.2018/755 article EN 2018-07-01

Attention-Based Adaptive Selection of Operations for Image Restoration in the Presence of Unknown Combined Distortions

OPENALEX - Publications

Masanori Suganuma Xing Liu Takayuki Okatani

Many studies have been conducted so far on image restoration, the problem of restoring a clean from its distorted version. There are many different types distortion affecting quality. Previous focused single distortion, proposing methods for removing them. However, quality degrades due to multiple factors in real world. Thus, depending applications, e.g., vision autonomous cars or surveillance cameras, we need be able deal with combined distortions unknown mixture ratios. For this purpose,...

10.1109/cvpr.2019.00925 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2019-06-01

Evolution of Deep Convolutional Neural Networks Using Cartesian Genetic Programming

OPENALEX - Publications

Masanori Suganuma Masayuki Kobayashi Shinichi Shirakawa Tomoharu Nagao

Abstract The convolutional neural network (CNN), one of the deep learning models, has demonstrated outstanding performance in a variety computer vision tasks. However, as architectures become deeper and more complex, designing CNN requires expert knowledge trial error. In this article, we attempt to automatically construct high-performing for given task. Our method uses Cartesian genetic programming (CGP) encode architectures, adopting highly functional modules such block tensor...

10.1162/evco_a_00253 article EN cc-by-nc Evolutionary Computation 2019-03-22

A Genetic Programming Approach to Designing Convolutional Neural Network Architectures

OPENALEX - Publications

Masanori Suganuma Shinichi Shirakawa Tomoharu Nagao

10.48550/arxiv.1704.00764 preprint EN other-oa arXiv (Cornell University) 2017-01-01

Improving visual question answering for bridge inspection by pre‐training with external data of image–text pairs

OPENALEX - Publications

Thannarot Kunlamai Tatsuro Yamane Masanori Suganuma Pang-jo CHUN Takayaki Okatani

Abstract This paper explores the application of visual question answering (VQA) in bridge inspection using recent advancements multimodal artificial intelligence (AI) systems. VQA involves an AI model providing natural language answers to questions about content input image. However, applying poses challenges due high cost creating training data that requires expert knowledge. To address this, we propose leveraging existing reports, which already include image–text pairs, as external...

10.1111/mice.13086 article EN cc-by-nc-nd Computer-Aided Civil and Infrastructure Engineering 2023-08-18

SBCFormer: Lightweight Network Capable of Full-size ImageNet Classification at 1 FPS on Single Board Computers

OPENALEX - Publications

Xiangyong Lu Masanori Suganuma Takayuki Okatani

Computer vision has become increasingly prevalent in solving real-world problems across diverse domains, including smart agriculture, fishery, and livestock management. These applications may not require processing many image frames per second, leading practitioners to use single board computers (SBCs). Although lightweight networks have been developed for "mobile/edge" devices, they primarily target smartphones with more powerful processors SBCs the low-end CPUs. This paper introduces a...

10.1109/wacv57701.2024.00116 article EN 2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) 2024-01-03

TB-Bench: Training and Testing Multi-Modal AI for Understanding Spatio-Temporal Traffic Behaviors from Dashcam Images/Videos

OPENALEX - Publications

Korawat Charoenpitaks Van‐Quang Nguyen Masanori Suganuma Arai Kentaro S. Totsuka and 2 more

The application of Multi-modal Large Language Models (MLLMs) in Autonomous Driving (AD) faces significant challenges due to their limited training on traffic-specific data and the absence dedicated benchmarks for spatiotemporal understanding. This study addresses these issues by proposing TB-Bench, a comprehensive benchmark designed evaluate MLLMs understanding traffic behaviors across eight perception tasks from ego-centric views. We also introduce vision-language instruction tuning...

10.48550/arxiv.2501.05733 preprint EN arXiv (Cornell University) 2025-01-10

RefVSR++: Exploiting Reference Inputs for Reference-based Video Super-resolution

OPENALEX - Publications

Han Zou Masanori Suganuma Takayuki Okatani

10.1109/wacv61041.2025.00273 article EN 2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) 2025-02-26

Inverting the Generation Process of Denoising Diffusion Implicit Models: Empirical Evaluation and a Novel Method

OPENALEX - Publications

Yan Zeng Masanori Suganuma Takayuki Okatani

10.1109/wacv61041.2025.00453 article EN 2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) 2025-02-26

Multimodal artificial intelligence approaches using large language models for expert‐level landslide image analysis

OPENALEX - Publications

Kittitouch Areerob Van‐Quang Nguyen Xianfeng Li Shogo Inadomi Toru Shimada and 6 more

Abstract Climate change exacerbates natural disasters, demanding rapid damage and risk assessment. However, expert‐reliant analyses delay responses despite drone‐aided data collection. This study develops compares multimodal AI approaches using advanced large language models (LLMs) for expert‐level landslide image analysis. We tackle landslide‐specific challenges: capturing nuanced geotechnical reasoning beyond digitization (specific to geological features assessment), developing specialized...

10.1111/mice.13482 article EN cc-by-nc Computer-Aided Civil and Infrastructure Engineering 2025-04-11

Contextual Affinity Distillation for Image Anomaly Detection

OPENALEX - Publications

Mingjie Zhang Masanori Suganuma Takayuki Okatani

Previous studies on unsupervised industrial anomaly detection mainly focus 'structural' types of anomalies such as cracks and color contamination by matching or learning local feature representations. While achieving significantly high performance this kind anomaly, they are faced with 'logical' that violate the long-range dependencies a normal object placed in wrong position. Noting reverse distillation approaches under encoder-decoder paradigm could learn from abstract level knowledge, we...

10.1109/wacv57701.2024.00022 article EN 2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) 2024-01-03

Look Wide and Interpret Twice: Improving Performance on Interactive Instruction-following Tasks

OPENALEX - Publications

Van-Quang Nguyen Masanori Suganuma Takayuki Okatani

There is a growing interest in the community making an embodied AI agent perform complicated task while interacting with environment following natural language directives. Recent studies have tackled problem using ALFRED, well-designed dataset for task, but achieved only very low accuracy. This paper proposes new method, which outperforms previous methods by large margin. It based on combination of several ideas. One two-stage interpretation provided instructions. The method first selects...

10.24963/ijcai.2021/128 article EN 2021-08-01

k‐Means Clustering for Prediction of Tensile Properties in Carbon Fiber‐Reinforced Polymer Composites

OPENALEX - Publications

Hiroki Kurita Masanori Suganuma Yinli Wang Fumio Narita

The application of computer algorithms to identify patterns in data is referred as machine learning. are used learn complex relationships and build models for various predictions. Herein, the k ‐means method used, one unsupervised learning methods learning, predict Young's modulus ultimate tensile strength (UTS) carbon‐fiber‐reinforced polymers (CFRPs), their experimental UTS values compared. categorizes CFRP into four colors: carbon fiber, epoxy resin matrix, defects, contamination....

10.1002/adem.202101072 article EN Advanced Engineering Materials 2022-02-18

Hierarchical feature construction for image classification using Genetic Programming

OPENALEX - Publications

Masanori Suganuma Tsuchiya Daiki Shinichi Shirakawa Tomoharu Nagao

In this paper, we design a hierarchical feature construction method for image classification. Our has two stages: (1) by combination of primitive processing filters, and (2) evolved filters. We verify the classification performance proposed on MIT urban nature scene dataset. The experimental results show that two-stage improves accuracy compared to single stage construction. addition, outperforms several existing methods.

10.1109/smc.2016.7844436 article EN 2022 IEEE International Conference on Systems, Man, and Cybernetics (SMC) 2016-10-01

How Can CNNs Use Image Position for Segmentation?

OPENALEX - Publications

Rito Murase Masanori Suganuma Takayuki Okatani

Convolution is an equivariant operation, and image position does not affect its result. A recent study shows that the zero-padding employed in convolutional layers of CNNs provides information to CNNs. The further claims enables accurate inference for several tasks, such as object recognition, segmentation, etc. However, there a technical issue with design experiments study, thus correctness claim yet be verified. Moreover, absolute may essential segmentation natural images, which target...

10.48550/arxiv.2005.03463 preprint EN other-oa arXiv (Cornell University) 2020-01-01

Unsupervised domain adaptation for semantic segmentation via cross-region alignment

OPENALEX - Publications

Zhijie Wang Xing Liu Masanori Suganuma Takayuki Okatani

Semantic segmentation requires a lot of training data, which necessitates costly annotation. There have been many studies on unsupervised domain adaptation (UDA) from one to another, e.g., computer graphics real images. However, there is still gap in accuracy between UDA and supervised native data. It arguably attributable the class-level misalignment source target To cope with this, we propose method that applies adversarial align two feature distributions domain. uses self-training...

10.1016/j.cviu.2023.103743 article EN cc-by-nc-nd Computer Vision and Image Understanding 2023-06-10

That’s BAD: blind anomaly detection by implicit local feature clustering

OPENALEX - Publications

Mingjie Zhang Masanori Suganuma Takayuki Okatani

Abstract Recent studies on visual anomaly detection (AD) of industrial objects/textures have achieved quite good performance. They consider an unsupervised setting, specifically the one-class in which we assume availability a set normal (i.e., anomaly-free) images for training. In this paper, more challenging scenario AD, detect anomalies given that might contain both and anomalous samples. The setting does not known data thus is completely free from human annotation, differs standard AD...

10.1007/s00138-024-01511-9 article EN cc-by Machine Vision and Applications 2024-03-01

Coming Soon ...