- Advanced Image Processing Techniques
- Multimodal Machine Learning Applications
- Advanced Image and Video Retrieval Techniques
- Advanced Vision and Imaging
- Image Processing Techniques and Applications
- Image Enhancement Techniques
- Anomaly Detection Techniques and Applications
- Advanced Neural Network Applications
- Evolutionary Algorithms and Applications
- Domain Adaptation and Few-Shot Learning
- Human Pose and Action Recognition
- Image and Signal Denoising Methods
- COVID-19 diagnosis using AI
- Reinforcement Learning in Robotics
- Machine Learning and Data Classification
- Metaheuristic Optimization Algorithms Research
- Video Analysis and Summarization
- Robotics and Sensor-Based Localization
- Cell Image Analysis Techniques
- Generative Adversarial Networks and Image Synthesis
- Video Surveillance and Tracking Methods
- Visual Attention and Saliency Detection
- Data-Driven Disease Surveillance
- Infrastructure Maintenance and Monitoring
- Structural Health Monitoring Techniques
Tohoku University
2018-2025
RIKEN Center for Advanced Intelligence Project
2018-2025
Yokohama National University
2013-2018
The convolutional neural network (CNN), which is one of the deep learning models, has seen much success in a variety computer vision tasks. However, designing CNN architectures still requires expert knowledge and lot trial error. In this paper, we attempt to automatically construct for an image classification task based on Cartesian genetic programming (CGP). our method, adopt highly functional modules, such as blocks tensor concatenation, node functions CGP. structure connectivity...
In this paper, we study design of deep neural networks for tasks image restoration. We propose a novel style residual connections dubbed "dual connection", which exploits the potential paired operations, e.g., up- and down-sampling or convolution with large- small-size kernels. modular block implementing connection style; it is equipped two containers to arbitrary operations are inserted. Adopting "unraveled" view proposed by Veit et al., point out that stack blocks allows first operation in...
We propose a method for designing convolutional neural network (CNN) architectures based on Cartesian genetic programming (CGP). In the proposed method, of CNNs are represented by directed acyclic graphs, in which each node represents highly-functional modules such as blocks and tensor operations, edge connectivity layers. The architecture is optimized to maximize classification accuracy validation dataset an evolutionary algorithm. show that can find competitive CNN compared with...
Many studies have been conducted so far on image restoration, the problem of restoring a clean from its distorted version. There are many different types distortion affecting quality. Previous focused single distortion, proposing methods for removing them. However, quality degrades due to multiple factors in real world. Thus, depending applications, e.g., vision autonomous cars or surveillance cameras, we need be able deal with combined distortions unknown mixture ratios. For this purpose,...
Abstract The convolutional neural network (CNN), one of the deep learning models, has demonstrated outstanding performance in a variety computer vision tasks. However, as architectures become deeper and more complex, designing CNN requires expert knowledge trial error. In this article, we attempt to automatically construct high-performing for given task. Our method uses Cartesian genetic programming (CGP) encode architectures, adopting highly functional modules such block tensor...
The convolutional neural network (CNN), which is one of the deep learning models, has seen much success in a variety computer vision tasks. However, designing CNN architectures still requires expert knowledge and lot trial error. In this paper, we attempt to automatically construct for an image classification task based on Cartesian genetic programming (CGP). our method, adopt highly functional modules, such as blocks tensor concatenation, node functions CGP. structure connectivity...
Abstract This paper explores the application of visual question answering (VQA) in bridge inspection using recent advancements multimodal artificial intelligence (AI) systems. VQA involves an AI model providing natural language answers to questions about content input image. However, applying poses challenges due high cost creating training data that requires expert knowledge. To address this, we propose leveraging existing reports, which already include image–text pairs, as external...
Computer vision has become increasingly prevalent in solving real-world problems across diverse domains, including smart agriculture, fishery, and livestock management. These applications may not require processing many image frames per second, leading practitioners to use single board computers (SBCs). Although lightweight networks have been developed for "mobile/edge" devices, they primarily target smartphones with more powerful processors SBCs the low-end CPUs. This paper introduces a...
The application of Multi-modal Large Language Models (MLLMs) in Autonomous Driving (AD) faces significant challenges due to their limited training on traffic-specific data and the absence dedicated benchmarks for spatiotemporal understanding. This study addresses these issues by proposing TB-Bench, a comprehensive benchmark designed evaluate MLLMs understanding traffic behaviors across eight perception tasks from ego-centric views. We also introduce vision-language instruction tuning...
Abstract Climate change exacerbates natural disasters, demanding rapid damage and risk assessment. However, expert‐reliant analyses delay responses despite drone‐aided data collection. This study develops compares multimodal AI approaches using advanced large language models (LLMs) for expert‐level landslide image analysis. We tackle landslide‐specific challenges: capturing nuanced geotechnical reasoning beyond digitization (specific to geological features assessment), developing specialized...
Previous studies on unsupervised industrial anomaly detection mainly focus 'structural' types of anomalies such as cracks and color contamination by matching or learning local feature representations. While achieving significantly high performance this kind anomaly, they are faced with 'logical' that violate the long-range dependencies a normal object placed in wrong position. Noting reverse distillation approaches under encoder-decoder paradigm could learn from abstract level knowledge, we...
There is a growing interest in the community making an embodied AI agent perform complicated task while interacting with environment following natural language directives. Recent studies have tackled problem using ALFRED, well-designed dataset for task, but achieved only very low accuracy. This paper proposes new method, which outperforms previous methods by large margin. It based on combination of several ideas. One two-stage interpretation provided instructions. The method first selects...
The application of computer algorithms to identify patterns in data is referred as machine learning. are used learn complex relationships and build models for various predictions. Herein, the k ‐means method used, one unsupervised learning methods learning, predict Young's modulus ultimate tensile strength (UTS) carbon‐fiber‐reinforced polymers (CFRPs), their experimental UTS values compared. categorizes CFRP into four colors: carbon fiber, epoxy resin matrix, defects, contamination....
In this paper, we design a hierarchical feature construction method for image classification. Our has two stages: (1) by combination of primitive processing filters, and (2) evolved filters. We verify the classification performance proposed on MIT urban nature scene dataset. The experimental results show that two-stage improves accuracy compared to single stage construction. addition, outperforms several existing methods.
Convolution is an equivariant operation, and image position does not affect its result. A recent study shows that the zero-padding employed in convolutional layers of CNNs provides information to CNNs. The further claims enables accurate inference for several tasks, such as object recognition, segmentation, etc. However, there a technical issue with design experiments study, thus correctness claim yet be verified. Moreover, absolute may essential segmentation natural images, which target...
Semantic segmentation requires a lot of training data, which necessitates costly annotation. There have been many studies on unsupervised domain adaptation (UDA) from one to another, e.g., computer graphics real images. However, there is still gap in accuracy between UDA and supervised native data. It arguably attributable the class-level misalignment source target To cope with this, we propose method that applies adversarial align two feature distributions domain. uses self-training...
Abstract Recent studies on visual anomaly detection (AD) of industrial objects/textures have achieved quite good performance. They consider an unsupervised setting, specifically the one-class in which we assume availability a set normal (i.e., anomaly-free) images for training. In this paper, more challenging scenario AD, detect anomalies given that might contain both and anomalous samples. The setting does not known data thus is completely free from human annotation, differs standard AD...