- Speech and Audio Processing
- Music and Audio Processing
- Video Surveillance and Tracking Methods
- Face and Expression Recognition
- Advanced Image and Video Retrieval Techniques
- Advanced Vision and Imaging
- Advanced Adaptive Filtering Techniques
- Underwater Acoustics Research
- Structural Health Monitoring Techniques
- Anomaly Detection Techniques and Applications
- Neural Networks and Applications
- Indoor and Outdoor Localization Technologies
- Direction-of-Arrival Estimation Techniques
- Blind Source Separation Techniques
- Speech Recognition and Synthesis
- Seismology and Earthquake Studies
- Earthquake Detection and Analysis
- Gait Recognition and Analysis
- Human Pose and Action Recognition
- Acoustic Wave Phenomena Research
- Robotic Path Planning Algorithms
- Hearing Loss and Rehabilitation
- Memory and Neural Mechanisms
- Advanced Measurement and Detection Methods
- Robotics and Sensor-Based Localization
Tokyo Institute of Technology
2019-2024
Systems Biology Institute
2024
Institute of Science Tokyo
2024
Honda (Japan)
2019
Erasmus MC
2018
National Institute of Advanced Industrial Science and Technology
2005-2016
This paper addresses the problem of 2D sound source localization using multiple microphone arrays in an outdoor environment. Two main issues exist such localization. Since performance depends on a variety parameters, lack knowledge about how to design system is one those issues. A thorough analysis respect accuracy results with different simulation conditions has been performed. Obtained characteristics lead discussion limitations and applicability system. The distinction between...
Support Vector Machines (SVMs), though accurate, are still difficult to solve large-scale applications, due the computational and storage requirement. To relieve this problem, we propose RANSAC-SVM method, which trains a number of small SVMs for randomly selected subsets training set, while tuning their parameters fit whole set. achieves good generalization performance, close Bayesian estimation, with subset samples, outperforms full SVM solution in some condition.
A novel visual tracking algorithm is proposed in this paper. The uses pixel-pair features to discriminate between an image patch with object the correct position and patches incorrect position. feature considered be robust for illumination change, also partial occlusion when appropriate are selected every video frame. precision a deforming (skier) examined detection method described.
This paper proposes an environmental sound segmentation method using Mask U-Net. In recent years, human–robot interactions, especially speech dialogue, have been assessed by auditory scene analysis. Methods, such as noise reduction, section detection, and source separation proposed for robot audition, acoustic signal processing, machine learning. However, conventional approaches three drawbacks: (1) Many studies analyzed individual functions, which are regarded being a cascade. Cascade...
This paper proposes an environmental sound segmentation method using Mask U-Net. Recent research in robot audition has analyzed noise reduction, section detection, and source separation for use a real-world environment with many noises overlaps. However, conventional methods apply respective functions cascades. The biggest problem of cascade systems is the accumulation errors generated at each function block. Although human voice have been proposed, robots operating must be able to separate...
Abstract This paper proposes a multichannel environmental sound segmentation method. Environmental is an integrated method to achieve source localization, separation and classification, simultaneously. When multiple microphones are available, spatial features can be used improve the localization accuracy of sounds from different directions; however, conventional methods have three drawbacks: (a) Sound using classification spectral trained in same neural network, may overfit relationship...
We examine a classification task in which signals of naturally occurring earthquakes are categorized ranging from minor to major, based on their magnitude. Generalized single-label task, most prior investigations have focused assessing whether an earthquake’s magnitude falls into the or large categories. This procedure is often not practical since tremor it generates has wide range variation neighboring regions distance, depth, type surface, and several other factors. present integrated...
This paper verify the performance of method calibrating their positions, orientations, and time offsets using observations with multiple microphone arrays estimating sound source positions in real environment. The conventional calculates orientation offset independently, it is difficult to correct solution when one optimizations cannot be performed well. Therefore, a was proposed that can simultaneously estimate position, orientation, by combining two types objective functions. We evaluate...
This paper presents a method for sound source localization and tracking using drones with microphone arrays. Microphone array processing is an established technique estimating the direction of source. However, search rescue applications, mere insufficient since location in terms distance essential. Accordingly, we propose novel call Multiple Triangulation Gaussian Sum Filter Tracking (MT-GSFT). MT-GSFT obtains by triangulation arrays attached to multiple drones. Since results differ among...
This paper proposes a multi-channel environmental sound segmentation method. Environmental is an integrated method that deals with source localization, separation and class identification. When multiple microphones are available, spatial features can be used to improve the accuracy of signals from different directions; however, conventional methods have two drawbacks: (a) Since localization using identification spectral trained in same neural network, it overfits relationship between...
We propose a novel framework for reliable automatic classification of earthquake magnitudes. Using deep learning methods, we aim to classify the magnitudes into different categories. The method is based on convolutional recurrent neural network in which new approach feature extraction using Log-Mel spectrogram representations applied seismic signals. able from minor major. Stanford Earthquake Dataset (STEAD) used train and validate proposed method. evaluation results demonstrate efficacy...
This paper presents a novel algorithm for road plane detection from an on-board camera.The employs the temporal difference of homography matrix, which is termed differential homography, caused by camera motion.Differential estimated optical flows regions, while using RANSAC to extract majority flows.Since relationship between image coordinate (location in image) and at locations image.The proposed does not require estimation matrix itself.Therefore, can be applied without calibration.The...
This paper addresses multi-channel environmental sound segmentation. Unlike traditional source separation using spatial information, segmentation is a technique that simultaneously detects, separates, and identifies sections based on the properties of previously learned sources. One such method, U-Net-based proposed for image semantic segmentation, has been However, since this method does not use performance degrades with overlapping Deeplabv3+, which one state-of-the-art methods however,...
Face tracking continues to be an important topic in computer vision. We describe a algorithm based on static face detector. Our detector is rectanglefeature- boosted classifier, which outputs the confidence whether input image face. The function that this confidence, called score function, contains information about location of moving target. A target has moved will located gradient direction from before moving. Therefore, our tracker go region where maximum using function. show works by...
One of the major challenges human movement identification in indoor environments is sensitivity to many uncommon interactions, such as falling off an object or moving a chair. This work investigates footstep movements using multiple modalities and analyzes their representations from small self-collected dataset acoustic vibration-based sensors. The core idea this study learn apparent similarities between two sensory traits (not limited microphone geophone) combine For purpose, we describe...
This paper describes improvement of Direction Arrival (DOA) estimation performance using quaternion output in the Detection and Classification Acoustic Scenes Events (DCASE) 2019 Task 3. DCASE Task3 focuses on sound event localization detection (SELD) which is a task that simultaneously estimates source direction addition to conventional (SED).In baseline method, angle directly regressed.However, periodic function it has discontinuities may make learning unstable.Specifically, even though...
This paper addresses the learning of periodic information such as phase for deep neural networks (DNN). To solve this problem, we propose a novel activation function based on von Mises distribution used in directional statistics, and construct von-Mises-Bernoulli DNN by replacing first hidden layer conventional Bernoulli-Bernoulli with proposed function. We theoretically validate that simple change can handle using restricted Boltzmann machines. In addition, practically show constructed...
Drone audition techniques are helpful for listening to target sound sources from the sky, which can be used human searching tasks in disaster sites. Among many required drone audition, source tracking is an essential technique, and thus several methods have been proposed. Authors also proposed a method that utilizes multiple microphone arrays obtain likelihood distribution of locations. These demonstrated benchmark experiments. However, performance against various with different distances...
A novel visual tracking algorithm is proposed in this paper. The plays an important role a cooperative driving support system (DSSS) that aimed at reducing traffic fatalities and injuries. input to the gray-scale image for every video frame from roadside camera, can be used detect existence of vehicles on road then track their trajectories. In algorithm, discriminative pixel-pair feature selection adopted discriminate between patch with object correct position patches objects incorrect...
An example-based classification algorithm to improve generalization performance for detecting objects in images is presented. The classifier integrates component-based classifiers according the AdaBoost algorithm. A probability estimate by a kernel-SVM used outputs of base learners, which are independently trained local features. learners determined selecting optimal feature sample weights boosting with cross-validation. Our method was applied MIT CBCL pedestrian image database, and 54...