- Robotics and Sensor-Based Localization
- Remote Sensing and LiDAR Applications
- 3D Surveying and Cultural Heritage
- Speech Recognition and Synthesis
- 3D Shape Modeling and Analysis
- Music and Audio Processing
- Speech and Audio Processing
- Natural Language Processing Techniques
- Robotic Locomotion and Control
- Topic Modeling
- Robot Manipulation and Learning
- Indoor and Outdoor Localization Technologies
- Advanced Neural Network Applications
- Medical Image Segmentation Techniques
- Smart Agriculture and AI
- Distributed and Parallel Computing Systems
- Robotic Path Planning Algorithms
- Advanced Image Processing Techniques
- Scientific Computing and Data Management
- Computer Graphics and Visualization Techniques
- Remote Sensing in Agriculture
- Advanced Image and Video Retrieval Techniques
- Advanced Optical Sensing Technologies
- IoT and Edge/Fog Computing
- Image Processing and 3D Reconstruction
Amazon (United Kingdom)
2023
Science Oxford
2018-2021
University of Oxford
2018-2021
In this paper, we present a 3D factor-graph LiDAR-SLAM system which incorporates state-of-the-art deeply learned feature-based loop closure detector to enable legged robot localize and map in industrial environments. Point clouds are accumulated using an inertial-kinematic state estimator before being aligned ICP registration. To close loops use proposal mechanism matches individual segments between clouds. We trained descriptor offline match these segments. The efficiency of our method...
We present a universal neural vocoder based on Parallel WaveNet, with an additional conditioning network called Audio Encoder. Our offers real-time high-quality speech synthesis wide range of use cases. tested it 43 internal speakers diverse age and gender, speaking 20 languages in 17 unique styles, which 7 voices 5 styles were not exposed during training. show that the proposed significantly outperforms speaker-dependent vocoders overall. also several existing architectures terms...
Localization in challenging, natural environments such as forests or woodlands is an important capability for many applications from guiding a robot navigating along forest trail to monitoring vegetation growth with handheld sensors. In this work we explore laser-based localization both urban and environments, which suitable online applications. We propose deep learning approach capable of meaningful descriptors directly 3D point clouds by comparing triplets (anchor, positive negative...
During localization and mapping the success of point cloud registration can be compromised when there is an absence geometric features or constraints in corridors across doorways, volumes scanned only partly overlap, due to occlusions constrictions between subsequent observations. This work proposes a strategy predict prevent laser-based failure. Our solution relies on explicit analysis content prior registration. A model predicting risk failed alignment learned by analysing degree spatial...
We present SKD, a novel keypoint detector that uses saliency to determine the best candidates from point cloud for tasks such as registration and reconstruction. The approach can be applied any differentiable deep learning descriptor by using gradients of with respect 3D position input points measure their saliency. is combined original context information in neural network, which trained learn robust candidates. key intuition behind this keypoints are not extracted solely result geometry...
In this work we introduce Natural Segmentation and Matching (NSM), an algorithm for reliable localization, using laser, in both urban natural environments. Current state-of-the-art global approaches do not generalize well to structure-poor vegetated areas such as forests or orchards. these environments clutter perceptual aliasing prevents repeatable extraction of distinctive landmarks between different test runs. forests, tree trunks are distinctive, foliage intertwines there is a complete...
Localization for autonomous robots in prior maps is crucial their functionality.This paper offers a solution to this problem indoor environments called InstaLoc, which operates on an individual lidar scan localize it within map.We draw inspiration from how humans navigate and position themselves by recognizing the layout of distinctive objects structures.Mimicking human approach, InstaLoc identifies matches object instances scene with those map.As far as we know, first method use panoptic...
This work focuses on modelling a speaker's accent that does not have dedicated text-to-speech (TTS) frontend, including grapheme-to-phoneme (G2P) module. Prior accents assumes phonetic transcription is available for the target accent, which might be case low-resource, regional accents. In our work, we propose an approach whereby first augment data to sound like donor voice via conversion, then train multi-speaker multi-accent TTS model combination of recordings and synthetic data, generate...
In this work we introduce Natural Segmentation and Matching (NSM), an algorithm for reliable localization, using laser, in both urban natural environments. Current state-of-the-art global approaches do not generalize well to structure-poor vegetated areas such as forests or orchards. these environments clutter perceptual aliasing prevents repeatable extraction of distinctive landmarks between different test runs. forests, tree trunks are distinctive, foliage intertwines there is a complete...
We present a universal neural vocoder based on Parallel WaveNet, with an additional conditioning network called Audio Encoder. Our offers real-time high-quality speech synthesis wide range of use cases. tested it 43 internal speakers diverse age and gender, speaking 20 languages in 17 unique styles, which 7 voices 5 styles were not exposed during training. show that the proposed significantly outperforms speaker-dependent vocoders overall. also several existing architectures terms...
This work focuses on modelling a speaker's accent that does not have dedicated text-to-speech (TTS) frontend, including grapheme-to-phoneme (G2P) module. Prior accents assumes phonetic transcription is available for the target accent, which might be case low-resource, regional accents. In our work, we propose an approach whereby first augment data to sound like donor voice via conversion, then train multi-speaker multi-accent TTS model combination of recordings and synthetic data, generate...
Localization is a key challenge in many robotics applications. In this work we explore LIDAR-based global localization both urban and natural environments develop method suitable for online application. Our approach leverages efficient deep learning architecture capable of compact point cloud descriptors directly from 3D data. The uses an feature space representation set segmented clouds to match between the current scene prior map. We show that down-sampling inner layers network can...
Localization for autonomous robots in prior maps is crucial their functionality. This paper offers a solution to this problem indoor environments called InstaLoc, which operates on an individual lidar scan localize it within map. We draw inspiration from how humans navigate and position themselves by recognizing the layout of distinctive objects structures. Mimicking human approach, InstaLoc identifies matches object instances scene with those As far as we know, first method use panoptic...
In this work, we introduce a framework for cross-lingual speech synthesis, which involves an upstream Voice Conversion (VC) model and downstream Text-To-Speech (TTS) model. The proposed consists of 4 stages. the first two stages, use VC to convert utterances in target locale voice speaker. third stage, converted data is combined with linguistic features durations from recordings language, are then used train single-speaker acoustic Finally, last stage entails training locale-independent...
We present SKD, a novel keypoint detector that uses saliency to determine the best candidates from point cloud for tasks such as registration and reconstruction. The approach can be applied any differentiable deep learning descriptor by using gradients of with respect 3D position input points measure their saliency. is combined original context information in neural network, which trained learn robust candidates. key intuition behind this keypoints are not extracted solely result geometry...