- Speech Recognition and Synthesis
- Music and Audio Processing
- Speech and Audio Processing
- Natural Language Processing Techniques
- Topic Modeling
- Speech and dialogue systems
- Animal Vocal Communication and Behavior
- Hydrocarbon exploration and reservoir analysis
- Enhanced Oil Recovery Techniques
- Hydraulic Fracturing and Reservoir Analysis
- Marine animal studies overview
- Drilling and Well Engineering
- Seismic Imaging and Inversion Techniques
- Advanced Image Processing Techniques
- Image and Signal Denoising Methods
- AI in Service Interactions
- Mineral Processing and Grinding
- Context-Aware Activity Recognition Systems
- Metaheuristic Optimization Algorithms Research
- Non-Destructive Testing Techniques
- Intelligent Tutoring Systems and Adaptive Learning
- Underwater Acoustics Research
- Evolutionary Algorithms and Applications
- Subtitles and Audiovisual Media
- Text and Document Classification Technologies
Apple (United Kingdom)
2020-2023
UNSW Sydney
2018-2022
Emotech (United Kingdom)
2018-2019
University of Edinburgh
2012-2017
Akademia Tarnowska
2007-2012
We investigate convolutional neural networks (CNNs) for large vocabulary distant speech recognition, trained using recorded from a single microphone (SDM) and multiple microphones (MDM). In the MDM case we explore beamformed signal input representation compared with direct use of acoustic channels as parallel to CNN. have explored different weight sharing approaches, propose channel-wise convolution two-way pooling. Our experiments, AMI meeting corpus, found that CNNs improve word error rate...
This paper proposes a simple yet effective model-based neural network speaker adaptation technique that learns speaker-specific hidden unit contributions given data, without requiring any form of speaker-adaptive training, or labelled data. An additional amplitude parameter is defined for each unit; the parameters are tied speaker, and learned using unsupervised adaptation. We conducted experiments on TED talks as used in International Workshop Spoken Language Translation (IWSLT)...
Despite the growing interest in unsupervised learning, extracting meaningful knowledge from unlabelled audio remains an open challenge. To take a step this direction, we recently proposed problem-agnostic speech encoder (PASE), that combines convolutional followed by multiple neural networks, called workers, tasked to solve self-supervised problems (i.e., ones do not require manual annotations as ground truth). PASE was shown capture relevant information, including speaker voice-print and...
We investigate multilingual modeling in the context of a deep neural network (DNN) - hidden Markov model (HMM) hybrid, where DNN outputs are used as HMM state likelihoods. By viewing networks cascade feature extractors followed by logistic regression classifier, we hypothesise that layers, which act extractors, will be transferable between languages. As corollary, propose training layers on multiple languages makes them more suitable for such cross-lingual transfer. experimentally confirm...
We investigate the use of cross-lingual acoustic data to initialise deep neural network (DNN) models by means unsupervised restricted Boltzmann machine (RBM) pre-training. DNNs for German are pretrained using one or all German, Portuguese, Spanish and Swedish. The used in a tandem configuration, where outputs as features hidden Markov model (HMM) whose emission densities modeled Gaussian mixture (GMMs), well hybrid HMM state likelihoods. experiments show that pretraining is more crucial...
We investigate the application of deep neural network (DNN)-hidden Markov model (HMM) hybrid acoustic models for far-field speech recognition meetings recorded using microphone arrays. show that achieve significantly better accuracy than conventional systems based on Gaussian mixture (GMMs). observe up to 8% absolute word error rate (WER) reduction from a discriminatively trained GMM baseline when single distant microphone, and between 4-6% WER beamforming various combinations array...
Spoken Language Understanding infers semantic meaning directly from audio data, and thus promises to reduce error propagation misunderstandings in end-user applications. However, publicly available SLU resources are limited. In this paper, we release SLURP, a new package containing the following: (1) A challenging dataset English spanning 18 domains, which is substantially bigger linguistically more diverse than existing datasets; (2) Competitive baselines based on state-of-the-art NLU ASR...
This work presents a broad study on the adaptation of neural network acoustic models by means learning hidden unit contributions (LHUC) - method that linearly re-combines units in speaker- or environment-dependent manner using small amounts unsupervised data. We also extend LHUC to speaker adaptive training (SAT) framework leads more adaptable DNN model, working both speaker-dependent and speaker-independent manner, without requirements maintain auxiliary feature extractors introduce...
A major advantage of statistical parametric speech synthesis (SPSS) over unit-selection is its adaptability and controllability in changing speaker characteristics speaking style. Recently, several studies using deep neural networks (DNNs) as acoustic models for SPSS have shown promising results. However, the DNNs has not been systematically studied. In this paper, we conduct an experimental analysis adaptation DNN-based at different levels. particular, augment a low-dimensional...
We have recently seen the emergence of several publicly available Natural Language Understanding (NLU) toolkits, which map user utterances to structured, but more abstract, Dialogue Act (DA) or Intent specifications, while making this process accessible lay developer. In paper, we present first wide coverage evaluation and comparison some most popular NLU services, on a large, multi-domain (21 domains) dataset 25K that collected annotated with Entity Type specifications will be released as...
Abstract Pore‐scale digital images are usually obtained from microcomputed tomography data that has been segmented into void and grain space. Image segmentation is a crucial step in the process of rock analysis can influence pore‐scale characterization studies and/or numerical simulation petrophysical properties. This concerning since all methods have user‐selected parameters result biases. Convolutional neural networks (CNNs) provide way forward once trained, CNN consistent reliable image...
Rory Beard, Ritwik Das, Raymond W. M. Ng, P. G. Keerthana Gopalakrishnan, Luka Eerens, Pawel Swietojanski, Ondrej Miksik. Proceedings of the 22nd Conference on Computational Natural Language Learning. 2018.
We present a structured overview of adaptation algorithms for neural network-based speech recognition, considering both hybrid hidden Markov model / network systems and end-to-end systems, with focus on speaker adaptation, domain accent adaptation. The characterizes as based embeddings, parameter or data augmentation. meta-analysis the performance recognition algorithms, relative error rate reductions reported in literature.
Abstract High‐resolution X‐ray microcomputed tomography (micro‐CT) data are used for the accurate determination of rock petrophysical properties. data, however, result in a small field view, and thus, representativeness simulation domain can be brought into question when dealing with geophysical applications. This paper applies cycle‐in‐cycle generative adversarial network (CinCGAN) to improve resolution 3‐D micro‐CT create super‐resolution image using unpaired training images. Effective...
Summary X-ray imaging of porous media has revolutionized the interpretation various microscale phenomena in subsurface systems. The volumetric images acquired from this technology, known as digital rocks (DR), make it a suitable candidate for machine learning and computer-vision applications. current routine DR frameworks involving image processing modeling are susceptible to user bias expensive computation requirements, especially large domains. In comparison, inference with trained...
Abstract Mineral and hydrocarbon exploration relies heavily on geological geotechnical information extracted from drill cores. Traditional drill-core characterization is based purely the subjective expertise of a geologist. New technologies can provide automatic mineral analysis high-resolution core images in non-destructive manner. However, automated rock mass presents significant challenge due to its lack generalization robustness. To date, estimation quality designation (RQD), key...
A major advantage of statistical parametric speech synthesis (SPSS) over unit-selection is its adaptability and controllability in changing speaker characteristics speaking style.Recently, several studies using deep neural networks (DNNs) as acoustic models for SPSS have shown promising results.However, the DNNs has not been systematically studied.In this paper, we conduct an experimental analysis adaptation DNN-based at different levels.In particular, augment a low-dimensional...
Recently there has been increasing interest in ways of using outof-domain (OOD) data to improve automatic speech recognition performance domains where only limited is available.This paper focuses on one such domain, namely that disordered for which very small databases exist, but normal can be considered OOD.Standard approaches handling use adaptation from OOD models into the target here we investigate an alternative approach with its focus feature extraction stage: used train...
We explore the use of maxout neuron in various aspects acoustic modelling for large vocabulary speech recognition systems; including low-resource scenario and multilingual knowledge transfers. Through experiments on voice search short message dictation datasets, we found that networks are around three times faster to train offer lower or comparable word error rates several tasks, when compared with logistic nonlinearity. also present a detailed study unit internal behaviour suggesting...
In this paper we investigate techniques to combine hybrid HMM-DNN (hidden Markov model - deep neural network) and tandem HMM-GMM Gaussian mixture model) acoustic models using: (1) averaging, (2) lattice combination with Minimum Bayes Risk decoding. We have performed experiments on the "TED Talks" task following protocol of IWSLT-2012 evaluation. Our experimental results suggest that DNN-based GMM-based are complementary, error rates being reduced by up 8% relative when DNN GMM systems...
Distant conversational speech recognition is challenging owing to the presence of multiple, overlapping talkers, additional non-speech acoustic sources, and effects reverberation. In this paper we review work on distant recognition, with an emphasis approaches which combine multichannel signal processing modelling, investigate use hybrid neural network / hidden Markov model models for meetings recorded using microphone arrays. particular convolutional fully-connected networks different...