Michael Riley

ORCID: 0009-0002-9449-025X
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Speech Recognition and Synthesis
  • Natural Language Processing Techniques
  • Algorithms and Data Compression
  • Speech and dialogue systems
  • semigroups and automata theory
  • Topic Modeling
  • Music and Audio Processing
  • Speech and Audio Processing
  • Machine Learning and Algorithms
  • Network Packet Processing and Optimization
  • Formal Methods in Verification
  • DNA and Biological Computing
  • Logic, programming, and type systems
  • Neural Networks and Applications
  • Privacy-Preserving Technologies in Data
  • Security and Verification in Computing
  • Millimeter-Wave Propagation and Modeling
  • Digital Communication and Language
  • Wireless Body Area Networks
  • Sensor Technology and Measurement Systems
  • Distributed systems and fault tolerance
  • Blind Source Separation Techniques
  • Music Technology and Sound Studies
  • Landfill Environmental Impact Studies
  • Radiation Effects in Electronics

Google (United States)
2014-2023

Oracle (United Kingdom)
2023

University of Warwick
2023

University of Utah
2019

Carnegie Mellon University
2017-2018

Alphabet (United States)
2016

University of Cincinnati
2009-2010

New York University
2007

AT&T (United States)
1996-2005

Massachusetts Institute of Technology
2005

This paper proposes and evaluates the hybrid autoregressive transducer (HAT) model, a time-synchronous encoder-decoder model that preserves modularity of conventional automatic speech recognition systems. The HAT provides way to measure quality internal language can be used decide whether inference with an external is beneficial or not. We evaluate our proposed on large-scale voice search task. Our experiments show significant improvements in WER compared state-of-the-art approaches <sup...

10.1109/icassp40776.2020.9053600 article EN ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2020-04-09

10.1016/s0304-3975(99)00014-6 article EN publisher-specific-oa Theoretical Computer Science 2000-01-01

Finite-state automata are a very effective tool in natural language processing. However, variety of applications and especially speech precessing, it is necessary to consider more general machines which arcs assigned weights or costs. We briefly describe some the main theoretical algorithmic aspects these machines. In particular, we an efficient composition algorithm for weighted transducers, give examples illustrating value determinization minimization algorithms automata.

10.48550/arxiv.cs/0503077 preprint EN other-oa arXiv (Cornell University) 2005-01-01

Most current spoken-dialog systems only extract sequences of words from a speaker's voice. This largely ignores other useful information that can be inferred speech such as gender, age, dialect, or emotion. These characteristics voice, voice signatures, whether static dynamic, for mining applications the design natural system. paper explores problem extracting automatically and accurately signatures We investigate two approaches speaker traits: first focuses on general acoustic prosodic...

10.1109/asru.2003.1318399 article EN 2004-09-07

Mingqing Chen, Ananda Theertha Suresh, Rajiv Mathews, Adeline Wong, Cyril Allauzen, Françoise Beaufays, Michael Riley. Proceedings of the 23rd Conference on Computational Natural Language Learning (CoNLL). 2019.

10.18653/v1/k19-1012 article EN cc-by 2019-01-01

We introduce a technique for dynamically applying contextually-derived language models to state-of-the-art speech recognition system. These generally small-footprint can be seen as generalization of cache-based [1], whereby contextually salient n-grams are derived from relevant sources (not just user generated language) produce model intended combination with the baseline model. The applied during first-pass decoding form on-the-fly composition between decoder search graph and set weighted...

10.21437/interspeech.2015-340 article EN Interspeech 2022 2015-09-06

Several applications of statistical tree-based modelling are described here to problems in speech and language. Classification regression trees well suited many the pattern recognition encountered this area since they (1) statistically select most significant features involved (2) provide "honest" estimates their performance, (3) permit both categorical continuous be considered, (4) allow human interpretation exploration result. First method is summarized, then its application automatic stop...

10.3115/1075434.1075492 article EN 1989-01-01

This paper explores various static interpolation methods for approximating a single dynamically-interpolated language model used variety of recognition tasks on the Google Android platform. The goal is to find statically-interpolated firstpass LM that best reduces search errors in two-pass system or even allows eliminating more complex dynamic second pass entirely. Static weights are uniform, prior-weighted, and maximum likelihood, posteriori, Bayesian solutions considered. Analysis argues...

10.21437/interspeech.2011-249 article EN Interspeech 2022 2011-08-27

We present the concepts of weighted language, transduction and automaton from algebraic automata theory as a general framework for describing implementing decoding cascades in speech language processing. This generality allows us to represent uniformly such information sources pronunciation dictionaries, models lattices, use uniform algorithms building stages optimizing combining them. In particular, single join algorithm can be used either combine dictionary context-dependency model during...

10.3115/1075812.1075870 article EN 1994-01-01

The authors investigate an automatic approach to segmentation of labeled speech and labeling when only the orthographic transcription is available. technique based on a phone recognition system trigram phonotactic model, gamma distribution duration models, spectral model five different structures for models varying contextual dependencies. alignment with given sequence performed as very constrained task sequence. When provided, classification-tree-based prediction most likely realizations...

10.1109/icassp.1991.150379 article EN 1991-01-01

We combine our earlier approach to context-dependent network representation with algorithm for determining weighted networks build optimized large-vocabulary speech recognition combining an n-gram language model, a pronunciation dictionary and context-dependency modeling. While fully-expanded have been used before in restrictive settings (medium vocabulary or no cross-word contexts), we demonstrate that determination method makes it practical use also full context For the DARPA North...

10.1109/icassp.1998.675352 article EN 2002-11-27

We showed in previous work that weighted finite-state transducers provide a common representation for many components of speech recognition system and described general algorithms combining these representations to build single optimized compact transducer integrating all components, directly mapping from HMM states words. This approach works well certain well-controlled input transducers, but presents some problems related the efficiency composition applicability determinization...

10.1109/icassp.2004.1326097 article EN IEEE International Conference on Acoustics Speech and Signal Processing 2004-09-28

Methods to predict detailed phonetic pronunciations from a coarse phonemic transcription are described. The base forms, obtainable orthographic text by dictionary lookup and other means, do not specify fine detail such as flapping, glottal stop insertion, or the formation of syllabic nasals liquids. These phenomena depend on context (often spanning word boundaries), stress environment, speaking rate, dialect. A procedure is presented that builds decision trees, trained TIMIT database, using...

10.1109/icassp.1991.150446 article EN 1991-01-01

10.21437/icslp.2002-401 article EN 4th International Conference on Spoken Language Processing (ICSLP 1996) 2002-09-16
Coming Soon ...