- Speech Recognition and Synthesis
- Natural Language Processing Techniques
- Topic Modeling
- Music and Audio Processing
- Speech and Audio Processing
- Speech and dialogue systems
- Hand Gesture Recognition Systems
- Hearing Impairment and Communication
- Blind Source Separation Techniques
- Advanced Text Analysis Techniques
- Algorithms and Data Compression
- Handwritten Text Recognition Techniques
- Text and Document Classification Technologies
- Human Pose and Action Recognition
- Phonocardiography and Auscultation Techniques
- Video Analysis and Summarization
- Gait Recognition and Analysis
- Advanced Data Compression Techniques
- Sentiment Analysis and Opinion Mining
- Web Data Mining and Analysis
- Time Series Analysis and Forecasting
- Advanced Adaptive Filtering Techniques
- Neural Networks and Applications
- Flow Measurement and Analysis
- Phonetics and Phonology Research
Stantec (Canada)
2022
Boğaziçi University
2012-2021
IBM (United States)
2013
Brigham Young University - Idaho
2012
Philips (Netherlands)
2008
AT&T (United States)
2002-2006
Google (United States)
2004
Johns Hopkins University
2000-2002
Carnegie Mellon University
1999
Ever-increasing computing power and connectivity bandwidth, together with falling storage costs, are resulting in an overwhelming amount of data various types being produced, exchanged, stored. Consequently, information search retrieval has emerged as a key application area. Text-based is the most active area, applications that range from Web local network to searching for personal residing on one's own hard-drive. Speech received less attention perhaps because large collections spoken...
This paper considers the problem of constructing an efficient inverted index for spoken term detection (STD) task. More specifically, we construct a deterministic weighted finite-state transducer storing soft-hits in form (utterance ID, start time, end posterior score) quadruplets. We propose generalized factor structure which retains time information necessary performing STD. The required is embedded into path weights without disrupting inherent optimality. also describe how to all...
This paper describes discriminative language modeling for a large vocabulary speech recognition task. We contrast two parameter estimation methods: the perceptron algorithm, and method based on conditional random fields (CRFs). The models are encoded as deterministic weighted finite state automata, applied by intersecting automata with word-lattices that output from baseline recognizer. algorithm has benefit of automatically selecting relatively small feature set in just couple passes over...
We explore the use of morph-based language models in large-vocabulary continuous-speech recognition systems across four so-called morphologically rich languages: Finnish, Estonian, Turkish, and Egyptian Colloquial Arabic. The morphs are subword units discovered an unsupervised, data-driven way using Morfessor algorithm. By estimating n -gram over sequences instead words, quality model is improved through better vocabulary coverage reduced data sparsity. Standard word suffer from high...
This paper summarizes our recent efforts for building a Turkish Broadcast News transcription and retrieval system. The agglutinative nature of leads to high number out-of-vocabulary (OOV) words which in turn lower automatic speech recognition (ASR) accuracy. situation compromises the performance systems based on ASR output. Therefore using word-based is not adequate transcribing Turkish. To alleviate this problem, various sub-word-based units are utilized. These solve OOV problem with...
Conversational speech exhibits considerable pronunciation variability, which has been shown to have a detrimental effect on the accuracy of automatic recognition. There many attempts model variation, including use decision trees generate alternate word pronunciations from phonemic baseforms. Use models during recognition is known improve accuracy. This paper describes incorporation into acoustic training in addition Subtle difficulties straightforward alternatives canonical are first...
Much of the massive quantities digitized data widely available, e.g., text, speech, hand-written sequences, are either given directly, or, as a result some prior processing, weighted automata. These compact representations large number alternative sequences and their weights reflecting uncertainty or variability data. Thus, indexation such requires indexing
In this paper, we present a baseline spoken term detection (STD) system for Turkish broadcast news. The agglutinative structure of causes high out-of-vocabulary (OOV) rate and increases word error (WER) in automatic speech recognition. Several approaches are attempted to reduce negative effect on the STD system. Sub-word units used handle OOV queries lattice-based indexing is obtain different operating points WER cases. A recently proposed method setting specific thresholds also evaluated...
The aim of this study is to find a useful methodology classify multiple distinct pulmonary conditions including the healthy condition and various pathological types, using sounds data.Fourteen-channel data 40 subjects (healthy pathological, where pathologies are obstructive restrictive types) modeled second order 250-point vector autoregressive model. estimated model parameters fed support machine Gaussian mixture (GMM) classifiers which used in configurations, resulting eight different...
We describe a method for discriminative training of language model that makes use syntactic features. follow reranking approach, where baseline recogniser is used to produce 1000-best output each acoustic input, and second "reranking" then choose an utterance from these lists. The features together with parameter estimation based on the perception algorithm. experiments Switchboard speech recognition task. provide additional 0.3% reduction in test-set error rate beyond (Roark et al., 2004a;...
It is practically impossible to build a word-based lexicon for speech recognition in agglutinative languages that would cover all the relevant words. The problem words are generally built by concatenating several prefixes and suffixes word roots. Together with compounding inflections this leads millions of different, but still frequent forms. Due inflections, ambiguity other phenomena, it also not trivial automatically split into meaningful parts. Rule-based morphological analyzers can...
This paper describes the AT&T WATSON real-time speech recognizer, product of several decades research at AT&T. The recognizer handles a wide range vocabulary sizes and is based on continuous-density hidden Markov models for acoustic modeling finite state networks language modeling. recognition network optimized efficient search. We identify algorithms used high-accuracy, low-latency recognition. present results small large tasks taken from VoiceTone/sup /spl reg// service, showing word...
The spoken term detection (STD) task aims to return relevant segments from a archive that contain the query terms whether or not they are in system vocabulary. This paper focuses on pronunciation modeling for out-of-vocabulary (OOV) which frequently occur STD queries. described this indexes word-level and sub-word level lattices confusion networks produced by an LVCSR using weighted finite state transducers (WFST).We investigate inclusion of n-best variants OOV (obtained letter-to-sound...
Wind power may present undesirable discontinuities and fluctuations due to considerable variations in wind speed, which affect adversely the smooth operation of grid. Effective forecast is essential order report amount energy supply with high accuracy, crucial for planning resources system operators. Variations cannot be sufficiently estimated by persistence type basic forecasting methods particularly medium long terms. Therefore a new statistical method presented here this paper based on...
In this paper, we propose a method for out-of-vocabulary (OOV) word detection and take step toward open vocabulary automatic speech recognition. The proposed uses hybrid language model combining words subword units such as phones or syllables. We describe algorithm based on the posterior count of OOV given model, compare it to using probability best string conventional only model. Experimental results Switchboard corpus are presented different sizes. new yields gain over 10% in detection....
This paper investigates error-corrective language modeling using the perceptron algorithm on word lattices. The resulting model is encoded as a weighted finite-state automaton, and used by intersecting with lattices, making it simple inexpensive to apply during decoding. We present results for various training scenarios Switchboard task, including n-gram features of different orders, performing n-best extraction versus full demonstrate importance conditions close possible testing conditions....
We explore morphology-based and sub-word language modeling approaches proposed for morphologically rich languages, evaluate contrast them Turkish broadcast news transcription task. In addition, as a model, we improve our previously morphology-integrated model automatic speech recognition. This is built by composing the finite-state transducer of morphological parser with over lexical morphemes. approach provides search network an unlimited vocabulary, generating only valid word forms while...
This paper introduces two complementary language modeling approaches for morphologically rich languages aiming to alleviate out-of-vocabulary (OOV) word problem and exploit morphology as a knowledge source. The first model, morpholexical is generative <formula formulatype="inline" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"><tex Notation="TeX">$n$</tex></formula> -gram where units are lexical-grammatical morphemes instead of commonly used words...
Keyword search, in the context of low resource languages, has emerged as a key area research. The dominant approach keyword search is to use Automatic Speech Recognition (ASR) front end produce representation audio that can be indexed. biggest drawback this lies its inability deal with out-of-vocabulary words and query terms are not ASR system output. In paper we present an empirical study evaluating various approaches based on using confusion models expansion techniques address problem. We...