- Speech Recognition and Synthesis
- Speech and Audio Processing
- Music and Audio Processing
- Natural Language Processing Techniques
- Speech and dialogue systems
- Neural Networks and Applications
- Advanced Data Compression Techniques
- Bayesian Methods and Mixture Models
- Medical Image Segmentation Techniques
- Target Tracking and Data Fusion in Sensor Networks
- Topic Modeling
- Algorithms and Data Compression
- Blind Source Separation Techniques
- Cerebrovascular and Carotid Artery Diseases
- Phonetics and Phonology Research
- Privacy-Preserving Technologies in Data
- Stochastic Gradient Optimization Techniques
- Advanced Image Processing Techniques
- Time Series Analysis and Forecasting
- Bayesian Modeling and Causal Inference
- Flow Measurement and Analysis
- Embedded Systems Design Techniques
- Heat Transfer and Numerical Methods
- Advanced Vision and Imaging
- Control Systems and Identification
Technical University of Crete
2000-2016
First Technical University
2006
Boston University
1991-2005
SRI International
1992-2003
University of Geneva
2003
Menlo School
1993-2002
University of Crete
1999-2002
Many alternative models have been proposed to address some of the shortcomings hidden Markov model (HMM), which is currently most popular approach speech recognition. In particular, a variety that could be broadly classified as segment described for representing variable-length sequence observation vectors in recognition applications. Since there are many aspects common between these approaches, including general and training problems, it useful consider them unified framework. The paper...
A trend in automatic speech recognition systems is the use of continuous mixture-density hidden Markov models (HMMs). Despite good performance that these achieve on average large vocabulary applications, there a variability across speakers. Performance degrades dramatically when user radically different from training population. popular technique can improve and robustness system adapting to speaker, more generally channel task. In HMMs number component densities typically very large, it may...
A nontraditional approach to the problem of estimating parameters a stochastic linear system is presented. The method based on expectation-maximization algorithm and can be considered as continuous analog Baum-Welch estimation for hidden Markov models. used training dynamical model that proposed better representing spectral dynamics speech recognition. It assumed observed feature vectors phone segment are output system, it shown how evolution function length modeled using alternative...
An algorithm is proposed that achieves a good tradeoff between modeling resolution and robustness by using new, general scheme for tying of mixture components in continuous mixture-density hidden Markov model (HMM)-based speech recognizers. The sets HMM states share the same are determined automatically agglomerative clustering techniques. Experimental results on ARPA's Wall Street Journal corpus show this reduces errors 25% over typical tied-mixture systems. New fast algorithms computing...
The authors describe a technique called progressive search which is useful for developing and implementing speech recognition systems with high computational requirements. scheme iteratively uses more complex schemes, where each iteration constrains the space of next. An algorithm forward-backward word-life described. It can generate word lattice in that would be used as language model embedded succeeding pass to reduce computation shown speed-ups than an order magnitude are achievable only...
Adapting the parameters of a statistical speaker independent continuous-speech recognizer to and channel can significantly improve recognition performance robustness system. In continuous mixture-density hidden Markov models number component densities is typically very large, it may not be feasible acquire sufficient amount adaptation data for robust maximum-likelihood estimates. To solve this problem, we have recently proposed constrained estimation technique Gaussian mixture densities....
We examine alternative architectures for a client-server model of speech-enabled applications over the World Wide Web (WWW). compare server-only processing where client encodes and transmits speech signal to server, recognition front end runs locally at cepstral coefficients server Internet. follow novel encoding paradigm, trying maximize performance instead perceptual reproduction, we find that by transmitting can achieve significantly higher fraction bit rate required when directly....
This original volume describes the Spoken Language Translator (SLT), one of first major automatic speech translation projects. The SLT system can translate between English, French, and Swedish in domain air travel planning, using a vocabulary about 1500 words, with an accuracy 75%. authors detail language processing components, largely built on top SRI Core Engine, combination general grammars techniques that allow them to be rapidly customized specific domains. They base recognition Hidden...
We propose a scheme that improves the robustness of continuous HMM systems use mixture observation densities by sharing same components among different states. The sets states share are determined automatically using agglomerative clustering techniques. Experimental results on Wall-Street Journal Corpus show our new form output distributions achieves 25% reduction in error rate over typical tied-mixture systems.< <ETX xmlns:mml="http://www.w3.org/1998/Math/MathML"...
A simple and general method is described that can combine different knowledge sources to reorder N-best lists of hypotheses produced by a speech recognizer. The automatically trainable, acquiring information from both positive negative examples. In experiments, the was tested on 1000-utterance sample unseen ATIS data.
An dynamical system model is proposed for better representing the spectral dynamics of speech recognition. It assumed that observed feature vectors a phone segment are output stochastic linear system, and two alternative assumptions regarding relationship length evolution considered. Training equivalent to identification nontraditional approach based on estimate-maximize algorithm followed. This evaluated phoneme classification task using TIMIT database. shown performance obtained...
Methods for reducing the computation requirements of joint segmentation and recognition phones using stochastic segment model are presented. The approach uses a fast classification method that reduces by factor two to four, depending on confidence choosing most probable model. A split-and-merge algorithm is proposed as an alternative typical dynamic programming solution problem, with savings increasing proportionally complexity. Although current recognizer context-independent phone models,...
The mismatch that frequently occurs between the training and testing conditions of an automatic speech recognizer can be efficiently reduced by adapting parameters to conditions. Two measures characterize performance adaptation algorithm are speed with which it adapts new conditions, its computational complexity, is important for online applications. A family algorithms continuous-density hidden Markov model (HMM) based recognizers have appeared on constrained reestimation distribution...
The recognition accuracy in previous large vocabulary automatic speech (ASR) systems is highly related to the existing mismatch between training and testing sets. For example, dialect differences across speakers result a significant degradation performance. Some popular adaptation approaches improve performance of recognizers based on hidden Markov models with continuous mixture densities by using linear transformations adapt means, possibly covariances Gaussians. assumption, however, too...
Several adaptation approaches have been proposed in an effort to improve the speech recognition performance mismatched conditions. However, application of these had mostly constrained speaker or channel tasks. We first investigate effect dialects between training and testing speakers automatic (ASR) system. find that a mismatch significantly influences accuracy. Consequently, we apply several develop dialect-specific system using dialect-dependent trained on different dialect small number...
The performance and robustness of a speech recognition system can be improved by adapting the models to speaker, channel task. In continuous mixture-density hidden Markov number component densities is typically very large, it may not feasible acquire large amount adaptation data for robust maximum-likelihood estimates. To solve this problem, we propose constrained estimation technique Gaussian mixture densities, combine with Bayesian techniques improve its asymptotic properties. We evaluate...
This paper summarizes the work of "Rapid Speech Recognizer Adaptation" team in workshop held at Johns Hopkins University summer 1998. The project addressed modeling dependencies between units speech with goal making more effective use small amounts data for speaker adaptation. A variety methods were investigated and their effectiveness a rapid adaptation task defined on Switchboard conversational corpus is reported.