- Speech Recognition and Synthesis
- Speech and Audio Processing
- Music and Audio Processing
- Natural Language Processing Techniques
- Topic Modeling
- Speech and dialogue systems
- Multimodal Machine Learning Applications
- Advanced Adaptive Filtering Techniques
- Neural Networks and Applications
- Text Readability and Simplification
- Advanced Text Analysis Techniques
- Domain Adaptation and Few-Shot Learning
- Advanced Graph Neural Networks
- Acoustic Wave Phenomena Research
- Data Quality and Management
- Phonetics and Phonology Research
- Hearing Loss and Rehabilitation
- Sentiment Analysis and Opinion Mining
- Anomaly Detection Techniques and Applications
- Software Engineering Research
- Semantic Web and Ontologies
- Generative Adversarial Networks and Image Synthesis
- Advanced Data Compression Techniques
- Blind Source Separation Techniques
- Algorithms and Data Compression
Bellevue Hospital Center
2017-2025
Tencent (China)
2018-2025
Beijing Language and Culture University
2014-2024
Shandong Normal University
2024
Chuzhou University
2022-2024
Anhui Medical University
2022-2024
Institute of Electrical and Electronics Engineers
2023
KLA (United States)
2018-2023
Zhejiang University
2023
Signal Processing (United States)
2023
Most current speech recognition systems use hidden Markov models (HMMs) to deal with the temporal variability of and Gaussian mixture (GMMs) determine how well each state HMM fits a frame or short window frames coefficients that represents acoustic input. An alternative way evaluate fit is feed-forward neural network takes several as input produces posterior probabilities over states output. Deep networks (DNNs) have many layers are trained using new methods been shown outperform GMMs on...
Переход к персонализированной медицине в практическом плане должен сочетать исследование проблемы молекулярно-генетической предрасположенности заболеваниям с анализом переходных состояний организме направлении возможной патологии.Классификация и контроль состояния могут эффективно осуществляться использованием методов искусственного интеллекта
Recently, the hybrid deep neural network (DNN)-hidden Markov model (HMM) has been shown to significantly improve speech recognition performance over conventional Gaussian mixture (GMM)-HMM. The improvement is partially attributed ability of DNN complex correlations in features. In this paper, we show that further error rate reduction can be obtained by using convolutional networks (CNNs). We first present a concise description basic CNN and explain how it used for recognition. propose...
We show empirically that in SGD training of deep neural networks, one can, at no or nearly loss accuracy, quantize the gradients aggressively—to but bit per value—if quantization error is carried forward across minibatches (error feedback). This size reduction makes it feasible to parallelize through data-parallelism with fast processors like recent GPUs. implement data-parallel deterministically distributed by combining this finding AdaGrad, automatic minibatch-size selection, double...
In this paper, we propose the utterance-level permutation invariant training (uPIT) technique. uPIT is a practically applicable, end-to-end, deep-learning-based solution for speaker independent multitalker speech separation. Specifically, extends recently proposed (PIT) technique with an cost function, hence eliminating need solving additional problem during inference, which otherwise required by frame-level PIT. We achieve using recurrent neural networks (RNNs) that, training, minimize...
We propose a novel deep learning training criterion, named permutation invariant (PIT), for speaker independent multi-talker speech separation, commonly known as the cocktail-party problem. Different from multi-class regression technique and clustering (DPCL) technique, our approach minimizes separation error directly. This strategy effectively solves long-lasting label problem, that has prevented progress on based techniques separation. evaluated PIT WSJ0 Danish mixed-speech tasks found it...
Deep learning is becoming a mainstream technology for speech recognition at industrial scale. In this paper, we provide an overview of the work by Microsoft researchers since 2009 in area, focusing on more recent advances which shed light to basic capabilities and limitations current deep technology. We organize along feature-domain model-domain dimensions according conventional approach analyzing systems. Selected experimental results, including related applications such as spoken dialogue...
Speech emotion recognition is a challenging problem partly because it unclear what features are effective for the task. In this paper we propose to utilize deep neural networks (DNNs) extract high level from raw data and show that they speech recognition. We first produce an state probability distribution each segment using DNNs. then construct utterance-level segment-level distributions. These utterancelevel fed into extreme learning machine (ELM), special simple efficient...
Recently, a new acoustic model based on deep neural networks (DNN) has been introduced. While the DNN generated significant improvements over GMM-based systems several tasks, there no evaluation of robustness such to environmental distortion. In this paper, we investigate noise DNN-based models and find that they can match state-of-the-art performance Aurora 4 task without any explicit compensation. This be further improved by incorporating information about environment into training using...
We investigate the potential of Context-Dependent Deep-Neural-Network HMMs, or CD-DNN-HMMs, from a feature-engineering perspective. Recently, we had shown that for speaker-independent transcription phone calls (NIST RT03S Fisher data), CD-DNN-HMMs reduced word error rate by as much one third-from 27.4%, obtained discriminatively trained Gaussian-mixture HMMs with HLDA features, to 18.5%-using 300+ hours training data (Switchboard), 9000+ tied triphone states, and up 9 hidden network layers.
In the deep neural network (DNN), hidden layers can be considered as increasingly complex feature transformations and final softmax layer a log-linear classifier making use of most abstract features computed in layers. While loglinear should different for languages, shared across languages. this paper we propose shared-hidden-layer multilingual DNN (SHL-MDNN), which are made common many languages while language dependent. We demonstrate that SHL-MDNN reduce errors by 3-5%, relatively, all...
Conversational speech recognition has served as a flagship task since the release of Switchboard corpus in 1990s. In this paper, we measure human error rate on widely used NIST 2000 test set, and find that our latest automated system reached parity. The professional transcribers is 5.9% for portion data, which newly acquainted pairs people discuss an assigned topic, 11.3% CallHome where friends family members have open-ended conversations. both cases, establishes new state art, edges past...
Semantic slot filling is one of the most challenging problems in spoken language understanding (SLU). In this paper, we propose to use recurrent neural networks (RNNs) for task, and present several novel architectures designed efficiently model past future temporal dependencies. Specifically, implemented compared important RNN architectures, including Elman, Jordan, hybrid variants. To facilitate reproducibility, these with publicly available Theano network toolkit completed experiments on...
Automatically generated malware is a significant problem for computer users. Analysts are able to manually investigate small number of unknown files, but the best large-scale defense detecting automated classification. Malware classifiers often use sparse binary features, and potential features can be on order tens or hundreds millions. Feature selection reduces manageable training simpler algorithms such as logistic regression, this still too large more complex neural networks. To overcome...
The purpose of this article is to introduce the readers emerging technologies enabled by deep learning and review research work conducted in area that direct relevance signal processing. We also point out, our view, future directions may attract interests require efforts from more processing researchers practitioners for advancing information technology applications.
We propose a novel regularized adaptation technique for context dependent deep neural network hidden Markov models (CD-DNN-HMMs). The CD-DNN-HMM has large output layer and many layers, each with thousands of neurons. huge number parameters in the makes challenging task, esp. when set is small. developed this paper adapts model conservatively by forcing senone distribution estimated from adapted to be close that unadapted model. This constraint realized adding Kullback-Leibler divergence...
Manually crafted combinatorial features have been the "secret sauce" behind many successful models. For web-scale applications, however, variety and volume of make these manually expensive to create, maintain, deploy. This paper proposes Deep Crossing model which is a deep neural network that automatically combines produce superior The input set individual can be either dense or sparse. important crossing are discovered implicitly by networks, comprised an embedding stacking layer, as well...
This paper reports our recent exploration of the layer-by-layer learning strategy for training a multi-layer generative model patches speech spectrograms. The top layer learns binary codes that can be used efficient compression and could also scalable recognition or rapid content retrieval. Each is fully connected to below weights on these connections are pretrained efficiently by using contrastive divergence approximation log likelihood gradient. After layer-bylayer pre-training we “unroll”...
Recently, convolutional neural networks (CNNs) have been shown to outperform the standard fully connected deep within hybrid network / hidden Markov model (DNN/HMM) framework on phone recognition task.In this paper, we extend earlier basic form of CNN and explore it in multiple ways.We first investigate several architectures, including full limited weight sharing, convolution along frequency time axes, stacking layers.We then develop a novel weighted softmax pooling layer so that size can be...
Recurrent Neural Network Language Models (RNN-LMs) have recently shown exceptional performance across a variety of applications. In this paper, we modify the architecture to perform Understanding, and advance state-of-the-art for widely used ATIS dataset. The core our approach is take words as input in standard RNN-LM, then predict slot labels rather than on output side. We present several variations that differ amount word context side, use non-lexical features. Remarkably, simplest model...
Neural network based approaches have recently produced record-setting performances in natural language understanding tasks such as word labeling. In the labeling task, a tagger is used to assign label each an input sequence. Specifically, simple recurrent neural networks (RNNs) and convolutional (CNNs) shown significantly outperform previous state-of-the-art - conditional random fields (CRFs). This paper investigates using long short-term memory (LSTM) networks, which contain input, output...