NFDI4DS | UHH-SEMS - Publication Details

Dong Yu

ORCID: 0000-0003-0520-6844

Publications

Citations

Views

---

Saved

---

About

Contact & Profiles

A5034476404

Research Areas

Speech Recognition and Synthesis
Speech and Audio Processing
Music and Audio Processing
Natural Language Processing Techniques
Topic Modeling
Speech and dialogue systems
Multimodal Machine Learning Applications
Advanced Adaptive Filtering Techniques
Neural Networks and Applications
Text Readability and Simplification
Advanced Text Analysis Techniques
Domain Adaptation and Few-Shot Learning
Advanced Graph Neural Networks
Acoustic Wave Phenomena Research
Data Quality and Management
Phonetics and Phonology Research
Hearing Loss and Rehabilitation
Sentiment Analysis and Opinion Mining
Anomaly Detection Techniques and Applications
Software Engineering Research
Semantic Web and Ontologies
Generative Adversarial Networks and Image Synthesis
Advanced Data Compression Techniques
Blind Source Separation Techniques
Algorithms and Data Compression

Bellevue Hospital Center
2017-2025

Tencent (China)
2018-2025

Beijing Language and Culture University
2014-2024

Shandong Normal University
2024

Chuzhou University
2022-2024

Anhui Medical University
2022-2024

Institute of Electrical and Electronics Engineers
2023

KLA (United States)
2018-2023

Zhejiang University
2023

Signal Processing (United States)
2023

Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups

OPENALEX - Publications

Geoffrey E. Hinton Li Deng Dong Yu George E. Dahl Abdelrahman Mohamed and 6 more

Most current speech recognition systems use hidden Markov models (HMMs) to deal with the temporal variability of and Gaussian mixture (GMMs) determine how well each state HMM fits a frame or short window frames coefficients that represents acoustic input. An alternative way evaluate fit is feed-forward neural network takes several as input produces posterior probabilities over states output. Deep networks (DNNs) have many layers are trained using new methods been shown outperform GMMs on...

10.1109/msp.2012.2205597 article EN IEEE Signal Processing Magazine 2012-10-19

Deep Learning: Methods and Applications

OPENALEX - Publications

Li Deng Dong Yu

Переход к персонализированной медицине в практическом плане должен сочетать исследование проблемы молекулярно-генетической предрасположенности заболеваниям с анализом переходных состояний организме направлении возможной патологии.Классификация и контроль состояния могут эффективно осуществляться использованием методов искусственного интеллекта

10.1561/2000000039 article RU Foundations and Trends® in Signal Processing 2014-01-01

Convolutional Neural Networks for Speech Recognition

OPENALEX - Publications

Ossama Abdel‐Hamid Abdelrahman Mohamed Hui Jiang Li Deng Gerald Penn and 1 more

Recently, the hybrid deep neural network (DNN)-hidden Markov model (HMM) has been shown to significantly improve speech recognition performance over conventional Gaussian mixture (GMM)-HMM. The improvement is partially attributed ability of DNN complex correlations in features. In this paper, we show that further error rate reduction can be obtained by using convolutional networks (CNNs). We first present a concise description basic CNN and explain how it used for recognition. propose...

10.1109/taslp.2014.2339736 article EN IEEE/ACM Transactions on Audio Speech and Language Processing 2014-07-16

Conversational speech transcription using context-dependent deep neural networks

OPENALEX - Publications

Frank Seide Gang Li Dong Yu

10.21437/interspeech.2011-169 article Interspeech 2022 2011-08-27

1-bit stochastic gradient descent and its application to data-parallel distributed training of speech DNNs

OPENALEX - Publications

Frank Seide Hao Fu Jasha Droppo Gang Li Dong Yu

We show empirically that in SGD training of deep neural networks, one can, at no or nearly loss accuracy, quantize the gradients aggressively—to but bit per value—if quantization error is carried forward across minibatches (error feedback). This size reduction makes it feasible to parallelize through data-parallelism with fast processors like recent GPUs. implement data-parallel deterministically distributed by combining this finding AdaGrad, automatic minibatch-size selection, double...

10.21437/interspeech.2014-274 article EN Interspeech 2022 2014-09-14

Multitalker Speech Separation With Utterance-Level Permutation Invariant Training of Deep Recurrent Neural Networks

OPENALEX - Publications

Morten Kolbæk Dong Yu Zheng‐Hua Tan Jesper Jensen

In this paper, we propose the utterance-level permutation invariant training (uPIT) technique. uPIT is a practically applicable, end-to-end, deep-learning-based solution for speaker independent multitalker speech separation. Specifically, extends recently proposed (PIT) technique with an cost function, hence eliminating need solving additional problem during inference, which otherwise required by frame-level PIT. We achieve using recurrent neural networks (RNNs) that, training, minimize...

10.1109/taslp.2017.2726762 article EN IEEE/ACM Transactions on Audio Speech and Language Processing 2017-07-13

Permutation invariant training of deep models for speaker-independent multi-talker speech separation

OPENALEX - Publications

Dong Yu Morten Kolbæk Zheng‐Hua Tan Jesper Jensen

We propose a novel deep learning training criterion, named permutation invariant (PIT), for speaker independent multi-talker speech separation, commonly known as the cocktail-party problem. Different from multi-class regression technique and clustering (DPCL) technique, our approach minimizes separation error directly. This strategy effectively solves long-lasting label problem, that has prevented progress on based techniques separation. evaluated PIT WSJ0 Danish mixed-speech tasks found it...

10.1109/icassp.2017.7952154 preprint EN 2017-03-01

Recent advances in deep learning for speech research at Microsoft

OPENALEX - Publications

Li Deng Jinyu Li Jui-Ting Huang Kaisheng Yao Dong Yu and 7 more

Deep learning is becoming a mainstream technology for speech recognition at industrial scale. In this paper, we provide an overview of the work by Microsoft researchers since 2009 in area, focusing on more recent advances which shed light to basic capabilities and limitations current deep technology. We organize along feature-domain model-domain dimensions according conventional approach analyzing systems. Selected experimental results, including related applications such as spoken dialogue...

10.1109/icassp.2013.6639345 article EN IEEE International Conference on Acoustics Speech and Signal Processing 2013-05-01

Speech emotion recognition using deep neural network and extreme learning machine

OPENALEX - Publications

Kun Han Dong Yu Ivan Tashev

Speech emotion recognition is a challenging problem partly because it unclear what features are effective for the task. In this paper we propose to utilize deep neural networks (DNNs) extract high level from raw data and show that they speech recognition. We first produce an state probability distribution each segment using DNNs. then construct utterance-level segment-level distributions. These utterancelevel fed into extreme learning machine (ELM), special simple efficient...

10.21437/interspeech.2014-57 article EN Interspeech 2022 2014-09-14

An investigation of deep neural networks for noise robust speech recognition

OPENALEX - Publications

Michael L. Seltzer Dong Yu Yongqiang Wang

Recently, a new acoustic model based on deep neural networks (DNN) has been introduced. While the DNN generated significant improvements over GMM-based systems several tasks, there no evaluation of robustness such to environmental distortion. In this paper, we investigate noise DNN-based models and find that they can match state-of-the-art performance Aurora 4 task without any explicit compensation. This be further improved by incorporating information about environment into training using...

10.1109/icassp.2013.6639100 article EN IEEE International Conference on Acoustics Speech and Signal Processing 2013-05-01

Feature engineering in Context-Dependent Deep Neural Networks for conversational speech transcription

OPENALEX - Publications

Frank Seide Gang Li Chen Xie Dong Yu

We investigate the potential of Context-Dependent Deep-Neural-Network HMMs, or CD-DNN-HMMs, from a feature-engineering perspective. Recently, we had shown that for speaker-independent transcription phone calls (NIST RT03S Fisher data), CD-DNN-HMMs reduced word error rate by as much one third-from 27.4%, obtained discriminatively trained Gaussian-mixture HMMs with HLDA features, to 18.5%-using 300+ hours training data (Switchboard), 9000+ tied triphone states, and up 9 hidden network layers.

10.1109/asru.2011.6163899 article EN 2011-12-01

Cross-language knowledge transfer using multilingual deep neural network with shared hidden layers

OPENALEX - Publications

Jui-Ting Huang Jinyu Li Dong Yu Li Deng Yifan Gong

In the deep neural network (DNN), hidden layers can be considered as increasingly complex feature transformations and final softmax layer a log-linear classifier making use of most abstract features computed in layers. While loglinear should different for languages, shared across languages. this paper we propose shared-hidden-layer multilingual DNN (SHL-MDNN), which are made common many languages while language dependent. We demonstrate that SHL-MDNN reduce errors by 3-5%, relatively, all...

10.1109/icassp.2013.6639081 article EN IEEE International Conference on Acoustics Speech and Signal Processing 2013-05-01

Achieving Human Parity in Conversational Speech Recognition

OPENALEX - Publications

Wayne Xiong Jasha Droppo Xuedong Huang Frank Seide Mike Seltzer and 3 more

Conversational speech recognition has served as a flagship task since the release of Switchboard corpus in 1990s. In this paper, we measure human error rate on widely used NIST 2000 test set, and find that our latest automated system reached parity. The professional transcribers is 5.9% for portion data, which newly acquainted pairs people discuss an assigned topic, 11.3% CallHome where friends family members have open-ended conversations. both cases, establishes new state art, edges past...

10.48550/arxiv.1610.05256 preprint EN other-oa arXiv (Cornell University) 2016-01-01

Using Recurrent Neural Networks for Slot Filling in Spoken Language Understanding

OPENALEX - Publications

Grégoire Mesnil Yann Dauphin Kaisheng Yao Yoshua Bengio Li Deng and 6 more

Semantic slot filling is one of the most challenging problems in spoken language understanding (SLU). In this paper, we propose to use recurrent neural networks (RNNs) for task, and present several novel architectures designed efficiently model past future temporal dependencies. Specifically, implemented compared important RNN architectures, including Elman, Jordan, hybrid variants. To facilitate reproducibility, these with publicly available Theano network toolkit completed experiments on...

10.1109/taslp.2014.2383614 article EN IEEE/ACM Transactions on Audio Speech and Language Processing 2014-12-25

Large-scale malware classification using random projections and neural networks

OPENALEX - Publications

George E. Dahl Jack W. Stokes Li Deng Dong Yu

Automatically generated malware is a significant problem for computer users. Analysts are able to manually investigate small number of unknown files, but the best large-scale defense detecting automated classification. Malware classifiers often use sparse binary features, and potential features can be on order tens or hundreds millions. Feature selection reduces manageable training simpler algorithms such as logistic regression, this still too large more complex neural networks. To overcome...

10.1109/icassp.2013.6638293 article EN IEEE International Conference on Acoustics Speech and Signal Processing 2013-05-01

Deep Learning and Its Applications to Signal and Information Processing [Exploratory DSP

OPENALEX - Publications

Dong Yu Li Deng

The purpose of this article is to introduce the readers emerging technologies enabled by deep learning and review research work conducted in area that direct relevance signal processing. We also point out, our view, future directions may attract interests require efforts from more processing researchers practitioners for advancing information technology applications.

10.1109/msp.2010.939038 article EN IEEE Signal Processing Magazine 2010-12-22

KL-divergence regularized deep neural network adaptation for improved large vocabulary speech recognition

OPENALEX - Publications

Dong Yu Kaisheng Yao Hang Su Gang Li Frank Seide

We propose a novel regularized adaptation technique for context dependent deep neural network hidden Markov models (CD-DNN-HMMs). The CD-DNN-HMM has large output layer and many layers, each with thousands of neurons. huge number parameters in the makes challenging task, esp. when set is small. developed this paper adapts model conservatively by forcing senone distribution estimated from adapted to be close that unadapted model. This constraint realized adding Kullback-Leibler divergence...

10.1109/icassp.2013.6639201 article EN IEEE International Conference on Acoustics Speech and Signal Processing 2013-05-01

Deep Crossing

OPENALEX - Publications

Ying Shan T. Ryan Hoens Jian Jiao Haijing Wang Dong Yu and 1 more

Manually crafted combinatorial features have been the "secret sauce" behind many successful models. For web-scale applications, however, variety and volume of make these manually expensive to create, maintain, deploy. This paper proposes Deep Crossing model which is a deep neural network that automatically combines produce superior The input set individual can be either dense or sparse. important crossing are discovered implicitly by networks, comprised an embedding stacking layer, as well...

10.1145/2939672.2939704 article EN 2016-08-08

Binary coding of speech spectrograms using a deep auto-encoder

OPENALEX - Publications

Li Deng Michael L. Seltzer Dong Yu Alex Acero Abdelrahman Mohamed and 1 more

This paper reports our recent exploration of the layer-by-layer learning strategy for training a multi-layer generative model patches speech spectrograms. The top layer learns binary codes that can be used efficient compression and could also scalable recognition or rapid content retrieval. Each is fully connected to below weights on these connections are pretrained efficiently by using contrastive divergence approximation log likelihood gradient. After layer-bylayer pre-training we “unroll”...

10.21437/interspeech.2010-487 article EN Interspeech 2022 2010-09-26

Exploring convolutional neural network structures and optimization techniques for speech recognition

OPENALEX - Publications

Ossama Abdel‐Hamid Li Deng Dong Yu

Recently, convolutional neural networks (CNNs) have been shown to outperform the standard fully connected deep within hybrid network / hidden Markov model (DNN/HMM) framework on phone recognition task.In this paper, we extend earlier basic form of CNN and explore it in multiple ways.We first investigate several architectures, including full limited weight sharing, convolution along frequency time axes, stacking layers.We then develop a novel weighted softmax pooling layer so that size can be...

10.21437/interspeech.2013-744 article EN Interspeech 2022 2013-08-25

Recurrent neural networks for language understanding

OPENALEX - Publications

Kaisheng Yao Geoffrey Zweig Mei-Yuh Hwang Yangyang Shi Dong Yu

Recurrent Neural Network Language Models (RNN-LMs) have recently shown exceptional performance across a variety of applications. In this paper, we modify the architecture to perform Understanding, and advance state-of-the-art for widely used ATIS dataset. The core our approach is take words as input in standard RNN-LM, then predict slot labels rather than on output side. We present several variations that differ amount word context side, use non-lexical features. Remarkably, simplest model...

10.21437/interspeech.2013-569 article EN Interspeech 2022 2013-08-25

Spoken language understanding using long short-term memory neural networks

OPENALEX - Publications

Kaisheng Yao Baolin Peng Yu Zhang Dong Yu Geoffrey Zweig and 1 more

Neural network based approaches have recently produced record-setting performances in natural language understanding tasks such as word labeling. In the labeling task, a tagger is used to assign label each an input sequence. Specifically, simple recurrent neural networks (RNNs) and convolutional (CNNs) shown significantly outperform previous state-of-the-art - conditional random fields (CRFs). This paper investigates using long short-term memory (LSTM) networks, which contain input, output...

10.1109/slt.2014.7078572 article EN 2022 IEEE Spoken Language Technology Workshop (SLT) 2014-12-01

Coming Soon ...