- Speech Recognition and Synthesis
- Natural Language Processing Techniques
- Topic Modeling
- Speech and Audio Processing
- Music and Audio Processing
- Advanced Graph Neural Networks
- Speech and dialogue systems
- Multimodal Machine Learning Applications
- Neural Networks and Applications
- Domain Adaptation and Few-Shot Learning
- Biomedical Text Mining and Ontologies
- Reinforcement Learning in Robotics
- Blind Source Separation Techniques
- Bayesian Modeling and Causal Inference
- Advanced Computational Techniques and Applications
- Advanced Image and Video Retrieval Techniques
- Advanced Neural Network Applications
- Bioinformatics and Genomic Networks
- Neural Networks and Reservoir Computing
- Cognitive Science and Mapping
- Recommender Systems and Techniques
- Gene expression and cancer classification
- Hate Speech and Cyberbullying Detection
- Advanced Sensor and Control Systems
- Child and Animal Learning Development
Chengdu University of Technology
2010-2024
Sichuan University
2024
West China Hospital of Sichuan University
2024
National Sun Yat-sen University
2010-2023
Johns Hopkins University
2020-2022
Meta (United States)
2020-2021
Meta (Israel)
2020-2021
University of Science and Technology of China
2019-2020
JDSU (United States)
2019
Xiamen University
2015-2017
Knowledge graph embedding has been an active research topic for knowledge base completion, with progressive improvement from the initial TransE, TransH, DistMult et al to current state-of-the-art ConvE. ConvE uses 2D convolution over embeddings and multiple layers of nonlinear features model graphs. The can be efficiently trained scalable large However, there is no structure enforcement in space recent convolutional network (GCN) provides another way learning node by successfully utilizing...
Multi-hop reading comprehension (RC) across documents poses new challenge over single-document RC because it requires reasoning multiple to reach the final answer. In this paper, we propose a model tackle multi-hop problem. We introduce heterogeneous graph with different types of nodes and edges, which is named as Heterogeneous Document-Entity (HDE) graph. The advantage HDE that contains granularity levels information including candidates, entities in specific document contexts. Our proposed...
Ann Lee, Peng-Jen Chen, Changhan Wang, Jiatao Gu, Sravya Popuri, Xutai Ma, Adam Polyak, Yossi Adi, Qing He, Yun Tang, Juan Pino, Wei-Ning Hsu. Proceedings of the 60th Annual Meeting Association for Computational Linguistics (Volume 1: Long Papers). 2022.
We introduce fairseq S2T, a extension for speech-to-text (S2T) modeling tasks such as end-to-end speech recognition and translation. It follows fairseq's careful design scalability extensibility. provide workflows from data pre-processing, model training to offline (online) inference. implement state-of-the-art RNN-based, Transformer-based well Conformer-based models open-source detailed recipes. Fairseq's machine translation language can be seamlessly integrated into S2T multi-task learning...
Xian Li, Changhan Wang, Yun Tang, Chau Tran, Yuqing Juan Pino, Alexei Baevski, Alexis Conneau, Michael Auli. Proceedings of the 59th Annual Meeting Association for Computational Linguistics and 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). 2021.
We introduce fairseq S2T, a extension for speech-to-text (S2T) modeling tasks such as end-to-end speech recognition and translation. It follows fairseq's careful design scalability extensibility. provide workflows from data pre-processing, model training to offline (online) inference. implement state-of-the-art RNN-based well Transformer-based models open-source detailed recipes. Fairseq's machine translation language can be seamlessly integrated into S2T multi-task learning or transfer...
Distance-based knowledge graph embeddings have shown substantial improvement on the link prediction task, from TransE to latest state-of-the-art RotatE. However, complex relations such as N-to-1, 1-to-N and N-to-N still remain challenging predict. In this work, we propose a novel distance-based approach for prediction. First, extend RotatE 2D domain high dimensional space with orthogonal transforms model relations. The transform embedding keeps capability modeling symmetric/anti-symmetric,...
Attention-based sequence-to-sequence modeling provides a powerful and elegant solution for applications that need to map one sequence different sequence. Its success heavily relies on the availability of large amounts training data. This presents challenge speech where labelled data is very expensive obtain, such as automatic recognition (ASR) translation (ST). In this study, we propose general multi-task learning framework leverage text ASR ST tasks. Two auxiliary tasks, denoising...
Yun Tang, Juan Pino, Xian Li, Changhan Wang, Dmitriy Genzel. Proceedings of the 59th Annual Meeting Association for Computational Linguistics and 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). 2021.
This paper aims to improve the widely used deep speaker embedding x-vector model. We propose following improvements: (1) a hybrid neural network structure using both time delay (TDNN) and long short-term memory networks (LSTM) generate complementary information at different levels; (2) multi-level pooling strategy collect from TDNN LSTM layers; (3) regularization scheme on extraction layer make extracted embeddings suitable for fusion step. The synergy of these improvements are shown NIST...
One of the main challenges for end-to-end speech translation is data scarcity.We leverage pseudo-labels generated from unlabeled audio by a cascade and an model.This provides 8.3 5.7 BLEU gains over strong semi-supervised baseline on MuST-C English-French English-German datasets, reaching state-of-the art performance.The effect quality investigated.Our approach shown to be more effective than simply pre-training encoder recognition task.Finally, we demonstrate effectiveness self-training...
The application of neural networks to seismic first break (FB) picking research has been developed for many years. Numerous multitrace FB methods based on convolutional (CNNs) have proposed. Among them, the pickup method semantic segmentation fully (FCNs) is proven stronger noise immunity. However, when in data with drastic variations FBs between local adjacent traces, because feature extraction FCNs convergence information around data, network output edge tends smooth, which leads a...
We present a simple yet effective approach to build multilingual speech-to-text (ST) translation by efficient transfer learning from pretrained speech encoder and text decoder. Our key finding is that minimalistic LNA (LayerNorm Attention) finetuning can achieve zero-shot crosslingual cross-modality ability only less than 10% of the parameters. This enables effectively leveraging large models with low training cost. Using wav2vec 2.0 for acoustic modeling, mBART generation, our advanced new...
Multi-hop reading comprehension (RC) across documents poses new challenge over single-document RC because it requires reasoning multiple to reach the final answer. In this paper, we propose a model tackle multi-hop problem. We introduce heterogeneous graph with different types of nodes and edges, which is named as Heterogeneous Document-Entity (HDE) graph. The advantage HDE that contains granularity levels information including candidates, entities in specific document contexts. Our proposed...
Recent years have seen great success in the use of neural seq2seq models on text-to-SQL task. However, little work has paid attention to how these generalize realistic unseen data, which naturally raises a question: does this impressive performance signify perfect generalization model, or are there still some limitations?In paper, we first diagnose bottleneck task by providing new testbed, observe that existing present poor ability rarely-seen data. The above analysis encourages us design...
This paper presents a new approach to feature analysis in automatic speech recognition (ASR) based on locality preserving projections (LPP). LPP is manifold dimensionality reduction algorithm which can be trained and applied as linear projection ASR features. Conventional algorithms are generally restricted batch mode implementation it difficult practice apply them unseen data. It argued that model vectors assumed lie nonlinear embedding subspace by local relations among input features, so...
Abstract There is a growing chorus of voices in the scientific community calling for greater openness sharing raw data that lead to publication. In this commentary, we discuss merits sharing, common concerns are raised, and practical issues arise developing policy. We suggest cognitive science topic establish data‐sharing
Speech emotion recognition (SER) has attracted great attention in recent years due to the high demand for emotionally intelligent speech interfaces. Deriving speaker-invariant representations is crucial. In this paper, we propose apply adversarial training SER learn representations. Our model consists of three parts: a representation learning sub-network with time-delay neural network (TDNN) and LSTM statistical pooling, an classification speaker network. Both take output as input. Two...
This paper investigates the impact of sub space based techniques for acoustic modeling in automatic speech recognition (ASR). There are many well known approaches to subspace speaker adaptation which represent sources variability as a projection within low dimensional subspace. A new approach ASR, referred Gaussian mixture model (SGMM), represents phonetic set projections applied at state level hidden Markov (HMM) model. The SGMM these intrinsic is evaluated continuous (CSR) task. shown...
Predictive state representations (PSRs) are powerful methods of modeling dynamical systems by representing through observational data. Most the current PSR techniques focus on learning a complete model from entire space. Consequently, often not scalable due to dimensional curse, which limits applications PSR. In this paper, we propose new technique. Instead directly at one time, learn set local models each is constructed sub-state space and then combine learnt models. We employ landmark...
Speaker space-based adaptation methods for automatic speech recognition have been shown to provide significant performance improvements tasks where only a few seconds of is available. However, these techniques are not widely used in practical applications because they require large amounts speaker-dependent training data and computer memory. The authors propose robust, low-complexity technique within this general class that has reduce word error rate, the storage requirements associated with...