Yun Tang

ORCID: 0000-0002-3122-5881
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Speech Recognition and Synthesis
  • Natural Language Processing Techniques
  • Topic Modeling
  • Speech and Audio Processing
  • Music and Audio Processing
  • Advanced Graph Neural Networks
  • Speech and dialogue systems
  • Multimodal Machine Learning Applications
  • Neural Networks and Applications
  • Domain Adaptation and Few-Shot Learning
  • Biomedical Text Mining and Ontologies
  • Reinforcement Learning in Robotics
  • Blind Source Separation Techniques
  • Bayesian Modeling and Causal Inference
  • Advanced Computational Techniques and Applications
  • Advanced Image and Video Retrieval Techniques
  • Advanced Neural Network Applications
  • Bioinformatics and Genomic Networks
  • Neural Networks and Reservoir Computing
  • Cognitive Science and Mapping
  • Recommender Systems and Techniques
  • Gene expression and cancer classification
  • Hate Speech and Cyberbullying Detection
  • Advanced Sensor and Control Systems
  • Child and Animal Learning Development

Chengdu University of Technology
2010-2024

Sichuan University
2024

West China Hospital of Sichuan University
2024

National Sun Yat-sen University
2010-2023

Johns Hopkins University
2020-2022

Meta (United States)
2020-2021

Meta (Israel)
2020-2021

University of Science and Technology of China
2019-2020

JDSU (United States)
2019

Xiamen University
2015-2017

Knowledge graph embedding has been an active research topic for knowledge base completion, with progressive improvement from the initial TransE, TransH, DistMult et al to current state-of-the-art ConvE. ConvE uses 2D convolution over embeddings and multiple layers of nonlinear features model graphs. The can be efficiently trained scalable large However, there is no structure enforcement in space recent convolutional network (GCN) provides another way learning node by successfully utilizing...

10.1609/aaai.v33i01.33013060 article EN Proceedings of the AAAI Conference on Artificial Intelligence 2019-07-17

Multi-hop reading comprehension (RC) across documents poses new challenge over single-document RC because it requires reasoning multiple to reach the final answer. In this paper, we propose a model tackle multi-hop problem. We introduce heterogeneous graph with different types of nodes and edges, which is named as Heterogeneous Document-Entity (HDE) graph. The advantage HDE that contains granularity levels information including candidates, entities in specific document contexts. Our proposed...

10.18653/v1/p19-1260 preprint EN 2019-01-01

Ann Lee, Peng-Jen Chen, Changhan Wang, Jiatao Gu, Sravya Popuri, Xutai Ma, Adam Polyak, Yossi Adi, Qing He, Yun Tang, Juan Pino, Wei-Ning Hsu. Proceedings of the 60th Annual Meeting Association for Computational Linguistics (Volume 1: Long Papers). 2022.

10.18653/v1/2022.acl-long.235 article EN cc-by Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) 2022-01-01

We introduce fairseq S2T, a extension for speech-to-text (S2T) modeling tasks such as end-to-end speech recognition and translation. It follows fairseq's careful design scalability extensibility. provide workflows from data pre-processing, model training to offline (online) inference. implement state-of-the-art RNN-based, Transformer-based well Conformer-based models open-source detailed recipes. Fairseq's machine translation language can be seamlessly integrated into S2T multi-task learning...

10.48550/arxiv.2010.05171 preprint EN other-oa arXiv (Cornell University) 2020-01-01

Xian Li, Changhan Wang, Yun Tang, Chau Tran, Yuqing Juan Pino, Alexei Baevski, Alexis Conneau, Michael Auli. Proceedings of the 59th Annual Meeting Association for Computational Linguistics and 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). 2021.

10.18653/v1/2021.acl-long.68 article EN cc-by 2021-01-01

We introduce fairseq S2T, a extension for speech-to-text (S2T) modeling tasks such as end-to-end speech recognition and translation. It follows fairseq's careful design scalability extensibility. provide workflows from data pre-processing, model training to offline (online) inference. implement state-of-the-art RNN-based well Transformer-based models open-source detailed recipes. Fairseq's machine translation language can be seamlessly integrated into S2T multi-task learning or transfer...

10.18653/v1/2020.aacl-demo.6 preprint EN 2020-01-01

Distance-based knowledge graph embeddings have shown substantial improvement on the link prediction task, from TransE to latest state-of-the-art RotatE. However, complex relations such as N-to-1, 1-to-N and N-to-N still remain challenging predict. In this work, we propose a novel distance-based approach for prediction. First, extend RotatE 2D domain high dimensional space with orthogonal transforms model relations. The transform embedding keeps capability modeling symmetric/anti-symmetric,...

10.18653/v1/2020.acl-main.241 article EN cc-by 2020-01-01

Attention-based sequence-to-sequence modeling provides a powerful and elegant solution for applications that need to map one sequence different sequence. Its success heavily relies on the availability of large amounts training data. This presents challenge speech where labelled data is very expensive obtain, such as automatic recognition (ASR) translation (ST). In this study, we propose general multi-task learning framework leverage text ASR ST tasks. Two auxiliary tasks, denoising...

10.1109/icassp39728.2021.9415058 article EN ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2021-05-13

Yun Tang, Juan Pino, Xian Li, Changhan Wang, Dmitriy Genzel. Proceedings of the 59th Annual Meeting Association for Computational Linguistics and 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). 2021.

10.18653/v1/2021.acl-long.328 article EN cc-by 2021-01-01

This paper aims to improve the widely used deep speaker embedding x-vector model. We propose following improvements: (1) a hybrid neural network structure using both time delay (TDNN) and long short-term memory networks (LSTM) generate complementary information at different levels; (2) multi-level pooling strategy collect from TDNN LSTM layers; (3) regularization scheme on extraction layer make extracted embeddings suitable for fusion step. The synergy of these improvements are shown NIST...

10.1109/icassp.2019.8682712 article EN ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2019-04-17

One of the main challenges for end-to-end speech translation is data scarcity.We leverage pseudo-labels generated from unlabeled audio by a cascade and an model.This provides 8.3 5.7 BLEU gains over strong semi-supervised baseline on MuST-C English-French English-German datasets, reaching state-of-the art performance.The effect quality investigated.Our approach shown to be more effective than simply pre-training encoder recognition task.Finally, we demonstrate effectiveness self-training...

10.21437/interspeech.2020-2938 article EN Interspeech 2022 2020-10-25

10.1016/s0076-6879(08)03811-1 article EN Methods in enzymology on CD-ROM/Methods in enzymology 2009-01-01

The application of neural networks to seismic first break (FB) picking research has been developed for many years. Numerous multitrace FB methods based on convolutional (CNNs) have proposed. Among them, the pickup method semantic segmentation fully (FCNs) is proven stronger noise immunity. However, when in data with drastic variations FBs between local adjacent traces, because feature extraction FCNs convergence information around data, network output edge tends smooth, which leads a...

10.1109/lgrs.2023.3248233 article EN IEEE Geoscience and Remote Sensing Letters 2023-01-01

We present a simple yet effective approach to build multilingual speech-to-text (ST) translation by efficient transfer learning from pretrained speech encoder and text decoder. Our key finding is that minimalistic LNA (LayerNorm Attention) finetuning can achieve zero-shot crosslingual cross-modality ability only less than 10% of the parameters. This enables effectively leveraging large models with low training cost. Using wav2vec 2.0 for acoustic modeling, mBART generation, our advanced new...

10.48550/arxiv.2010.12829 preprint EN other-oa arXiv (Cornell University) 2020-01-01

Multi-hop reading comprehension (RC) across documents poses new challenge over single-document RC because it requires reasoning multiple to reach the final answer. In this paper, we propose a model tackle multi-hop problem. We introduce heterogeneous graph with different types of nodes and edges, which is named as Heterogeneous Document-Entity (HDE) graph. The advantage HDE that contains granularity levels information including candidates, entities in specific document contexts. Our proposed...

10.48550/arxiv.1905.07374 preprint EN other-oa arXiv (Cornell University) 2019-01-01

Recent years have seen great success in the use of neural seq2seq models on text-to-SQL task. However, little work has paid attention to how these generalize realistic unseen data, which naturally raises a question: does this impressive performance signify perfect generalization model, or are there still some limitations?In paper, we first diagnose bottleneck task by providing new testbed, observe that existing present poor ability rarely-seen data. The above analysis encourages us design...

10.1609/aaai.v34i05.6246 article EN Proceedings of the AAAI Conference on Artificial Intelligence 2020-04-03

This paper presents a new approach to feature analysis in automatic speech recognition (ASR) based on locality preserving projections (LPP). LPP is manifold dimensionality reduction algorithm which can be trained and applied as linear projection ASR features. Conventional algorithms are generally restricted batch mode implementation it difficult practice apply them unseen data. It argued that model vectors assumed lie nonlinear embedding subspace by local relations among input features, so...

10.1109/icassp.2008.4517923 article EN Proceedings of the ... IEEE International Conference on Acoustics, Speech, and Signal Processing 2008-03-01

Abstract There is a growing chorus of voices in the scientific community calling for greater openness sharing raw data that lead to publication. In this commentary, we discuss merits sharing, common concerns are raised, and practical issues arise developing policy. We suggest cognitive science topic establish data‐sharing

10.1111/tops.12006 article EN Topics in Cognitive Science 2013-01-01

Speech emotion recognition (SER) has attracted great attention in recent years due to the high demand for emotionally intelligent speech interfaces. Deriving speaker-invariant representations is crucial. In this paper, we propose apply adversarial training SER learn representations. Our model consists of three parts: a representation learning sub-network with time-delay neural network (TDNN) and LSTM statistical pooling, an classification speaker network. Both take output as input. Two...

10.48550/arxiv.1903.09606 preprint EN other-oa arXiv (Cornell University) 2019-01-01

This paper investigates the impact of sub space based techniques for acoustic modeling in automatic speech recognition (ASR). There are many well known approaches to subspace speaker adaptation which represent sources variability as a projection within low dimensional subspace. A new approach ASR, referred Gaussian mixture model (SGMM), represents phonetic set projections applied at state level hidden Markov (HMM) model. The SGMM these intrinsic is evaluated continuous (CSR) task. shown...

10.1109/icassp.2011.5947356 article EN 2011-05-01

Predictive state representations (PSRs) are powerful methods of modeling dynamical systems by representing through observational data. Most the current PSR techniques focus on learning a complete model from entire space. Consequently, often not scalable due to dimensional curse, which limits applications PSR. In this paper, we propose new technique. Instead directly at one time, learn set local models each is constructed sub-state space and then combine learnt models. We employ landmark...

10.5555/2772879.2773312 article EN Adaptive Agents and Multi-Agents Systems 2015-05-04

Speaker space-based adaptation methods for automatic speech recognition have been shown to provide significant performance improvements tasks where only a few seconds of is available. However, these techniques are not widely used in practical applications because they require large amounts speaker-dependent training data and computer memory. The authors propose robust, low-complexity technique within this general class that has reduce word error rate, the storage requirements associated with...

10.1109/tasl.2008.916530 article EN IEEE Transactions on Audio Speech and Language Processing 2008-02-15
Coming Soon ...