NFDI4DS | UHH-SEMS - Publication Details

Yun Tang

ORCID: 0000-0002-3122-5881

Publications

Citations

Views

---

Saved

---

About

Contact & Profiles

A5033437280

Research Areas

Speech Recognition and Synthesis
Natural Language Processing Techniques
Topic Modeling
Speech and Audio Processing
Music and Audio Processing
Advanced Graph Neural Networks
Speech and dialogue systems
Multimodal Machine Learning Applications
Neural Networks and Applications
Domain Adaptation and Few-Shot Learning
Biomedical Text Mining and Ontologies
Reinforcement Learning in Robotics
Blind Source Separation Techniques
Bayesian Modeling and Causal Inference
Advanced Computational Techniques and Applications
Advanced Image and Video Retrieval Techniques
Advanced Neural Network Applications
Bioinformatics and Genomic Networks
Neural Networks and Reservoir Computing
Cognitive Science and Mapping
Recommender Systems and Techniques
Gene expression and cancer classification
Hate Speech and Cyberbullying Detection
Advanced Sensor and Control Systems
Child and Animal Learning Development

Chengdu University of Technology
2010-2024

Sichuan University
2024

West China Hospital of Sichuan University
2024

National Sun Yat-sen University
2010-2023

Johns Hopkins University
2020-2022

Meta (United States)
2020-2021

Meta (Israel)
2020-2021

University of Science and Technology of China
2019-2020

JDSU (United States)
2019

Xiamen University
2015-2017

End-to-End Structure-Aware Convolutional Networks for Knowledge Base Completion

OPENALEX - Publications

Chao Shang Yun Tang Jing Huang Jinbo Bi Xiaodong He and 1 more

Knowledge graph embedding has been an active research topic for knowledge base completion, with progressive improvement from the initial TransE, TransH, DistMult et al to current state-of-the-art ConvE. ConvE uses 2D convolution over embeddings and multiple layers of nonlinear features model graphs. The can be efficiently trained scalable large However, there is no structure enforcement in space recent convolutional network (GCN) provides another way learning node by successfully utilizing...

10.1609/aaai.v33i01.33013060 article EN Proceedings of the AAAI Conference on Artificial Intelligence 2019-07-17

Multi-hop Reading Comprehension across Multiple Documents by Reasoning over Heterogeneous Graphs

OPENALEX - Publications

Ming Tu Guangtao Wang Jing Huang Yun Tang Xiaodong He and 1 more

Multi-hop reading comprehension (RC) across documents poses new challenge over single-document RC because it requires reasoning multiple to reach the final answer. In this paper, we propose a model tackle multi-hop problem. We introduce heterogeneous graph with different types of nodes and edges, which is named as Heterogeneous Document-Entity (HDE) graph. The advantage HDE that contains granularity levels information including candidates, entities in specific document contexts. Our proposed...

10.18653/v1/p19-1260 preprint EN 2019-01-01

Direct Speech-to-Speech Translation With Discrete Units

OPENALEX - Publications

Ann Lee Peng–Jen Chen Changhan Wang Jiatao Gu Sravya Popuri and 7 more

Ann Lee, Peng-Jen Chen, Changhan Wang, Jiatao Gu, Sravya Popuri, Xutai Ma, Adam Polyak, Yossi Adi, Qing He, Yun Tang, Juan Pino, Wei-Ning Hsu. Proceedings of the 60th Annual Meeting Association for Computational Linguistics (Volume 1: Long Papers). 2022.

10.18653/v1/2022.acl-long.235 article EN cc-by Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) 2022-01-01

fairseq S2T: Fast Speech-to-Text Modeling with fairseq

OPENALEX - Publications

Changhan Wang Yun Tang Xutai Ma Anne Wu Dmytro Okhonko and 1 more

We introduce fairseq S2T, a extension for speech-to-text (S2T) modeling tasks such as end-to-end speech recognition and translation. It follows fairseq's careful design scalability extensibility. provide workflows from data pre-processing, model training to offline (online) inference. implement state-of-the-art RNN-based, Transformer-based well Conformer-based models open-source detailed recipes. Fairseq's machine translation language can be seamlessly integrated into S2T multi-task learning...

10.48550/arxiv.2010.05171 preprint EN other-oa arXiv (Cornell University) 2020-01-01

Multilingual Speech Translation from Efficient Finetuning of Pretrained Models

OPENALEX - Publications

Xian Li Changhan Wang Yun Tang Chau Tran Yuqing Tang and 4 more

Xian Li, Changhan Wang, Yun Tang, Chau Tran, Yuqing Juan Pino, Alexei Baevski, Alexis Conneau, Michael Auli. Proceedings of the 59th Annual Meeting Association for Computational Linguistics and 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). 2021.

10.18653/v1/2021.acl-long.68 article EN cc-by 2021-01-01

Fairseq S2T: Fast Speech-to-Text Modeling with Fairseq

OPENALEX - Publications

Changhan Wang Yun Tang Xutai Ma Anne Wu Dmytro Okhonko and 1 more

We introduce fairseq S2T, a extension for speech-to-text (S2T) modeling tasks such as end-to-end speech recognition and translation. It follows fairseq's careful design scalability extensibility. provide workflows from data pre-processing, model training to offline (online) inference. implement state-of-the-art RNN-based well Transformer-based models open-source detailed recipes. Fairseq's machine translation language can be seamlessly integrated into S2T multi-task learning or transfer...

10.18653/v1/2020.aacl-demo.6 preprint EN 2020-01-01

Orthogonal Relation Transforms with Graph Context Modeling for Knowledge Graph Embedding

OPENALEX - Publications

Yun Tang Jing Huang Guangtao Wang Xiaodong He Bowen Zhou

Distance-based knowledge graph embeddings have shown substantial improvement on the link prediction task, from TransE to latest state-of-the-art RotatE. However, complex relations such as N-to-1, 1-to-N and N-to-N still remain challenging predict. In this work, we propose a novel distance-based approach for prediction. First, extend RotatE 2D domain high dimensional space with orthogonal transforms model relations. The transform embedding keeps capability modeling symmetric/anti-symmetric,...

10.18653/v1/2020.acl-main.241 article EN cc-by 2020-01-01

A General Multi-Task Learning Framework to Leverage Text Data for Speech to Text Tasks

OPENALEX - Publications

Yun Tang Juan Pino Changhan Wang Xutai Ma Dmitriy Genzel

Attention-based sequence-to-sequence modeling provides a powerful and elegant solution for applications that need to map one sequence different sequence. Its success heavily relies on the availability of large amounts training data. This presents challenge speech where labelled data is very expensive obtain, such as automatic recognition (ASR) translation (ST). In this study, we propose general multi-task learning framework leverage text ASR ST tasks. Two auxiliary tasks, denoising...

10.1109/icassp39728.2021.9415058 article EN ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2021-05-13

Improving Speech Translation by Understanding and Learning from the Auxiliary Text Translation Task

OPENALEX - Publications

Yun Tang Juan Pino Xian Li Changhan Wang Dmitriy Genzel

Yun Tang, Juan Pino, Xian Li, Changhan Wang, Dmitriy Genzel. Proceedings of the 59th Annual Meeting Association for Computational Linguistics and 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). 2021.

10.18653/v1/2021.acl-long.328 article EN cc-by 2021-01-01

Deep Speaker Embedding Learning with Multi-level Pooling for Text-independent Speaker Verification

OPENALEX - Publications

Yun Tang Guohong Ding Jing Huang Xiaodong He Bowen Zhou

This paper aims to improve the widely used deep speaker embedding x-vector model. We propose following improvements: (1) a hybrid neural network structure using both time delay (TDNN) and long short-term memory networks (LSTM) generate complementary information at different levels; (2) multi-level pooling strategy collect from TDNN LSTM layers; (3) regularization scheme on extraction layer make extracted embeddings suitable for fusion step. The synergy of these improvements are shown NIST...

10.1109/icassp.2019.8682712 article EN ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2019-04-17

Self-Training for End-to-End Speech Translation

OPENALEX - Publications

Juan Pino Qiantong Xu Xutai Ma Mohammad Javad Dousti Yun Tang

One of the main challenges for end-to-end speech translation is data scarcity.We leverage pseudo-labels generated from unlabeled audio by a cascade and an model.This provides 8.3 5.7 BLEU gains over strong semi-supervised baseline on MuST-C English-French English-German datasets, reaching state-of-the art performance.The effect quality investigated.Our approach shown to be more effective than simply pre-training encoder recognition task.Finally, we demonstrate effectiveness self-training...

10.21437/interspeech.2020-2938 article EN Interspeech 2022 2020-10-25

Chapter 11 Evaluation and Comparison of Computational Models

OPENALEX - Publications

Jay I. Myung Yun Tang Mark A. Pitt

10.1016/s0076-6879(08)03811-1 article EN Methods in enzymology on CD-ROM/Methods in enzymology 2009-01-01

Seismic First Break Picking Through Swin Transformer Feature Extraction

OPENALEX - Publications

Peifan Jiang Fei Deng Xuben Wang Pengfei Shuai Wen Luo and 1 more

The application of neural networks to seismic first break (FB) picking research has been developed for many years. Numerous multitrace FB methods based on convolutional (CNNs) have proposed. Among them, the pickup method semantic segmentation fully (FCNs) is proven stronger noise immunity. However, when in data with drastic variations FBs between local adjacent traces, because feature extraction FCNs convergence information around data, network output edge tends smooth, which leads a...

10.1109/lgrs.2023.3248233 article EN IEEE Geoscience and Remote Sensing Letters 2023-01-01

Multi-Stride Self-Attention for Speech Recognition

OPENALEX - Publications

Kyu J. Han Jing Huang Yun Tang Xiaodong He Bowen Zhou

10.21437/interspeech.2019-1973 article EN Interspeech 2022 2019-09-13

Multilingual Speech Translation with Efficient Finetuning of Pretrained Models

OPENALEX - Publications

Xian Li Changhan Wang Yun Tang Chau Tran Yuqing Tang and 4 more

We present a simple yet effective approach to build multilingual speech-to-text (ST) translation by efficient transfer learning from pretrained speech encoder and text decoder. Our key finding is that minimalistic LNA (LayerNorm Attention) finetuning can achieve zero-shot crosslingual cross-modality ability only less than 10% of the parameters. This enables effectively leveraging large models with low training cost. Using wav2vec 2.0 for acoustic modeling, mBART generation, our advanced new...

10.48550/arxiv.2010.12829 preprint EN other-oa arXiv (Cornell University) 2020-01-01

Multi-hop Reading Comprehension across Multiple Documents by Reasoning over Heterogeneous Graphs

OPENALEX - Publications

Ming Tu Guangtao Wang Jing Huang Yun Tang Xiaodong He and 1 more

10.48550/arxiv.1905.07374 preprint EN other-oa arXiv (Cornell University) 2019-01-01

Zero-Shot Text-to-SQL Learning with Auxiliary Task

OPENALEX - Publications

Shuaichen Chang Pengfei Liu Yun Tang Jing Huang Xiaodong He and 1 more

Recent years have seen great success in the use of neural seq2seq models on text-to-SQL task. However, little work has paid attention to how these generalize realistic unseen data, which naturally raises a question: does this impressive performance signify perfect generalization model, or are there still some limitations?In paper, we first diagnose bottleneck task by providing new testbed, observe that existing present poor ability rarely-seen data. The above analysis encourages us design...

10.1609/aaai.v34i05.6246 article EN Proceedings of the AAAI Conference on Artificial Intelligence 2020-04-03

A study of using locality preserving projections for feature extraction in speech recognition

OPENALEX - Publications

Yun Tang Richard C. Rose

This paper presents a new approach to feature analysis in automatic speech recognition (ASR) based on locality preserving projections (LPP). LPP is manifold dimensionality reduction algorithm which can be trained and applied as linear projection ASR features. Conventional algorithms are generally restricted batch mode implementation it difficult practice apply them unseen data. It argued that model vectors assumed lie nonlinear embedding subspace by local relations among input features, so...

10.1109/icassp.2008.4517923 article EN Proceedings of the ... IEEE International Conference on Acoustics, Speech, and Signal Processing 2008-03-01

An efficient approach for electric load forecasting using distributed ART (adaptive resonance theory) & HS-ARTMAP (Hyper-spherical ARTMAP network) neural network

OPENALEX - Publications

Yuan Cai Jianzhou Wang Yun Tang Yuchen Yang

10.1016/j.energy.2010.11.005 article EN Energy 2010-12-25

What Should Be the Data Sharing Policy of Cognitive Science?

OPENALEX - Publications

Mark A. Pitt Yun Tang

Abstract There is a growing chorus of voices in the scientific community calling for greater openness sharing raw data that lead to publication. In this commentary, we discuss merits sharing, common concerns are raised, and practical issues arise developing policy. We suggest cognitive science topic establish data‐sharing

10.1111/tops.12006 article EN Topics in Cognitive Science 2013-01-01

Towards adversarial learning of speaker-invariant representation for speech emotion recognition

OPENALEX - Publications

Ming Tu Yun Tang Jing Huang Xiaodong He Bowen Zhou

Speech emotion recognition (SER) has attracted great attention in recent years due to the high demand for emotionally intelligent speech interfaces. Deriving speaker-invariant representations is crucial. In this paper, we propose apply adversarial training SER learn representations. Our model consists of three parts: a representation learning sub-network with time-delay neural network (TDNN) and LSTM statistical pooling, an classification speaker network. Both take output as input. Two...

10.48550/arxiv.1903.09606 preprint EN other-oa arXiv (Cornell University) 2019-01-01

An investigation of subspace modeling for phonetic and speaker variability in automatic speech recognition

OPENALEX - Publications

Richard C. Rose Shou-Chun Yin Yun Tang

This paper investigates the impact of sub space based techniques for acoustic modeling in automatic speech recognition (ASR). There are many well known approaches to subspace speaker adaptation which represent sources variability as a projection within low dimensional subspace. A new approach ASR, referred Gaussian mixture model (SGMM), represents phonetic set projections applied at state level hidden Markov (HMM) model. The SGMM these intrinsic is evaluated continuous (CSR) task. shown...

10.1109/icassp.2011.5947356 article EN 2011-05-01

Predictive State Representations with State Space Partitioning

OPENALEX - Publications

Yunlong Liu Yun Tang Yifeng Zeng

Predictive state representations (PSRs) are powerful methods of modeling dynamical systems by representing through observational data. Most the current PSR techniques focus on learning a complete model from entire space. Consequently, often not scalable due to dimensional curse, which limits applications PSR. In this paper, we propose new technique. Instead directly at one time, learn set local models each is constructed sub-state space and then combine learnt models. We employ landmark...

10.5555/2772879.2773312 article EN Adaptive Agents and Multi-Agents Systems 2015-05-04

Rapid Speaker Adaptation Using Clustered Maximum-Likelihood Linear Basis With Sparse Training Data

OPENALEX - Publications

Yun Tang Richard C. Rose

Speaker space-based adaptation methods for automatic speech recognition have been shown to provide significant performance improvements tasks where only a few seconds of is available. However, these techniques are not widely used in practical applications because they require large amounts speaker-dependent training data and computer memory. The authors propose robust, low-complexity technique within this general class that has reduce word error rate, the storage requirements associated with...

10.1109/tasl.2008.916530 article EN IEEE Transactions on Audio Speech and Language Processing 2008-02-15

Coming Soon ...