NFDI4DS | UHH-SEMS - Publication Details

Zhang Yu

ORCID: 0000-0003-2012-226X

Publications

Citations

Views

---

Saved

---

About

Contact & Profiles

A5100619178

Research Areas

Topic Modeling
Natural Language Processing Techniques
Speech Recognition and Synthesis
Advanced Computational Techniques and Applications
Chinese history and philosophy
Speech and Audio Processing
Advanced Text Analysis Techniques
Service-Oriented Architecture and Web Services
Semantic Web and Ontologies
Music and Audio Processing
Translation Studies and Practices
Web Data Mining and Analysis
Speech and dialogue systems
Simulation and Modeling Applications
Industrial Technology and Control Systems
Biomedical Text Mining and Ontologies
Educational Reforms and Innovations
Remote Sensing and Land Use
Language, Metaphor, and Cognition
Educational Technology and Pedagogy
Recommender Systems and Techniques
Text and Document Classification Technologies
Geomechanics and Mining Engineering
Multimodal Machine Learning Applications
Data Quality and Management

Harbin Institute of Technology
2010-2024

Jiamusi University
2024

China University of Geosciences (Beijing)
2024

Affiliated Hospital of Chengde Medical College
2023

Dalian University of Technology
2019-2022

Google (United States)
2019-2022

Qingdao University
2022

BGI Group (China)
2021

Kunming Metallurgy College
2021

Guangdong Institute of Intelligent Manufacturing
2020

LibriTTS: A Corpus Derived from LibriSpeech for Text-to-Speech

OPENALEX - Publications

Heiga Zen Viet Chau Dang Rob Clark Zhang Yu Ron J. Weiss and 3 more

This paper introduces a new speech corpus called "LibriTTS" designed for text-to-speech use.It is derived from the original audio and text materials of LibriSpeech corpus, which has been used training evaluating automatic recognition systems.The inherits desired properties while addressing number issues make less than ideal work.The released consists 585 hours data at 24kHz sampling rate 2,456 speakers corresponding texts.Experimental results show that neural end-to-end TTS models trained...

10.21437/interspeech.2019-2441 article EN Interspeech 2022 2019-09-13

Very deep convolutional networks for end-to-end speech recognition

OPENALEX - Publications

Zhang Yu William Chan Navdeep Jaitly

Sequence-to-sequence models have shown success in end-to-end speech recognition. However these only used shallow acoustic encoder networks. In our work, we successively train very deep convolutional networks to add more expressive power and better generalization for ASR models. We apply network-in-network principles, batch normalization, residual connections LSTMs build recurrent structures. Our exploit the spectral structure feature space computational depth without overfitting issues....

10.1109/icassp.2017.7953077 preprint EN 2017-03-01

Pushing the Limits of Semi-Supervised Learning for Automatic Speech Recognition

OPENALEX - Publications

Zhang Yu James Qin Daniel Park Wei Han Chung‐Cheng Chiu and 3 more

We employ a combination of recent developments in semi-supervised learning for automatic speech recognition to obtain state-of-the-art results on LibriSpeech utilizing the unlabeled audio Libri-Light dataset. More precisely, we carry out noisy student training with SpecAugment using giant Conformer models pre-trained wav2vec 2.0 pre-training. By doing so, are able achieve word-error-rates (WERs) 1.4%/2.6% test/test-other sets against current WERs 1.7%/3.3%.

10.48550/arxiv.2010.10504 preprint EN other-oa arXiv (Cornell University) 2020-01-01

Cycle-consistency Training for End-to-end Speech Recognition

OPENALEX - Publications

Takaaki Hori Ramón Fernández Astudillo Tomoki Hayashi Zhang Yu Shinji Watanabe and 1 more

This paper presents a method to train end-to-end automatic speech recognition (ASR) models using unpaired data. Although the approach can eliminate need for expert knowledge such as pronunciation dictionaries build ASR systems, it still requires large amount of paired data, i.e., utterances and their transcriptions. Cycle-consistency losses have been recently proposed way mitigate problem limited These approaches compose reverse operation with given transformation, e.g., text-to-speech (TTS)...

10.1109/icassp.2019.8683307 article EN ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2019-04-17

Generating Diverse and Natural Text-to-Speech Samples Using a Quantized Fine-Grained VAE and Autoregressive Prosody Prior

OPENALEX - Publications

Guangzhi Sun Zhang Yu Ron J. Weiss Yuan Cao Heiga Zen and 3 more

Recent neural text-to-speech (TTS) models with fine-grained latent features enable precise control of the prosody synthesized speech. Such typically incorporate a variational autoencoder (VAE) structure, extracting at each input token (e.g., phonemes). However, generating samples standard VAE prior often results in unnatural and discontinuous speech, dramatic prosodic variation between tokens. This paper proposes sequential discrete space which can generate more naturally sounding samples....

10.1109/icassp40776.2020.9053436 article EN ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2020-04-09

MAESTRO: Matched Speech Text Representations through Modality Matching

OPENALEX - Publications

Zhehuai Chen Zhang Yu Andrew E. Rosenberg Bhuvana Ramabhadran Pedro J. Moreno and 2 more

We present Maestro, a self-supervised training method to unify representations learnt from speech and text modalities.Self-supervised learning signals aims learn the latent structure inherent in signal, while attempts capture lexical information.Learning aligned unpaired sequences is challenging task.Previous work either implicitly enforced these two modalities be space through multitasking parameter sharing or explicitly conversion of via synthesis.While former suffers interference between...

10.21437/interspeech.2022-10937 article EN Interspeech 2022 2022-09-16

SLAM: A Unified Encoder for Speech and Language Modeling via Speech-Text Joint Pre-Training

OPENALEX - Publications

Ankur Bapna Yu-An Chung Nan Wu Anmol Gulati Jia Ye and 5 more

Unsupervised pre-training is now the predominant approach for both text and speech understanding. Self-attention models pre-trained on large amounts of unannotated data have been hugely successful when fine-tuned downstream tasks from a variety domains languages. This paper takes universality unsupervised language one step further, by unifying within single model. We build encoder with BERT objective unlabeled together w2v-BERT speech. To further align our model representations across...

10.48550/arxiv.2110.10329 preprint EN cc-by arXiv (Cornell University) 2021-01-01

Scaling End-to-End Models for Large-Scale Multilingual ASR

OPENALEX - Publications

Bo Li Ruoming Pang Tara N. Sainath Anmol Gulati Zhang Yu and 5 more

Building ASR models across many languages is a challenging multi-task learning problem due to large variations and heavily unbalanced data. Existing work has shown positive transfer from high resource low languages. However, degradations on are commonly observed interference the heterogeneous multilingual data reduction in per-language capacity. We conduct capacity study 15-language task, with amount of per language varying 7.6K 53.5K hours. adopt GShard [1] efficiently scale up 10B...

10.1109/asru51503.2021.9687871 article EN 2021 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU) 2021-12-13

Deep Reinforcement Learning for Chinese Zero Pronoun Resolution

OPENALEX - Publications

Qingyu Yin Zhang Yu Weinan Zhang Ting Liu William Yang Wang

Recent neural network models for Chinese zero pronoun resolution gain great performance by capturing semantic information pronouns and candidate antecedents, but tend to be short-sighted, operating solely making local decisions. They typically predict coreference links between the one single antecedent at a time while ignoring their influence on future Ideally, modeling useful of preceding potential antecedents is crucial classifying later pronoun-candidate pairs, need which leads...

10.18653/v1/p18-1053 article EN cc-by Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) 2018-01-01

Learn to Combine Linguistic and Symbolic Information for Table-based Fact Verification

OPENALEX - Publications

Qi Shi Zhang Yu Qingyu Yin Ting Liu

Table-based fact verification is expected to perform both linguistic reasoning and symbolic reasoning. Existing methods lack attention take advantage of the combination information information. In this work, we propose HeterTFV, a graph-based approach, that learns combine effectively. We first construct program graph encode programs, kind LISP-like logical form, learn semantic compositionality programs. Then heterogeneous incorporate by introducing nodes into graph. Finally, approach reason...

10.18653/v1/2020.coling-main.466 article EN cc-by Proceedings of the 17th international conference on Computational linguistics - 2020-01-01

A Comparative study of Thematic Model from the Perspective of Textual Meta-Function: A Case Study of Top Chinese and Foreign Universities’ English Promotional Videos

OPENALEX - Publications

Zhang Yu Fei Guo

English promotional videos are crucial tools for image dissemination in universities, playing a key role shaping institutional branding, attracting potential students, and enhancing social awareness. While numerous studies have explored the linguistic characteristics of university from semiotic perspective, systematic analyses perspective textual meta-function remain relatively scarce. This study, utilizing UAM Corpus Tool, investigates thematic structure 89 universities China abroad through...

10.32996/ijllt.2025.6.4.21 article EN International Journal of Linguistics Literature & Translation 2025-04-17

Examining Effectiveness and Validity of Accommodations for English Language Learners in Mathematics: An Evidence‐Based Computer Accommodation Decision System

OPENALEX - Publications

Jamal Abedi Zhang Yu Susan E. Rowe Hansol Lee

Abstract Research indicates that the performance‐gap between English Language Learners (ELLs) and their non‐ELL peers is partly due to ELLs' difficulty in understanding assessment language. Accommodations have been shown narrow this performance‐gap, but many accommodations studies not used a randomized design are based on relatively small sample sizes. Addressing such issues, we administered standard‐based mathematics approximately 3,000 Grade 9 ELL students under five different...

10.1111/emip.12328 article EN Educational Measurement Issues and Practice 2020-04-12

Document-Level Biomedical Relation Extraction Using Graph Convolutional Network and Multihead Attention: Algorithm Development and Validation

OPENALEX - Publications

Jian Wang Xiaoyu Chen Zhang Yu Yijia Zhang Jiabin Wen and 3 more

Automatically extracting relations between chemicals and diseases plays an important role in biomedical text mining. Chemical-disease relation (CDR) extraction aims at complex semantic relationships entities documents, which contain intrasentence intersentence relations. Most previous methods did not consider dependency syntactic information across the sentences, are very valuable for task, particular, accurately.In this paper, we propose a novel end-to-end neural network based on graph...

10.2196/17638 article EN cc-by JMIR Medical Informatics 2020-04-25

Identification of Web Query Intent Based on Query Text and Web Knowledge

OPENALEX - Publications

Dayong Wu Zhang Yu Shiqi Zhao Ting Liu

In this paper, we propose a novel approach to identifying user intents of search engine queries. Specifically, recast it as classification problem, in which four types features are adopted. The based on deep linguistic analysis queries well feedbacks. We evaluate the method with real web query data. results show that about 88% test can be correctly identified framework via combining all 4 features.

10.1109/pcspa.2010.40 article EN 2010-09-01

LibriTTS: A Corpus Derived from LibriSpeech for Text-to-Speech

OPENALEX - Publications

Heiga Zen Viet Chau Dang Rob Clark Zhang Yu Ron J. Weiss and 3 more

This paper introduces a new speech corpus called "LibriTTS" designed for text-to-speech use. It is derived from the original audio and text materials of LibriSpeech corpus, which has been used training evaluating automatic recognition systems. The inherits desired properties while addressing number issues make less than ideal work. released consists 585 hours data at 24kHz sampling rate 2,456 speakers corresponding texts. Experimental results show that neural end-to-end TTS models trained...

10.48550/arxiv.1904.02882 preprint EN other-oa arXiv (Cornell University) 2019-01-01

Graph-based re-ranking using acoustic feature similarity between search results for spoken term detection on low-resource languages

OPENALEX - Publications

Hung-yi Lee Zhang Yu Ekapol Chuangsuwanich James Glass

Acoustic feature similarity between search results has been shown to be very helpful for the task of spoken term detection (STD). A graph-based re-ranking approach STD proposed based on concept that results, which are acoustically similar other with higher confidence scores, should have scores themselves. In this approach, all a given considered as graph, and propagate through graph. Since can improve without any additional labelled data, it is especially suitable languages limited amounts...

10.21437/interspeech.2014-526 article EN Interspeech 2022 2014-09-14

Generating diverse and natural text-to-speech samples using a quantized fine-grained VAE and auto-regressive prosody prior

OPENALEX - Publications

Guangzhi Sun Zhang Yu Ron J. Weiss Yuan Cao Heiga Zen and 3 more

10.48550/arxiv.2002.03788 preprint EN other-oa arXiv (Cornell University) 2020-01-01

Semi-Supervision in ASR: Sequential MixMatch and Factorized TTS-Based Augmentation

OPENALEX - Publications

Zhehuai Chen Andrew Rosenberg Zhang Yu Heiga Zen Mohammadreza Ghodsi and 5 more

10.21437/interspeech.2021-677 article EN Interspeech 2022 2021-08-27

Temporal Spatial Inverse Semantics for Robots Communicating with Humans

OPENALEX - Publications

Ze Gong Zhang Yu

Effective communication between humans often embeds both temporal and spatial context. While context captures the geographic settings of objects in environment, describes their changes over time. In this paper, we propose inverse semantics (TeSIS) to extend approach also consider for robots communicating with humans. Inverse generates natural language requests while taking into account how well human listeners would interpret those given current Compared semantics, our incorporates by...

10.1109/icra.2018.8460754 article EN 2018-05-01

A comparison of end-to-end models for long-form speech recognition

OPENALEX - Publications

Chung‐Cheng Chiu Wei Han Zhang Yu Ruoming Pang Sergey Kishchenko and 9 more

End-to-end automatic speech recognition (ASR) models, including both attention-based models and the recurrent neural network transducer (RNN-T), have shown superior performance compared to conventional systems. However, previous studies focused primarily on short utterances that typically last for just a few seconds or, at most, tens of seconds. Whether such architectures are practical long from minutes hours remains an open question. In this paper, we investigate improve end-to-end...

10.48550/arxiv.1911.02242 preprint EN other-oa arXiv (Cornell University) 2019-01-01

Long short-term memory with attention and multitask learning for distant speech recognition

OPENALEX - Publications

Zhang Yu Pengyuan Zhang Yan Yonghong

10.16511/j.cnki.qhdxxb.2018.25.016 article EN Journal of Tsinghua University(Science and Technology) 2018-03-15

Is POS Tagging Necessary or Even Helpful for Neural Dependency Parsing?

OPENALEX - Publications

Houquan Zhou Zhang Yu Zhenghua Li Min Zhang

In the pre deep learning era, part-of-speech tags have been considered as indispensable ingredients for feature engineering in dependency parsing. But quite a few works focus on joint tagging and parsing models to avoid error propagation. contrast, recent studies suggest that POS becomes much less important or even useless neural parsing, especially when using character-based word representations. Yet there are not enough investigations focusing this issue, both empirically linguistically....

10.48550/arxiv.2003.03204 preprint EN other-oa arXiv (Cornell University) 2020-01-01

Coming Soon ...